ENH: add .iloc attribute to provide location-based indexing #2922

Merged
merged 7 commits into from Mar 7, 2013

8 participants

@jreback

Updated to include new indexers:

.iloc for pure integer based indexing
.loc for pure label based indexing
.iat for fast scalar access by integer location
.at for fast scalar access by label location

Much updated docs, test suite, and example

In the new test_indexing.py, you can change the _verbose flag to True to get more test output
anybody interested can investigate a couple of cases marked no comp which are where the new
indexing behavior differs from .ix (or .ix doesn't work); this doesn't include cases where a KeyError/IndexError is raised (but .ix let's these thru)

Also, I wrote .iloc on top of .ix but most methods are overriden, it is possible that this let's something thru that should not, so pls take a look

Please try this out and let me know if any of the docs or interface semantics are off

@nehalecky

Rad. I am really looking forward to this being merged in master.

@jreback

can you give a try....let me know any issues?

@jreback

also...here's a 'feature' that is included

if you specify a 'label' to .loc it will throw a ValueError; This is true EVEN IF the label actually exists and is in the requested axis...'label' means not (integer or slice)

df.loc['a',:] 
df.loc['1',:]

will ALWAYS fail, no matter the index

anyone have an issue with that?

@y-p

Actually, it does what I expect even with a multiIndex, what is it that you think is missing?
It's consistent with .ix in that picking a single col/row returns a series raher then a dataframe,
but I think both cases are a wart.

Great stuff, I'd use this all the time.

@wesm
Python for Data member

I hate to bikeshed but what are people's thought on what this should be called? Either at or loc prolly works. We'd thought about iix also "integer ix" but I dunno about that

@jreback

do you think a purely label based is needed at all?
(I ask because what would u call that, label?)

if not i'd vote for at or loc, maybe not use iix...too much issue of typos

@y-p

I actually like iix.

crop?

@jreback

@y-p i was missing test cases for mi

@nehalecky

I liked loc, because it is pretty clear as to what it performs, but using this is then an departure from the use of i nomenclature for other exiting lookup methods (i.e., irow and icol). Whatever the call, the method names should all fall in line for consistency.

@hugadams

+1 for loc and iloc.

I also think a purely label-based equivalent (eg one with strict flags) would be helpful. Of course, it just comes down to overloading the amount of slicing options present to new users.

As has already been discussed on the mailing list, when labels are numerical, having a clear option for slicing by row and slicing by value, and raising errors for misuse is quite helpful.

For example, imagine I have spectral data running from 400.0 - 700.0nm. If users are slicing by value, it's too easy for them to do [400:700] when they mean [400.0:700.0] or [400:700.0] and I'd prefer my programs bring this to their attention than assume their intentions.

@jreback

how about iloc?

@jreback

or il?

@y-p

+1 for decree from wes.

@nehalecky

Also like suggested iloc or il. Sounds like integer location. Nice.

Edit: I meant integer.

@hugadams hugadams referenced this pull request in hugadams/scikit-spectra Feb 25, 2013
Closed

add tslice function #42

@jreback

any consensus on the name?

il : too short, not clear on meaning
iix : pretty good, except can accidently invode ix
at : ok
iat : ?
loc : ok
iloc : keeps the nomeclature of irow,icol,iget_value
crop : keeps meaning?

so...if choose between:

loc, iloc , at

@nehalecky

+1 iloc

In detail: I gave this a day to think about and while I do like loc, I am leaning now more towards iloc. This is because if I am ever looking to do any sort of advanced indexing, I immediately hammer out .i<Tab> to view all the methods. I didn't even notice how much I do this until I started to actively think about it and it's clear that I've hardwired myself based off of this existing nomenclature!

On that note, I think using iloc maintains consistency with other indexing methods and also, as a method name, indicates what operation is performed.

@hugadams

Jeff, you had mentioned that you were possibly going to have 2 functions here... one that does strict by-index slicing and one that does strict by-name slicing. If so, why not call the by-index version "iloc" and the by-name version "loc"?

@jreback

I wouldn't call it loc, would be then too confusing and easy to accident. Is there something that DOESN't work using .ix?

If you know that you are looking for a label (and have chosen .ix for that reason, this is a label based indexer. I guess there might be the case where the label doesn't exist and its an integer and you have integers as the index and you want to raise a KeyError instead of returning based on the location?

@stephenwlin

I wouldn't call it loc either but I think it'd be useful: there are lots of weird corner cases (see #2727, for example) where the choice of integer vs label-based indexing is basically impossible to predict without looking at the code. (the monotonicity of the index has an effect, too, which mean that two indices with all the same values except one can silently result in different behavior, even if the changed value is not part of the index expression).

@jreback

so the proposal is then:

iloc stricly integer-based
lab or label strickly label-based
ix try label, fall back to integer (backward-compatibly)

I think can get this by essentially making ix the label guy, and having it raise instead of falling back,
which the new ix can catch and then call iloc

and suggestions for label getter?

?

@stephenwlin

well, it would be nice that the existing ix behavior were clean enough to factor into a big try: label_based(); except: position_based(), but I actually really doubt that's the case right now...the logic is pretty spaghetti and when i tried make is more sane while fixing #2727 lots of tests broke (which is why the final behavior is still weird, although consistent between __getitem__ and __setitem__)...so I think the safest thing to do is to put up with the leaving the spaghetti in ix and the possible code duplication between three indexers.

i could easily be wrong though :)

@jreback

your are welcome to have a go!

but he basic idea to have a label based getter? (and only labels)

@stephenwlin

well i think the idea is good i just think it'll necessarily have to be a new, third, code path (even if it's a mostly trivial one)...I'm looking at the positional/label choice logic in _convert_to_indexer right now and I don't think it's possible to encapsulate it with a big try-except (for one thing, there are cases right now that integer indexing is preferred for whatever reason even if label-based indexing could be possible, like in #2727...example df.ix[1.0:4] is positional on a floating index even though the underlying Index will happily take an integer label and cast it up to a float if you ask it to, and IIRC it breaks tests when you change this)

@jreback

ok...the obvious question then....

should we cause an API change where by .ix does not fall back ? (and instead raises), then this de factor is the label based indexer

>>> df = DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))
>>> df
           0         1         2         3
0   0.730481  1.529788 -0.581710  0.616712
2   0.453565 -0.859765 -1.271082 -0.818614
4  -0.923394 -0.887154  2.681521 -1.367626
6  -0.214502 -0.044165 -0.027145 -1.204357
8   0.332970 -1.202543  1.275269  0.197951
10 -0.715645  0.262102 -1.950028  0.807226
12  1.283404  0.106109 -1.635975 -0.480751
14  1.582952 -1.137347 -0.163757  0.120903

>>> df.ix[3]
KeyError: 3

#### essentially causing this not to work ####
>>> df.ix[2]
0    0.453565
1   -0.859765
2   -1.271082
3   -0.818614
Name: 2
@stephenwlin

well, it'd have to be an clearly documented API change then, for sure :)

@jreback

@wesm care to chime in?

@stephenwlin

(i'm not sure I understand your example by the way...which line do you mean not working? df.ix[2] would work still because it's label-based and df.ix[3] already fails because the "fallback" logic avoids location-based indexing on an integer indexer, because of the ambiguity...)

@jreback

you are right my example is wrong....what I actually mean is that integers would ONLY be for labels that match, and not have any positional meaning (float indicies are another issue)....

@stephenwlin

btw, I was trying to generate an example of something that's currently a fallback to position-based and would become disallowed and discovered this...

In [91]: df = DataFrame(np.random.randn(8, 4), index=[2, 4, 6, 8, 'null', 10, 12, 14])

In [92]: df
Out[92]: 
             0         1         2         3
2    -0.951922  0.502621  0.346998 -0.784631
4     1.073580 -1.030964  0.783075  0.283990
6    -0.290176  0.236777 -0.042059 -2.613214
8     0.082795  1.196050 -1.983549  2.973472
null -0.345000 -0.998171  1.035359  1.378678
10    1.762567 -0.706646 -1.591715  0.344561
12   -0.219641 -0.786794  0.228584 -0.808036
14    0.411628  0.427615  0.270707  0.160328

In [93]: df.ix[2] # <-- position-based???
Out[93]: 
0   -0.290176
1    0.236777
2   -0.042059
3   -2.613214
Name: 6, Dtype: float64

In [94]: df.ix['null']
Out[94]: 
0   -0.345000
1   -0.998171
2    1.035359
3    1.378678
Name: null, Dtype: float64

so I presume if you had some big csv where a string happened to pop in where it wasn't supposed to for some reason and you didn't notice it, you'd silently change the semantics of all your integer indexes... :/ (unless there's some explicit data sanitation logic somewhere to handle this...)

all the more reason to provide a way to eliminate the ambiguity if possible (just not sure if breaking ix is worth it vs making a new attribute...)

@jreback

here's the fallbacker

>>> df.ix[1]
0   1.073580 
1  -1.030964  
2   0.783075  
3   0.283990
Name: 4, Dtype: float64
@stephenwlin

yeah but df.ix[2] is worse because it shouldn't even fallback..it's a label that's in the actual index but it's defaulting to position-based!

@nehalecky

That example is gnarly.

@jreback, I think your suggestion for making .ix be purely label-based is a great idea. It took me a while to understand how .ix behaves when first using it, and still sometimes I am second-guessing myself (thanks @stephenwlin, perfect example). I know this decoupling of the position-based lookup fallback in .ix might cause some people's existing code to break, however, with a dedicated method for integer-based location lookup , I think this transition wouldn't be so bad if well documented in the API changes.

It's likely this change would make lives easier in the long run. :)

@stephenwlin

hah, this is getting into pathological territory but it gets worse when you extend this to slices:

In [97]: df = DataFrame(np.random.randn(8, 4), index=[2, 4, 6, 8, 'null', 10, 12, 14])

In [98]: df[2:4] # position-based???
Out[98]: 
          0         1         2         3
6  0.844447  0.652539 -0.658187  0.530878
8  1.423447  0.781136 -0.207705 -1.392231

In [99]: df[2:'null'] # label-based...
Out[99]: 
             0         1         2         3
2     0.463806  0.634177  0.578070 -0.617634
4     0.124600 -0.919656 -1.446786 -1.067398
6     0.844447  0.652539 -0.658187  0.530878
8     1.423447  0.781136 -0.207705 -1.392231
null -0.395115  1.058926  0.688837  0.412456
@stephenwlin

maybe a runtime config option is in order?

@jreback

counterpoint to my argument and @nehalecky is this:

df.ix[0:10,0:5] or df.ix[0:10,0]

and I want this to work

if we shift to purely label based then

df.ix[df.index[0:10],df.columns[0:5]] which is not good

@jreback

I think easiest to leave .ix alone to do exactly what its doing (including its pathological, but pretty good guessing)

and provide a .lab that will not guess (and be purely label based)

@stephenwlin

In df.ix[0:10,0:5], do you mean that one of the two axes is position-based and the other is meant to be label-based? Otherwise if they're both position-based, you could just use `df.iloc[0:10,0:5]', right?

I could image a valid use case for having position-based indexing on one axis and label-based on another, though, which does make it tricky...

@jreback

no I mean that works now regardless of the indicies (unless its pathological, like an index of [0,2,4,6,8])
yes...i could shift over to .iloc...you are right

@stephenwlin

cool, but mixed position on one axis and label-based on another definitely would be more difficult if ix changed...you'd have to do something like:

df.ix[df.index[0:10], 'start':'end']

(also, this is minor, but I'm fairly sure this would unnecessarily trigger a copy-based take operation rather than an alias-based slice internally, unless it's special-case optimized to recognize that the indices can be converted back into a slice)

@jreback

anybody have a non pathological case where
ix should raise instead of falling back?

@jreback

added slicing with integer lists, any other indexers that we should support, as from (integers, slices, list-like of integers)

@jreback

I needed it to avoid this behavior, where you are specifying an illegal slice (where we want an IndexError)
debugging negative indicies in any event.....

df.iloc[1:5,5:6]
Empty DataFrame
Columns: []
Index: [2, 4, 6, 8]
@stephenwlin

yeah but apparently none of the other python APIs consider that illegal? I don't like it, actually, but apparently you can do ['a','b'][100:1000] (and the same with ndarray) without any complaints and just get an empty list/array back. in any case if you want the check i think it should be [-len, len - 1] for start and [-len - 1, len] for stop (otherwise you allow one-past-the-end start on the positive side and you disallow one-past-the-end end on the negative side, so you'll never be able to include the last element on the negative side)

@jreback

related to this...does take_nd not deal with negative indices?

In [5]: indices = np.array([-1,1])

In [6]: values
Out[6]: 
array([[-1.41043254,  1.46304052, -1.49114496, -1.51895237],
       [-0.50658412,  0.05933003,  0.33679506, -0.28096733],
       [-0.48683646, -1.95403321, -0.47624251,  0.47096301]])

In [8]: com.take_nd(values,indices,axis=1)
Out[8]: 
array([[        nan,  1.46304052],
       [        nan,  0.05933003],
       [        nan, -1.95403321]])

@stephenwlin

yeah, actually I got bit by this when looking into fixing #2892 on our side...no it doesn't, so it's not a drop-in-replacement for numpy.take and shouldn't be used for user-supplied indices (this doesn't have anything to do with my refactoring, it's always been that way..)

basically, take_nd overloads -1 to mean "empty, fill with fill_value", and all the internal routines that generate indicies use -1 that way, so you can't pass user-supplied indices (which might be negative in the conventional way) directly to take_nd.

@stephenwlin

(also take_nd doesn't really bounds check in most cases to speed things up, so there's another reason why it can't take user-supplied indices...I am not sure how well we are abiding by this rule or if there's user-facing APIs that will allow segfault through this...possibly we should do more checking, actually, since everything in Python is possibly user-facing in theory :/)

@jreback

ok...i can easily fix this (just translate the user indicies into postive....no biggie)..thxs...

and I am going to keep in the bounds checking (with your change)....I think its better to get an exception (IndexError)

@stephenwlin

if you're looking into this you might want to be on the lookout for any other places we're passing user-supplied indices directly to take_nd without sanitization, since all of those are potential segfaults (I actually started converting everything to take_nd on my local because of #2892 until things started breaking with negative and/or out-of-range indicies...by then it was so FUBARed I had to revert it all back...)

and yeah, I personally like bounds checking slices better too...I don't know why Python lists allow arbitrarily out-of-range slice indices. It's like allowing an STL container to accept an arbitrary iterator out of nowhere in C++, yuck :D

@jreback

Anyone have an issue with this? (its in the top-comment as well)
this is what @stephenwlin and I have been talking about last bunch of comments

There is one signficant departure from standard python/numpy slicing semantics.
python/numpy allow slicing past the end of an array without an associated error.

# these are allowed in python/numpy.
In [43]: x = list('abcdef')

In [44]: x[4:10]
Out[44]: ['e', 'f']

In [45]: x[8:10]
Out[45]: []

# this will raise
>>> df.iloc[:,3:6]
IndexError: out-of-bounds on slice (end)```
@jreback

.ix doesn't currently handle neg indicies in any event

(Pdb) p df
           0         2         4         6
0   0.033202 -0.431927 -0.252642 -0.133124
2  -0.647470  0.541821 -1.734298 -0.014205
4   0.268948 -0.528600  1.233215  0.485311
6   0.277405  0.842330  1.904427  1.088825
8   0.099685  0.511559 -0.044274  2.062759
10 -1.150009 -0.992674 -1.091558 -0.500971
12  0.296256  0.607382  0.372635  0.490391
14 -1.412898  0.814839  0.353179 -0.277610
16  0.521936  0.428918 -1.397244  1.392659
18  0.192156 -0.569827 -0.003237 -1.142187
(Pdb) p df.ix[-1]
*** KeyError: KeyError(-1,)
(Pdb) p df.ix[:,-1]
*** KeyError: KeyError(u'no item named -1',)```
@stephenwlin

(minor aside, but I think it allows negative indices when in integers-as-positions mode instead of integers-as-labels mode...I'm pretty sure I saw them being exercised in a few tests, always with string column name though)

@nehalecky

Nice discussion while I was away. I also thought about the case with mixed index-type, after posting my last comment and leaving to go buy polenta, and that's actually a pretty big oversight on my part.

It's actually really convenient to have:

In [10]: df = DataFrame(np.random.randn(6, 4), \
   ....:         index=['a', 'b', 'c', 'd', 'e', 'f'], \
   ....:         columns=['ABC', 'BCD', 'CDE' ,'DEF'])

and do things like:

In [19]: df.ix['c':'e',[1,3]]
Out[19]: 
        BCD       DEF
c -1.610068  0.362089
d  1.236012 -0.778627
e  0.447955 -0.163005

As for the negative indices, I get thrown off often when I try and do what @jreback just demonstrated, but still, am able to get things out like this (just like @stephenwlin's comment):

In [20]: df.ix['c':'e',-1]
Out[20]: 
c    0.362089
d   -0.778627
e   -0.163005
Name: DEF, Dtype: float64

and also like this:

In [27]: df.ix['c':'e',-3:-1]
Out[27]: 
        BCD      CDE
c -1.610068 -1.50532
d  1.236012 -0.22282
e  0.447955 -2.33088

Also, just to demonstrate some of the behavior more:

In [24]: df.ix[-2]
Out[24]: 
ABC    0.077143
BCD    0.447955
CDE   -2.330880
DEF   -0.163005
Name: e, Dtype: float64

In [25]: df.ix[-2:-1]
Out[25]: 
        ABC       BCD      CDE       DEF
e  0.077143  0.447955 -2.33088 -0.163005

And its for this reason that I have a love/hate relationship with .ix, it's been a few releases and I feel like we are still getting to know each other. :)

Still, I am now not certain about my vote for changing it's fallback behavior. But a label-based only lookup would be nice.

EDIT: Ranges were backwards, sorry.

@stephenwlin

I think df.ix['c':'e',-1:-2] should be df.ix['c':'e',-1:-2:-1] actually, to get what you want? :) Unless I'm misinterpreting you.

@nehalecky

You're right, that was a typo. Thank you. Fixed in edited comment. :)

@jreback

@nehalecky do you think you can come up with a case where label based is actually different from .ix. So far I can't think of one (unless we should COMPLETELY ignore integers)?

so df.lab['c':'e',0:2] will fail because you are using a slice that is not the index

@nehalecky

@jreback, off the top of my head, for a pure label-based lookup, no. I can't think of a case where it would be different. But you bring up a interesting point about ignoring integers. What about integer timestamps that haven't been cast as a DatetimeIndex?

And yes, with what we have been discussing df.lab['c':'e',0:2] would fail because the slice doesn't actually exist in index.

Also, as it feels like we're touching on some fundamental behavior, I thought a little list here could be helpful to visualize the naming of similar features.

the indexings and proposed additions:

  • ix
  • iloc
  • lab

and the other associated lookup methods:

  • irow
  • icol
  • iget_value
  • iget
  • get (series only)
  • get_value

Does this all feel consistent and/or intuitive? (Not to be too over the top, I thought it could be a useful discussion).

@jreback

good list

I am not in favor of adding lab, I think .ix IS the label based index
(but reversing myself from earlier, maybe rename ix to loc)?
obviously need to keep ix too, just keeping the names in common

I WOULD be in favor of deprecating irow, icol in favor of iloc

and replace get_value/set_value/iget_value with the following

.at[row_label,col_label] (for labels only)
.iat[row,col] (for integers only)

so unified indexing then would be:

loc, iloc (general access)
at, iat (fast access for single values)

works for both getting and setting
with the I indicating integer indexing

(get is necessary to maintain dict interface on series, so excluding from this)

pretty radical but makes things consistent

@wesm
Python for Data member

Re: get_value/set_value, I actually at one point sat down and optimized the micro/nanoseconds out of those functions, esp get_value. if we ever get dataframe internals pushed down into C/cython then this could be optimized further.

@jreback's proposal seems to make sense to me. I always found ix sort of arbitrary and it would be nice to have a consistent idiom. Obviously ix is never going to go away and can't even get deprecated at this point due to how much it's used in my book lol.

@hugadams
@stephenwlin

I am not in favor of adding lab, I think .ix IS the label based index
(but reversing myself from earlier, maybe rename ix to loc)?
obviously need to keep ix too, just keeping the names in common

So what about the situations right now where ix favors positions over labels even when labels could work (like #2727 or the weird mixed string-integer case above)? Just leave them as-is and have loc adopt the same behavior?

@stephenwlin

fwiw regardless of what happens with ix I don't think any new indexer property (unless it's just an alias for ix) should have any mixing of label-based and position-based semantics whatsoever, even if it's just a big try: labels() except: positions() block around everything, because no matter how hard you try to make the transition between the two regimes clean you'll always find cases where a minor change in data can silently break code (in which case it's better to have an error instead).

simple but somewhat realistic example: if you have an integer index with labels, a integer slice will work as label-based if both sides are present in the index or if the index is monotonic, but, in the latter case, if a label is added anywhere which breaks monotonicity (which might happen because of a single bad row), suddenly you can't use labels anymore and you'll get entirely different results.

you can, of course, special case this with more complicated rules about monotonicity detection, etc....but I suspect that going down that route will eventually lead to same situation we're in now, except you'll have two separate indexers with unpredictable semantics that you have to worry about not breaking for backward compatbility, rather than one :O so better to just make a hard rule against any fallback, imho, even if it's less convenient this way.

@stephenwlin

Actually, going to contradict what i just said, but I could imagine the following maybe working (this is real slipperly slope territory...)

  1. try labels
  2. if that fails, check if index has any numerical data (possibly even including datetime64[ns]) whatsoever, re-raise if it does
  3. try positions

The idea being that you only get to step 3 if there's no possibility the integers you provided were really meant to be labels, since your labels are completely non-numeric. It's pretty stringent, but that would preserve the use case of integer indexing on strings, such as the one @nehalecky provided. (It might be expensive unless we cache the result of the check in step 2 somehow, though...I'm not sure if inferred_type captures this 100% accurately already or not)

Maybe there's a way to break this, too, though, with a pathological case? can't really think of one off the top of my head, but who knows? :D

(Also, possibly including datetime64[ns] as numeric is too stringent...probably you'll never find a case where the range of timestamp values overlaps the range of possible positional indexes...but it's a slippery slope the more and more cases you start considering...)

@jreback

should iloc/loc accept boolean indexers?
I think yes, and it's easy to do

so going to show an example later but
I think these are the indexing possibilities

loc : labels, slice of labels, listlike of labels, boolean
iloc : positions, slice of integers, neg integers, listlike of integer, boolean

note that floats/datetime indicies are by definition label based

anything missing?

@janschulz

Thanks a lot for introducing this. .ix was and still is a lot of headache for me (and others, judging by the bugreports about integer slicing) and if that is replaced by two atributes, one for "use this for labels" and one for "use that for positions", I'm very happy :-)

Regarding .ix: why not add a deprecation warning (printed only once per session or maybe even always in "fixing" mode) which states that the attribute is removed in and what the replacement is. So users of the book have an idea what to look for as a replacement and @wesm an excuse to update the book with all the rest of the shiny new features which pandas got since the book was printed :-) I would probably buy it again...

@jreback

not sure about deprecations yet (will prob be FutureWarnings)...maybe for 0.11 will just introduce new functionaility,
as get closer to release will see

@nehalecky

Waking up to such a nice discussion, awesome.

I am a little shy about throwing around radical ideas, but I do enjoy them! I don't feel I have enough clout yet with pandas development to make such huge suggestions, but am totally in sync the the ideas in this thread...

👍 re @jreback's radical proposal:

so unified indexing then would be:

loc, iloc (general access)
at, iat (fast access for single values)

I believe this would clean up the namespace for similar functionality, and also sets establishes clear nomenclature for the future functionality (and perhaps even some spill over into other packages, which would be fun to see).

Regarding .ix, my mixed feelings have cleared up a bit after thinking about its use in:

  • @wesm's (grand) book,
  • dependancies in existing code,
  • the convineince it provides as a quick (and sometimes dirty) indexer

and with this feel it should it should be left in. But perhaps with better documented behavior (or slightly restructured) as per @stephenwlin's suggestion (if I understood him correctly):

the following maybe working (this is real slipperly slope territory...)

  1. try labels
  2. if that fails, check if index has any numerical data (possibly even including datetime64[ns]) whatsoever, re-raise if it does
  3. try positions

We could think of it as the convenient yet sometimes unpredictable, beloved, legacy indexer, with a healthy YRMV disclaimer attached, for the pathological cases out there.

👍E+06 re @jreback's suggestion for boolean indexers, just awesome. I look forward to examples, but the list of the indexing possibilities is grand:

loc : labels, slice of labels, listlike of labels, boolean
iloc : positions, slice of integers, neg integers, listlike of integer, boolean

There is, however, one more thing I'd wish to see, and I'll follow up in an additional comment...

@stephenwlin

and with this feel it should it should be left in. But perhaps with better documented behavior (or slightly restructured) as per @stephenwlin's suggestion (if I understood him correctly):

Well, if .loc is meant to be purely labels and will not fall back to positions, then this wouldn't be needed...just let .ix remain as is for backward compatbility (well, at least until people have time to migrate over to .loc and .iloc, definitely can't do it all at once. I only meant that as a suggestion because it seemed like @jreback was in favor of simply renaming .ix to .loc and keeping it as-is, but I might have misunderstood what was being proposed...I was suggesting this as alternate semantics for a new .loc if some kind of fallback behavior was really deemed necessary.

@jreback

will see how far I get, but already created iat, at (pretty trivial actually), iloc basically done, loc is right now a sub-class of ix, so can modify down the road. The testing is the trickiest part here, trying to generalize.

@stephenwlin

so to be clear .loc won't have any fallback to positions?

@jreback

in theory yes, but not written (its a sub-class of _NDFrameIndexer, so can resuse whatever), but have to rip apart the code.....I'll push a commit in a bit....

@stephenwlin

if you haven't written it yet, it might be easier just to add a boolean flag _no_positional_fallback (but more terse) to _NDFrameIndexer and pass it through __init__ along with the name...it'll be easier to reuse code without refactoring everything into smaller functions.

@jreback

yep...

though that has the 'quirks' of ix...
I think we ought to be quite strict, for example:
(this is issue #2911)

df = DataFrame([[1,2], [3,4]], index=['X', 'Y'], columns=['A', 'B'])
df.ix[['X', 'Z'], :]

Out[1]: 
    A   B
X   1   2
Z NaN NaN    < -- why does this not error?

If the keys that are passed (and once expanded by slicing), are not WHOLLY contained within the indices,
it should raise KeyError?

so this would also be an error (works in ix though)

df = DataFrame(np.random.rand(4,4),index=range(0,8,2))
df.loc[0:3,:]
@janschulz

The last "why does this not error" seems to be similar to #2033...

@stephenwlin

IIRC it's inheriting that behavior from reindex, where it's explicitly allowed.

@jreback

so all in favor of making .loc STRICT, raise your mouse!
could be softened a bit by only requiring endpoints to be in the index as well...

@janschulz

Strict +1! Either label or index position, no fallback. And put only this new ways into the "newbie" documentation :-)

@nehalecky

+1 .loc STRICT (and leaving .ix as is).

@jreback

just so you understand....the following will also be disallowed:

df = DataFrame(np.random.rand(4,4),columns=['A','B','C','D'])
df.loc[0:2,0:2]

but

df.loc[0:2,df.columns[0:2]]

would of course work

@jreback

@nehalecky .ix will be the same

@nehalecky

Regarding what I also wanted to see. Perhaps this should be posted in a separate feature request, as I think it would likely bring up some complications to what is being proposed, but while we are on the topic of boolean indexers for labels...

Maybe like a some others here, I need the ability to filter on index labels using pattern/regexes. For some of the work I do, this is a really helpful feature, allowing for a flexible approach to selecting/grouping across non-homogenous data sets, where the construct of a hierarchical index is a little too rigid for application. Before scoffing at such a bold suggestion, let me just explain a bit. :)

Right now, I achieve this by monkey patching with the following code (disclaimer, nothing pretty here):

def lab_select(self, pat, axis=1):
    if axis == 1:
        s = pd.Series(self.columns)
        return self[s[s.str.contains(pat)]]
    else:
        s = pd.Series(self.index)
        return self.ix[s[s.str.contains(pat)]]

pd.DataFrame.lab_select = lab_select

Heterogeneous data

One scenario where I find this useful is working with mixed vector and scalar datasets. For example, I have a simple time series that I store into a pandas DataFrame, with columns like:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 36320 entries, 2013-02-24 23:50:53+00:00 to 2013-02-25 00:21:08.950000+00:00
Data columns:
lat                   36320  non-null values
long                  36320  non-null values
alt                   36320  non-null values
speed                 36320  non-null values
course                36320  non-null values
verticalAccuracy      36320  non-null values
horizontalAccuracy    36320  non-null values
acceleration_X        36320  non-null values
acceleration_Y        36320  non-null values
acceleration_Z        36320  non-null values
Heading_X             36320  non-null values
Heading_Y             36320  non-null values
Heading_Z             36320  non-null values
TrueHeading           36320  non-null values
MagneticHeading       36320  non-null values
HeadingAccuracy       36320  non-null values
Rotation_X            36320  non-null values
Rotation_Y            36320  non-null values
Rotation_Z            36320  non-null values
dtypes: float64(15), int64(4)

You might notice that the labeling for the parameters is consistent, and I am able to grab, say, all acceleration data via a simple, df.lab_select('accel'). 3D vector data contains a common subscript, allowing you to grab all vector data, df.lab_select('_'), or similarly, all x component vector data via df.lab_select('_X'). My life feels lighter when I do these things, I swear.

Finite element data analysis

Outside of vector and scalar quantities, another situation where I've found this functionality extremely helpful is in using pandas in pre and post processing of input and output data from legacy finite element simulators (legacy as in, written in Fortran 77). Even while still being generated, the keys that represent each element and element connection are alpha numeric, and have an underlying construct that represents different types of geologic strata and/or structures in a finite element grid. We leverage this underlying construct to analyze unique element groups and/or connections. Right now, this is done via some big perl scripts with regexes, but it's so much nicer to work with pandas.

I've tried fitting this into a hierarchical index, but find that trying to come up with a logic for the many possible combinations of connections between different elements makes it become cumbersome quickly. Instead I just resort to grouping them with a simple regex/pattern, and this works grand.

Anyways, implementing this is another question, and I don't see how it could be implemented under the proposed .loc as an advanced boolean for labels without messing up intended behavior (i.e., .loc['regex/pattern'] can have case where 'regex' identically matches a label or some part). Perhaps, better, is to expose Index class to some of the str methods (like .contains) so one could do a df.loc[df.index.contains('regex/pattern')], but I still like the idea of it being a core functionality of the indexers. :)

Again, this could be totally appropriate for a separate issue, or perhaps isn't liked at all. As always, your thoughts are appreciated. Let me know if something isn't clear…. Thank you!

@jreback

pretty sure filter/select does this aleady (frame methods)

(as a side note prob going to extend filter to other axis anyhow)

I do similar things, but I have homogeneous axes, so I use a Panel

@jreback

if you want to do this directly, something like this would work:

matcher = re.compile(regex)
def crit(i):
    return bool(matcher.search(i))

a boolean of the matches = df.index[np.asarray([crit(label) for label in df.index ])]
@nehalecky

Wow. Somehow I haven't been privy to filter! Thanks for the tip and the code suggestions, it seems like they should work fine. In the meantime, look forward to seeing the new commits! Thanks @jreback!

@janschulz

@jreback: yes, I would expect the first example to fail (if loc is the attribute for "by label"): the attributes should dictate that the slice is interpreted as integer locations (iloc) or lable (loc).

This should also fail:

df = DataFrame(np.random.rand(4,4),columns=['A','B','C','D'], index=['A','B','C','D'])
df.loc[0:2,df.columns[0:2]]

If you want to slice one by label location and one by integer location, you would need two steps.

df.loc[:,"A":"B"].iloc[0:2,:]

BTW: "lloc" for "lable location" and "iloc" for "integer location"?

@jreback

@JanSchulz

(Pdb) df
          A         B         C         D
A  0.963910  0.828651  0.284636  0.350293
B  0.706649  0.146676  0.302511  0.226628
C  0.615977  0.467022  0.522566  0.528801
D  0.641034  0.581319  0.380756  0.270585
(Pdb) df.loc[:,"A":"B"].iloc[0:2,:]
          A         B
A  0.963910  0.828651
B  0.706649  0.146676
@jreback

any one have a chance to try out?

@nehalecky

Hey @jreback, just saw these recent commits, they look really good. I'll fetch these updates and check them out soon. BTW, this:

(Pdb) df.loc[:,"A":"B"].iloc[0:2,:]
         A         B
A  0.963910  0.828651
B  0.706649  0.146676

... was grand to see. I look forward to implementing these new little friends, .loc and .iloc, in my own work.

@hugadams
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
@@ -32,6 +32,56 @@ attention in this area. Expect more work to be invested higher-dimensional data
structures (including Panel) in the future, especially in label-based advanced
indexing.
+Choice
+------
+Starting in 0.11.0, indexing has had a number of user-requested additions in
+order to support more explicit location based indexing. Pandas now supports
+three types of multi-axis indexing.
+
+ - ``.loc`` is strictly label based, will raise ``KeyError`` when the items are not found,
+ allowed inputs are:
+
+ - A single label, e.g. ``5`` or ``'a'`` (note that an integer *is* a label too)

"(Note that 5 is used as a label of an integer based index)"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
@@ -32,6 +32,56 @@ attention in this area. Expect more work to be invested higher-dimensional data
structures (including Panel) in the future, especially in label-based advanced
indexing.
+Choice
+------
+Starting in 0.11.0, indexing has had a number of user-requested additions in
+order to support more explicit location based indexing. Pandas now supports
+three types of multi-axis indexing.
+
+ - ``.loc`` is strictly label based, will raise ``KeyError`` when the items are not found,
+ allowed inputs are:
+
+ - A single label, e.g. ``5`` or ``'a'`` (note that an integer *is* a label too)
+ - A list or array of labels ``['a', 'b', 'c']``
+ - A slice object with labels ``'a':'f'``
+ - A boolean array
+
+ - ``.iloc`` is strictly integer based, will raise ``IndexError`` when the items are not found

.iloc is strictly integer +position+ based +(from 0 to length-1 of this axis)+, will raise IndexError when the items are not found +(i.e. the underlying structure is not as long)+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on the diff Mar 2, 2013
doc/source/dsintro.rst
@@ -437,8 +437,8 @@ The basics of indexing are as follows:
:widths: 30, 20, 10
Select column, ``df[col]``, Series
- Select row by label, ``df.xs(label)`` or ``df.ix[label]``, Series

remove xs example? Where is xs actually still needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz and 1 other commented on an outdated diff Mar 2, 2013
RELEASE.rst
@@ -120,6 +127,8 @@ pandas 0.11.0
- Support null checking on timedelta64, representing (and formatting) with NaT
- Support setitem with np.nan value, converts to NaT
+ - ``icol`` with negative indicies was return ``nan`` (see GH2922_)

-was return-+returned+?
Was the nan a bug or is that now a feature? Maybe add new behaviour?

@jreback
jreback added a note Mar 2, 2013

this wasn't working (with a list of negative indicies), but pseudo deprecating icol anyhow (its fixed now in any event)

n [11]: df
Out[11]: 
          0         1         2         3
0  0.143576  0.957356  0.990652  0.852501
1  0.187392  0.722276  0.761076  0.037906
2  0.793335  0.975346  0.251745  0.616150
3  0.020577  0.438545  0.872616  0.491270
4  0.480611  0.318490  0.563682  0.994372
5  0.795428  0.786869  0.096081  0.300232

In [12]: df.icol([1,2])
Out[12]: 
          1         2
0  0.957356  0.990652
1  0.722276  0.761076
2  0.975346  0.251745
3  0.438545  0.872616
4  0.318490  0.563682
5  0.786869  0.096081

In [13]: df.icol([-1,-2])
Out[13]: 
    3         2
0 NaN  0.000000
1 NaN  0.990652
2 NaN  0.761076
3 NaN  0.251745
4 NaN  0.872616
5 NaN  0.563682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
+ allowed inputs are:
+
+ - A single label, e.g. ``5`` or ``'a'`` (note that an integer *is* a label too)
+ - A list or array of labels ``['a', 'b', 'c']``
+ - A slice object with labels ``'a':'f'``
+ - A boolean array
+
+ - ``.iloc`` is strictly integer based, will raise ``IndexError`` when the items are not found
+ allowed inputs are:
+
+ - An integer e.g. ``5``
+ - A list or array of integers ``[4, 3, 0]``
+ - A slice object with ints ``1:7``
+ - A boolean array
+
+ - ``.ix`` support mixed integer and label based access. It will defer to label based, but

defer? Isn't it "first try label based position"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
@@ -32,6 +32,56 @@ attention in this area. Expect more work to be invested higher-dimensional data
structures (including Panel) in the future, especially in label-based advanced
indexing.
+Choice
+------
+Starting in 0.11.0, indexing has had a number of user-requested additions in
+order to support more explicit location based indexing. Pandas now supports
+three types of multi-axis indexing.
+
+ - ``.loc`` is strictly label based, will raise ``KeyError`` when the items are not found,
+ allowed inputs are:
+
+ - A single label, e.g. ``5`` or ``'a'`` (note that an integer *is* a label too)
+ - A list or array of labels ``['a', 'b', 'c']``
+ - A slice object with labels ``'a':'f'``

"+(Contrary to usual python slices, both the start and the endpoint are returned!)+"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
+ - A list or array of labels ``['a', 'b', 'c']``
+ - A slice object with labels ``'a':'f'``
+ - A boolean array
+
+ - ``.iloc`` is strictly integer based, will raise ``IndexError`` when the items are not found
+ allowed inputs are:
+
+ - An integer e.g. ``5``
+ - A list or array of integers ``[4, 3, 0]``
+ - A slice object with ints ``1:7``
+ - A boolean array
+
+ - ``.ix`` support mixed integer and label based access. It will defer to label based, but
+ fallback to integer access. ``.ix`` is the most general and will support any of the inputs
+ to ``.loc`` and ``.iloc``, as well as support for floating point label schemes.
+

+As using integer slices with ix have different behavior depending on whether the slice is interpreted as integer location based or label position based, it's usually better to be explicit and use iloc (integer location) or loc (label location).+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
+ - A slice object with labels ``'a':'f'``
+ - A boolean array
+
+ - ``.iloc`` is strictly integer based, will raise ``IndexError`` when the items are not found
+ allowed inputs are:
+
+ - An integer e.g. ``5``
+ - A list or array of integers ``[4, 3, 0]``
+ - A slice object with ints ``1:7``
+ - A boolean array
+
+ - ``.ix`` support mixed integer and label based access. It will defer to label based, but
+ fallback to integer access. ``.ix`` is the most general and will support any of the inputs
+ to ``.loc`` and ``.iloc``, as well as support for floating point label schemes.
+
+Multi-axes gettting uses the following notation, using an example with ``.loc`` but applies to all

-gettting-+getting+?
Or better "Getting values from objetcs with multi-axes used the following notation (using .loc as an example, but applies to .iloc and .ix as well)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
@@ -32,6 +32,56 @@ attention in this area. Expect more work to be invested higher-dimensional data
structures (including Panel) in the future, especially in label-based advanced
indexing.
+Choice
+------
+Starting in 0.11.0, indexing has had a number of user-requested additions in

From my understanding "indexing" is an internal word, I see these attributes as "getting portions of the source data structure". I understand "indexing" as what google does with the web. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on the diff Mar 2, 2013
doc/source/indexing.rst
@@ -529,27 +731,6 @@ numpy array. For instance,
dflookup.lookup(xrange(0,10,2), ['B','C','A','B','D'])
-Advanced indexing with integer labels

I think this section should still be included in the ".ix" portion of the documentation.

Maybe a new headline "Problems with .ix and integer index" and add a reference to .loc and .iloc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@janschulz janschulz commented on an outdated diff Mar 2, 2013
pandas/core/indexing.py
+
+ def _getitem_axis(self, key, axis=0):
+ raise NotImplementedError()
+
+ def _getbool_axis(self, key, axis=0):
+ labels = self.obj._get_axis(axis)
+ key = _check_bool_indexer(labels, key)
+ inds, = key.nonzero()
+ try:
+ return self.obj.take(inds, axis=axis)
+ except (Exception), detail:
+ raise self._exception(detail)
+
+class _LocIndexer(_LocationIndexer):
+ """ purely label based location based indexing """
+ _valid_types = "labels (MUST BE INCLUSIVE), slices of labels, slices of integers if the index is integers, boolean"

_valid_types = "label, slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean array"
?
[updated to have similar wording as .iloc below]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback

@JanSchulz docs updated.....

@jreback

also if someone has some interesting multi-index use cases

FYI - this is where ix really is useful as iloc/loc restrict u to integers/labels on ALL levels (that h want to select)

@nehalecky nehalecky commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
@@ -32,6 +32,78 @@ attention in this area. Expect more work to be invested higher-dimensional data
structures (including Panel) in the future, especially in label-based advanced
indexing.
+Choice
+------
+
+Starting in 0.11.0, object selection has had a number of user-requested additions in
+order to support more explicit location based indexing. Pandas now supports
+three types of multi-axis indexing.
+
+ - ``.loc`` is strictly label based, will raise ``KeyError`` when the items are not found,
+ allowed inputs are:
+
+ - A single label, e.g. ``5`` or ``'a'``
+
+ (note that ``5`` when used as a *label* of an integer based index)

Hey @jreback, unless I am misunderstanding some of the terminology, I think you mean label based index here. Also, I think it helps to be a bit more explicit (being that the use of an integer as a label is subtle, yet oh so important, distinction). Something like:
(note that5when used as a *label* of a label based index. This use is **not** an integer position along index).
Let me know what you think about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nehalecky nehalecky commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
+
+Pandas provides a suite of methods in order to have **purely label based indexing**.
+This is a strict inclusion based protocol. **ALL** of the labels for which you ask,
+must be in the index or a ``KeyError`` will be raised!
+
+When slicing, the start bound is *included*, **AND** the stop bound is *included*.
+Integers are valid labels, but they refer to the label *and not the position*.
+
+The ``.loc`` attribute is the primary access method.
+
+The following are valid inputs:
+
+ - A single label, e.g. ``5`` or ``'a'``
+
+ (note that ``5`` when used as a *label* of an integer based index)
+ - A list or array of labels ``['a', 'b', 'c']``

Here also, if you agree with my suggestions. Again, something like:
(note that5when used as a *label* of a label based index. This use is **not** an integer position along index).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nehalecky nehalecky and 1 other commented on an outdated diff Mar 2, 2013
doc/source/indexing.rst
+ raise ``IndexError`` when the requested indicies are out of bounds. Allowed inputs are:
+
+ - An integer e.g. ``5``
+ - A list or array of integers ``[4, 3, 0]``
+ - A slice object with ints ``1:7``
+ - A boolean array
+
+ See more at :ref:`Integer indexing <indexing.integer>`
+
+ - ``.ix`` supports mixed integer and label based access. It is primarily label based, but
+ will fallback to integer positional access. ``.ix`` is the most general and will support
+ any of the inputsx to ``.loc`` and ``.iloc``, as well as support for floating point label schemes.
+ As using integer slices with ``.ix`` have different behavior depending on whether the slice
+ is interpreted as integer location based or label position based, it's usually better to be
+ explicit and use ``.iloc`` (integer location) or ``.loc`` (label location).
+

Great description for .ix. With this, the functionality of the indexers is clear. One small comment: with regards to terminology describing the location / position along indices, do you think it would be better to be more terse, so that we only use one word to refer to this (i.e, location), or unique words for describing position of labels and location of integers? To me, as an advanced data structure, the DataFrame (or pandas data structures in general) changes some core assumptions one might have about fundamental behavior of any array-like structure, due to the amazing functionality via its indexing methods. Semantically, the words position and location are quite similar, however, do have subtle distinctions (i.e., location tends to be fixed while position has a more relative connotation). It could be of benefit to use this distinction in developing a concise system metaphor, so we are all on the same band wagon when referring to these behaviors. Just a thought. :)

@jreback
jreback added a note Mar 2, 2013

ok....are you suggesting then we use 'position' when talking about integer indexing, while 'location' for label indexing? Maybe point out a specifc place where we should use each?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback

@nehalecky and @JanSchulz incorporated both your changes (though still give a read thru to see if 'position' indexing versus 'label/location' indexing is not clear (or maybe should be more consistent). Also added a 10min to pandas sections for newbies.....Hopefully short and sweet....Feel free to suggest changes anywhere.

@wesm if ok will most of this, I think it might be useful to merge it in to give the external docs a look?

@jreback

anybody have a chance to go over docs again?
10min section, good addition?

@wesm
Python for Data member

Let me give this a more careful look over in a little bit. The 10 minute section looks good at a glance, very cool

@janschulz

The 10min introduction is great!

Re multi-index: an example would be nice (I tried but couldn't get it to work or did simple misunderstood something...)

import pandas as pd
df = pd.DataFrame(data={"a":["A","A","B","B"],"b":["X","Y","X","Y"], "c":[1,2,3,4],"d":[5,6,7,8]})
# First the difference of using label and integer location on an integer index:
df.iloc[1:2]
df.loc[1:2]
df2 = df.set_index(["a","b"])
# This still works and nicely shows the problem of mixing label and integer location
df2.ix["A",0:1]
df2.loc["A",df2.columns[0:1]]
# But how do I select the A->X row? Do I need two slicing operations?
df2.ix["A",:].ix["X",:]

Also some oddities:

df2.ix[1,"d"] # Why does this not use the fallback and returns the "B"-Part of the dataframe?
@jreback

@JanSchulz

as you noticed, .ix and .loc perform the same here

In [14]: df2.ix['A'].ix['X']
Out[14]: 
c    1
d    5
Name: X, dtype: int64

In [15]: df2.loc['A'].loc['X']
Out[15]: 
c    1
d    5
Name: X, dtype: int64

### this is the method you are after ### (ix/loc interchangebale here)
In [28]: df2.ix[('A','X')]
Out[28]: 
c    1
d    5
Name: (A, X), dtype: int64

.iloc is a shortcut here (but not generally) (this is a good case of you can't do this currently)

In [20]: df2.iloc[0]
Out[20]: 
c    1
d    5
Name: (A, X), dtype: int64

your second example is not really supported in current (or new stuff)
you can get some results, but I don't think they are really a good way of doing it...


In [26]: df2.iloc[2,:]
Out[26]: 
c    3
d    7
Name: (B, X), dtype: int64

In [27]: df2.ix[('B','X')]
Out[27]: 
c    3
d    7
Name: (B, X), dtype: int64
@jreback

heres an example of tricky multi-index setting
.ix pretty much handles these cases (with the exception of the last one), where you want to select
MULTIPLE tuples and then set the values in a non-trivial selection set....not sure how much you would really use this.....but...

http://stackoverflow.com/questions/15200598/partial-update-to-dataframe-with-multi-index-index-with-integer-labels/15213525#15213525

@jreback

anyone have any issues with me merging this soonish?

speak now or forever hold your gitses :)

jreback added some commits Feb 24, 2013
@jreback jreback ENH: add .loc attribute to provide location-based indexing
TST: added multi-index tests

DOC: changed loc -> iloc
     added more docs

ENH: added integer lists as indexers to iloc

ENH: raise correctly on out-of-bounds slicing
     support negative indexing in iloc and icol

CLN: move all indexings (ix/iloc) to PandasObject in generic.py
     (except _SeriesIndexer in series.py)
     add name parameter to Indexer creation, makes indexers independent
     of their external names
cb96f77
@jreback jreback TST: new test suite for indexing 02ed791
@jreback jreback ENH: added loc/at/iat indexers ....almost done 7cc64d6
@jreback jreback DOC/TST: revised indexing section in docs
         updated whatsnew
         all tests work

DOC: changes suggested by Jan Schulz
     revised whatsnew to include mostly references to new indexing
28c3d9a
@jreback jreback DOC: added 10min newbie intro to pandas
     changes in indexing suggested by Jan Schulz, and nehalecky

DOC: added plotting,reshaping, more examples in setting to 10min.rst

DOC: more doc updates, added more examples in selection
     added join to 10min

DOC: release notes and whatsnew updates for 10min
fbf1977
@jreback jreback DOC: revamped dtypes section in basics.rst
     fixed removal of foo temp files in 10min

DOC: added to time series in 10min.rst
643e1cb
@jreback jreback DOC: added sorting examples to 10min
BUG: fixed multi-index selection via loc, back to using some
     of ix code (but still do validation if not mi)

ENH: add xs to Series for compatiblity, create _xs functions in all objects

DOC: added several sub-sections to 10min
     fixed some references in basics.rst
41793ea
@wesm
Python for Data member

It looks good to go to me. Someday, Jeff, we'll get you to obey 80-character line length =P No APIs are changed, right?

@jreback

no APIs changed. No actual depreciations (just a note that we could deprecate some)

@jreback jreback merged commit 0e17518 into pydata:master Mar 7, 2013
@jreback

ok....docs are updated, so pls give take a look and let me know any changes

http://pandas.pydata.org/pandas-docs/dev/indexing.html

@hugadams
@nehalecky

Yeah, thank you. I've just pull in master and installed. Going to start using this immediately in some current work. Thanks for all the great work, it's really appreciated.

@hugadams
@hugadams
@jreback

see this (and many other questions about this)
http://stackoverflow.com/questions/5160339/floating-point-precision-in-python-array

I guess you could np.round

or

eps = 1e-12
df[(df<eps) & (df>-eps)] = 0
@hugadams
@hugadams
@jreback
@hugadams
@hugadams
@jreback
@hugadams
@jtratner
Python for Data member

I know this is from a while back - but could we add a note to the docs about how you replace irow/icol with iloc? I.e., we have that list of what's deprecated, but do something like:

  • icol - use .iloc[n,:] instead
  • irow - use .iloc[:, n] instead for DataFrame and .iloc[n] for Series
  • iget_value - ??

You can get the functionality of irow in a relatively comprehensible way with head, but iloc for icol feels less intuitive, especially for someone who's just starting out (I guess you can replace with df[df.columns[0]])

@jreback

there is an example in the indexing docs and 10min IIRC

@jtratner
Python for Data member

okay I'll try to find it and then add it to the docs there with examples when I do.

@hugadams
@jreback

@hugadams you are commening on an older iloc issue? is that correct?

@hugadams
@jreback

If the float values are getting 'changed' then they are different and may not match very well; aligning on float indexes is probably not a good idea. You can try 0.14.0 which has a better Float64Index engine (they are real floats as opposed to object so much faster, but should be the same matching behavior). If you are still having issues, try to narrow it down (maybe pickle/dump the frames right before they are merged/joined and show them), and open a new issue.

@jreback

best to post on a new issue
showing an example of what u r doing (that is copy-past able)

@hugadams

Wow, sorry. I was sending this to the mailing list and must have put an autofilled address to this thread. My apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment