Skip to content

Loading…

ENH/BUG: Fix names, levels and labels handling in MultiIndex #4039

Merged
merged 4 commits into from

5 participants

@jtratner
Python for Data member

This PR covers:

Fixes: #4202, #3714, #3742 (there are might be some others, but I've blanked on them...)

Bug fixes:

  • MultiIndex preserves names as much as possible and it's now harder to overwrite index metadata by making changes down the line.
  • set_values no longer messes up names.

External API Changes:

  • Names, levels and labels are now validated each time and 'mostly' immutable.
  • names, levels and labels produce containers that are immutable (using new containers FrozenList and FrozenNDArray)
  • MultiIndex now shallow copies levels and labels before storing them.
  • Adds astype method to MultiIndex to resolve issue with set_values in NDFrame
  • Direct setting of levels and labels is "deprecated" with a setter that raises a DeprecationWarning (but still functions)
  • New set_names, set_labels, and set_levels methods allow setting of these attributes and take an inplace=True keyword argument to mutate in place.
  • Index has a rename method that works similarly to the set_* methods.
  • Improved exceptions on Index methods to be more descriptive / more specific (e.g., replacing Exception with ValueError, etc.)
  • Index.copy() now accepts keyword arguments (name=,names=, levels=, labels=,) which return a new copy with those attributes set. It also accepts deep, which is there for compatibility with other copy() methods, but doesn't actually change what copy does (though, for MultiIndex, it makes the copy operation slower)

Internal changes:

  • MultiIndex now uses _set_levels, _get_levels, _set_labels, _get_labels internally to handle labels and levels (and uses that directly in __array_finalize__ and __setstate__, etc.)
  • MultiIndex.copy(deep=True) will deepcopy levels, labels, and names.
  • Index objects handle names with _set_names and _get_names.
  • Index now inherits from FrozenNDArray which (mostly) blocks mutable methods (except for view() and reshape())
  • Index now actually copies ndarrays when copy=True is passed to constructor and dtype=None
@cpcloud cpcloud commented on an outdated diff
pandas/core/index.py
@@ -1568,6 +1561,21 @@ def __repr__(self):
def __len__(self):
return len(self.labels[0])
+ def _get_names(self):
+ return [level.name for level in self.levels]
+
+ def _set_names(self, values):
+ values = list(values)
+ if len(values) != self.nlevels:
+ raise ValueError(('Length of names (%d) must be same as level '
+ '(%d)') % (len(values),self.nlevels))
@cpcloud Python for Data member
cpcloud added a note

complete bikeshedding, but no need for double parens here, as long as there's one set of parens python knows what 2 do :)

@jtratner Python for Data member

heh, I just moved it straight from __new__, certainly worth it to change. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jtratner
Python for Data member

as an aside, this leads to weird things like

idx.names = list("abcdef")
idx.droplevel("d") # Gives an index out of range-esque error
@cpcloud
Python for Data member

yep i used to get that and i hacked around by recreating frames/series and some other trickery that will hopefully never see the light of day :)

@jtratner
Python for Data member

@cpcloud okay, I think I'm understanding more of the problem here: when you slice an index, the levels remain the same...e.g.:

chunklet = idx[-3:]
assert chunklet.levels[0] is idx.levels[0] # True

So, when you assign names, it mutates the underlying levels of both. This seems to follow the convention in __new__ to assign names to the levels directly...but if you do this after the fact, you end up mutating earlier copies.

Moreover, if you pass levels to the MultiIndex constructor, it doesn't copy them, e.g.:

new_idx = MultiIndex(idx.levels, idx.labels)
assert new_idx.levels[0] is idx.levels[0] # True

so what ought to be happening here? Should names be assigned to underlying levels or just left alone?

@cpcloud cpcloud commented on an outdated diff
pandas/tests/test_index.py
((22 lines not shown))
+
+ # initializing with bad names (should always be equivalent)
+ major_axis, minor_axis = self.index.levels
+ major_labels, minor_labels = self.index.labels
+ assertRaisesRegexp(ValueError, "^Length of names", MultiIndex, levels=[major_axis, minor_axis],
+ labels=[major_labels, minor_labels],
+ names=['first'])
+ assertRaisesRegexp(ValueError, "^Length of names", MultiIndex, levels=[major_axis, minor_axis],
+ labels=[major_labels, minor_labels],
+ names=['first', 'second', 'third'])
+
+ # names are assigned
+ index.names = ["a", "b"]
+ ind_names = list(index.names)
+ level_names = [level.name for level in index.levels]
+ self.assertListEqual(ind_names, level_names)
@cpcloud Python for Data member
cpcloud added a note

fyi you can't use this because it was introduced in py27

@jtratner Python for Data member
@cpcloud Python for Data member
cpcloud added a note

In python >= 2.7 assertEqual will dispatch to e.g., assertListEqual if two lists are passed, so that's nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jtratner
Python for Data member

@cpcloud made those two changes and rebased.

@cpcloud
Python for Data member

@wesm any reason why names should be settable to a sequence larger than nlevels after the fact?

@wesm
Python for Data member

nope. i just never added any validation and left names as a simple attribute.

@cpcloud
Python for Data member

+1 for a validator here...@jtratner what breaks?

@jreback

this is basically same issue as #3742

@cpcloud
Python for Data member

indeed...close this one, that one?

@jreback

def should be some kind of validator on setting of names attrib in index; weird things can happen if e.g. levels are changed

@jreback

can close that one (maybe move example to here though) as another test case

@cpcloud
Python for Data member

Example from #3742 cc @thriveth

I have raised the issue in this question on Stack Overflow, but I'm not sure it ever made it to the Pandas issue tracker.

I have a MultiIndex'ed DataFrame which I want to expand by using set_value(), but doing this destroys the names attribute of the index. This does not happen when setting the value of an already existing entry in the DataFrame. An easily reproducible example is to create the dataframe by:

lev1 = ['hans', 'hans', 'hans', 'grethe', 'grethe', 'grethe']
lev2 = ['1', '2', '3'] * 2
idx = pd.MultiIndex.from_arrays(
    [lev1, lev2], 
    names=['Name', 'Number'])
df = pd.DataFrame(
    np.random.randn(6, 4),
    columns=['one', 'two', 'three', 'four'],
    index=idx)
df = df.sortlevel()
df 

This shows a neat and nice object, just as I expected, with proper naming of the index columns. If I now run:

df.set_value(('grethe', '3'), 'one', 99.34)

the result is also as expected. But if I run:

df.set_value(('grethe', '4'), 'one', 99.34)

The column names of the index are gone, and the names attribute has been set to [None, None].

@jreback

also #3714 same issue too, except assigning to levels, needs validation as well

@jtratner
Python for Data member

This is what I was trying to get across earlier :) If you pass levels through the MultiIndex constructor, they have their names set to the names keyword argument or to None (and are not copied).

        if names is None:
            # !!!This is why names get reset to None
            subarr.names = [None] * subarr.nlevels
        else:
            if len(names) != subarr.nlevels:
                raise AssertionError(('Length of names (%d) must be same as level '
                                      '(%d)') % (len(names),subarr.nlevels))

            subarr.names = list(names)

        # THIS IS WHERE NAMES GET OVERWRITTEN WITHOUT BEING COPIED
        # set the name
        for i, name in enumerate(subarr.names):
            subarr.levels[i].name = name

An easy solution would be for the MultiIndex to copy the levels it receives first and then rename them. Then, any time you set the names attribute, it would set the name on every level and the names attribute on the copied levels, e.g. force _ensure_index to make a copy if it's already an index, so that at this point you'd be all good:

levels = [_ensure_index(lev) for lev in levels]

Because later on it just assigns it to the object:

subarr.levels = levels

I think levels should be a cached_readonly property, so that you don't end up creating indices twice (once in the levels = part and another on the setattr levels). If not,you could turn levels into a property that makes a copy before setting on the object.

If this all makes sense to you, I can write it up into a PR soon.

@jtratner
Python for Data member

@jreback @cpcloud Are Index and MultiIndex supposed to be immutable? If so, then neither should be able to set names directly (like MultiIndex does in its __new__ method). I think it makes more sense to allow them to mutate on names...much easier internally.

@jreback

well they are immutable on the values
but not names / levels (though they really should be)
because state sharing is a problem then

@jtratner
Python for Data member

@cpcloud I learned something today: Python cheats and compares lists first by object equality, then checks individual items...I'm wondering if this error is lurking other places in the code:

>>> arr1, arr2 = np.array(range(10)), np.array(range(10))
>>> assert [arr1, arr2] == [arr1, arr2] # succeeds
>>> assert [arr1, arr2] == [arr1, arr2.copy()]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> assert_almost_equal([arr1, arr2], [arr1.copy(), arr2])
True

I found this in test_copy, where a test case was not actually testing what it claimed to be testing :P .

    def test_copy(self):
        i_copy = self.index.copy()

        # Equal...but not the same object
        self.assert_(i_copy.levels == self.index.levels)
        self.assert_(i_copy.levels is not self.index.levels)
@jtratner
Python for Data member

Actually that whole test is not great, because it's actually (was) testing that two different lists were created.

@cpcloud
Python for Data member

@jtratner that is good to know.

@cpcloud
Python for Data member

i always assumed that sequence equality was done recursively. docs seem to imply that that is the case..strange

@jtratner
Python for Data member

@cpcloud yep, that's exactly right. (I'm sure that's implementation dependent, but it is important to know if using numpy arrays.

@cpcloud
Python for Data member

maybe some sort of lightweight Levels class is in order? overloading __eq__ to compare levels or something more general for use with labels and levels?

@cpcloud
Python for Data member

Maybe you could use Categorical?

@jtratner
Python for Data member

@cpcloud maybe. Right now I just changed levels, names and labels to return tuples and used assert_almost_equals. Very simple.

On that note - is it okay to make that change? Much easier to prevent erroneous assignment (like index.levels[i] = some_new_index) by producing an immutable object than to try to use something else. Have to change a ton of test cases that assumed names produces a list, but hopefully that's not a big deal...

@cpcloud
Python for Data member

Categorical might be jumping the gun a bit. I would rather have it immutable, but that prevents sharing. a perf comparison might be useful here

@jtratner
Python for Data member

@cpcloud well, I'm using shallow copies/views, which I think means that only metadata is copied. This is necessary anyways, because you want to be able to set names on the underlying levels without worrying about messing up other indexes.

@jtratner
Python for Data member

(I just pushed what I have so far - it's failing because a ton of tests assume that index names, levels and labels will be lists...)

@jtratner
Python for Data member

@cpcloud also, categoricals aren't immutable, right?

@cpcloud
Python for Data member

Just the levels attribute because it's an Index

@cpcloud
Python for Data member

You'd still have to have a tuple of Categorical for multiindex which is not that different from what it sounds like you're doing

@jtratner
Python for Data member

oh, so in other words, change the representation of levels to Categorical? Index is immutable too, so that would be changing the existing behavior (and, potentially, leading to weird behavior if the indices were to be changed under the hood).

@cpcloud
Python for Data member

yeah that's why i said may be jumping the gun on my part

@jtratner
Python for Data member

This may be totally minor, just didn't want to make a decision without being explicit. Currently, to_panel and to_frame do weird things when indexes have no names [because of the naming/mutation issue this PR addresses] and I'm not sure how it's supposed to work.

In master, to_frame mutates the index of the original panel and adds a name if it doesn't exist - this is pretty clearly an error.

>>> from pandas.util.testing import makePanel
>>> panel = makePanel()
>>> panel["ItemA"].index.names
[None]
>>> frame = panel.to_frame() # mutates original panelindex
>>> panel["ItemA"].index.names
 ['major']
>>> frame.to_panel()["ItemA"].index.names
['major']

Here's the behavior I'm currently using:

In [24]: from pandas.util.testing import makePanel

In [25]: panel = makePanel()

In [27]: panel["ItemA"].index.names
Out[27]: [None]

In [28]: frame = panel.to_frame()

In [29]: panel["ItemA"].index.names
Out[29]: [None]

However, the one problem with this approach is that it means that panel.to_frame().to_panel() is not exactly equal to the original panel (nor are individual items equivalent either). (Note: this is only in the case where the index didn't already have a name).
e.g.

In [31]: frame.to_panel()["ItemA"].index.names
Out[31]: ['major']

Now, this wasn't actually the case previously, because it was mutating the original panel to get there, but there was a test case that checked this (incorrectly) and I want to make sure that it's not a big deal to change this.

Here's the test case:

def test_to_frame_mixed(self):
        panel = self.panel.fillna(0)
        panel['str'] = 'foo'
        panel['bool'] = panel['ItemA'] > 0

        lp = panel.to_frame()
        wp = lp.to_panel()
        self.assertEqual(wp['bool'].values.dtype, np.bool_)
        # only passes because original panel has been mutated
        assert_frame_equal(wp['bool'], panel['bool'])

(it passes if you add a check_names=False to the last line)

@jreback

@jtratner

here's the basic problem

Indexes are immutable in there values, so I could have many objects using them, so far so good.
If I then assign say (names, applies to levels too) to a particular index, then its names are changed

the side-effect is that names on the other references to it are also changed (they arent' actually changed,
but since they are pointing to it, they are de-facto changed)

really what should happen is that a NEW index should be created if the values, names, or levels are changed (rather than changing the existing ones)...

I believe this is the path you are going down, no?

@jtratner
Python for Data member

@jreback Yep. I already added set_levels, set_names methods (and I guess I should add set_labels) that have a similar call structure to other methods (i.e., they have an inplace option). Right now levels are not assignable, I'd like to deprecate setting names directly, but I'm not sure whether that's too backwards incompatible.

I had to create a separate FrozenList object to return levels, names and labels because PyTables doesn't work if the multiindex names are not a subclass of list (because DataFrame special-cases lists). Well, frankly, pandas just specialcases lists all over the place, so it was easier to go with the flow.

@jreback

I think I saw somewhere that you used a property validator for .levels, so you can intercept the call (and potentially raise/create a new object, whatever)?

@jreback

@jtratner as an FYI..make sure Index recreates your frozen list on unpickling (set_state), though this maybe not be an issues as its set using the property (e.g. .names)...so prob ok

@jtratner
Python for Data member

@jreback It does - I made sure to change all of those to use the underlying getters and setters instead. I did change __set_state__ and other places to use the underlying methods so everything's recreated properly. [ there's some equivalent for numpy that's called for views, right?]

I added _set_levels and _get_levels which handle all the validation for setting and getting levels, but the levels attribute is readonly.

@jtratner
Python for Data member

@jreback btw - you can't create new objects in setters - they are explicitly used for mutation. If we wanted to deprecate setting names / labels directly, then we could do something like this:

def __set_names(self, names):
     warnings.warn("Setting names directly is deprecated, use the set_names method instead", category=DeprecationWarning)
     return self._set_names(names)

names = property(fget=_get_names, fset=__set_names)

And then remove the fset= and __set_names in a few versions from now.

@jtratner
Python for Data member

and all the copying is just shallow copying (which I think means that it doesn't copy the underlying data, right?)

@jreback

__reduce__ is essentially __get_state__ for built-in types

look at #3714, I think the issue is that indicies are shared between multiple objects (frames), and when one has names/lables/levels changed the other changes by definition too.....

so If someone changes say names then they should get a new object

@jtratner
Python for Data member

@jreback Yeah, none of that is possible with this PR. That was an issue because MultiIndex wasn't making shallow-copies of levels that were already indexes.

So are you saying that it's okay for these to raise errors (about not being settable) now?

df.columns.names = ["a", "b", "c"]
df.index.name = "b"
@jreback

@jtratner no...I think it should be allowed, but the problem is that you need to do some fancy footwork in order to avoid shared indices being modified as well.

you would have to have an observer type patter thru a weak ref or use reference counting.....

but not even sure that this comes up a lot...

@jtratner
Python for Data member

@jreback handling names is pretty simple - you just shallow_copy the underlying levels and assign them names. Given that the levels still share the same ndarrays after shallow_copy, that's not a big deal [the sole thing set in __array_finalize__ is name].

Labels are trickier - the constructor was already copying them, so I assumed it was okay to copy them every time too. Is it fair to assume that it's not a big deal to make a copy of labels for every Index object?

@jreback

that's exactly the problem with names
if changes names on df2 and df also shares that index then it implicitly now has the same names
which is fine until you then try to assign names on df and change on df2

this is why having part of an immutable object being mutable is a problem

levels and labels are the same problem

all should be copied in the constructors
and setting should really return new objects (but as I said that is complicated)

@jtratner
Python for Data member

@jreback is there a reason why labels are stored as ndarrays and not Index objects?

@jreback

not sure

Index a little heavyweight for them

maybe a BasicIndex type would be nice here

@cpcloud
Python for Data member

i can say that for my use cases i often have a nested dict that eventually gets turned into a panel and then somewhere down the line back into a frame. i have to be careful if i want to preserve names across all of this, so it would be nice to be able to not worry about it. that being said, i could probably clean up my code to just use frames and avoid the issue entirely.

@jtratner
Python for Data member

@cpcloud if you already have names, this preserves them throughout. Only if it didn't have names to start will it change (btw - if you have a minimal test case to check that could be helpful)

@cpcloud
Python for Data member

@jtratner i think this is just a reiteration of what you have above

In [33]: from string import ascii_lowercase as letters, ascii_uppercase as LETTERS

In [34]: d = {}

In [35]: for i in xrange(10):
   ....:     d[i] = mkdf(10, 10, c_idx_nlevels=10, r_idx_nlevels=10, c_idx_names=letters[:10], r_idx_names=LETTERS[:10])
   ....:

In [36]: p = Panel(d)

In [37]: p.items
Out[37]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

In [38]: p.majo
p.major_axis  p.major_xs

In [38]: p.major_axis.names
Out[38]: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

In [39]: p.minor_axis.names
Out[39]: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

In [40]: df = p.to_frame(filter_observations=False)

In [41]: df.index.names
Out[41]: ['major', 'minor']

although i'm not sure what can be done here since names would have to a be tuple of tuples or lists. it would be nice if the major and minor axes of a panel get merged into a single index with n1 * n2 * ... * nN where n is the number of levels and the index is the axis (general NDFrame) when p.to_frame was called. that should probably be in a separate PR tho.

@jtratner
Python for Data member

@cpcloud well, it's certainly possible to do I guess. You could use a special MultiIndex for Panel to do it, not sure how much it gains you though. I think that should be a different PR though...this is nearly done, just need to figure out where reindex is overwriting the names attribute (and potentially for #4092 as well).

@cpcloud
Python for Data member

yeah not really much gain, but occasionally i have very large multiindexes that i would like to preserve the names across. anyway it's not a big deal.

@cpcloud
Python for Data member

are we sure about 0.12 for this? how common is setting levels et al? probably not very, just want to make sure since this is an api change

@jreback

i think this can go in 0.13 as one of the first changes (as well as @jtratner object hierarchy changes)

@jtratner
Python for Data member

I have no strong opinion about 0.12 vs. 0.13 for this...just useful to have the object hierarchy changes first so I can finish up this commit.

@jtratner
Python for Data member

@jreback @cpcloud Figured out the problem: it was because MultiIndex didn't define its own astype method. Given that converting a MultiIndex to anything but 'O' dtype makes no sense and raises an error, I was thinking of using this for the MultiIndex astype method. Does this look okay? This is the final element for this set of commits, so once this is decided, can be merged:

    def astype(self, dtype):
        if np.dtype(dtype) == np.object_:
            return self._shallow_copy()
        else:
            raise ValueError("Setting %s dtype to anything other than object is not supported" % self.__class__)

Error, for those interested, is:

In [61]: _.astype(int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-61-08c372efe5bb> in <module>()
----> 1 _.astype(int)

ValueError: setting an array element with a sequence.
@jreback

that looks fine (though you don't actually need the else)

FYI....occasionally there is a need to set a dtype that is not sanctioned, specifically I am refering to Int64Index, there is one specific occasion that I needed to set to a plotform int (which mean int32 on those platforms), in order to do a take IIRC (also internally DateTimeIndex is forces toint64`` in places). So in general good to force dtypes of the indicies, ON THE PUBLIC API (and to prevent dev mistakes)

+1

@jtratner
Python for Data member

@jreback Index still allows astype with arbitrary dtypes, plus you can always either get at values directly or use a view.

Related question, if you want to set the names on an index of a DataFrame, would the preference be to do:

# mutates index
df.index.set_names(new_names, inplace=True)

or

# replaces with new index
df.index = df.index.set_names(new_names)

I'm thinking the latter should be preferred, but both ways are slightly awkward.

@jreback

df.index.names = new_names is not allowed now?

I agree this is the root of all evil, but seems somewhat common to do

@jtratner
Python for Data member

@jreback I have it with a DeprecationWarning on the setter (along with labels and levels)

@jreback

ahh...ok

df.index.set_names(new_names, inplace=True is good, very explict

I am not sure that df.index.set_names(new_names) should be returning an index, that seems kind of odd

@jreback

maybe make it explicity like this

df.index = df.index.copy(names = new_names), e.g. names is already accepted in the constructor, how about in copy too?

@jtratner
Python for Data member

@jreback matches things like df.reindex and df.set_index... on a more practical note, how can an index know what it's attached to? (and/or which dataframe has requested it to change names).

That said, I'm not super convinced that it's important to prevent setting names. It's somewhat important to prevent setting an individual name (e.g. df.index.names[0] = "apple").

Can make copy accept names, levels and labels.

@jreback

note that is an API change (which is good!), you will still be able to implicity change names on other shared indicies (via the inplace kw), but will be much harder to do!

@jtratner
Python for Data member

@jreback okay, so do you want the setter behavior to work?

@jreback

I am still not conviced that df.index = df.index.set_names(new_names) makes sense....

on your other question: the only way for the index to know what its attached to is to track references (would have to be weak), but I think way too much trouble to do/worry about.

e.g. the Index would have to have a weakdict of object to which it is attached (for easy lookup), then say you did a df.index.names = new_names the index could go thru the list and create a new index to which it now points (which had the original names), basically copy-on-write semantics, but this is way too complicated I think

@jreback

I think df.index.names = new_names is ok (if you only try to change a part of the names is bad, so raise on that)

but I don't see why name setting is necessaryily bad (or were there objections?)

however, I would definitly outlaw labels/levels setting inplace (and make them call a function which returns a new index), could just be copy?
pretty sure this just leads to trouble

@jtratner
Python for Data member

@jreback , name setting is only bad if you feel it violates the idea that Index is immutable.

@jtratner
Python for Data member

@jreback I'm leaving it at:

  • names are validated on setting
  • levels/labels cannot be set
  • set_* methods for each
  • copy can take names, levels, labels and dtype arguments.
@jreback

looks good to me
maybe I future can revisit if want to allow inplace names setting

@wesm ?

@jtratner
Python for Data member

@jreback do you want to disallow levels/labels setting right now or just deprecate it?

@wesm
Python for Data member

no objections here

@wesm
Python for Data member

i'm not sure we should disallow setting levels ; there is probably production code out there that relies on it (this isn't an uncommon idiom in R, for example)

@jreback

maybe deprecate only in 0.12 for label/levels (though I would raise if you try to set a single level)
point towards copy as the new way?

@jtratner
Python for Data member

How about a warning like this:

def _discouraged_setter(property_name, method_name, alternate=None):
    msg = "Setting %r directly is discouraged. " % property_name
    if alternate:
        msg += " Use %r instead." % alternate

    def wrapper(self, *args, **kwargs):
        warnings.warn(msg, category=UserWarning)
        meth = getattr(self, method_name)
        return meth(*args, **kwargs)
    wrapper.__name__ = property_name

    return wrapper
@jreback

we might need a different warning for this (as some might want to basically ignore this warning)

maybe explain that Index could be shared among multiple objects?
(or better yet maybe put a wanting section in docs in Index internals section and put a link to it here?)

@jtratner
Python for Data member

@jreback I originally had DeprecationWarning, which gets ignored regularly. Changed to UserWarning when it was discouraged. Maybe it's not even important to prevent this? If you're setting levels or labels directly, you're choosing to do so and it shouldn't be a surprise if it causes unexpected behavior. Internal code can use the set_levels and set_labels, etc.

The only difference is that now setting levels and labels invokes validation behind the scenes...

@jreback

I think we need a warning that is not regularly ignored to point out that while ok should use set_*
DeprecationWarning prob ok

see how many tests it shows up on

@jtratner
Python for Data member

DeprecationWarning doesn't show up. Only one test sets levels directly, rest was just internal setting of levels (which were easily changed)

@jtratner
Python for Data member

@jreback @cpcloud this definitely expanded in scope since the initial naming, but it now covers pretty much everything that was discussed (I edited the pull request description at the top to list all the changes). Important to note:

  • names, labels and levels are properties and validated.
  • names can still be set directly.
  • labels and levels can still be set, but generate DeprecationWarning
  • Created FrozenList and FrozenNDArray objects to make levels, labels and names immutable (can't use a tuple because of type-checking in other parts of the pandas library)
  • Index now inherits from FrozenNDArray (which inherits from PandasObject)
@jtratner
Python for Data member

@cpcloud that's good to know (and I was thinking about that for a bit too). I might go explore that (especially because I think using a tuple would take less memory and require less overriding, etc.

@jtratner
Python for Data member

@cpcloud Hit a problem with that idea: you would have to set it on the metaclass of the object, have to make it Python 2/3 compatible in the metaclass setting and manage how it works everywhere. (so it ends up requiring the metaclass object, the Frozen object and an intermediary class in the MRO required to do metaclass in a way that works for both 2.X and 3.X [e.g., how six does it]). Maybe in the future the typechecking for list could be changed to a general metaclass in core/base (like, isinstance(obj, LookupSequence)) that indicates whether the iterable should be treated like a tuple or like a list.

@cpcloud
Python for Data member

yep i was just reading the pep and about totell u i was wrong :(

@jtratner
Python for Data member

@cpcloud I took out the PEP8 changes for clarity, so it's just changes relevant to this PR (esp b/c they are relatively far-ranging). all documented & such and works.

@jreback

@jtratner looks good
prob need a little bit of docs in v0.13.0 and in multi index section to explain functionality
can merge first thing after release

@jtratner
Python for Data member
@jtratner
Python for Data member

@jreback @cpcloud I've been performance testing this and I'm getting weird results with test_perf: I ran the performance suite 5 times and certain tests (like series_getitem_scalar and dataframe_getitem_scalar) only showed up in some of the logs. Do you know why this might happen?

@jreback

I think there is a parameter to drop a test from the results if it takes less than like 10 us or something (u can change this); sometimes these tests do

eg 0.005 I think gets dropped (in certain runs)

@hayd
Python for Data member

Should equals be sensitive to this meta-y data:

In [1]: i1 = pd.Index([1,2])

In [2]: i2 = pd.Index([1,2], name='a')

In [3]: i1.equals(i2)
Out[3]: True

Changing may help find some other cases?

@jtratner
Python for Data member

@hayd Let's say I have two different dataframes with different indexes that I want to merge. If the indexes are the same except that they have different names, seems to me they should be considered equal (given that the merge would happen on the value rather than the names). We could add a parameter to equals that says "Check metadata", but aside from that. I could potentially see changing equals causing a performance hit, particularly with MultiIndex.

@hayd
Python for Data member

I disagree there'd be a performance hit, at least not one worth worrying about (usually the names will be much smaller than the entire index).

Does join/merge check whether the indexes are equal?
Agree join/merge is a tricky one for the propogation of names (I think I left it as a "TODO" when I changed check_names argument to assert_frame_equal to check index/columns names by default).... and current behaviour is a little flaky.

Was just suggesting that changing this behaviour may highlight where we have ambiguities (even if we decide not to care).

@jreback

equals is just a check on the values; I agree with @hayd a perf hit is not the issue (I don't think there will be one).
however, equals checks occur a lot.

The bigger issue is we can't distinguish (right now) between 2 indices with equal values but different names, but in some sense, who cares? its like a referene to a variable. If have a pointing to a value and b pointing to a value, as long as the value is the same, the should be equal.

e.g. lists work this way

>>> import copy
>>> value = ['foo']
>>> a = copy.copy(value)
>>> b = copy.copy(value)
>>> a == b
True

@hayd
Python for Data member

I think names is a special descriptive property (especially in a Multindex) that should always match... (I pretty much always give index names unless using the default).

However I expect you're right that many people probably don't care/aren't careful/don't want it to be fussy, so probably we shouldn't enforce using matching names... :(

@jtratner
Python for Data member

@jreback @hayd this is ready to go. On equals: I personally don't want equals to check metadata-like properties.

@jreback jreback commented on an outdated diff
doc/source/v0.12.0.txt
@@ -171,6 +171,23 @@ API changes
``__repr__``). Plus string safety throughout. Now employed in many places
throughout the pandas library. (:issue:`4090`, :issue:`4092`)
+ - ``Index`` and ``MultiIndex`` changes (:issue:`4039`):
@jreback
jreback added a note

@jtratner I think you need to move this to 0.13?

@jtratner Python for Data member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback

otherwise looks ok to me

@wesm @y-p @cpcloud ?

@wesm
Python for Data member

PR looks fine to me. :bomb:s away

@jreback

@jtratner lmk after u r done fixing release notes

@jtratner
Python for Data member

@jreback I was going back through and I disagree with my earlier choice on this. I had setup __deepcopy__ to return a shallow copy, but I think that's a dumb decision. If you want to copy over all the memory and other things associated with the Index, you should be able to do it. Otherwise, you'd have to call the constructor all over again.

What do you think?

@hayd I feel like you might have an opinion here :smile:

@jreback

oh...thought that was what was decided (on deep=True), default is of course what we have now only shallow copy

so if I df.copy(deep=True)

I will and truly get a completely new copy

@jtratner
Python for Data member

@jreback yeah, that's what I think it should be. I'm going to go ahead and add a deep=True kwarg to copy here while I'm making sure that __deepcopy__ works correctly for MultiIndexes.

@jtratner
Python for Data member

@jreback this was started a month ago, so my understanding of pandas codebase + role of Index has changed a lot since...

@jtratner
Python for Data member

actually, turns out that master disables it too :p

    def __deepcopy__(self, memo={}):
        """
        Index is not mutable, so disabling deepcopy
        """
        return self

so do we want to change it?

Current does:

def __deepcopy__(self, memo={}):
     return self._shallow_copy()

which preserves metadata

@jtratner
Python for Data member

by "current" I mean this PR

@jreback

hmm...I think you should leave that, and make the user explicity copy(deep=True) to get an actual deep copy

I am not sure what actually calls __deepcopy__ and could be some weirdnes...(of course you can try it and see)

but we normally don't want anything but shallow copies (except very rarely), so I would vote to be very explicty about it

@jtratner
Python for Data member

@jreback __deepcopy__ is called by copy.deepcopy in Python.

@jtratner
Python for Data member

@jreback honestly not a big deal, just wanted to bring it up b/c of discussion in that other thread about copying indices. It has to shallow copy otherwise you get the weirdness with naming being overridden, etc.

@jreback

I guess you could enable deepcopy to be efffectively copy(deep=True) (rather than a shallow)......

@jtratner
Python for Data member

@jreback I guess for now should default to what we have, unless there's an explicit need to change, right?

@jtratner
Python for Data member

@jreback especially because it makes it easier to subclass...

@jreback

that's fine; so you are just enabling copy(deep=False) (and user would have to explicity pass deep=True to get a new (deep) copy; which I think we are only planning on doing if you pass to series/frame.copy(deep=True), which again is not the default

@jtratner
Python for Data member
@jreback

no hurry
just flash a message

merge this now!

@cpcloud
Python for Data member

NOW :exclamation: :smile:

@jtratner
Python for Data member

:trollface:

@cpcloud
Python for Data member

btw @jtratner SO much easier to test on py3 now :+1:

@jreback

agreed +1

@jtratner
Python for Data member
@jtratner
Python for Data member

2 things:
1. Deep copy stuff doesn't matter - Python defaults to not copying immutable objects and nothing in an Index can be mutable.
2. Index constructor with copy=True wasn't actually copying ndarrays if dtype=None. I fixed that.

After this PR, we should check that if you pass an ndarray to the DataFrame constructor as an index, that it actually gets copied.

@jtratner
Python for Data member

I believe this now works and I added some test cases for copy=True being passed into the Index constructor. It's not an issue for MultiIndex. Now __deepcopy__ just calls copy(deep=True), so that is no longer a separate case. I also made all the copy() constructors accept a deep kwarg even though, in nearly every case I can think of, it shouldn't matter. If you passed a custom object, it could potentially make a difference, so I've included it, but passing deep=True to an Index's copy method likely only causes the copy to be a bit slower.

One thing that I'd like to propose is that it might make sense to have the default on Index be copy=True rather than copy=False and have most pandas objects explicitly tell Index not to copy. That way, you don't get weird errors like this (which violate the idea that Index is immutable):

arr = np.array(range(10))
ind = Index(range(10))
arr[0] = 1000
print ind[0] # now 1000
@jtratner
Python for Data member

actually, I think the end there may be a regression from 0.11 where someone changed how copy works...not sure.

@jreback

I'll bet you don't copy an incoming np.array and since that's a view then it can be changed
another way is to actually change the numpy flag

I think

arr.flags.writeable=False

then it will raise if u try to modify it
and won't have the copy overhead !

@jtratner
Python for Data member

Okay, that is one way to do it. it looks like 0.11 did that.

So do,

  • copy=True => copies array, original array fine
  • copy=False => sets array to not writeable, might not copy array, but will if it has to do any dtype changes, etc. (sets not writeable at very end of constructor)
@jtratner
Python for Data member

okay, but what if I, for example, passed an array to set_index and then wanted to set that to be one of the columns of my object? I think setting the writeable flag might be a bit complicated...

@jtratner
Python for Data member

@jreback btw - the issue was the com._asarray_tuplesafe wasn't always making a copy.

@jreback

hmm

that is used in a lot l places
normally a numpy array is NOT copied when put into a series for example (IMO it should be but that's current what exists)

@jtratner
Python for Data member

@jreback right, I changed it only in the Index constructor.

@jreback

ok great....if you can put some tests in to check on copying would be great (talking about copying in the Index constructor from various types of inputs)

@jtratner
Python for Data member

@jreback I have tests for just passing copy=True to the Index constructor and this doesn't matter for MultiIndex. Where are you thinking this would matter input wise?

@cpcloud
Python for Data member

btw, how are you testing copying?

np.may_share_memory is currently the most reliable way

you can't compare to base bc of possible chaining

above function will not yield false negatives so if it returns False they are definitely NOT sharing memory

may have false positives (returns True but not actually sharing memory, means that they might share memory)

@jtratner
Python for Data member

@cpcloud that's useful. I was actually just checking if setting an element changed the index. E.g.

arr = self.strIndex.values
ind = Index(arr, copy=True)
arr[0] = "THIS IS SOME TEXT THAT SHOULDNT GO IN"
self.assertNotEqual(ind[0], arr[0])
@jreback

I guess need to test for a numpy array and list passed (that then should get copied)

@jtratner
Python for Data member

@jreback in the constructor for index, that's already handled. I need to tweak MultiIndex and make sure that it's working.

@jtratner
Python for Data member

@jreback - so what behavior would you expect for these cases?

        levels = np.array(["a", "b", "c"])
        labels = np.array([1, 1, 2, 0, 0, 1, 1])
        val = labels[0]
        mi = MultiIndex(levels=[levels, levels], labels=[labels, labels]) 
        self.assertEqual(mi.labels[0][0], val) # succeeds
        labels[0] = 15
        self.assertEqual(mi.labels[0][0], val) # fails
        dct = {"A": range(10), "B": range(10)}
        ind = np.arange(10)
        val = ind[0]
        df1 = DataFrame(dct, index=ind)
        ind[0] = 30
        self.assertEqual(df1.index[0], val) # fails
@jtratner jtratner referenced this pull request
Closed

rename multi-index #4461

@jtratner
Python for Data member

@jreback I guess I'm just thinking that the defaults in the constructors should be to copy, and then we should put copy=False in all the internal calls...

@jreback

@jtratner current default is copy=False....hmmm...so by default when you copy a frame, you get a copy of the data but not the index....I think we should leave it....too much room for error here (as no copy is really the default)

@hayd
Python for Data member

something weird going on with index.equals (?) at #4458, any ideas? (could just be me)

@jreback

do you have an example?

@hayd
Python for Data member

@jreback nope, can't repo outside of that test.

@jtratner
Python for Data member

@jreback now that we aren't going to copy by default in constructors, etc, I think this is ready to go (since it doesn't make sense to add test cases to assert that ndarray isn't copied when passed to constructor). Just waiting on Travis for confirmation that I didn't miss anything.

@jtratner
Python for Data member

Actually, hold on - I want to add test case from #4226.

@jtratner
Python for Data member

nm ... that was already resolved.

@jreback

ok will review tom

can you squash to smaller num of commits?

@jtratner
Python for Data member
@jreback jreback commented on an outdated diff
doc/source/release.rst
@@ -47,6 +47,12 @@ pandas 0.13
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- Significant table writing performance improvements in ``HDFStore``
+ - ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to
+ change attributes (like ``names``, ``levels``, ``labels``, etc.)
@jreback
jreback added a note

minor point, take out the etc

@jtratner Python for Data member

will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
doc/source/v0.13.0.txt
@@ -72,6 +72,23 @@ API changes
import os
os.remove(path)
+ - ``Index`` and ``MultiIndex`` changes (:issue:`4039`):
+
@jreback
jreback added a note

you don't need to list all of this here, it just repeats releast.rst; instead can you show an example of how you formerly changed names/levels (maybe using code-block so it doesn't evaluate), then then new method using set_*

@jtratner Python for Data member

I'll fix that then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
doc/source/v0.13.0.txt
@@ -110,6 +127,15 @@ Bug Fixes
- Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)
+ - Fixed bug in ``DataFrame.set_values`` which was causing name attributes to
@jreback
jreback added a note

don't need these here, repeating release.rst

@jtratner Python for Data member

I took them out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
pandas/core/index.py
@@ -38,6 +39,20 @@ def wrapper(self, other):
class InvalidIndexError(Exception):
pass
+
+def _deprecate_setter(property_name, method_name, alternate=None):
@jreback
jreback added a note

I think there might be something like this in pandas.utils.decorators.deprecate ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on the diff
pandas/core/index.py
@@ -326,9 +405,6 @@ def __contains__(self, key):
def __hash__(self):
return hash(self.view(np.ndarray))
- def __setitem__(self, key, value):
@jreback
jreback added a note

are we doing a deprecation warning here? (or is that handled somewhere else?)

@jtratner Python for Data member

This is removed because FrozenNDArray handles this instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
pandas/core/index.py
((42 lines not shown))
+ idx = self
+ else:
+ idx = self._shallow_copy()
+ idx._set_levels(levels)
+ return idx
+
+ def _get_labels(self):
+ return self._labels
+
+ def _set_labels(self, labels, copy=False):
+ if len(labels) != self.nlevels:
+ raise ValueError("Length of levels and labels must be the same.")
+ self._labels = FrozenList(_ensure_frozen(labs,copy=copy)._shallow_copy()
+ for labs in labels)
+
+ # remove me in 0.14 and change to readonly property
@jreback
jreback added a note

add in issue to remove these methods so we don't forget

@jtratner Python for Data member

will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
pandas/core/index.py
((73 lines not shown))
+ Returns
+ -------
+ new index (of same type and class...etc)
+ """
+ if inplace:
+ idx = self
+ else:
+ idx = self._shallow_copy()
+ idx._set_labels(labels)
+ return idx
+
+ # remove me in 0.14 and change to readonly property
+ __set_labels = _deprecate_setter("_set_labels", "set_labels")
+ labels = property(fget=_get_labels, fset=__set_labels)
+
+ def copy(self, names=None, dtype=None, levels=None, labels=None,
@jreback
jreback added a note

shouldn't MultiIndex.copy and Index.copy be the same method (with MI just have an additional levels argument?)

@jtratner Python for Data member

no - they can't be the same, MultiIndex has to have some logic about setting levels and labels - neither of them are long methods anyways.

@jreback
jreback added a note

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jtratner
Python for Data member

@jreback you might want to hold off before looking at this more in-depth - I broke something within Index and need to fix it.

@jreback

hahh...no problemo....let me know

jtratner added some commits
@jtratner jtratner ENH: Index inherits from FrozenNDArray + add FrozenList
* `FrozenNDArray` - thin wrapper around ndarray that disallows setting methods
  (will be used for levels on `MultiIndex`)
* `FrozenList` - thin wrapper around list that disallows setting methods
  (needed because of type checks elsewhere)

Index inherits from FrozenNDArray now and also actually copies for deepcopy.
Assumption is that underlying array is still immutable-ish
285622f
@jtratner jtratner ENH: Make core/index exceptions more descriptive
* `assert_copy`: You can use `assert_copy` to check that two iterables
  produce copies that are not the same object. (uses `assert_almost_equal`
  under the hood).
* Fix assert_almost_equal to handle non-ndarrays (previously failed
  after iterable check)
* Fix test_mixed_panel to reflect true name behavior
bdf270c
@jtratner
Python for Data member

@jreback tell me if you want docs on set_names, set_levels, or the new copy constructor.

@jreback

I think an example for v0.13 is good
and maybe add in indexing.rst? not sure we actually have a section on index names so put where u think

@jtratner
Python for Data member

okay, there's an example in v0.13.0.txt and changed indexing.rst slightly to add the index names section (moved around the Index objects part so it could address MultiIndex too)

@jtratner
Python for Data member

@jreback this is all working now + has the docs, etc.

@jreback jreback commented on an outdated diff
doc/source/release.rst
@@ -82,6 +88,20 @@ pandas 0.13
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`)
+ - ``Index`` and ``MultiIndex`` changes (:issue:`4039`):
@jreback
jreback added a note

doesn't this need to be indented? (eg outer level should be same as existing and inner level indented?)

@cpcloud Python for Data member
cpcloud added a note

yep needs to be indented

@jtratner Python for Data member

That indentation level is because the previous entry is a sub-bullet of the HDFStore changes. (this matches up with the outer indentation level, which is 2 spaces.).

@cpcloud Python for Data member
cpcloud added a note

then can you change the ones below it cuz this one sticks out

@jtratner Python for Data member

What follows are sub-bullets of "Index and MultiIndex changes", just like how the HDFStore changes are grouped above it - I can change it if you want, I was matching the look of other elements that have multiple changes.

@cpcloud Python for Data member
cpcloud added a note

oh you're right! sorry about that. only thing is that to prevent sphinx from complaining you should put a newline between an outdented bullet point and the previously indented one

@jtratner Python for Data member

@cpcloud added the space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jreback jreback commented on an outdated diff
doc/source/v0.13.0.txt
@@ -72,6 +72,23 @@ API changes
import os
os.remove(path)
+ - Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
+ ``labels``, and ``names``) (:issue:`4039`):
+
+ ..code-block ::
+
+ # instead of setting levels directly
@jreback
jreback added a note

I would make this more explicit, I'll saying before 0.13 you did this, but that is deprecated, so now use set_*

@jtratner Python for Data member

I fixed this up - see what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
jtratner added some commits
@jtratner jtratner BUG/ENH: Make names, levels and labels properties.
* `names` is now a property *and* is set up as an immutable tuple.
* `levels` are always (shallow) copied now and it is deprecated to set directly
* `labels` are set up as a property now, moving all the processing of
  labels out of `__new__` + shallow-copied.
* `levels` and `labels` are immutable.
* Add names tests, motivating example from #3742, reflect tuple-ish
  output from names, and level names check to reindex test.
* Add set_levels, set_labels, set_names and rename to index
* Deprecate setting labels and levels directly

Similar to other set_* methods...allows mutation if necessary but
otherwise returns same object.

Labels are now converted to `FrozenNDArray` and wrapped in a
`FrozenList`. Should mostly resolve #3714 because you have to work to
actually make assignments to an `Index`.

BUG: Give MultiIndex its own astype method

Fixes issue with set_value forgetting names.
033a932
@jtratner jtratner ENH: Additional keyword arguments for Index.copy()
* Index derivatives can set `name` or `names` as well as
  `dtype` on copy. MultiIndex can set `levels`, `labels`, and `names`.
* Also, `__deepcopy__` just calls `copy(deep=True)`
* Now, BlockManager.copy() takes an additional argument `copy_axes` which
  copies axes as well. Defaults to False.
* `Series.copy()` takes an optional deep argument, which causes it to
  copy its index.
* `DataFrame.copy()` passes `copy_axes=True` when deepcopying.
* Add copy kwarg to MultiIndex `__new__`
5cad4d2
@jreback

ok bombs away

@jreback jreback merged commit 429e9f3 into pydata:master
@jreback

@jtratner

just rebased #3482 on top of this in master, no problem

noticed that you had a separate copy for Series (which copies index and such properly). Is this not also needed in core/internals/copy? specificy the axes are ALWAYS shallow copied here...?

@jtratner
Python for Data member

@jreback actually, I was flip-flopping on this for a while. I tried copying axes in a few places, and I kept getting issues with the check that self.ref_items is self.items failing. It's fine for series, because it passes it through the constructor again, but DataFrame passes its data to the new object, so it doesn't reassign ref_items. I didn't want to mess with it too much because I wasn't sure where to change it. Maybe I needed to change it in block instead?

@jtratner jtratner deleted the jtratner:fix-multiindex-naming branch
@jtratner
Python for Data member

@jreback After you brought this up - I'm thinking that you could copy index first, then pass it as a parameter to the copy on blocks to overwrite items and ref_items. Maybe that would work? What's the difference between ref_items and items?

@jreback

@jtratner items are references to ref_items that are identical or a take (and not an indexing operation).

So the only thing you can do is to copy it BEFORE you start and then use that one (and yes, could be done in copy), you could have to pass the new ref_items, then change the Block.copy to use the new items, e.g. items = ref_items.take(self.ref_locs); very much like a renaming operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 10, 2013
  1. @jtratner

    ENH: Index inherits from FrozenNDArray + add FrozenList

    jtratner committed
    * `FrozenNDArray` - thin wrapper around ndarray that disallows setting methods
      (will be used for levels on `MultiIndex`)
    * `FrozenList` - thin wrapper around list that disallows setting methods
      (needed because of type checks elsewhere)
    
    Index inherits from FrozenNDArray now and also actually copies for deepcopy.
    Assumption is that underlying array is still immutable-ish
  2. @jtratner

    ENH: Make core/index exceptions more descriptive

    jtratner committed
    * `assert_copy`: You can use `assert_copy` to check that two iterables
      produce copies that are not the same object. (uses `assert_almost_equal`
      under the hood).
    * Fix assert_almost_equal to handle non-ndarrays (previously failed
      after iterable check)
    * Fix test_mixed_panel to reflect true name behavior
Commits on Aug 11, 2013
  1. @jtratner

    BUG/ENH: Make names, levels and labels properties.

    jtratner committed
    * `names` is now a property *and* is set up as an immutable tuple.
    * `levels` are always (shallow) copied now and it is deprecated to set directly
    * `labels` are set up as a property now, moving all the processing of
      labels out of `__new__` + shallow-copied.
    * `levels` and `labels` are immutable.
    * Add names tests, motivating example from #3742, reflect tuple-ish
      output from names, and level names check to reindex test.
    * Add set_levels, set_labels, set_names and rename to index
    * Deprecate setting labels and levels directly
    
    Similar to other set_* methods...allows mutation if necessary but
    otherwise returns same object.
    
    Labels are now converted to `FrozenNDArray` and wrapped in a
    `FrozenList`. Should mostly resolve #3714 because you have to work to
    actually make assignments to an `Index`.
    
    BUG: Give MultiIndex its own astype method
    
    Fixes issue with set_value forgetting names.
  2. @jtratner

    ENH: Additional keyword arguments for Index.copy()

    jtratner committed
    * Index derivatives can set `name` or `names` as well as
      `dtype` on copy. MultiIndex can set `levels`, `labels`, and `names`.
    * Also, `__deepcopy__` just calls `copy(deep=True)`
    * Now, BlockManager.copy() takes an additional argument `copy_axes` which
      copies axes as well. Defaults to False.
    * `Series.copy()` takes an optional deep argument, which causes it to
      copy its index.
    * `DataFrame.copy()` passes `copy_axes=True` when deepcopying.
    * Add copy kwarg to MultiIndex `__new__`
Something went wrong with that request. Please try again.