Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

API: Deep copy not copying index/columns #4202

Closed
hayd opened this Issue Jul 11, 2013 · 13 comments

Comments

Projects
None yet
3 participants
Contributor

hayd commented Jul 11, 2013

http://stackoverflow.com/questions/17591104/in-pandas-can-i-deeply-copy-a-dataframe-including-its-index-and-column/17591423#17591423

This means if we change the index in place, e.g. with name:

df1 = df.copy()
df1.index.name = 'ffg'

This changes the name of the df.index.

Contributor

jreback commented Jul 11, 2013

releated #4039
cc @jtratner

this is the case where reference tracking to the indices would help (but too complicated)

so just force copy indices here

Contributor

hayd commented Jul 11, 2013

@jreback do you see the index.name thing? apparently not everyone does (my computer is sulking today, so not trusting it): http://stackoverflow.com/a/17591435/1240268

Contributor

hayd commented Jul 11, 2013

Ah I see, I think that is being done after (which creates a new object):

df2.index = df2.index[::-1]
Contributor

jtratner commented Jul 16, 2013

Btw - this should be resolved by shallow copying indices, since they are supposed to be mostly immutable. That allows you to change metadata. After #4039, MultiIndex and Index will handle this transparently when copied with view() or anything that triggers __array_finalize__, __setstate__, etc.

Contributor

hayd commented Jul 31, 2013

@jtratner so this is fixed with #4039 ? (if not, will fix once that's merged)

Contributor

jtratner commented Jul 31, 2013

Yes.

@ghost ghost assigned jtratner Sep 9, 2013

Contributor

jtratner commented Sep 11, 2013

@hayd I believe this is all fixed and can be closed, yes?

Contributor

hayd commented Sep 12, 2013

@jtratner test is this:

In [11]: df = pd.DataFrame([1])

In [12]: df.index.name = 'foo'

In [13]: df1 = df.copy()

In [14]: df1.index.name = 'bar'

In [15]: df
Out[15]: 
     0
bar   
0    1
Contributor

jtratner commented Sep 12, 2013

I will fix that.

Contributor

jtratner commented Sep 12, 2013

@jreback Where is the best opportunity to actually copy an index (both for rows and columns) with the block manager? I think it just needs to be shallow or deep-copied each time and then passed as the new ref_item to the block copying.

Contributor

jreback commented Sep 12, 2013

right now there is NO deep copying of the index in BM. I think you just need to change BlockManager.copy to something like this:

   def copy(self, deep=True):
        """
        Make deep or shallow copy of BlockManager

        Parameters
        ----------
        deep : boolean, default True
            If False, return shallow copy (do not copy data)

        Returns
        -------
        copy : BlockManager
        """
        if deep:
             new_axes = [ ax.copy(deep=True) for ax in self.axes ]
        else:
             new_axes = list(self.axes)
        return self.apply('copy', axes=new_axes, deep=deep, do_integrity_check=False)
Contributor

jtratner commented Sep 12, 2013

@jreback ah okay, there it is...now I remember we were hitting this issue before too...need to make sure the ref_items get passed to the copy constructor.

I think this copy needs a separate kwarg for whether index should be deep copied (because, generally, index always needs to be shallow copied).

Contributor

jreback commented Sep 12, 2013

@jtratner I don't know, I think easist just to copy everything on deep copy?

@jtratner jtratner closed this in #4830 Sep 24, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment