Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame constructor ignores copy=True argument if dtype is set #9099

Closed
mairas opened this issue Dec 17, 2014 · 2 comments

Comments

Projects
None yet
2 participants
@mairas
Copy link

commented Dec 17, 2014

I just noticed that DataFrame constructor ignores the copy=True argument if dtype is set. In the code snippet below, the orig dataframe should stay unmodified after any modification of new1 and new2. Instead, the columns of new2 (or at least the first one as shown in the snippet) are references to the same data, as highlighted by the modification shown on statement 13 and onwards.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.4.3
IPython: 2.3.1
sphinx: 1.1.2
patsy: 0.3.0
dateutil: 2.3
pytz: 2014.10
bottleneck: 0.6.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.7.4
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

In [4]: orig_data = {
   ...:     'col1': [1.],
   ...:     'col2': [2.],
   ...:     'col3': [3.],}

In [5]: orig = pd.DataFrame(orig_data)

In [6]: new1 = pd.DataFrame(orig, copy=True)

In [7]: new2 = pd.DataFrame(orig, dtype=float, copy=True)

In [8]: new1
Out[8]: 
   col1  col2  col3
0     1     2     3

In [9]: new2
Out[9]: 
   col1  col2  col3
0     1     2     3

In [10]: new1['col1'] = 100.

In [11]: new1
Out[11]: 
   col1  col2  col3
0   100     2     3

In [12]: orig
Out[12]: 
   col1  col2  col3
0     1     2     3

In [13]: new2['col1'] = 200.

In [14]: new2
Out[14]: 
   col1  col2  col3
0   200     2     3

In [15]: orig
Out[15]: 
   col1  col2  col3
0   200     2     3

In [16]:
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2014

fix should happen here: https://github.com/pydata/pandas/blob/master/pandas/core/generic.py#L127
(the frames are constructed from a frame itself, so it goes thru the manager construction).

welcome a pull-request on this, should be pretty straightforward.

@jreback jreback added this to the 0.16.0 milestone Dec 18, 2014

@mairas

This comment has been minimized.

Copy link
Author

commented Dec 18, 2014

OK, here:

#9105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.