Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

Closed
dmarx opened this issue Oct 22, 2014 · 8 comments · Fixed by #8644
Closed

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

dmarx opened this issue Oct 22, 2014 · 8 comments · Fixed by #8644
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@dmarx
Copy link

dmarx commented Oct 22, 2014

My dataframe has a timestamp column that is encoded as unix epoch. When I convert the column using named index selection it works fine, but when I use the '.ix' syntax it coerces the whole dataframe. Example:

import pandas as pd

df = pd.DataFrame(
    {'timestamp':[1413840976, 1413842580, 1413760580], 
     'delta':[1174, 904, 161], 
     'elapsed':[7673, 9277, 1470]
    })

df2 = df.copy()

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df2.ix[:,2] = pd.to_datetime(df['timestamp'], unit='s')

df

    delta    elapsed    timestamp
0    1174    7673    2014-10-20 21:36:16
1    904    9277    2014-10-20 22:03:00
2    161    1470    2014-10-19 23:16:20

df2

    delta    elapsed    timestamp
0    2014-10-20 21:36:16    1970-01-01 00:00:00.000007673    1970-01-01 00:00:01.413840976
1    2014-10-20 22:03:00    1970-01-01 00:00:00.000009277    1970-01-01 00:00:01.413842580
2    2014-10-19 23:16:20    1970-01-01 00:00:00.000001470    1970-01-01 00:00:01.413760580

I strongly suspect the difference in behavior here is problematic and should be resolved. If this is actually "how things should work," I'd greatly appreciate it if someone could explain why the different indexing styles produce these different results

@jreback
Copy link
Contributor

jreback commented Oct 22, 2014

pls pd.show_versions()

@dmarx
Copy link
Author

dmarx commented Oct 22, 2014

Should be the most current distributions via Anaconda

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
sqlalchemy: 0.9.4
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None

@jreback
Copy link
Contributor

jreback commented Oct 22, 2014

this was fixed in 0.14 or 0.14.1

0.15.0 is current and just released

@jreback jreback closed this as completed Oct 22, 2014
@dmarx
Copy link
Author

dmarx commented Oct 22, 2014

Cool, guess Anaconda is lagging by a few releases (or I need to update?). I'll take your word for it and close the issue. Thanks!

@jreback
Copy link
Contributor

jreback commented Oct 22, 2014

you should be able to

conda update pandas and get 0.15 now

@dmarx
Copy link
Author

dmarx commented Oct 22, 2014

I don't know how to reopen the issue, but after updating and confirming that I've got 0.15.0, it still exhibits the behavior that inspired me to submit this bug.

pd.show_version():

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

@jorisvandenbossche
Copy link
Member

Yep, I can confirm I get the same issue with master

@jreback jreback added this to the 0.15.1 milestone Oct 23, 2014
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Oct 23, 2014
@jreback
Copy link
Contributor

jreback commented Oct 23, 2014

@dmarx ok, something gone ary. It is supposed to be the same (and not touch anything else). Its is actually a bit tricky, as it DOES need to do some dtype inference (and possibly break up a block of dtypes), which is the problem here (e.g. say you have 2 int64 dtypes, then change one of them to datetime64[ns], this get split to 2 blocks, int64, and datetime64[ns] - this is internal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
3 participants