column datatype conversion impacts whole dataframe when using df.ix indexing #8607

dmarx · 2014-10-22T21:15:03Z

My dataframe has a timestamp column that is encoded as unix epoch. When I convert the column using named index selection it works fine, but when I use the '.ix' syntax it coerces the whole dataframe. Example:

import pandas as pd

df = pd.DataFrame(
    {'timestamp':[1413840976, 1413842580, 1413760580], 
     'delta':[1174, 904, 161], 
     'elapsed':[7673, 9277, 1470]
    })

df2 = df.copy()

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
df2.ix[:,2] = pd.to_datetime(df['timestamp'], unit='s')

df

    delta    elapsed    timestamp
0    1174    7673    2014-10-20 21:36:16
1    904    9277    2014-10-20 22:03:00
2    161    1470    2014-10-19 23:16:20

df2

    delta    elapsed    timestamp
0    2014-10-20 21:36:16    1970-01-01 00:00:00.000007673    1970-01-01 00:00:01.413840976
1    2014-10-20 22:03:00    1970-01-01 00:00:00.000009277    1970-01-01 00:00:01.413842580
2    2014-10-19 23:16:20    1970-01-01 00:00:00.000001470    1970-01-01 00:00:01.413760580

I strongly suspect the difference in behavior here is problematic and should be resolved. If this is actually "how things should work," I'd greatly appreciate it if someone could explain why the different indexing styles produce these different results

The text was updated successfully, but these errors were encountered:

jreback · 2014-10-22T21:43:40Z

pls pd.show_versions()

dmarx · 2014-10-22T22:00:52Z

Should be the most current distributions via Anaconda

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
sqlalchemy: 0.9.4
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None

jreback · 2014-10-22T22:05:26Z

this was fixed in 0.14 or 0.14.1

0.15.0 is current and just released

dmarx · 2014-10-22T22:08:56Z

Cool, guess Anaconda is lagging by a few releases (or I need to update?). I'll take your word for it and close the issue. Thanks!

jreback · 2014-10-22T22:10:37Z

you should be able to

conda update pandas and get 0.15 now

dmarx · 2014-10-22T22:20:07Z

I don't know how to reopen the issue, but after updating and confirming that I've got 0.15.0, it still exhibits the behavior that inspired me to submit this bug.

pd.show_version():

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 1.5
pytz: 2014.7
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

jorisvandenbossche · 2014-10-23T06:56:12Z

Yep, I can confirm I get the same issue with master

jreback · 2014-10-23T11:46:35Z

@dmarx ok, something gone ary. It is supposed to be the same (and not touch anything else). Its is actually a bit tricky, as it DOES need to do some dtype inference (and possibly break up a block of dtypes), which is the problem here (e.g. say you have 2 int64 dtypes, then change one of them to datetime64[ns], this get split to 2 blocks, int64, and datetime64[ns] - this is internal).

jreback closed this as completed Oct 22, 2014

jorisvandenbossche reopened this Oct 23, 2014

jorisvandenbossche added the Bug label Oct 23, 2014

jreback added this to the 0.15.1 milestone Oct 23, 2014

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Oct 23, 2014

jreback mentioned this issue Oct 27, 2014

BUG: Bug in ix/loc block splitting on setitem (manifests with integer-like dtypes, eg. datetime64) (GH8607) #8644

Merged

jreback closed this as completed in #8644 Oct 27, 2014

jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

dmarx commented Oct 22, 2014

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

jorisvandenbossche commented Oct 23, 2014

jreback commented Oct 23, 2014

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

column datatype conversion impacts whole dataframe when using df.ix indexing #8607

Comments

dmarx commented Oct 22, 2014

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

INSTALLED VERSIONS

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

jreback commented Oct 22, 2014

dmarx commented Oct 22, 2014

INSTALLED VERSIONS

jorisvandenbossche commented Oct 23, 2014

jreback commented Oct 23, 2014