diff(axis=1) after insert results in unexpected NaN column #10907

Closed
dadkins opened this Issue Aug 26, 2015 · 5 comments

Comments

Projects
None yet
2 participants

dadkins commented Aug 26, 2015

The following code in pandas 0.16.2 demonstrates the problem:

>>> import pandas as pd
>>> df = pd.DataFrame({'y': pd.Series([2]), 'z': pd.Series([3])})
>>> df
   y  z
0  2  3
>>> df.insert(0, 'x', 1)
>>> df.diff(axis=1)
    x   y  z
0 NaN NaN  1

The following workaround produces the expected result:

>>> df.T.diff().T
    x  y  z
0 NaN  1  1

Versions:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-400.1.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.16.2
nose: None
Cython: None
numpy: 1.9.2
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Contributor

jreback commented Aug 26, 2015

This needs a .consolidate. (should be done in the BlockManager however).

In [25]: df = pd.DataFrame({'y': pd.Series([2]), 'z': pd.Series([3])})

In [26]: df
Out[26]: 
   y  z
0  2  3

In [27]: df.insert(0, 'x', 1)

In [28]: df._data  
Out[28]: 
BlockManager
Items: Index([u'x', u'y', u'z'], dtype='object')
Axis 1: Int64Index([0], dtype='int64')
IntBlock: slice(1, 3, 1), 2 x 1, dtype: int64
IntBlock: slice(0, 1, 1), 1 x 1, dtype: int64

In [29]: df = df.consolidate()

In [30]: df.diff(axis=1)
Out[30]: 
    x  y  z
0 NaN  1  1

In [31]: df._data
Out[31]: 
BlockManager
Items: Index([u'x', u'y', u'z'], dtype='object')
Axis 1: Int64Index([0], dtype='int64')
IntBlock: slice(0, 3, 1), 3 x 1, dtype: int64

jreback added this to the Next Major Release milestone Aug 26, 2015

Contributor

jreback commented Aug 26, 2015

pull-requests welcome!

dadkins commented Aug 26, 2015

I noticed that shift(axis=1) also has this bug. Are there any others that break across block boundaries?

>>> import pandas as pd
>>> df = pd.DataFrame({'y': pd.Series([2]), 'z': pd.Series([3])})
>>> df.insert(0, 'x', 1)
>>> df
   x  y  z
0  1  2  3
>>> df.shift(axis=1)
    x   y  z
0 NaN NaN  2
Contributor

jreback commented Aug 26, 2015

diff is essentially (though not actually implemented)
like

df.sub(df.shift(axis=1),axis=1)

@jreback jreback modified the milestone: 0.17.0, Next Major Release Aug 30, 2015

Contributor

jreback commented Sep 1, 2015

closed by #10930

jreback closed this Sep 1, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment