DataFrame.fillna() working on row vector instead of column vector? #15522

ixru · 2017-02-27T20:23:37Z

Code Sample, a copy-pastable example if possible

>>> df.head(5)
                       time   id    bid  bid_depth  bid_depth_total  \
0 2017-02-27 11:34:31+00:00  105  148.0      497.0         216589.0
1 2017-02-27 11:34:35+00:00  105    NaN        NaN              NaN
2 2017-02-27 11:34:38+00:00  105    NaN        NaN              NaN
3 2017-02-27 11:34:40+00:00  105    NaN        NaN              NaN
4 2017-02-27 11:34:41+00:00  105    NaN        NaN              NaN

   bid_number  offer  offer_depth  offer_depth_total  offer_number   open  \
0       243.0  148.1      14192.0           530373.0         503.0  147.5
1         NaN    NaN      14272.0           530453.0         504.0    NaN
2         NaN    NaN      14192.0           530373.0         503.0    NaN
3         NaN    NaN      14272.0           530453.0         504.0    NaN
4         NaN    NaN      14492.0           530673.0         505.0    NaN

    high    low   last  change  change_percent     volume        value  trades
0  148.2  147.3  148.0     0.9            0.61  1286830.0  190224000.0  2112.0
1    NaN    NaN    NaN     NaN             NaN        NaN          NaN     NaN
2    NaN    NaN    NaN     NaN             NaN        NaN          NaN     NaN
3    NaN    NaN    NaN     NaN             NaN        NaN          NaN     NaN
4    NaN    NaN    NaN     NaN             NaN        NaN          NaN     NaN

>>> df.fillna(method='pad')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/site-packages/pandas/core/frame.py", line 2842, in fillna
    downcast=downcast, **kwargs)
  File "/usr/lib/python3.6/site-packages/pandas/core/generic.py", line 3250, in fillna
    downcast=downcast)
  File "/usr/lib/python3.6/site-packages/pandas/core/internals.py", line 3177, in interpolate
    return self.apply('interpolate', **kwargs)
  File "/usr/lib/python3.6/site-packages/pandas/core/internals.py", line 3056, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/lib/python3.6/site-packages/pandas/core/internals.py", line 917, in interpolate
    downcast=downcast, mgr=mgr)
  File "/usr/lib/python3.6/site-packages/pandas/core/internals.py", line 956, in _interpolate_with_fill
    values = self._try_coerce_result(values)
  File "/usr/lib/python3.6/site-packages/pandas/core/internals.py", line 2448, in _try_coerce_result
    result = result.reshape(len(result))
ValueError: cannot reshape array of size 24311 into shape (1,)

Problem description

msgpack of dataframe for replication:
https://www.dropbox.com/s/5skf6v8x2vg103o/dataframe?dl=0

I'm a beginner so I can only guess at what is wrong, but it seems to be working on rows instead of the columns. I can loop through df.columns and do it series by series to end up with the expected output so it doesn't seem to me as if it is a problem with any of the columns.

Expected Output

Fill the columns of NaN's with prior value in column.

Output of `pd.show_versions()`

commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.9.8-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.2.0
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-02-27T20:49:40Z

can you show df.info()

ixru · 2017-02-27T20:58:16Z

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24311 entries, 0 to 24310
Data columns (total 19 columns):
time                 24311 non-null datetime64[ns, UTC]
id                   24311 non-null int64
bid                  1469 non-null float64
bid_depth            7988 non-null float64
bid_depth_total      11630 non-null float64
bid_number           10765 non-null float64
offer                1370 non-null float64
offer_depth          7864 non-null float64
offer_depth_total    10617 non-null float64
offer_number         9940 non-null float64
open                 1085 non-null float64
high                 1086 non-null float64
low                  1085 non-null float64
last                 1223 non-null float64
change               1223 non-null float64
change_percent       1223 non-null float64
volume               3697 non-null float64
value                3697 non-null float64
trades               3697 non-null float64
dtypes: datetime64[ns, UTC](1), float64(17), int64(1)
memory usage: 3.5 MB

chris-b1 · 2017-02-27T21:08:43Z

Something to do with datetimetz. Here's a simpler repro:

df = pd.DataFrame({'date': pd.date_range('2014-01-01', periods=5, tz='US/Central')})
df.fillna(method='pad')

ValueError                                Traceback (most recent call last)
<ipython-input-77-8f5ecb26a2f6> in <module>()
----> 1 df.fillna(method='pad')

jreback · 2017-02-27T21:09:37Z

yeah need to handle these in the Block correctly (the tz)

jreback · 2017-02-27T21:13:12Z

@MatSalm easy way to do this is (though not super pretty)

In [20]: df = pd.DataFrame({'A':pd.date_range('20130101',periods=4,tz='US/Eastern'),'B':[1,2,np.nan,np.nan]})

In [21]: df
Out[21]: 
                          A    B
0 2013-01-01 00:00:00-05:00  1.0
1 2013-01-02 00:00:00-05:00  2.0
2 2013-01-03 00:00:00-05:00  NaN
3 2013-01-04 00:00:00-05:00  NaN

In [23]: df[df.select_dtypes(exclude=['number']).columns].join(df.select_dtypes(include=['number']).fillna(method='pad'))
Out[23]: 
                          A    B
0 2013-01-01 00:00:00-05:00  1.0
1 2013-01-02 00:00:00-05:00  2.0
2 2013-01-03 00:00:00-05:00  2.0
3 2013-01-04 00:00:00-05:00  2.0

ixru · 2017-02-27T21:57:57Z

Thank you

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Feb 27, 2017

chris-b1 added Bug Timezones Timezone data dtype labels Feb 27, 2017

jreback added Difficulty Intermediate labels Feb 27, 2017

jreback added this to the Next Major Release milestone Feb 27, 2017

mroeschke mentioned this issue Jun 29, 2018

TST/CLN: Old timezone issues PT3 #21674

Merged

6 tasks

jreback modified the milestones: Next Major Release, 0.24.0 Jul 2, 2018

jreback closed this as completed in #21674 Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.fillna() working on row vector instead of column vector? #15522

DataFrame.fillna() working on row vector instead of column vector? #15522

ixru commented Feb 27, 2017

jreback commented Feb 27, 2017

ixru commented Feb 27, 2017

chris-b1 commented Feb 27, 2017

jreback commented Feb 27, 2017

jreback commented Feb 27, 2017 •

edited

Loading

ixru commented Feb 27, 2017

DataFrame.fillna() working on row vector instead of column vector? #15522

DataFrame.fillna() working on row vector instead of column vector? #15522

Comments

ixru commented Feb 27, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Feb 27, 2017

ixru commented Feb 27, 2017

chris-b1 commented Feb 27, 2017

jreback commented Feb 27, 2017

jreback commented Feb 27, 2017 • edited Loading

ixru commented Feb 27, 2017

Output of `pd.show_versions()`

jreback commented Feb 27, 2017 •

edited

Loading