fillna() does not work when value parameter is a list #3435

ijmcf · 2013-04-23T15:52:43Z

Should raise on a passed list to value

The results from the fillna() method are very strange when the value parameter is given a list.

For example, using a simple example DataFrame:

df = pandas.DataFrame({'A': [numpy.nan, 1, 2], 'B': [10, numpy.nan, 12], 'C': [[20, 21, 22], [23, 24, 25], numpy.nan]})
df
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaN

df.fillna(value=[100, 101, 102])
A B C
0 100 10 [20, 21, 22]
1 1 101 [23, 24, 25]
2 2 12 102

So it appears the values in the list are used to fill the 'holes' in order, if the list has the same length as number of holes. But if the the list is shorter than the number of holes, the behavior changes to using only the first value in the list:

df.fillna(value=[100, 101])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 100

If the list is longer than the number of holes, you get something even more odd:

df.fillna(value=[100, 101, 102, 103])
A B C
0 100 10 [20, 21, 22]
1 1 100 [23, 24, 25]
2 2 12 102

If you specify provide a dict that specifies the fill values by column, the values from the list are used within that column only:

df.fillna(value={'C': [100, 101]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100

Since it's not always practical to know the number of NaN values a priori, or to customize the length of the value list to match it, this is problematic. Furthermore, some desired values get over-interpreted and cannot be used:

For example, if you want to actually replace all NaN instances in a single column with the same list (either empty or non-empty), I can't figure out how to do it:

df.fillna(value={'C': [[100,101]]})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 100

Indeed, if you specify the empty list nothing is filled:

df.fillna(value={'C': list()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 NaN

But a dict works fine:

f.fillna(value={'C': {0: 1}})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {0: 1}

df.fillna(value={'C': dict()})
A B C
0 NaN 10 [20, 21, 22]
1 1 NaN [23, 24, 25]
2 2 12 {}

So it appears the fillna() is making a lot of decisions about how the fill values should be applied, and certain desired outcomes can't be achieved because it's being too 'clever'.

The text was updated successfully, but these errors were encountered:

jreback · 2013-04-23T15:59:46Z

lists are not allowed (for the reasons you show), should raise on this (only scalar or dict are valid)

jreback · 2013-04-23T16:50:42Z

FYI keeping lists in a frame, while allowed, it not efficient at all, what excatly are you trying to accomplish?

ijmcf · 2013-04-23T18:20:09Z

Good question. I am creating a DataFrame containing a number of key elements of information on a daily process - some of those elements are singular (floats, integers, strings), but some are multiple - and the number of elements can vary day by day from 0 to n. I'm storing those elements currently as lists.

For example, something like the dummy data frame I used in the notes on the Issue.

If you have any suggestions for alternative approaches, I'd be glad to hear them.

Thanks
Iain

On Tuesday, April 23, 2013 at 12:50 PM, jreback wrote:

FYI keeping lists in a frame, while allowed, it not efficient at all, what excatly are you trying to accomplish?

—
Reply to this email directly or view it on GitHub (#3435 (comment)).

jreback · 2013-04-23T18:25:02Z

I would use multiple df's in this case, maybe indexed by a common element
(and then wrap a class around it to manage it)

for your singular elements it looks like a single df is good
for the multiple ones

use another frame that is indexed 0..n (could be along index or columns whatever makes sense)

when you are mixing hierarchical and non-hierarchial (singular data) better 2 use different objects

jreback · 2013-05-13T22:51:49Z

closed by #3585

ariddell · 2013-09-10T20:17:57Z

Is there any alternative here? I frequently see R dataframes that contain lists. Sometimes one needs a little unnormalized data to be associated with a record.

jreback · 2013-09-10T20:29:38Z

can you give an example if input and output?

jtratner · 2013-09-10T20:56:48Z

Could you use a tuple?

ariddell · 2013-09-10T21:58:28Z

Just for the record, here's no less an authority than Trevor Hastie cramming
data structures inside a data frame in R.

> library(lars)
Loaded lars 1.2

> data(diabetes)
> str(diabetes)
'data.frame':   442 obs. of  3 variables:
$ x : AsIs [1:442, 1:10] 0.038075.... -0.00188.... 0.085298.... -0.08906.... 0.005383.... ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr  "age" "sex" "bmi" "map" ...
$ y : num  151 75 141 206 135 97 138 63 110 310 ...
$ x2: AsIs [1:442, 1:64] 0.038075.... -0.00188.... 0.085298.... -0.08906.... 0.005383.... ...
..- attr(*, ".Names")= chr  "age" "age" "age" "age" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr  "1" "2" "3" "4" ...
.. ..$ : chr  "age" "sex" "bmi" "map" ...

Here's my more modest example:

In [3]: df = pd.DataFrame.from_records([dict(id=10, languages=('en','de')), dict(id=11)])

In [4]: df
Out[4]: 
   id languages
0  10  (en, de)
1  11       NaN

In [7]: # doesn't work

In [8]: df.fillna(tuple())
Out[8]: 
   id languages
0  10  (en, de)
1  11       NaN

In [9]: # doesn't work either

In [10]: df.fillna([])
Out[10]: 
   id languages
0  10  (en, de)
1  11       NaN

In [11]: # best I can do

In [12]: df.fillna(set())
Out[12]: 
   id languages
0  10  (en, de)
1  11        ()

I'm using a release version of pandas -- but I gather the list and tuple will raise exceptions.

jreback · 2013-09-10T22:04:00Z

in an object column (eg strings) this is easy and natural

my hesitation is if u did this is a float column then it would convert to an object dtype
that's the real issue

as from 'accidentally'' putting a list (when u don't mean it)

cpcloud · 2013-09-10T22:07:07Z

That data set is a nice example of how not to structure your data. Using I() to stuff things in a data.frame just seems like a terrible idea.

ariddell · 2013-09-10T22:57:35Z

I like my example of putting lists or tuples. They are perfectly valid NumPy object arrays. A string with comma delimiters just isn't a general option -- what if the underlying strings contain commas?

Now that I think about it -- what is the workaround? I can't do this:

In [10]: df = pd.DataFrame.from_records([dict(id=10, languages=('en','de')), dict(id=11)])

In [11]: df.languages[pd.isnull(df.languages)] = tuple()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

I suppose json.dumps() and json.loads() is probably the way to go?

BrenBarn · 2014-04-26T22:08:45Z

Is there an actual solution to this? What are you supposed to do if you actually want a DataFrame/Series whose values are lists, and you want to replace NaN values with an empty list?

jreback · 2014-04-27T13:56:33Z

@BrenBarn you are welcome to open an issue to support this, would be ok. But as you know supporting lists in a frame is problematic at best (eg. setting is pretty much impossible), so this have very limited uses, and would never recommend using it.

Pranjalya · 2020-06-19T11:17:51Z

The dict doesn't work now. :-(

rohetoric · 2023-09-10T12:09:07Z

Sorry but why is this issue closed? What is the solution here?

lodagro mentioned this issue May 6, 2013

fillna/fill_value fails when filling with a list #3526

Closed

cpcloud mentioned this issue May 13, 2013

raise on fillna passed a list or tuple #3585

Merged

jreback closed this as completed May 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fillna() does not work when value parameter is a list #3435

fillna() does not work when value parameter is a list #3435

ijmcf commented Apr 23, 2013

jreback commented Apr 23, 2013

jreback commented Apr 23, 2013

ijmcf commented Apr 23, 2013

jreback commented Apr 23, 2013

jreback commented May 13, 2013

ariddell commented Sep 10, 2013

jreback commented Sep 10, 2013

jtratner commented Sep 10, 2013

ariddell commented Sep 10, 2013

jreback commented Sep 10, 2013

cpcloud commented Sep 10, 2013

ariddell commented Sep 10, 2013

BrenBarn commented Apr 26, 2014

jreback commented Apr 27, 2014

Pranjalya commented Jun 19, 2020

rohetoric commented Sep 10, 2023

fillna() does not work when value parameter is a list #3435

fillna() does not work when value parameter is a list #3435

Comments

ijmcf commented Apr 23, 2013

jreback commented Apr 23, 2013

jreback commented Apr 23, 2013

ijmcf commented Apr 23, 2013

jreback commented Apr 23, 2013

jreback commented May 13, 2013

ariddell commented Sep 10, 2013

jreback commented Sep 10, 2013

jtratner commented Sep 10, 2013

ariddell commented Sep 10, 2013

jreback commented Sep 10, 2013

cpcloud commented Sep 10, 2013

ariddell commented Sep 10, 2013

BrenBarn commented Apr 26, 2014

jreback commented Apr 27, 2014

Pranjalya commented Jun 19, 2020

rohetoric commented Sep 10, 2023