New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fillna() does not work when value parameter is a list #3435
Comments
lists are not allowed (for the reasons you show), should raise on this (only scalar or dict are valid) |
FYI keeping lists in a frame, while allowed, it not efficient at all, what excatly are you trying to accomplish? |
Good question. I am creating a DataFrame containing a number of key elements of information on a daily process - some of those elements are singular (floats, integers, strings), but some are multiple - and the number of elements can vary day by day from 0 to n. I'm storing those elements currently as lists. For example, something like the dummy data frame I used in the notes on the Issue. If you have any suggestions for alternative approaches, I'd be glad to hear them. Thanks On Tuesday, April 23, 2013 at 12:50 PM, jreback wrote:
|
I would use multiple df's in this case, maybe indexed by a common element for your singular elements it looks like a single df is good use another frame that is indexed 0..n (could be along index or columns whatever makes sense) when you are mixing hierarchical and non-hierarchial (singular data) better 2 use different objects |
closed by #3585 |
Is there any alternative here? I frequently see R dataframes that contain lists. Sometimes one needs a little unnormalized data to be associated with a record. |
can you give an example if input and output? |
Could you use a tuple? |
Just for the record, here's no less an authority than Trevor Hastie cramming
Here's my more modest example:
I'm using a release version of pandas -- but I gather the list and tuple will raise exceptions. |
in an object column (eg strings) this is easy and natural my hesitation is if u did this is a float column then it would convert to an object dtype as from 'accidentally'' putting a list (when u don't mean it) |
That data set is a nice example of how not to structure your data. Using |
I like my example of putting lists or tuples. They are perfectly valid NumPy object arrays. A string with comma delimiters just isn't a general option -- what if the underlying strings contain commas? Now that I think about it -- what is the workaround? I can't do this:
I suppose json.dumps() and json.loads() is probably the way to go? |
Is there an actual solution to this? What are you supposed to do if you actually want a DataFrame/Series whose values are lists, and you want to replace NaN values with an empty list? |
@BrenBarn you are welcome to open an issue to support this, would be ok. But as you know supporting lists in a frame is problematic at best (eg. setting is pretty much impossible), so this have very limited uses, and would never recommend using it. |
The dict doesn't work now. :-( |
Sorry but why is this issue closed? What is the solution here? |
Should raise on a passed list to
value
The results from the fillna() method are very strange when the value parameter is given a list.
For example, using a simple example DataFrame:
So it appears the values in the list are used to fill the 'holes' in order, if the list has the same length as number of holes. But if the the list is shorter than the number of holes, the behavior changes to using only the first value in the list:
If the list is longer than the number of holes, you get something even more odd:
If you specify provide a dict that specifies the fill values by column, the values from the list are used within that column only:
Since it's not always practical to know the number of NaN values a priori, or to customize the length of the value list to match it, this is problematic. Furthermore, some desired values get over-interpreted and cannot be used:
For example, if you want to actually replace all NaN instances in a single column with the same list (either empty or non-empty), I can't figure out how to do it:
Indeed, if you specify the empty list nothing is filled:
But a dict works fine:
So it appears the fillna() is making a lot of decisions about how the fill values should be applied, and certain desired outcomes can't be achieved because it's being too 'clever'.
The text was updated successfully, but these errors were encountered: