Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.apply not working with datetimes #6125

Closed
dbew opened this issue Jan 27, 2014 · 2 comments · Fixed by #6126
Closed

DataFrame.apply not working with datetimes #6125

dbew opened this issue Jan 27, 2014 · 2 comments · Fixed by #6126
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@dbew
Copy link
Contributor

dbew commented Jan 27, 2014

When you use apply on a DataFrame with datetimes in, the result is unexpected. This is a dataframe with just integers and strings and the result is that we get the market names back out.

positions = pd.DataFrame([[1, 'ABC', 50], [1, 'YUM', 20], 
                          [1, 'DEF', 20], [2, 'ABC', 50],
                          [2, 'YUM', 20], [2, 'DEF', 20]],
                         columns=['a', 'market', 'position'])
positions.apply(lambda r: r['market'], axis=1)
Out[210]: 
0    ABC
1    YUM
2    DEF
3    ABC
4    YUM
5    DEF
dtype: object

If we replace the data in column 'a' with datetimes, then we get the wrong result - the first value in the market column is repeated:

import datetime

positions = pd.DataFrame([[datetime.datetime(2013, 1, 1), 'ABC', 50], 
                           [datetime.datetime(2013, 1, 1), 'YUM', 20],
                           [datetime.datetime(2013, 1, 1), 'DEF', 20],
                           [datetime.datetime(2013, 1, 2), 'ABC', 50],
                           [datetime.datetime(2013, 1, 2), 'YUM', 20], 
                           [datetime.datetime(2013, 1, 2), 'DEF', 20]],
                          columns=['a', 'market', 'position'])
positions.apply(lambda r: r['market'], axis=1)
Out[213]: 
0    ABC
1    ABC
2    ABC
3    ABC
4    ABC
5    ABC
dtype: object

If you replace the lambda function with a function which prints the object passed in, then you can see that you only ever receive the first row of the dataframe:

def print_input(r):
    print r
    return 1

positions.apply(print_input, axis=1)
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 0, dtype: object
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 1, dtype: object
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 2, dtype: object
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 3, dtype: object
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 4, dtype: object
a           2013-01-01 00:00:00
market                      ABC
position                     50
Name: 5, dtype: object
Out[215]: 
0    1
1    1
2    1
3    1
4    1
5    1
dtype: int64

This is new in the master, I didn't see it in pandas 0.11.0 or 0.13.0.

@jreback
Copy link
Contributor

jreback commented Jan 27, 2014

in order to do apply perf improvements I am not copying the data that is passed to the apply and just overwriting it. This doesn't work with datelike types intermixed (which are themselves a view on the underlying data). So a mixed-type frame has to do this reduction using a slower method (which is python based)

@dbew
Copy link
Contributor Author

dbew commented Jan 28, 2014

Thanks, that's working for me now (on head of master).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants