New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: different apply function behavior when columns with type Timestamp present #17602

Closed
xvwei1989 opened this Issue Sep 20, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@xvwei1989

xvwei1989 commented Sep 20, 2017

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
df = pd.DataFrame([[1,2],[1,2]],columns=['a','b'])
print df.apply(lambda x: {'s':x['a']+x['b']},1)
################
# (AS EXPECTED)
# output: 
# 0    {u's': 3}
# 1    {u's': 3}
# dtype: object
################

# add one new column with type Timestamp
df['tm'] = [pd.Timestamp('2017-05-01 00:00:00'),pd.Timestamp('2017-05-02 00:00:00')]
print df.apply(lambda x: {'s':x['a']+x['b']},1)

################
#(WRONG OUTPUT)
# output: 
#       a  b   tm
# 0   NaN NaN NaN
# 1   NaN NaN NaN
################

Problem description

when the return type of apply function is dict, if a new column with type Timestamp is added to the dataframe, the result will be unexpected even if the apply function is unchanged

Output of pd.show_versions()

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 20, 2017

duplicate of #16353 and #15628

.apply infers the output dimension based on what you are returning, which looks exactly like a Series. This is not idiomatic pandas, not to mention non-performant.

@jreback jreback closed this Sep 20, 2017

@jreback jreback added this to the No action milestone Sep 20, 2017

jreback added a commit to jreback/pandas that referenced this issue Nov 30, 2017

@jreback jreback modified the milestones: No action, 0.22.0 Nov 30, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 2, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 3, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 7, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 10, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 14, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 21, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 23, 2017

jreback added a commit to jreback/pandas that referenced this issue Jan 6, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jreback added a commit to jreback/pandas that referenced this issue Feb 5, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jorisvandenbossche added a commit that referenced this issue Feb 7, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment