New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) #17348

Closed
daltschu opened this Issue Aug 27, 2017 · 2 comments

Comments

Projects
None yet
4 participants
@daltschu

daltschu commented Aug 27, 2017

Unintended behaviour of pandas happens when one tries to create a series applying
tuple (or list) to two columns of a dataframe, one of which consists of timestamps:

import pandas as pd
import numpy as np
d = pd.DataFrame({'a': pd.Series(np.random.randn(4)), 
                  'b': ['a', 'list', 'of', 'words'], 
                  'ts': pd.date_range('2016-10-01', periods=4, freq='H')})
d
a b ts
0 0.200813 a 2016-10-01 00:00:00
1 0.316971 list 2016-10-01 01:00:00
2 -0.186392 of 2016-10-01 02:00:00
3 -0.565593 words 2016-10-01 03:00:00

let's try first with columns 'a'and 'b':

d[['a', 'b']].apply(tuple, axis=1)
0         (0.2008128669491346, a)
1      (0.3169711841447721, list)
2       (-0.1863916899789735, of)
3    (-0.5655926199699992, words)
dtype: object

So far, everything is fine. Now let's do it with 'a' and 'ts':

d[['a', 'ts']].apply(tuple, axis=1)
a ts
0 0.200813 2016-10-01 00:00:00
1 0.316971 2016-10-01 01:00:00
2 -0.186392 2016-10-01 02:00:00
3 -0.565593 2016-10-01 03:00:00

Oops.

It's easy to find a way around this, by coating the timestamps before apply and uncoating after:

def coating(t):
    return lambda: t

def uncoating(x, f):
    return x, f()
d['coated_ts'] = d['ts'].apply(coating)
d[['a', 'coated_ts']].apply(tuple, axis=1).apply(lambda t: uncoating(*t))
0     (0.2008128669491346, 2016-10-01 00:00:00)
1     (0.3169711841447721, 2016-10-01 01:00:00)
2    (-0.1863916899789735, 2016-10-01 02:00:00)
3    (-0.5655926199699992, 2016-10-01 03:00:00)
dtype: object

It would be nice if this strange behaviour was corrected.

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@daltschu daltschu changed the title from Strange behaviour when trying to create a series from two columns of a dataframe to Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) Aug 27, 2017

@gfyoung gfyoung added the Bug label Aug 27, 2017

@gfyoung

This comment has been minimized.

Member

gfyoung commented Aug 27, 2017

@daltschu : Thanks for reporting this! Indeed, that looks pretty buggy to me. Investigation and subsequent PR to patch is welcome!

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 28, 2017

this is a duplicate of #16321, #15628

When you are returning a list-like it is re-converted to columns if the len matches the input shape. Its not really the best to do this, but not inferring is worse. not really sure datetimes make this different. you are welcome to have a look to see if you can make this better / more consistent.

@jreback jreback added the Reshaping label Aug 28, 2017

jreback added a commit to jreback/pandas that referenced this issue Nov 30, 2017

@jreback jreback added this to the 0.22.0 milestone Nov 30, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 2, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 3, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 7, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 10, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 14, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 21, 2017

jreback added a commit to jreback/pandas that referenced this issue Dec 23, 2017

jreback added a commit to jreback/pandas that referenced this issue Jan 6, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

@jorisvandenbossche jorisvandenbossche added Apply and removed Reshaping labels Jan 28, 2018

jreback added a commit to jreback/pandas that referenced this issue Feb 5, 2018

API/BUG: .apply will correctly infer output shape when axis=1
closes #16353
closes #17348
closes #17437
closes #18573
closes #17970
closes #17892
closes #17602
closes #18775
closes #18901
closes #18919

jorisvandenbossche added a commit that referenced this issue Feb 7, 2018

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment