Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: maybe_convert_objects on soft converttof datetime can raise unexpectedly #19359

Closed
egslava opened this issue Jan 23, 2018 · 9 comments
Closed
Labels
Apply Apply, Aggregate, Transform Bug Dtype Conversions Unexpected or buggy dtype conversions good first issue Timeseries Usage Question

Comments

@egslava
Copy link

egslava commented Jan 23, 2018

Code Sample, a copy-pastable example if possible

from pandas import Series, DataFrame
from dateutil.parser import parse

def transform(x):
    return Series( {
        'time': parse("22:05 UTC+1"),
        'title': 'remove this "title" to remove the error, or remove timezone'
    } )

DataFrame ( ["stub"] ).apply(transform)

Problem description

It gives TypeError: unhashable type: 'tzoffset', instead of new DataFrame. To reproduce it, it's important to have time with tzinfo AND another key of different (non-datetime) type.

Expected Output

                                                       0
time                            2018-01-23 22:05:00-01:00
title  remove this "title" to remove the error, or re...

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.6.6
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.9
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: 4.1.1
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jan 23, 2018

simply apply to a single column. this is completely non-performant and non-idiomatic in any case.

In [7]: from pandas import Series, DataFrame
   ...: from dateutil.parser import parse
   ...: 
   ...: def transform(x):
   ...:     return Series( {
   ...:         'time': parse("22:05 UTC+1"),
   ...:         'title': 'remove this "title" to remove the error, or remove timezone'
   ...:     } )
   ...: 
   ...: DataFrame ( ["stub"] )[0].apply(transform)
   ...: 
Out[7]: 
                       time                                              title
0 2018-01-23 22:05:00-01:00  remove this "title" to remove the error, or re...

@jreback jreback closed this as completed Jan 23, 2018
@jreback jreback added Usage Question Apply Apply, Aggregate, Transform labels Jan 23, 2018
@jreback jreback added this to the No action milestone Jan 23, 2018
@egslava
Copy link
Author

egslava commented Jan 23, 2018

Hello, Jeff Reback!
Thank you so much for your apply! :) I mean, I know how to fix my program, just it took me about 1-2h to understand where is exactly the mistake.

This is a simplified example to reproduce the bug. In the real program I iterate over rows (axis=1) and I need an access to several columns to generate the result table. I don't want to be pushy, but why isn't current behavior considered to be a bug?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2018

In [27]: from pandas._libs import lib

In [28]: import dateutil

In [30]: lib.maybe_convert_objects(np.array([datetime(2018, 1, 23, 22, 5, tzinfo=dateutil.tz.tzoffset(None, -3600)),'foo']), convert_datetime=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-b06da0694d4e> in <module>()
----> 1 lib.maybe_convert_objects(np.array([datetime(2018, 1, 23, 22, 5, tzinfo=dateutil.tz.tzoffset(None, -3600)),'foo']), convert_datetime=1)

~/pandas/pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_objects()
   1321     # we try to coerce datetime w/tz but must all have the same tz
   1322     if seen.datetimetz_:
-> 1323         if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
   1324             from pandas import DatetimeIndex
   1325             return DatetimeIndex(objects)

TypeError: unhashable type: 'tzoffset'

@jreback jreback reopened this Jan 24, 2018
@jreback
Copy link
Contributor

jreback commented Jan 24, 2018

here is a repro. The apply behavior is fine actually, but its a lower level bug. A PR to fix this would be apprecaited.

@jreback jreback changed the title DataFrame.apply(). Lambda can't return a Series with time BUG: maybe_convert_objects on soft converttof datetime can raise unexpectedly Jan 24, 2018
@jreback jreback modified the milestones: No action, Next Major Release Jan 24, 2018
@lingster
Copy link

lingster commented Jan 26, 2018

The below seems to fix the issue, I'll create a PR unless any objections?
(in inference.pyx):

    if seen.datetimetz_:
        if not any({isinstance(val, str) for val in objects}) and \
                len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
            from pandas import DatetimeIndex
            return DatetimeIndex(objects)
        seen.object_ = 1
In[14]:DataFrame(["stub"]).apply(transform)
Out[14]: 
                                                       0
time                           2018-01-26 22:05:00-01:00
title  remove this "title" to remove the error ro rem...

@jreback
Copy link
Contributor

jreback commented Jan 26, 2018

ok

lingster added a commit to lingster/pandas that referenced this issue Jan 27, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 27, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@BeforeFlight
Copy link
Contributor

BeforeFlight commented Jul 25, 2019

Shouldn't it be closed? Bug seems like not reproducible now:

import pandas as pd
import numpy as np
import dateutil
import pandas._libs.lib as lib

from datetime import datetime

test = lib.maybe_convert_objects(
    np.array([
        datetime(2018, 1, 23, 22, 5, tzinfo=dateutil.tz.tzoffset(None, -3600)),
        'foo'
    ]),
    convert_datetime=1)

print(test, test.dtype)
[datetime.datetime(2018, 1, 23, 22, 5, tzinfo=tzoffset(None, -3600)) 'foo'] object

Also datetimeTZ now using stand-alone function for checking:

def is_datetime_with_singletz_array(values: ndarray) -> bool:

@BeforeFlight
Copy link
Contributor

@jreback

@TomAugspurger
Copy link
Contributor

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Dtype Conversions Unexpected or buggy dtype conversions good first issue Timeseries Usage Question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants