BUG: DataFrame with index having tzlocal() timezone could not be saved to parquet #33786

vfilimonov · 2020-04-25T11:27:57Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

If Dataframe has index which timezone is set to dateutil.tz.tzlocal(), it could not be saved to parquet.

It might be related to #24310

Code Sample, a copy-pastable example

from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())
x = pd.DataFrame([[1,2]]*len(ind), index=ind, columns=['A','B'])
x.to_parquet('tmp.parquet')

Problem description

The code raises ValueError: Unable to convert timezone "tzlocal()" to string.

However saving to e.g. CSV works well:

x.to_csv('tmp.csv')

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 42.0.1.post20191125
Cython : None
pytest : 5.3.0
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.3
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.3.0
pyxlsb : None
s3fs : 0.4.2
scipy : 1.3.1
sqlalchemy : 1.3.13
tables : 3.4.4
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : None
xlsxwriter : None
numba : 0.46.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-04-25T12:07:26Z

@vfilimonov dateutil timezones are currently not supported by pyarrow, see https://issues.apache.org/jira/browse/ARROW-5248

So the best option, for now, is to convert the timezone to a datetime.timezone fixed offset of pytz timezone.

vfilimonov · 2020-04-25T12:40:20Z

Thank you for quick response @jorisvandenbossche!

What would be the easiest way to convert index to fixed offset?

I could think of a workaround by iterating over index and parsing strings:

from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())

ind = [pd.Timestamp(str(_)) for _ in ind]

jbrockmendel · 2023-03-02T22:52:47Z

@jorisvandenbossche is there any prospect of this being supported in pyarrow? if not it might make sense to add a helpful message for this case?

vfilimonov added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2020

jorisvandenbossche added IO Parquet parquet, feather Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame with index having tzlocal() timezone could not be saved to parquet #33786

BUG: DataFrame with index having tzlocal() timezone could not be saved to parquet #33786

vfilimonov commented Apr 25, 2020 •

edited

INSTALLED VERSIONS

jorisvandenbossche commented Apr 25, 2020

vfilimonov commented Apr 25, 2020

jbrockmendel commented Mar 2, 2023

BUG: DataFrame with index having tzlocal() timezone could not be saved to parquet #33786

BUG: DataFrame with index having tzlocal() timezone could not be saved to parquet #33786

Comments

vfilimonov commented Apr 25, 2020 • edited

Code Sample, a copy-pastable example

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Apr 25, 2020

vfilimonov commented Apr 25, 2020

jbrockmendel commented Mar 2, 2023

vfilimonov commented Apr 25, 2020 •

edited

Output of `pd.show_versions()`