Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_json of Series with period dtype results in AttributeError #31917

Open
gerrymanoim opened this issue Feb 12, 2020 · 8 comments
Open

to_json of Series with period dtype results in AttributeError #31917

gerrymanoim opened this issue Feb 12, 2020 · 8 comments
Labels
Enhancement IO JSON read_json, to_json, json_normalize Period Period data type

Comments

@gerrymanoim
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
test = pd.Series(pd.period_range('1/1/2011', freq='B', periods=3))
test.to_json(None, orient='table')

Leads to:

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent)
   2362             compression=compression,
   2363             index=index,
-> 2364             indent=indent,
   2365         )
   2366

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/io/json/_json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent)
     83         default_handler=default_handler,
     84         index=index,
---> 85         indent=indent,
     86     ).write()
     87

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/io/json/_json.py in __init__(self, obj, orient, date_format, double_precision, ensure_ascii, date_unit, index, default_handler, indent)
    289             raise ValueError(msg)
    290
--> 291         self.schema = build_table_schema(obj, index=self.index)
    292
    293         # NotImplemented on a column MultiIndex

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/io/json/_table_schema.py in build_table_schema(data, index, primary_key, version)
    251     if data.ndim > 1:
    252         for column, s in data.items():
--> 253             fields.append(convert_pandas_type_to_json_field(s))
    254     else:
    255         fields.append(convert_pandas_type_to_json_field(data))

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/io/json/_table_schema.py in convert_pandas_type_to_json_field(arr, dtype)
    115         field["ordered"] = ordered
    116     elif is_period_dtype(arr):
--> 117         field["freq"] = arr.freqstr
    118     elif is_datetime64tz_dtype(arr):
    119         if hasattr(arr, "dt"):

~/miniconda3/envs/qgrid/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'freqstr'

This issue also crops up in dataframes where a column is a period Series and orient='table'.

Problem description

I can't find anything that says this would be unsupported.

Expected Output

Json string of series representation.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

Yea just don't think this is implemented in the serializer. Would take a PR to support

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Period Period data type labels Feb 12, 2020
@WillAyd WillAyd added this to the Contributions Welcome milestone Feb 12, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Apr 15, 2020
@jreback jreback modified the milestones: 1.1, Contributions Welcome Jul 10, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

@WillAyd would this go through values_for_json? I just started a branch that moves values_for_json from the Block to the EA, want to see if doing so fixes any existing issues and this seems like a candidate.

@WillAyd
Copy link
Member

WillAyd commented Jun 13, 2023

Been a while since I looked at it but I think the datetimes still return primitive values in values_for_json and objToJSON converts those primitive values to textual representations. I think can short-circuit some of that here if you returned strings from values_for_json but that would introduce inconsistency in how we handle things.

A more robust/challenging approach would be to implement similar period serialization functions in the np_datetime_strings / JSON modules

@jbrockmendel
Copy link
Member

in #53680 i tried implementing PeriodArray._values_for_json to return self._format_native_types() and the OP example gave

'{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"values","type":"datetime","freq":"B"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"values":"2011-01-03"},{"index":1,"values":"2011-01-04"},{"index":2,"values":"2011-01-05"}]}'

i'm guessing that the "type":"datetime" might not be exactly what we want, but the values look like what i'd expect.

@jbrockmendel
Copy link
Member

@WillAyd can you comment on whether the result in the previous comment is "correct"? if so we can fix this in #53680

@WillAyd
Copy link
Member

WillAyd commented Jul 19, 2023

I don't know the JSON table specification that deeply so hard to say if that is correct or not without reading through the spec. The other consideration point is how to get that to work with existing JSON keywords like date_format and date_unit

@jbrockmendel
Copy link
Member

oh if there is a table spec then do we want to explicit not-support dtypes that are not part of the spec? That would be better than the current OverflowError: Maximum recursion level reached

@WillAyd
Copy link
Member

WillAyd commented Jul 19, 2023

The table spec is located at https://specs.frictionlessdata.io/table-schema/ (pulled link from the top of the _table_schema.py module)

I think we already extend it for timezone support, though I'm not sure if the way we extended that was part of the spec or something we invented. The table format is different than the other formats and follows quite a few different paths in the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants