Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Extending the orient="table" option to all Table Schema types #55038

Open
1 of 3 tasks
loco-philippe opened this issue Sep 6, 2023 · 0 comments
Open
1 of 3 tasks
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@loco-philippe
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The orient="table" option of the JSON interface enforces the TableSchema specification but only partially.
The table below shows the deviations from the specification:

Data



Data type
table-schema



format
table-schema



type
pandas


Specification
orient=’table’
pandas


read_json
orient=’table’
datetime default (datetime ISO8601 in UTC) datetime datetime64[ns] ok
number default number float64 ok
integer default integer int64 ok
boolean default boolean bool ok
string default string object ok
custom type default any (custom type) category / string ok
email email string Format not supported
uri uri string Format not supported
binary Binary (base64 string) string Format not supported
uuid uuid string Format not supported
date, time or datetime with parsable format any (parsable ?) date, time, datetime Format partially supported
date, time or datetime with custom format <PATTERN> date, time, datetime Format not supported
duration default (lexical duration ISO8601) duration timedelta64[ns] Read_json not yet implemented
Json data default (json) object Unsupported
Json array default (json array) array Unsupported
date default (date ISO8601) date Unsupported
time default (time ISO8601) time Unsupported
year default year Unsupported
month default yearmonth Unsupported
Point (string) default (string “lon, lat”) geopoint Unsupported
Point (geojson array) array (array [lon, lat]) geopoint Unsupported
Point (json object) object (eg {"lon": 90, "lat": 45}) geopoint Unsupported
Geometry (geojson) default (geojson spec) geojson Unsupported
Geometry (topojson) Topojson (topojson spec) geojson Unsupported
Everything (custom type) (string) Only ‘any‘ is supported

Feature Description

proposal

To have a simple and reversible solution which respect the TableSchema specification, I propose to add an 'External Type' (as defined in some JSON export e.g. "ExtDtype":"Float32").
This External Type (NTVtype) can be added to the name of DataFrame Columns following NTV format (as defined in Internet-Draft).

This proposal is included in the PDEP-0012

The table below shows the correspondence between TableSchema format / type and pandas NTVtype / dtype:

format / type NTVtype / dtype
default / datetime / datetime64[ns]
default / number / float64
default / integer / int64
default / boolean / bool
default / string / object
default / duration / timedelta64[ns]
email / string email / string
uri / string uri / string
default / object object / object
default / array array / object
default / date date / object
default / time time / object
default / year year / int64
default / yearmonth month / int64
array / geopoint point / object
default / geojson geojson / object

Note:

  • other TableSchema format are defined and are to be studied (uuid, binary, topojson, specific format for geopoint and datation)
  • the first six lines correspond to the existing

Issues

The proposal resolves or provides an alternative solution for the following issues: #12997, #14358, #16492, #35420, #35464, #36211, #39537, #49585, #50782, #51375, #52595, #53252

Examples

In [1]: from shapely.geometry import Point
        from datetime import date

In [2]: df = pd.DataFrame({
            'end february::date': ['date(2023,2,28)', 'date(2024,2,29)', 'date(2025,2,28)'],
            'coordinates::point': ['Point([2.3, 48.9])', 'Point([5.4, 43.3])', 'Point([4.9, 45.8])'],
            'contact::email':     ['john.doe@table.com', 'lisa.minelli@schema.com', 'walter.white@breaking.com']
            })

In [3]: df
Out[3]:
        end february::date coordinates::point             contact::email
        0         2023-02-28   POINT (2.3 48.9)         john.doe@table.com
        1         2024-02-29   POINT (5.4 43.3)    lisa.minelli@schema.com
        2         2025-02-28   POINT (4.9 45.8)  walter.white@breaking.com

JSON representation

In [4]: pprint(df.to_json(orient='table'), compact=True, width=140, sort_dicts=False)
Out[4]:
        {'schema': {'fields': [{'name': 'index', 'type': 'integer'},
                               {'name': 'end february', 'type': 'date'},
                               {'name': 'coordinates', 'type': 'geopoint', 'format': 'array'},
                               {'name': 'contact', 'type': 'string', 'format': 'email'}],
                    'primaryKey': ['index'],
                    'pandas_version': '1.4.0'},
         'data': [{'index': 0, 'end february': '2023-02-28', 'coordinates': [2.3, 48.9], 'contact': 'john.doe@table.com'},
                  {'index': 1, 'end february': '2024-02-29', 'coordinates': [5.4, 43.3], 'contact': 'lisa.minelli@schema.com'},
                  {'index': 2, 'end february': '2025-02-28', 'coordinates': [4.9, 45.8], 'contact': 'walter.white@breaking.com'}]}

Alternative Solutions

The alternative solutions are to add a new pandas attribute ext_type or to extend the dtype as NTVtype.

Those solutions have important impacts.

Additional Context

-> NTV repository : https://github.com/loco-philippe/NTV
-> PDEP-0012

@loco-philippe loco-philippe added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant