Skip to content

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jan 10, 2020

similar to @cbertinato great work in #28595 but done deeper in the C extension

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Jan 10, 2020
@@ -407,6 +406,30 @@ static char *int64ToIso(int64_t value, NPY_DATETIMEUNIT base, size_t *len) {
return result;
}

/* Converts the int64_t representation of a duration to ISO; mutates len */
static char *int64ToIsoDuration(int64_t value, size_t *len) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after this PR going to split these conversion functions off into a separate module as the JSON one is getting rather big

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other PR is #31057 should probably be merged first

@@ -1615,6 +1649,11 @@ char **NpyArr_encodeLabels(PyArrayObject *labels, PyObjectEncoder *enc,
ret[i] = PyObject_Malloc(len + 1);
memcpy(ret[i], cLabel, len + 1);

if (is_datetimelike) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a memory leak in the current implementation where any datetimelike values used in the index would leak their respective character representation out; will add a whatsnew once the file is created

@WillAyd
Copy link
Member Author

WillAyd commented Jan 11, 2020

Current benchmarks:

       before           after         ratio
     [447a3b00]       [4a94f159]
     <master>         <json-timedelta>
+         152±3ms         209±10ms     1.38  io.json.ToJSONISO.time_iso_format('values')
+         168±3ms         213±10ms     1.27  io.json.ToJSONISO.time_iso_format('records')
+         188±4ms         235±10ms     1.25  io.json.ToJSONISO.time_iso_format('split')
+         195±4ms         238±10ms     1.22  io.json.ToJSONISO.time_iso_format('columns')
-        103±10ms         79.2±6ms     0.77  io.json.ToJSON.time_to_json('values', 'df_td_int_ts')
-        137±20ms         95.9±1ms     0.70  io.json.ToJSON.time_to_json('values', 'df_int_float_str')

So the time_iso_format checks are taking a little bit longer but note that they were broken before, i.e. on master you get this:

>>> pd.Series([pd.Timedelta(days=1)], index=[pd.Timedelta(days=1)]).to_json(date_format="iso")
'{"P1DT0H0M0S":"1970-01-02T00:00:00.000Z"}'

Where we are writing as dates and not durations.

I think just a matter of the timedelta -> iso function not being as fast (using sprintf instead of totally hand-coded) but maybe OK for now

@@ -905,3 +905,37 @@ int make_iso_8601_datetime(npy_datetimestruct *dts, char *outstr, int outlen,
outlen);
return -1;
}


int make_iso_8601_timedelta(pandas_timedeltastruct *tds,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we do also have a isoformat function for time deltas defined here:

def isoformat(self) -> str:

But I think the path to get that to work in the JSON module would be way more complicated than implementing in C as we do here (and less consistent with other datetime handling).

Maybe as a follow up we want to use this C function in the Timedelta class

@jbrockmendel
Copy link
Member

pls rebase

@jbrockmendel
Copy link
Member

LGTM cc @jreback

@jbrockmendel
Copy link
Member

needs rebase, other good to go i think

@jreback jreback added this to the 1.1 milestone Mar 17, 2020
@jreback
Copy link
Contributor

jreback commented Mar 17, 2020

completely closes #28256. ?

can you check the perf again.

@WillAyd
Copy link
Member Author

WillAyd commented Mar 17, 2020

Yea closes; added some more test coverage as well.

Here is an updated benchmark:

       before           after         ratio
     [402c5cdd]       [ef08ad6c]
     <json-test-cleanup~1^2>       <json-timedelta>
+        143±20ms         225±10ms     1.57  io.json.ToJSONISO.time_iso_format('values')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@jreback
Copy link
Contributor

jreback commented Mar 17, 2020

this is slower?

@WillAyd
Copy link
Member Author

WillAyd commented Mar 17, 2020

Slower but correct

@jbrockmendel
Copy link
Member

Slower but correct

worth the tradeoff every time.

@jreback jreback merged commit 80078ac into pandas-dev:master Mar 19, 2020
@jreback
Copy link
Contributor

jreback commented Mar 19, 2020

thanks, correct wins every time.

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 23, 2020
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 25, 2020
@WillAyd WillAyd deleted the json-timedelta branch April 12, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timedelta in to_json object array and ISO dates not handled properly
4 participants