Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.to_json() slower in 0.13.x vs 0.12.0 #5765

Closed
acowlikeobject opened this issue Dec 23, 2013 · 16 comments · Fixed by #6137

Comments

@acowlikeobject
Copy link

commented Dec 23, 2013

df.to_json() method seems consistently ~1.8x slower in version 0.13.x (and a few older 0.12.x versions in the master branch on git) than in 0.12.0.

Version 0.12.0:

Python 2.7.5+ (default, Sep 17 2013, 15:31:50) 
In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 96.1 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.12.0', '1.7.1') 

Version 0.13.0rc1:

Python 2.7.5+ (default, Sep 17 2013, 15:31:50) 
In [1]: import pandas as pd, numpy as np

In [2]: df = pd.DataFrame(np.random.rand(100000,10))

In [3]: %timeit df.to_json(orient='split')
10 loops, best of 3: 172 ms per loop

In [4]: pd.__version__, np.__version__
Out[4]: ('0.13.0rc1-119-g2485e09', '1.8.0')

The 1.8x factor seems to hold on my machine across Python versions (2.7.5 vs 3.3.2), dataframe sizes, orient values and dtypes (only tried floats and DatetimeIndex).

Was there some change in to_json() or have I goofed something up in my environment?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2013

@dsm054

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2013

Should I be worried that it takes me ~700ms to do the same thing, or is the original computer super-fast?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2013

as an aside....could prob add a vbench (or 2) in pandas/vb/packers.py (to at least track this from version to version)

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2013

I see similar results, ~200ms 0.12 vs. ~430ms 0.13rc

@danbirken

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2014

I ran a little to_json() timing test over a bunch of commits, and I believe 2 slowdowns are being introduced since 0.12, a smaller one and a larger one. I'm fairly sure the larger slowdown is this commit: edefe98

In my testing to_json() takes 1.5x as long in this commit vs its parent. I honestly don't know enough about the tradeoffs of locale vs slowdown to submit any sort of pull request to fix it, but at least that should help a future person narrow down this bug.

@Komnomnomnom

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2014

Thanks guys, I'm going to take a look into this one and see if I can improve / remove the slowdown, haven't had much time over the Christmas period. Thanks @danbirken for narrowing it down :), got a code snippet for the tests you're running?

@danbirken

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2014

Very similar to the sample in the OP (I just added a hash to ensure the results were identical... they are):

import pandas as pd
import hashlib, time

df = pd.DataFrame({'a': range(500000), 'b': [i / 1000. for i in range(500000)]})

total_time = 0

for i in range(10):
    start = time.time()
    out = df.to_json()
    total_time += time.time() - start

print 'Time: %.2fs, Hash: %s' % (
    total_time, hashlib.md5(out).hexdigest()
)

Then:

$ git checkout c7b578c2176c9a0e1099b07d9b01be25c6fa3bca
$ python setup.py clean
$ python setup.py build_ext --inplace
$ python timing_test.py
Time: 4.49s, Hash: 2608e38a63003e637a3a108760bdf46e

$ git checkout edefe981ca2422e78045b981bb373cfbeb458097
$ python setup.py clean
$ python setup.py build_ext --inplace
$ python timing_test.py
Time: 6.42s, Hash: 2608e38a63003e637a3a108760bdf46e
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2014

@Komnomnomnom anything on this?

@Komnomnomnom

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2014

hey @jreback, should have some time this weekend

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2014

np

@Komnomnomnom

This comment has been minimized.

Copy link
Contributor

commented Jan 26, 2014

FYI I looked into this a bit more, and while it does seem the (my) locale code is slowing things down a bit, the main culprit is much earlier. For code jsonifying a dataframe of 100,000 floats vbench produces

download

which points to this PR (mine!) #4498. I'll see if I can sort it out.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2014

@Komnomnomnom we still have a few days left for 0.,13.1 if you can pinpoint slowdown (and don't compromise functionaility / tests) lmk

@Komnomnomnom

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2014

@jreback I've tracked down the main slowdowns. Fix is in the works so will
hopefully get a PR together today or tomorrow.

On Tue, Jan 28, 2014 at 9:43 AM, jreback notifications@github.com wrote:

@Komnomnomnom https://github.com/Komnomnomnom we still have a few days
left for 0.,13.1 if you can pinpoint slowdown (and don't compromise
functionaility / tests) lmk


Reply to this email directly or view it on GitHubhttps://github.com//issues/5765#issuecomment-33432176
.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2014

gr8

@Komnomnomnom

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2014

Thanks @acowlikeobject for catching this, and others for investigating. Please check out my PR for a probable fix.

@acowlikeobject

This comment has been minimized.

Copy link
Author

commented Jan 28, 2014

@Komnomnomnom No worries, thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.