PERF: to_json very slow with lines=True #14408

Closed
joshowen opened this Issue Oct 12, 2016 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

joshowen commented Oct 12, 2016 edited by jreback

A small, complete example of the issue

N = 100000
C = 5

In [6]: df = DataFrame(dict([('float{0}'.format(i), np.random.randn(N)) for i in range(C)]))

In [7]: df.to_json('foo.json',orient='records',lines=True)

In [8]: %timeit df.to_json('foo.json',orient='records',lines=True)
1 loop, best of 3: 3.66 s per loop

In [9]: %timeit df.to_json('foo.json',orient='records')
10 loops, best of 3: 98.8 ms per loop

As discussed in pydata#14391

jreback added this to the 0.19.1 milestone Oct 12, 2016

Contributor

jreback commented Oct 12, 2016

@jreback jreback added a commit to jreback/pandas that referenced this issue Oct 15, 2016

@jreback jreback PERF: improved perf in .to_json when lines=True
closes #14408
e855e1f

jreback closed this in 7cad3f1 Oct 15, 2016

@tworec tworec added a commit to RTBHOUSE/pandas that referenced this issue Oct 21, 2016

@jreback @tworec jreback + tworec PERF: improved perf in .to_json when lines=True
closes #14408
13f988c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment