Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: json should process data column-by-column (and not use .values) #9037

Closed
jreback opened this issue Dec 7, 2014 · 3 comments
Closed
Labels
IO JSON read_json, to_json, json_normalize Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Dec 7, 2014

xref #9027

So to_json first converts all the a DataFrame by using .values. This converts everything to object dtype (if its a mixed frame) and is pretty expensive perf wise.

If you do this column by column you get excellent perf (and tiny bit more complexity in the code).

cc @Komnomnomnom
@cpcloud

@jreback jreback added Performance Memory or execution speed performance IO JSON read_json, to_json, json_normalize labels Dec 7, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 7, 2014
@Komnomnomnom
Copy link
Contributor

Thanks @cpcloud for #9028

I should have some time to play with this this week. Will there be any perf impact going column by column with a non-mixed frame?

@jreback
Copy link
Contributor Author

jreback commented Dec 10, 2014

@Komnomnomnom

you should be able to go column by column for any kind of frame (and perf is essentially the same) except if its very large.

So the 'ideal' way to do this is actually

for dtype, block in df.blocks():
    # this is a single dtyped frame

So you get costless single-dtyped frames

@jreback
Copy link
Contributor Author

jreback commented Dec 24, 2014

closed by #9130

@jreback jreback closed this as completed Dec 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants