Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading small data takes a long time. #12

Open
davidbp opened this issue Oct 21, 2018 · 1 comment
Open

loading small data takes a long time. #12

davidbp opened this issue Oct 21, 2018 · 1 comment

Comments

@davidbp
Copy link

davidbp commented Oct 21, 2018

Hello,

I tried to load a small dataset and it seems to take a lot of time to do so. It also seems that using Queryverse takes a significant amount of time.

@time using Queryverse
 17.643986 seconds (43.33 M allocations: 2.193 GiB, 6.44% gc time)
@time df = DataFrame(load("iris.csv"))
17.341491 seconds (55.15 M allocations: 2.607 GiB, 11.01% gc time)

The second time I load data is much faster though.

@time df = DataFrame(load("iris.csv"))
 0.001057 seconds (5.01 k allocations: 210.609 KiB)

Is there a command to precompile or another way to make this faster?

@davidanthoff
Copy link
Member

I'm afraid this is just a general problem right now with julia: precompile doesn't actually save machine code, so even with precompile, a lot of stuff needs to be recompiled in every new julia session...

I think there are only two options that could work right now: 1) you could try to compile these packages into your sysimage. I've never done that and it might be very complicated... 2) Load just using CSVFiles, DataFrames. That will cut down on the number of packages that are being loaded, so it should help with the time the using takes. It won't help with the second issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants