-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading multiple symbols with a single query #814
Comments
Parallel reads would work great. A process pool or threadpool should help a lot this this many symbols |
Thanks for your reply! I've made some tests. Changed my code to:
Here are the results, where localhost is a local instange of MongoDB and remote is a MongoDB cluster hosted on Atlas with the free tier. Concat and singledfs are the 2 solutions in my first post. Threaded is the solution in this post.
The performance boost of local threaded read against others makes me think I don't see the same results with the remote instance because it's the free-tier, rather than other reasons (network overhead and similar). Hopefully I'll get better results with the paid one. Will update if I decide to go this route. Any input on the matter is very appreciated. edit: disregard "preparing aggregation". I used that to compare performance between iterating over concat and an array of dfs. edit2: I'm both testing on a better remote instance and publishing the complete code so that it can be reproduced, in a few minutes... |
So, I've upgraded the remote MongoDB intance to a more powerful one changed the number of txt files from 150(ish) to 280. Here are the benchmarks:
The complete code I'm using, in case someone wants to reproduce, is here: https://github.com/Saturnix/ArcticFiddle |
So it seems like some improvement, no? |
Yes, it is! Thanks for the tip! |
I think that the concatenated DFs might get better performance simply because you'll probably get better compression on much larger dataframe than you would on many small ones being compressed independently |
@Saturnix closing this issue, feel free to reopen if you have more questions regarding this. |
Arctic Version
Arctic Store
Description of problem and/or code sample that reproduces the issue
Pulling many different symbols (300 and more) in a loop is slow (can take several seconds).
Especially if this is from a remote mongodb instance. I suppose this is because for each
.read()
a separate query is made?My solution is to store the dataframes concatenated:
Is there any way to do this, without any performance hit against my solution above?
df1, df2, df3 = library.read(["df1", "df2", "df3"])
The text was updated successfully, but these errors were encountered: