Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'd recommend to use parquet partitions #6

Closed
veonua opened this issue Feb 17, 2021 · 4 comments
Closed

I'd recommend to use parquet partitions #6

veonua opened this issue Feb 17, 2021 · 4 comments

Comments

@veonua
Copy link

veonua commented Feb 17, 2021

it gives the smaller size and faster save\load time, while supported by the majority of data libraries

@veonua
Copy link
Author

veonua commented Feb 17, 2021

wallstreetbets_posts.csv > 920Mb
wallstreetbets_posts.parquet ~ 120Mb

@mattpodolak
Copy link
Owner

Hi @veonua, thanks for pointing this out. However, this is not functionality of the pmaw library, but post-processing done after responses have been retrieved with pmaw so I will be closing this issue.

The documentation will continue to provide an example using .csv, as this benefits the largest number of users.

@veonua
Copy link
Author

veonua commented Feb 17, 2021

the proposal was for a cache as well

@mattpodolak
Copy link
Owner

mattpodolak commented Feb 17, 2021

ah okay, sorry, that wasn't clear. I know it takes up less space than .pickle which is currently being used, but I haven't done any benchmarks with it yet.

Added as a feature request - #7. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants