I'd recommend to use parquet partitions #6

veonua · 2021-02-17T20:25:36Z

it gives the smaller size and faster save\load time, while supported by the majority of data libraries

veonua · 2021-02-17T20:36:50Z

wallstreetbets_posts.csv > 920Mb
wallstreetbets_posts.parquet ~ 120Mb

mattpodolak · 2021-02-17T20:59:06Z

Hi @veonua, thanks for pointing this out. However, this is not functionality of the pmaw library, but post-processing done after responses have been retrieved with pmaw so I will be closing this issue.

The documentation will continue to provide an example using .csv, as this benefits the largest number of users.

veonua · 2021-02-17T21:01:45Z

the proposal was for a cache as well

mattpodolak · 2021-02-17T21:18:42Z

ah okay, sorry, that wasn't clear. I know it takes up less space than .pickle which is currently being used, but I haven't done any benchmarks with it yet.

Added as a feature request - #7. Thanks!

mattpodolak closed this as completed Feb 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I'd recommend to use parquet partitions #6

I'd recommend to use parquet partitions #6

veonua commented Feb 17, 2021

veonua commented Feb 17, 2021

mattpodolak commented Feb 17, 2021

veonua commented Feb 17, 2021

mattpodolak commented Feb 17, 2021 •

edited

Loading

I'd recommend to use parquet partitions #6

I'd recommend to use parquet partitions #6

Comments

veonua commented Feb 17, 2021

veonua commented Feb 17, 2021

mattpodolak commented Feb 17, 2021

veonua commented Feb 17, 2021

mattpodolak commented Feb 17, 2021 • edited Loading

mattpodolak commented Feb 17, 2021 •

edited

Loading