You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have some problems with loading large data sets. In the census example, it uses pandas to read in .h5 files or dask for castra. Since castra is no longer maintained, I used pandas. But it is real slow. I end up using df = dd.from_pandas() to make another dask data frame. I tried dd.from_hdf(), the loading process is real fast, but it's only because the data was still on the disk, not in the memory. When plotting for this data frame, it could be really slow, especially for interactive plots. every time I zoomed, it re-searched the disk. Is there a better way to load large data sets? When the data set is too large for the memory, what can I do to make the plotting faster?
The text was updated successfully, but these errors were encountered:
Various file formats are discussed and benchmarked in #129. Personally, I recommend fastparquet, at least if you are using Python 3.
The osm.ipynb example in datashader shows how to set up dask to work out of core, when the data is too large for memory. It shouldn't be very difficult to get good performance, but you will probably have to study the options provided by fastparquet and dask (partition sizes, caching options, etc.) and experiment with them.
Hi, I have some problems with loading large data sets. In the census example, it uses pandas to read in
.h5
files or dask for castra. Since castra is no longer maintained, I used pandas. But it is real slow. I end up usingdf = dd.from_pandas()
to make another dask data frame. I trieddd.from_hdf()
, the loading process is real fast, but it's only because the data was still on the disk, not in the memory. When plotting for this data frame, it could be really slow, especially for interactive plots. every time I zoomed, it re-searched the disk. Is there a better way to load large data sets? When the data set is too large for the memory, what can I do to make the plotting faster?The text was updated successfully, but these errors were encountered: