New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading any data for reprs #6722
Comments
So what's the solution here? Add another condition checking for more than a certain number of variables? Somehow check whether a dataset is cloud-backed? |
I think the best thing to do is to not load anything unless asked to. So delete the |
This would be a pretty small change and only applies for loading data into numpy arrays, for example current repr for a variable followed by modified for the example dataset above (which already happens for large arrays): Seeing a few values at the edges can be nice, so this makes me realize how data summaries in the metadata (Zarr or STAC) is great for large datasets on cloud storage. |
Is the print still slow if somewhere just before the load the array was masked to only show a few start and end elements, |
What happened?
For "small" datasets, we load in to memory when displaying the repr. For cloud backed datasets with large number of "small" variables, this can use a lot of time sequentially loading O(100) variables just for a repr.
xarray/xarray/core/formatting.py
Lines 548 to 549 in 6c8db5e
What did you expect to happen?
Fast reprs!
Minimal Complete Verifiable Example
This dataset has 48 "small" variables
On
2022.03.0
this repr takes 36.4sIf I comment the
array.size
condition I get 6μs.MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.4
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.05.2
distributed: None
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
setuptools: 62.3.2
pip: 22.1.2
conda: None
pytest: None
IPython: 8.4.0
sphinx: 4.5.0
The text was updated successfully, but these errors were encountered: