-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Code Sample, a copy-pastable example if possible
In [1]: import xarray as xr
In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')
In [3]: radmax_ds
Out[3]:
<xarray.Dataset>
Dimensions: (latitude: 5650, longitude: 12050, time: 3)
Coordinates:
* latitude (latitude) float32 13.505002 13.515002 13.525002 13.535002 ...
* longitude (longitude) float32 -170.495 -170.485 -170.475 -170.465 ...
* time (time) datetime64[ns] 2017-03-07T01:00:00 2017-03-07T02:00:00 ...
Data variables:
RadarMax (time, latitude, longitude) float32 ...
Attributes:
start_date: 03/07/2017 01:00
end_date: 03/07/2017 01:55
elapsed: 60
data_rights: Respond (TM) Confidential Data. (c) Insurance Services Offi...
In [4]: %timeit foo = radmax_ds.RadarMax.load()
The slowest run took 35509.20 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 216 µs per loop
In [5]: 216 * 35509.2
Out[5]: 7669987.199999999
So, without any slicing, it takes approximately 7.5 seconds for me to load this complete file into memory. Now, let's see what happens when I slice the DataArray and load it:
In [1]: import xarray as xr
In [2]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')
In [3]: %timeit foo = radmax_ds.RadarMax[::1, ::1, ::1].load()
1 loop, best of 3: 7.56 s per loop
In [4]: radmax_ds.close()
In [5]: radmax_ds = xr.open_dataset('tests/radmax_baseline.nc')
In [6]: %timeit foo = radmax_ds.RadarMax[::1, ::10, ::10].load()
I killed this session after 17 minutes. top
did not report any unusual io wait, and memory usage was not out of control. I am using v0.10.2 of xarray. My suspicion is that there is something wrong with the indexing system that is causing xarray to read in the data in a bad order. Notice that if I slice all the data, then the timing works out the same as reading it all in straight-up. Not shown here is a run where if I slice every 100 lats and 100 longitudes, then the timing is shorter again, but not to the same amount of time as reading it all in at once.
Let me know if you want a copy of the file. It is a compressed netcdf4, taking up only 1.7MB.
I wonder if this is related to #1985?