New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datacube.load performance for multi band netCDF data #756
Comments
@fre171csiro for
Also And for |
|
|
@fre171csiro alright this is due to chunking along time dimension.
Datacube reads one time slice at a time, with this file structure reading 1 time slice means reading and uncompressing 75 time slices then throwing away 74 of them only to repeat that again. Known issue, and should be addressed within datacube, this requires using netcdf library instead of GDAL, since GDAL data model also assumes raster planes, so it will also read one time slice at a time. I suggest you re-chunk your netcdf to have chunking along time dimension to be I am also concerned about this:
does this file have correct geo-registration? |
Related #625 |
Thanks @Kirill888 for the feedback and suggestions I will give the re-chucking a go. As for the geo-registration I don't know and will get back to you |
Ok as the data covers the whole of Australia I think what has happen is that a default of GDA94 is assumed |
As there are no standard parallels defined, nor ellipsoidal parameters, it could be any geographical coordinate system. |
@fre171csiro
|
NetCDF/CF crs is specified through this mechanism: so looks like this file is missing CRS. Maybe netcdf gdal driver assumes |
|
So for
|
Ok I have tried to rechunk the netcdf file and I am not certain that I have achieve the desired outcome as load time has slowed down even more.
produced
please ignore the spelling :-) |
@fre171csiro you also need to increase chunk size in lat/lon axis, this file has way too many tiny chunks in each plane |
You want |
need |
@fre171csiro can you also share your product definition, I'm curious why |
Seems related #673. this current issue is an example of where this "fallback" behaviour is more negative than positive. |
|
yep, confirmed. @fre171csiro |
Our docs are incomplete and but here is the spec for storage section: datacube-core/datacube/model/schema/dataset-type-schema.yaml Lines 103 to 120 in 00649da
If your files are indeed all the same, you can specify |
Thanks for your help @Kirill888. Do you know if there is any future tutorials/workshops that help with data prep, product definitions, dataset definitions/prep, indexing and ingesting? |
@Kirill888 your suggested chunking has improved load time
|
Expected behaviour
Something comparable to xarray.open_dataset('file_to_load.nc')
Actual behaviour
On the same infrastructure current datacube.load(...) which would load the same dataset/file is significantly slower. xarray load time = ~8 ms, datacube load = ~28m
Simple comparison
Steps to reproduce the behaviour
... Include code, command line parameters as appropriate ...
Environment information
Which
datacube --version
are you using?Open Data Cube core, version 1.7
What datacube deployment/enviornment are you running against?
CSIRO (@woodcockr) Internal depolyment
netCDF metadata
gdalinfo (output is truncated as there are 366 bands)
ncdump -h
The text was updated successfully, but these errors were encountered: