Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datacube.load performance for multi band netCDF data #756

Closed
fre171csiro opened this issue Jul 3, 2019 · 23 comments
Closed

Datacube.load performance for multi band netCDF data #756

fre171csiro opened this issue Jul 3, 2019 · 23 comments

Comments

@fre171csiro
Copy link

fre171csiro commented Jul 3, 2019

Expected behaviour

Something comparable to xarray.open_dataset('file_to_load.nc')

Actual behaviour

On the same infrastructure current datacube.load(...) which would load the same dataset/file is significantly slower. xarray load time = ~8 ms, datacube load = ~28m

Simple comparison

image

Steps to reproduce the behaviour

... Include code, command line parameters as appropriate ...

Environment information

  • Which datacube --version are you using?
    Open Data Cube core, version 1.7

  • What datacube deployment/enviornment are you running against?
    CSIRO (@woodcockr) Internal depolyment

netCDF metadata

gdalinfo (output is truncated as there are 366 bands)

!gdalinfo /data/qtot/qtot_avg_1912.nc
Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
Driver: netCDF/Network Common Data Format
Files: /data/qtot/qtot_avg_1912.nc
Size is 841, 681
Coordinate System is `'
Origin = (111.974999999999994,-9.975000000000000)
Pixel Size = (0.050000000000000,-0.050000000000000)
Metadata:
  latitude#long_name=latitude
  latitude#name=latitude
  latitude#standard_name=latitude
  latitude#units=degrees_north
  longitude#long_name=longitude
  longitude#name=longitude
  longitude#standard_name=longitude
  longitude#units=degrees_east
  NC_GLOBAL#var_name=qtot_avg
  NETCDF_DIM_EXTRA={time}
  NETCDF_DIM_time_DEF={366,4}
  NETCDF_DIM_time_VALUES={4382,4383,4384,4385,4386,4387,4388,4389,4390,4391,4392,4393,4394,4395,4396,4397,4398,4399,4400,4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413,4414,4415,4416,4417,4418,4419,4420,4421,4422,4423,4424,4425,4426,4427,4428,4429,4430,4431,4432,4433,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,4454,4455,4456,4457,4458,4459,4460,4461,4462,4463,4464,4465,4466,4467,4468,4469,4470,4471,4472,4473,4474,4475,4476,4477,4478,4479,4480,4481,4482,4483,4484,4485,4486,4487,4488,4489,4490,4491,4492,4493,4494,4495,4496,4497,4498,4499,4500,4501,4502,4503,4504,4505,4506,4507,4508,4509,4510,4511,4512,4513,4514,4515,4516,4517,4518,4519,4520,4521,4522,4523,4524,4525,4526,4527,4528,4529,4530,4531,4532,4533,4534,4535,4536,4537,4538,4539,4540,4541,4542,4543,4544,4545,4546,4547,4548,4549,4550,4551,4552,4553,4554,4555,4556,4557,4558,4559,4560,4561,4562,4563,4564,4565,4566,4567,4568,4569,4570,4571,4572,4573,4574,4575,4576,4577,4578,4579,4580,4581,4582,4583,4584,4585,4586,4587,4588,4589,4590,4591,4592,4593,4594,4595,4596,4597,4598,4599,4600,4601,4602,4603,4604,4605,4606,4607,4608,4609,4610,4611,4612,4613,4614,4615,4616,4617,4618,4619,4620,4621,4622,4623,4624,4625,4626,4627,4628,4629,4630,4631,4632,4633,4634,4635,4636,4637,4638,4639,4640,4641,4642,4643,4644,4645,4646,4647,4648,4649,4650,4651,4652,4653,4654,4655,4656,4657,4658,4659,4660,4661,4662,4663,4664,4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,4678,4679,4680,4681,4682,4683,4684,4685,4686,4687,4688,4689,4690,4691,4692,4693,4694,4695,4696,4697,4698,4699,4700,4701,4702,4703,4704,4705,4706,4707,4708,4709,4710,4711,4712,4713,4714,4715,4716,4717,4718,4719,4720,4721,4722,4723,4724,4725,4726,4727,4728,4729,4730,4731,4732,4733,4734,4735,4736,4737,4738,4739,4740,4741,4742,4743,4744,4745,4746,4747}
  qtot_avg#long_name=Total runoff: averaged across both HRUs (mm)
  qtot_avg#name=qtot_avg
  qtot_avg#standard_name=qtot_avg
  qtot_avg#units=mm
  qtot_avg#_FillValue=-999
  time#calendar=gregorian
  time#long_name=time
  time#name=time
  time#standard_name=time
  time#units=days since 1900-01-01
Corner Coordinates:
Upper Left  ( 111.9750000,  -9.9750000) 
Lower Left  ( 111.9750000, -44.0250000) 
Upper Right ( 154.0250000,  -9.9750000) 
Lower Right ( 154.0250000, -44.0250000) 
Center      ( 133.0000000, -27.0000000) 
Band 1 Block=50x1 Type=Float32, ColorInterp=Undefined
  NoData Value=-999
  Unit Type: mm
  Metadata:
    long_name=Total runoff: averaged across both HRUs (mm)
    name=qtot_avg
    NETCDF_DIM_time=4382
    NETCDF_VARNAME=qtot_avg
    standard_name=qtot_avg
    units=mm
    _FillValue=-999

ncdump -h

netcdf qtot_avg_1912 {
dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;

// global attributes:
		:var_name = "qtot_avg" ;
}
@Kirill888
Copy link
Member

@fre171csiro for ncdump can you please re-run with ncdump -hs, particularly we are interested in chunking parameters
example:

        short count_clear(time, y, x) ;
                count_clear:_FillValue = -1s ;
                count_clear:grid_mapping = "crs" ;
                count_clear:units = "1" ;
                count_clear:long_name = "count_clear" ;
                count_clear:coverage_content_type = "modelResult" ;
                count_clear:_Storage = "chunked" ;
                count_clear:_ChunkSizes = 1, 200, 200 ;
                count_clear:_DeflateLevel = 4 ;
                count_clear:_Shuffle = "true" ;
                count_clear:_Fletcher32 = "true" ;
                count_clear:_Endianness = "little" ;

Also 8ms for open_dataset is obviously too fast to read the data, still computing mean should read all the data and that takes only 8s <<< 30min.

And for gdalinfo can you pleas re-run with gdalinfo NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg

@fre171csiro
Copy link
Author

fre171csiro commented Jul 3, 2019

!ncdump -hs /data/qtot/qtot_avg_1912.nc
netcdf qtot_avg_1912 {
dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 1 ;
		time:_Endianness = "little" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
		latitude:_Storage = "contiguous" ;
		latitude:_Endianness = "little" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
		longitude:_Storage = "contiguous" ;
		longitude:_Endianness = "little" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 75, 1, 50 ;
		qtot_avg:_DeflateLevel = 1 ;
		qtot_avg:_Shuffle = "true" ;
		qtot_avg:_Endianness = "little" ;

// global attributes:
		:var_name = "qtot_avg" ;
		:_NCProperties = "version=2,netcdf=4.6.2,hdf5=1.10.4" ;
		:_SuperblockVersion = 2 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;`

@fre171csiro
Copy link
Author

fre171csiro commented Jul 3, 2019

!gdalinfo NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg
Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
Driver: netCDF/Network Common Data Format
Files: /data/qtot/qtot_avg_1912.nc
Size is 841, 681
Coordinate System is `'
Origin = (111.974999999999994,-9.975000000000000)
Pixel Size = (0.050000000000000,-0.050000000000000)
Metadata:
  latitude#long_name=latitude
  latitude#name=latitude
  latitude#standard_name=latitude
  latitude#units=degrees_north
  longitude#long_name=longitude
  longitude#name=longitude
  longitude#standard_name=longitude
  longitude#units=degrees_east
  NC_GLOBAL#var_name=qtot_avg
  NETCDF_DIM_EXTRA={time}
  NETCDF_DIM_time_DEF={366,4}
  NETCDF_DIM_time_VALUES={4382,4383,4384,4385,4386,4387,4388,4389,4390,4391,4392,4393,4394,4395,4396,4397,4398,4399,4400,4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413,4414,4415,4416,4417,4418,4419,4420,4421,4422,4423,4424,4425,4426,4427,4428,4429,4430,4431,4432,4433,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,4454,4455,4456,4457,4458,4459,4460,4461,4462,4463,4464,4465,4466,4467,4468,4469,4470,4471,4472,4473,4474,4475,4476,4477,4478,4479,4480,4481,4482,4483,4484,4485,4486,4487,4488,4489,4490,4491,4492,4493,4494,4495,4496,4497,4498,4499,4500,4501,4502,4503,4504,4505,4506,4507,4508,4509,4510,4511,4512,4513,4514,4515,4516,4517,4518,4519,4520,4521,4522,4523,4524,4525,4526,4527,4528,4529,4530,4531,4532,4533,4534,4535,4536,4537,4538,4539,4540,4541,4542,4543,4544,4545,4546,4547,4548,4549,4550,4551,4552,4553,4554,4555,4556,4557,4558,4559,4560,4561,4562,4563,4564,4565,4566,4567,4568,4569,4570,4571,4572,4573,4574,4575,4576,4577,4578,4579,4580,4581,4582,4583,4584,4585,4586,4587,4588,4589,4590,4591,4592,4593,4594,4595,4596,4597,4598,4599,4600,4601,4602,4603,4604,4605,4606,4607,4608,4609,4610,4611,4612,4613,4614,4615,4616,4617,4618,4619,4620,4621,4622,4623,4624,4625,4626,4627,4628,4629,4630,4631,4632,4633,4634,4635,4636,4637,4638,4639,4640,4641,4642,4643,4644,4645,4646,4647,4648,4649,4650,4651,4652,4653,4654,4655,4656,4657,4658,4659,4660,4661,4662,4663,4664,4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,4678,4679,4680,4681,4682,4683,4684,4685,4686,4687,4688,4689,4690,4691,4692,4693,4694,4695,4696,4697,4698,4699,4700,4701,4702,4703,4704,4705,4706,4707,4708,4709,4710,4711,4712,4713,4714,4715,4716,4717,4718,4719,4720,4721,4722,4723,4724,4725,4726,4727,4728,4729,4730,4731,4732,4733,4734,4735,4736,4737,4738,4739,4740,4741,4742,4743,4744,4745,4746,4747}
  qtot_avg#long_name=Total runoff: averaged across both HRUs (mm)
  qtot_avg#name=qtot_avg
  qtot_avg#standard_name=qtot_avg
  qtot_avg#units=mm
  qtot_avg#_FillValue=-999
  time#calendar=gregorian
  time#long_name=time
  time#name=time
  time#standard_name=time
  time#units=days since 1900-01-01
Corner Coordinates:
Upper Left  ( 111.9750000,  -9.9750000) 
Lower Left  ( 111.9750000, -44.0250000) 
Upper Right ( 154.0250000,  -9.9750000) 
Lower Right ( 154.0250000, -44.0250000) 
Center      ( 133.0000000, -27.0000000) 
Band 1 Block=50x1 Type=Float32, ColorInterp=Undefined
  NoData Value=-999
  Unit Type: mm
  Metadata:
    long_name=Total runoff: averaged across both HRUs (mm)
    name=qtot_avg
    NETCDF_DIM_time=4382
    NETCDF_VARNAME=qtot_avg
    standard_name=qtot_avg
    units=mm
    _FillValue=-999

@Kirill888
Copy link
Member

@fre171csiro alright this is due to chunking along time dimension.

float qtot_avg(time, latitude, longitude) ;
.....
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 75, 1, 50 ;
		qtot_avg:_DeflateLevel = 1 ;
		qtot_avg:_Shuffle = "true" ;
		qtot_avg:_Endianness = "little" ;

Datacube reads one time slice at a time, with this file structure reading 1 time slice means reading and uncompressing 75 time slices then throwing away 74 of them only to repeat that again. Known issue, and should be addressed within datacube, this requires using netcdf library instead of GDAL, since GDAL data model also assumes raster planes, so it will also read one time slice at a time.

I suggest you re-chunk your netcdf to have chunking along time dimension to be 1, and chunking along lat/lon to be 512x512 or similar. I believe nccopy might be able to do that.

I am also concerned about this:

Coordinate System is `'

does this file have correct geo-registration?

@Kirill888
Copy link
Member

Related #625

@fre171csiro
Copy link
Author

Thanks @Kirill888 for the feedback and suggestions I will give the re-chucking a go. As for the geo-registration I don't know and will get back to you

@fre171csiro
Copy link
Author

fre171csiro commented Jul 3, 2019

Ok as the data covers the whole of Australia I think what has happen is that a default of GDA94 is assumed
https://www.spatialreference.org/ref/epsg/4283/

@sixy6e
Copy link
Contributor

sixy6e commented Jul 3, 2019

As there are no standard parallels defined, nor ellipsoidal parameters, it could be any geographical coordinate system.

@Kirill888
Copy link
Member

Kirill888 commented Jul 3, 2019

@fre171csiro rasterio is the library we use to read the file, depending on how it was installed it might use custom version of gdal it ships. Can you try rio info NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg, also helpful to know

rio --version and rio --gdal-version and gdal-config --version

@Kirill888
Copy link
Member

NetCDF/CF crs is specified through this mechanism:

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#appendix-grid-mappings

so looks like this file is missing CRS. Maybe netcdf gdal driver assumes EPSG:4326 if axis are latitude/longitude

@fre171csiro
Copy link
Author

!rio --version = 1.0.22 !rio --gdal-version = 2.4.0 !gdal-config --version = 2.4.0

@fre171csiro
Copy link
Author

So for rio info NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg we get the following where "crs": null

WARNING:rasterio._env:CPLE_AppDefined in No UNIDATA NC_GLOBAL:Conventions attribute {"blockxsize": 50, "blockysize": 1, "bounds": [111.975, -44.025000000000006, 154.025, -9.975], "colorinterp": ["undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined"], "count": 366, "crs": null, "descriptions": [null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null], "driver": "netCDF", "dtype": "float32", "height": 681, "indexes": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366], "mask_flags": [["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"]], "nodata": -999.0, "res": [0.05, 0.05], "shape": [681, 841], "tiled": true, "transform": [0.05, 0.0, 111.975, 0.0, -0.05, -9.975, 0.0, 0.0, 1.0], "units": ["mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm"], "width": 841}

@fre171csiro
Copy link
Author

Ok I have tried to rechunk the netcdf file and I am not certain that I have achieve the desired outcome as load time has slowed down even more.

nccopy -d5 -w -c time/1,lat/512,lon/512 qtot_avg_1912.nc qtot_avg_1912_chucked.nc

produced

dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 1 ;
		time:_DeflateLevel = 5 ;
		time:_Endianness = "little" ;
		time:_NoFill = "true" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
		latitude:_Storage = "chunked" ;
		latitude:_ChunkSizes = 681 ;
		latitude:_DeflateLevel = 5 ;
		latitude:_Endianness = "little" ;
		latitude:_NoFill = "true" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
		longitude:_Storage = "chunked" ;
		longitude:_ChunkSizes = 841 ;
		longitude:_DeflateLevel = 5 ;
		longitude:_Endianness = "little" ;
		longitude:_NoFill = "true" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 1, 1, 50 ;
		qtot_avg:_DeflateLevel = 5 ;
		qtot_avg:_Endianness = "little" ;
		qtot_avg:_NoFill = "true" ;

// global attributes:
		:var_name = "qtot_avg" ;
		:_NCProperties = "version=1|netcdflibversion=4.6.0|hdf5libversion=1.10.0" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;

please ignore the spelling :-)

@fre171csiro fre171csiro reopened this Jul 4, 2019
@Kirill888
Copy link
Member

@fre171csiro you also need to increase chunk size in lat/lon axis, this file has way too many tiny chunks in each plane

@Kirill888
Copy link
Member

You want _ChunkSizes = 1, 512, 512

@Kirill888
Copy link
Member

need latitude instead of lat on your conversion command

@Kirill888
Copy link
Member

@fre171csiro can you also share your product definition, I'm curious why .load works even though files have no CRS and a sample dataset definition would be helpful.

@Kirill888
Copy link
Member

Seems related #673.

this current issue is an example of where this "fallback" behaviour is more negative than positive.

@fre171csiro
Copy link
Author

fre171csiro commented Jul 4, 2019

@fre171csiro can you also share your product definition, I'm curious why .load works even though files have no CRS and a sample dataset definition would be helpful.

name: AWRA_Flow_Total
description: Flow_Total

# If unsure use eo
# platform/instrument are optional, copy from source product
# or omit if combining products from multiple platforms
metadata_type: eo
metadata:
  product_type: awra_qtot_avg
  format:
    name: NetCDF
  platform:
    code: Srn
  instrument:
    name: qtot_avg

storage:
  crs: EPSG:4326
  resolution:
    longitude: 0.05
    latitude: -0.05

measurements:
  # Repeat for all variables
  - name: Flow_total
    dtype: float32
    nodata: -999.
    units: '1'

@Kirill888
Copy link
Member

yep, confirmed. @fre171csiro storage section is intended for "ingested" products, those generate storage files that are exactly as storage spec. For external data this is only valid if external data geo-referencing is identical across all files. Also note that Resolution and CRS is not sufficient to fully describe pixel grid, one also needs to know where pixel boundaries lie, if not specified datacube assume pixel edge coincides with x=0, y=0

@Kirill888
Copy link
Member

Our docs are incomplete and but here is the spec for storage section:

storage:
type: object
properties:
chunking:
type: object
crs:
type: string
dimension_order:
type: array
resolution:
type: object
tile_size:
type: object
origin:
type: object
driver:
type: string
additionalProperties: false

If your files are indeed all the same, you can specify origin, but best thing is to fix geo referencing on the file to be sure your data is returned with correct coordinates.

@fre171csiro
Copy link
Author

Thanks for your help @Kirill888. Do you know if there is any future tutorials/workshops that help with data prep, product definitions, dataset definitions/prep, indexing and ingesting?

@fre171csiro
Copy link
Author

@Kirill888 your suggested chunking has improved load time

%%time
import datacube
with datacube.Datacube(env='awra') as dc:
    %time data = dc.load(product='AWRA_Flow_Total', time=('1912-01-01', '1912-12-31'), resolution=(-0.05,0.05))#, latitude=(-85.53), longitude=(56.77))#, time=('1910-01-01'))
data
CPU times: user 53.1 s, sys: 26.7 s, total: 1min 19s
Wall time: 2min 54s
CPU times: user 54.4 s, sys: 27.3 s, total: 1min 21s
Wall time: 2min 55s
<xarray.Dataset>
Dimensions:     (latitude: 700, longitude: 900, time: 366)
Coordinates:
  * time        (time) datetime64[ns] 1912-01-01 1912-01-02 ... 1912-12-31
  * latitude    (latitude) float64 -10.03 -10.08 -10.12 ... -44.88 -44.92 -44.98
  * longitude   (longitude) float64 110.0 110.1 110.1 ... 154.9 154.9 155.0
Data variables:
    Flow_total  (time, latitude, longitude) float32 -999.0 -999.0 ... -999.0
Attributes:
    crs:      EPSG:4326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants