Datacube.load performance for multi band netCDF data #756

fre171csiro · 2019-07-03T04:05:20Z

Expected behaviour

Something comparable to xarray.open_dataset('file_to_load.nc')

Actual behaviour

On the same infrastructure current datacube.load(...) which would load the same dataset/file is significantly slower. xarray load time = ~8 ms, datacube load = ~28m

Simple comparison

Steps to reproduce the behaviour

... Include code, command line parameters as appropriate ...

Environment information

Which datacube --version are you using?
Open Data Cube core, version 1.7
What datacube deployment/enviornment are you running against?
CSIRO (@woodcockr) Internal depolyment

netCDF metadata

gdalinfo (output is truncated as there are 366 bands)

!gdalinfo /data/qtot/qtot_avg_1912.nc
Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
Driver: netCDF/Network Common Data Format
Files: /data/qtot/qtot_avg_1912.nc
Size is 841, 681
Coordinate System is `'
Origin = (111.974999999999994,-9.975000000000000)
Pixel Size = (0.050000000000000,-0.050000000000000)
Metadata:
  latitude#long_name=latitude
  latitude#name=latitude
  latitude#standard_name=latitude
  latitude#units=degrees_north
  longitude#long_name=longitude
  longitude#name=longitude
  longitude#standard_name=longitude
  longitude#units=degrees_east
  NC_GLOBAL#var_name=qtot_avg
  NETCDF_DIM_EXTRA={time}
  NETCDF_DIM_time_DEF={366,4}
  NETCDF_DIM_time_VALUES={4382,4383,4384,4385,4386,4387,4388,4389,4390,4391,4392,4393,4394,4395,4396,4397,4398,4399,4400,4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413,4414,4415,4416,4417,4418,4419,4420,4421,4422,4423,4424,4425,4426,4427,4428,4429,4430,4431,4432,4433,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,4454,4455,4456,4457,4458,4459,4460,4461,4462,4463,4464,4465,4466,4467,4468,4469,4470,4471,4472,4473,4474,4475,4476,4477,4478,4479,4480,4481,4482,4483,4484,4485,4486,4487,4488,4489,4490,4491,4492,4493,4494,4495,4496,4497,4498,4499,4500,4501,4502,4503,4504,4505,4506,4507,4508,4509,4510,4511,4512,4513,4514,4515,4516,4517,4518,4519,4520,4521,4522,4523,4524,4525,4526,4527,4528,4529,4530,4531,4532,4533,4534,4535,4536,4537,4538,4539,4540,4541,4542,4543,4544,4545,4546,4547,4548,4549,4550,4551,4552,4553,4554,4555,4556,4557,4558,4559,4560,4561,4562,4563,4564,4565,4566,4567,4568,4569,4570,4571,4572,4573,4574,4575,4576,4577,4578,4579,4580,4581,4582,4583,4584,4585,4586,4587,4588,4589,4590,4591,4592,4593,4594,4595,4596,4597,4598,4599,4600,4601,4602,4603,4604,4605,4606,4607,4608,4609,4610,4611,4612,4613,4614,4615,4616,4617,4618,4619,4620,4621,4622,4623,4624,4625,4626,4627,4628,4629,4630,4631,4632,4633,4634,4635,4636,4637,4638,4639,4640,4641,4642,4643,4644,4645,4646,4647,4648,4649,4650,4651,4652,4653,4654,4655,4656,4657,4658,4659,4660,4661,4662,4663,4664,4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,4678,4679,4680,4681,4682,4683,4684,4685,4686,4687,4688,4689,4690,4691,4692,4693,4694,4695,4696,4697,4698,4699,4700,4701,4702,4703,4704,4705,4706,4707,4708,4709,4710,4711,4712,4713,4714,4715,4716,4717,4718,4719,4720,4721,4722,4723,4724,4725,4726,4727,4728,4729,4730,4731,4732,4733,4734,4735,4736,4737,4738,4739,4740,4741,4742,4743,4744,4745,4746,4747}
  qtot_avg#long_name=Total runoff: averaged across both HRUs (mm)
  qtot_avg#name=qtot_avg
  qtot_avg#standard_name=qtot_avg
  qtot_avg#units=mm
  qtot_avg#_FillValue=-999
  time#calendar=gregorian
  time#long_name=time
  time#name=time
  time#standard_name=time
  time#units=days since 1900-01-01
Corner Coordinates:
Upper Left  ( 111.9750000,  -9.9750000) 
Lower Left  ( 111.9750000, -44.0250000) 
Upper Right ( 154.0250000,  -9.9750000) 
Lower Right ( 154.0250000, -44.0250000) 
Center      ( 133.0000000, -27.0000000) 
Band 1 Block=50x1 Type=Float32, ColorInterp=Undefined
  NoData Value=-999
  Unit Type: mm
  Metadata:
    long_name=Total runoff: averaged across both HRUs (mm)
    name=qtot_avg
    NETCDF_DIM_time=4382
    NETCDF_VARNAME=qtot_avg
    standard_name=qtot_avg
    units=mm
    _FillValue=-999

ncdump -h

netcdf qtot_avg_1912 {
dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;

// global attributes:
		:var_name = "qtot_avg" ;
}

The text was updated successfully, but these errors were encountered:

Kirill888 · 2019-07-03T04:25:59Z

@fre171csiro for ncdump can you please re-run with ncdump -hs, particularly we are interested in chunking parameters
example:

        short count_clear(time, y, x) ;
                count_clear:_FillValue = -1s ;
                count_clear:grid_mapping = "crs" ;
                count_clear:units = "1" ;
                count_clear:long_name = "count_clear" ;
                count_clear:coverage_content_type = "modelResult" ;
                count_clear:_Storage = "chunked" ;
                count_clear:_ChunkSizes = 1, 200, 200 ;
                count_clear:_DeflateLevel = 4 ;
                count_clear:_Shuffle = "true" ;
                count_clear:_Fletcher32 = "true" ;
                count_clear:_Endianness = "little" ;

Also 8ms for open_dataset is obviously too fast to read the data, still computing mean should read all the data and that takes only 8s <<< 30min.

And for gdalinfo can you pleas re-run with gdalinfo NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg

fre171csiro · 2019-07-03T04:32:00Z

!ncdump -hs /data/qtot/qtot_avg_1912.nc
netcdf qtot_avg_1912 {
dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 1 ;
		time:_Endianness = "little" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
		latitude:_Storage = "contiguous" ;
		latitude:_Endianness = "little" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
		longitude:_Storage = "contiguous" ;
		longitude:_Endianness = "little" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 75, 1, 50 ;
		qtot_avg:_DeflateLevel = 1 ;
		qtot_avg:_Shuffle = "true" ;
		qtot_avg:_Endianness = "little" ;

// global attributes:
		:var_name = "qtot_avg" ;
		:_NCProperties = "version=2,netcdf=4.6.2,hdf5=1.10.4" ;
		:_SuperblockVersion = 2 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;`

fre171csiro · 2019-07-03T04:34:55Z

!gdalinfo NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg
Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
Driver: netCDF/Network Common Data Format
Files: /data/qtot/qtot_avg_1912.nc
Size is 841, 681
Coordinate System is `'
Origin = (111.974999999999994,-9.975000000000000)
Pixel Size = (0.050000000000000,-0.050000000000000)
Metadata:
  latitude#long_name=latitude
  latitude#name=latitude
  latitude#standard_name=latitude
  latitude#units=degrees_north
  longitude#long_name=longitude
  longitude#name=longitude
  longitude#standard_name=longitude
  longitude#units=degrees_east
  NC_GLOBAL#var_name=qtot_avg
  NETCDF_DIM_EXTRA={time}
  NETCDF_DIM_time_DEF={366,4}
  NETCDF_DIM_time_VALUES={4382,4383,4384,4385,4386,4387,4388,4389,4390,4391,4392,4393,4394,4395,4396,4397,4398,4399,4400,4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413,4414,4415,4416,4417,4418,4419,4420,4421,4422,4423,4424,4425,4426,4427,4428,4429,4430,4431,4432,4433,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,4454,4455,4456,4457,4458,4459,4460,4461,4462,4463,4464,4465,4466,4467,4468,4469,4470,4471,4472,4473,4474,4475,4476,4477,4478,4479,4480,4481,4482,4483,4484,4485,4486,4487,4488,4489,4490,4491,4492,4493,4494,4495,4496,4497,4498,4499,4500,4501,4502,4503,4504,4505,4506,4507,4508,4509,4510,4511,4512,4513,4514,4515,4516,4517,4518,4519,4520,4521,4522,4523,4524,4525,4526,4527,4528,4529,4530,4531,4532,4533,4534,4535,4536,4537,4538,4539,4540,4541,4542,4543,4544,4545,4546,4547,4548,4549,4550,4551,4552,4553,4554,4555,4556,4557,4558,4559,4560,4561,4562,4563,4564,4565,4566,4567,4568,4569,4570,4571,4572,4573,4574,4575,4576,4577,4578,4579,4580,4581,4582,4583,4584,4585,4586,4587,4588,4589,4590,4591,4592,4593,4594,4595,4596,4597,4598,4599,4600,4601,4602,4603,4604,4605,4606,4607,4608,4609,4610,4611,4612,4613,4614,4615,4616,4617,4618,4619,4620,4621,4622,4623,4624,4625,4626,4627,4628,4629,4630,4631,4632,4633,4634,4635,4636,4637,4638,4639,4640,4641,4642,4643,4644,4645,4646,4647,4648,4649,4650,4651,4652,4653,4654,4655,4656,4657,4658,4659,4660,4661,4662,4663,4664,4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,4678,4679,4680,4681,4682,4683,4684,4685,4686,4687,4688,4689,4690,4691,4692,4693,4694,4695,4696,4697,4698,4699,4700,4701,4702,4703,4704,4705,4706,4707,4708,4709,4710,4711,4712,4713,4714,4715,4716,4717,4718,4719,4720,4721,4722,4723,4724,4725,4726,4727,4728,4729,4730,4731,4732,4733,4734,4735,4736,4737,4738,4739,4740,4741,4742,4743,4744,4745,4746,4747}
  qtot_avg#long_name=Total runoff: averaged across both HRUs (mm)
  qtot_avg#name=qtot_avg
  qtot_avg#standard_name=qtot_avg
  qtot_avg#units=mm
  qtot_avg#_FillValue=-999
  time#calendar=gregorian
  time#long_name=time
  time#name=time
  time#standard_name=time
  time#units=days since 1900-01-01
Corner Coordinates:
Upper Left  ( 111.9750000,  -9.9750000) 
Lower Left  ( 111.9750000, -44.0250000) 
Upper Right ( 154.0250000,  -9.9750000) 
Lower Right ( 154.0250000, -44.0250000) 
Center      ( 133.0000000, -27.0000000) 
Band 1 Block=50x1 Type=Float32, ColorInterp=Undefined
  NoData Value=-999
  Unit Type: mm
  Metadata:
    long_name=Total runoff: averaged across both HRUs (mm)
    name=qtot_avg
    NETCDF_DIM_time=4382
    NETCDF_VARNAME=qtot_avg
    standard_name=qtot_avg
    units=mm
    _FillValue=-999

Kirill888 · 2019-07-03T04:46:34Z

@fre171csiro alright this is due to chunking along time dimension.

float qtot_avg(time, latitude, longitude) ;
.....
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 75, 1, 50 ;
		qtot_avg:_DeflateLevel = 1 ;
		qtot_avg:_Shuffle = "true" ;
		qtot_avg:_Endianness = "little" ;

Datacube reads one time slice at a time, with this file structure reading 1 time slice means reading and uncompressing 75 time slices then throwing away 74 of them only to repeat that again. Known issue, and should be addressed within datacube, this requires using netcdf library instead of GDAL, since GDAL data model also assumes raster planes, so it will also read one time slice at a time.

I suggest you re-chunk your netcdf to have chunking along time dimension to be 1, and chunking along lat/lon to be 512x512 or similar. I believe nccopy might be able to do that.

I am also concerned about this:

Coordinate System is `'

does this file have correct geo-registration?

Kirill888 · 2019-07-03T04:56:43Z

Related #625

fre171csiro · 2019-07-03T05:00:02Z

Thanks @Kirill888 for the feedback and suggestions I will give the re-chucking a go. As for the geo-registration I don't know and will get back to you

fre171csiro · 2019-07-03T05:21:37Z

Ok as the data covers the whole of Australia I think what has happen is that a default of GDA94 is assumed
https://www.spatialreference.org/ref/epsg/4283/

sixy6e · 2019-07-03T05:44:58Z

As there are no standard parallels defined, nor ellipsoidal parameters, it could be any geographical coordinate system.

Kirill888 · 2019-07-03T06:00:12Z

@fre171csiro rasterio is the library we use to read the file, depending on how it was installed it might use custom version of gdal it ships. Can you try rio info NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg, also helpful to know

rio --version and rio --gdal-version and gdal-config --version

Kirill888 · 2019-07-03T06:38:59Z

NetCDF/CF crs is specified through this mechanism:

http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#appendix-grid-mappings

so looks like this file is missing CRS. Maybe netcdf gdal driver assumes EPSG:4326 if axis are latitude/longitude

fre171csiro · 2019-07-03T07:22:58Z

!rio --version = 1.0.22 !rio --gdal-version = 2.4.0 !gdal-config --version = 2.4.0

fre171csiro · 2019-07-03T07:27:59Z

So for rio info NETCDF:"/data/qtot/qtot_avg_1912.nc":qtot_avg we get the following where "crs": null

WARNING:rasterio._env:CPLE_AppDefined in No UNIDATA NC_GLOBAL:Conventions attribute {"blockxsize": 50, "blockysize": 1, "bounds": [111.975, -44.025000000000006, 154.025, -9.975], "colorinterp": ["undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined", "undefined"], "count": 366, "crs": null, "descriptions": [null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null], "driver": "netCDF", "dtype": "float32", "height": 681, "indexes": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366], "mask_flags": [["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"], ["nodata"]], "nodata": -999.0, "res": [0.05, 0.05], "shape": [681, 841], "tiled": true, "transform": [0.05, 0.0, 111.975, 0.0, -0.05, -9.975, 0.0, 0.0, 1.0], "units": ["mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm", "mm"], "width": 841}

fre171csiro · 2019-07-04T05:04:26Z

Ok I have tried to rechunk the netcdf file and I am not certain that I have achieve the desired outcome as load time has slowed down even more.

nccopy -d5 -w -c time/1,lat/512,lon/512 qtot_avg_1912.nc qtot_avg_1912_chucked.nc

produced

dimensions:
	time = UNLIMITED ; // (366 currently)
	latitude = 681 ;
	longitude = 841 ;
variables:
	int time(time) ;
		time:name = "time" ;
		time:long_name = "time" ;
		time:calendar = "gregorian" ;
		time:units = "days since 1900-01-01" ;
		time:standard_name = "time" ;
		time:_Storage = "chunked" ;
		time:_ChunkSizes = 1 ;
		time:_DeflateLevel = 5 ;
		time:_Endianness = "little" ;
		time:_NoFill = "true" ;
	double latitude(latitude) ;
		latitude:name = "latitude" ;
		latitude:long_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:standard_name = "latitude" ;
		latitude:_Storage = "chunked" ;
		latitude:_ChunkSizes = 681 ;
		latitude:_DeflateLevel = 5 ;
		latitude:_Endianness = "little" ;
		latitude:_NoFill = "true" ;
	double longitude(longitude) ;
		longitude:name = "longitude" ;
		longitude:long_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:standard_name = "longitude" ;
		longitude:_Storage = "chunked" ;
		longitude:_ChunkSizes = 841 ;
		longitude:_DeflateLevel = 5 ;
		longitude:_Endianness = "little" ;
		longitude:_NoFill = "true" ;
	float qtot_avg(time, latitude, longitude) ;
		qtot_avg:_FillValue = -999.f ;
		qtot_avg:name = "qtot_avg" ;
		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
		qtot_avg:units = "mm" ;
		qtot_avg:standard_name = "qtot_avg" ;
		qtot_avg:_Storage = "chunked" ;
		qtot_avg:_ChunkSizes = 1, 1, 50 ;
		qtot_avg:_DeflateLevel = 5 ;
		qtot_avg:_Endianness = "little" ;
		qtot_avg:_NoFill = "true" ;

// global attributes:
		:var_name = "qtot_avg" ;
		:_NCProperties = "version=1|netcdflibversion=4.6.0|hdf5libversion=1.10.0" ;
		:_SuperblockVersion = 0 ;
		:_IsNetcdf4 = 1 ;
		:_Format = "netCDF-4" ;

please ignore the spelling :-)

Kirill888 · 2019-07-04T05:06:24Z

@fre171csiro you also need to increase chunk size in lat/lon axis, this file has way too many tiny chunks in each plane

Kirill888 · 2019-07-04T05:07:03Z

You want _ChunkSizes = 1, 512, 512

Kirill888 · 2019-07-04T05:08:22Z

need latitude instead of lat on your conversion command

Kirill888 · 2019-07-04T05:14:43Z

@fre171csiro can you also share your product definition, I'm curious why .load works even though files have no CRS and a sample dataset definition would be helpful.

Kirill888 · 2019-07-04T05:19:41Z

Seems related #673.

this current issue is an example of where this "fallback" behaviour is more negative than positive.

fre171csiro · 2019-07-04T05:24:06Z

@fre171csiro can you also share your product definition, I'm curious why .load works even though files have no CRS and a sample dataset definition would be helpful.

name: AWRA_Flow_Total
description: Flow_Total

# If unsure use eo
# platform/instrument are optional, copy from source product
# or omit if combining products from multiple platforms
metadata_type: eo
metadata:
  product_type: awra_qtot_avg
  format:
    name: NetCDF
  platform:
    code: Srn
  instrument:
    name: qtot_avg

storage:
  crs: EPSG:4326
  resolution:
    longitude: 0.05
    latitude: -0.05

measurements:
  # Repeat for all variables
  - name: Flow_total
    dtype: float32
    nodata: -999.
    units: '1'

Kirill888 · 2019-07-04T05:38:29Z

yep, confirmed. @fre171csiro storage section is intended for "ingested" products, those generate storage files that are exactly as storage spec. For external data this is only valid if external data geo-referencing is identical across all files. Also note that Resolution and CRS is not sufficient to fully describe pixel grid, one also needs to know where pixel boundaries lie, if not specified datacube assume pixel edge coincides with x=0, y=0

Kirill888 · 2019-07-04T05:58:00Z

Our docs are incomplete and but here is the spec for storage section:

datacube-core/datacube/model/schema/dataset-type-schema.yaml

Lines 103 to 120 in 00649da

    
           storage: 
        
               type: object 
        
               properties: 
        
                   chunking: 
        
                       type: object 
        
                   crs: 
        
                       type: string 
        
                   dimension_order: 
        
                       type: array 
        
                   resolution: 
        
                       type: object 
        
                   tile_size: 
        
                       type: object 
        
                   origin: 
        
                       type: object 
        
                   driver: 
        
                       type: string 
        
               additionalProperties: false

If your files are indeed all the same, you can specify origin, but best thing is to fix geo referencing on the file to be sure your data is returned with correct coordinates.

fre171csiro · 2019-07-04T07:34:55Z

Thanks for your help @Kirill888. Do you know if there is any future tutorials/workshops that help with data prep, product definitions, dataset definitions/prep, indexing and ingesting?

fre171csiro · 2019-07-04T08:43:18Z

@Kirill888 your suggested chunking has improved load time

%%time
import datacube
with datacube.Datacube(env='awra') as dc:
    %time data = dc.load(product='AWRA_Flow_Total', time=('1912-01-01', '1912-12-31'), resolution=(-0.05,0.05))#, latitude=(-85.53), longitude=(56.77))#, time=('1910-01-01'))
data
CPU times: user 53.1 s, sys: 26.7 s, total: 1min 19s
Wall time: 2min 54s
CPU times: user 54.4 s, sys: 27.3 s, total: 1min 21s
Wall time: 2min 55s
<xarray.Dataset>
Dimensions:     (latitude: 700, longitude: 900, time: 366)
Coordinates:
  * time        (time) datetime64[ns] 1912-01-01 1912-01-02 ... 1912-12-31
  * latitude    (latitude) float64 -10.03 -10.08 -10.12 ... -44.88 -44.92 -44.98
  * longitude   (longitude) float64 110.0 110.1 110.1 ... 154.9 154.9 155.0
Data variables:
    Flow_total  (time, latitude, longitude) float32 -999.0 -999.0 ... -999.0
Attributes:
    crs:      EPSG:4326

fre171csiro closed this as completed Jul 3, 2019

fre171csiro reopened this Jul 4, 2019

fre171csiro closed this as completed Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datacube.load performance for multi band netCDF data #756

Datacube.load performance for multi band netCDF data #756

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

Kirill888 commented Jul 3, 2019

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 3, 2019 •

edited

sixy6e commented Jul 3, 2019

Kirill888 commented Jul 3, 2019 •

edited

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

fre171csiro commented Jul 4, 2019 •

edited

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

fre171csiro commented Jul 4, 2019

fre171csiro commented Jul 4, 2019

Datacube.load performance for multi band netCDF data #756

Datacube.load performance for multi band netCDF data #756

Comments

fre171csiro commented Jul 3, 2019 • edited by Kirill888

Expected behaviour

Actual behaviour

Simple comparison

Steps to reproduce the behaviour

Environment information

netCDF metadata

gdalinfo (output is truncated as there are 366 bands)

ncdump -h

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019 • edited by Kirill888

fre171csiro commented Jul 3, 2019 • edited by Kirill888

Kirill888 commented Jul 3, 2019

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 3, 2019 • edited

sixy6e commented Jul 3, 2019

Kirill888 commented Jul 3, 2019 • edited

Kirill888 commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 3, 2019

fre171csiro commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

fre171csiro commented Jul 4, 2019 • edited

Kirill888 commented Jul 4, 2019

Kirill888 commented Jul 4, 2019

fre171csiro commented Jul 4, 2019

fre171csiro commented Jul 4, 2019

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

fre171csiro commented Jul 3, 2019 •

edited by Kirill888

fre171csiro commented Jul 3, 2019 •

edited

Kirill888 commented Jul 3, 2019 •

edited

fre171csiro commented Jul 4, 2019 •

edited