Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dc.load returning almost no scenes for a large polygon but many for a small polygon in the same area #276

Closed
BexDunn opened this issue Aug 4, 2017 · 3 comments

Comments

@BexDunn
Copy link

commented Aug 4, 2017

Expected behaviour

Expect to get a similar number of scenes retrieved across a polygon area, when using a big polygon in the dc.load call vs using a small polygon in the dc.load retrieval.

Actual behaviour

If I plot the scenes retrieved I get a triangle shape with about 2 observations in it in the big polygon, whereas in that area for the small polygon I get 300-600 scenes retrieved. (Over 30 yr epoch, ls5,7 and 8)
image

Steps to reproduce the behaviour

https://github.com/rjdunn/GWnotebooks/blob/master/Dask_wetness_anomaly_debugger_040817-forDC.ipynb

The code above should run for either of the queries below. Obv you may wish to change the paths for the output files.

queries used:

Big polygon

 'geopolygon': Geometry(POLYGON ((-83105.2314368732 -1451602.90924564,-33391.0599240792 -1455644.18659597,-37415.0420370399 -1505610.83402089,-87117.7910054179 -1501569.35056345,-83105.2314368732 -1451602.90924564)), PROJCS["GDA94_Australian_Albers",GEOGCS["GCS_GDA_1994",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["standard_parallel_1",-18],PARAMETER["standard_parallel_2",-36],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",132],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["Meter",1]]),
 'time': ('1987-01-01', '2016-12-31')}

Little polygon

 'geopolygon': Geometry(POLYGON ((131.554434234076 -13.8009653022795,131.547892015813 -13.8790036027955,131.496441846499 -13.8746177747694,131.503013176119 -13.7965809082819,131.554434234076 -13.8009653022795)), GEOGCS["GCS_WGS_1984",DATUM["WGS_1984",SPHEROID["WGS_84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]),
 'time': ('1987-01-01', '2016-12-31')}

Environment information

  • Which datacube --version are you using?
    1.3.2
  • What datacube deployment/enviornment are you running against?
    behavior same on raijin and on VDI
@Kirill888

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2017

After some analysis error was identified in Datacube.load when using dask_chunks parameter, basically custom fuser function and skip_broken_datasets do not flow through to .

Missing fuse_func leads to using default fuser, which in the case of PQ masks is completely broken. This in turn leads to no data in the south part of the tile after masking is applied.

broken_fuser_sample

Bottom image should look like top-right, instead it's an exact copy of top-left.

@Kirill888

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2017

Error was in datacube.api.core.py::fuse_lazy and was partially fixed by this commit

639774b#diff-73367154868e05ad209c8e4c81a320b8

By explicitly naming fuse_func parameter.

However call chain load->make_dask_array->fuse_lazy->_fuse_measurement still drops skip_broken_datasets=<False|True> parameter.

@Kirill888

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2017

Kirill888 added a commit that referenced this issue Aug 28, 2017

Fix for issue #276
`fuse_func` parameter was not passed correctly, adding explicit name.

@Kirill888 Kirill888 closed this Dec 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.