Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More geometry tools (will be used for reproject eventually) #600

Merged
merged 23 commits into from
Nov 23, 2018

Conversation

Kirill888
Copy link
Member

@Kirill888 Kirill888 commented Nov 20, 2018

  • Clean up geometry tests (doc -> pytest)
  • Move geometry module around
    • utils.geometry.py -> utils.geometry._base.py
    • re-expose public methods from _base
  • Copied most methods from dea_proto: geom.py to utils.geometry.tools.py
  • Removed use_threads option from core

Top level method of interest is:

  • compute_reproject_roi(src:GeoBox, dst:GeoBox) -> (roi: slice, scale: float)

Given two geoboxes find region within source GeoBox that overlaps with destination GeoBox, also compute scale factor. The idea is that:

  • src[roi] -> scale -> reproject -> dst
  • OR
  • src(scale)[roi(scale)] -> reproject -> dst

Here roi is "minimal", padding is configureable though, so you only read what you need. Also scale can be used to pick the right kind of overview level to read.

@codecov
Copy link

codecov bot commented Nov 20, 2018

Codecov Report

Merging #600 into develop will increase coverage by 0.72%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #600      +/-   ##
===========================================
+ Coverage    85.33%   86.05%   +0.72%     
===========================================
  Files           82       84       +2     
  Lines         7846     7975     +129     
===========================================
+ Hits          6695     6863     +168     
+ Misses        1151     1112      -39
Impacted Files Coverage Δ
datacube/virtual/impl.py 89.62% <100%> (-0.05%) ⬇️
datacube/utils/geometry/tools.py 100% <100%> (ø)
datacube/utils/geometry/_base.py 98.69% <100%> (ø)
datacube/utils/geometry/__init__.py 100% <100%> (ø)
datacube/api/core.py 88.57% <100%> (+5.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6b0a05...93e85e2. Read the comment docs.

@coveralls
Copy link

coveralls commented Nov 20, 2018

Coverage Status

Coverage increased (+0.7%) to 86.031% when pulling 93e85e2 on kk-geometry into e6b0a05 on develop.

- move utils/geometry.py to utils/geometry/_base.py
- re-export public classes and methods
These will be used for reproject roi computation
- estimate affine transform from points
- decompose affine transform into parts
- figure out scale change for a locally linear transform
with some changes to use datacube structure, also adding tests, also with
special case for "same crs" case.
If two geoboxes share same CRS their pixel<>pixel transform is a linear mapping,
expose that fact and the actual transform via `.linear` attribute on the
returned function.
Copied from dea-proto + adding special case for same crs transforms
@Kirill888 Kirill888 changed the title WIP: todo More geometry tools (will be used for reproject eventually) Nov 21, 2018
@Kirill888 Kirill888 requested a review from omad November 21, 2018 00:15
Kirill888 and others added 10 commits November 21, 2018 16:39
Not all CRSs have epsg code, deal gracefully with those that don't
Reasons for removal

- creates new threadpool on every invocation (high overhead)
- doesn't expose important parameters like level of concurrency
- netcdf reading/writing is not thread-safe, so this won't help in the most
  common case
- needs really large reads to actually have a benefit (see high overhead)
- introduces dependency on not-portable SharedArray library
- was added for s3aio driver and was tuned for that use-case, should go into
  s3aio driver
- no tests
avoid circular dependencies
This address issue #602. Also `.buffer` was swapping order of y/x padding,
fixing that too.
Copy link
Member

@omad omad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't pretend to have reviewed the details of every new function, but

  • I like the refactoring and cleaning
  • The code looks readable
  • Tests look reasonable
  • Includes some documentation

I've added some of your PR comments about compute_reproject_roi() into the code since it seems to fit, and touched up a couple of the other docstrings.

Thanks Kirill

@mergify mergify bot merged commit a3c7506 into develop Nov 23, 2018
@delete-merged-branch delete-merged-branch bot deleted the kk-geometry branch November 23, 2018 04:54
@woodcockr
Copy link
Member

The removal of the use_threads option has an immediate and serious detrimental impact on the S3AIO library performance and the CSIRO DC deployment on AWS and all associated client applications. Any CSIRO DC or ODC users working in the AWS environment and using the S3AIO driver are advised to move to CSIRO EASI github fork of the odc repository until this change is discussed and resolved.
The repository fork can be found at https://github.com/csiro-easi/datacube-core and will not be tracking changes from the ODC in order to remain isolated from this change.

@Kirill888
Copy link
Member Author

@woodcockr we are working on making concurrent open/read possible with the driver infrastructure, this is a significant change, but it would allow proper implementation of concurrency within the driver itself. Removing use_threads at this time significantly speeds up this development.

I have document problems with that approach in a commit message see here df544c8

Replicated below:

  • creates new threadpool on every invocation (high overhead)
  • doesn't expose important parameters like level of concurrency
  • netcdf reading/writing is not thread-safe, so this won't help in the most common case
  • needs really large reads to actually have a benefit (see high overhead)
  • introduces dependency on not-portable SharedArray library
  • was added for s3aio driver and was tuned for that use-case, should go into s3aio driver
  • no tests

So it's also an issues of correctness for those users of datacube that rely on ingested netcdfs as their primary storage format.

@woodcockr
Copy link
Member

woodcockr commented Nov 28, 2018

@Kirill888 I read and understood the "problems" you indicated, but it is a breaking change for s3aio applications and performance of the driver. Thankfully I picked it up before we updated to the latest release. I didn't expect a pull request labelled "More geometry tools" to contain a unrelated to geometry and breaking change. I was curious on a few things you were doing and just happened to spot this in time.
We've moved all our upstream development connections over to our fork until we can resolve this. Unfortunately git doesn't support selective changesets, only completed ones so we can't work around this. We have been contacted by other groups previously using s3aio but we've no way of knowing who else is impacted so I thought I would add the note because there will be a sudden drop in performance, not to mention an error in your application if people update.

@Kirill888
Copy link
Member Author

fair point, my bad for not putting this into a separate PR

@woodcockr
Copy link
Member

A separate PR would be helpful, so would discussing it with those impacted before it being accepted.

Now we need to figure out how to fix it...can we roll back the change and split the PR and commit the other changes and then discuss the breaking change?

was pretty sure we'd isolated the impact on anything other than S3AIO when it's pull request was first accepted, including the dependency issue (it's optional).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants