-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize angle-based modis interpolation for dask #35
Conversation
Codecov Report
@@ Coverage Diff @@
## main #35 +/- ##
==========================================
- Coverage 79.81% 79.77% -0.05%
==========================================
Files 18 18
Lines 1214 1221 +7
==========================================
+ Hits 969 974 +5
- Misses 245 247 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Also run "black" on file to deal with long variable names
geotiepoints/modisinterpolator.py
Outdated
y = ( | ||
np.arange((coarse_scan_length + 1) * fine_scan_length) % fine_scan_length | ||
) + 0.5 | ||
y = y[fine_scan_length // 2 : -(fine_scan_length // 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran black on this file so stickler is going to be mad about this unless I update flake8 config to ignore it.
geotiepoints/modisinterpolator.py
Outdated
def _expand_tiepoint_array_1km( | ||
coarse_pixels_per_1km, | ||
coarse_scan_width, | ||
fine_pixels_per_coarse_pixel, | ||
arr, | ||
fine_scan_length, | ||
): | ||
arr = np.repeat(arr, fine_scan_length, axis=1) | ||
arr = np.concatenate( | ||
(arr[:, : fine_scan_length // 2, :], arr, arr[:, -(fine_scan_length // 2):, :]), axis=1 | ||
) | ||
arr = np.repeat(arr.reshape((-1, coarse_scan_width - 1)), fine_pixels_per_coarse_pixel, axis=1) | ||
return np.hstack((arr, arr[:, -fine_pixels_per_coarse_pixel:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mraspaud I'm concerned about this function. These operations involve a lot of temporary arrays. I would normally prefer to create one single result array and then fill it in. I was about to change it until I realized that the "corner" arrays (arr
in this function) are one row and one column short by design. Is this required? I'm wondering if the logic here would be simpler if those were able to stay the original size, especially if it is only cut short to avoid duplicate computation. I'm sure it isn't, but thought I'd mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh I almost got it with a single array (locally) but the repeats and the scan dimension hurt my head. I'll play with it more tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ugly, but this works as a replacement for this method but dask profiling shows it uses the tiniest bit more memory. I was hoping to not need the repeat
calls, but I don't think broadcasting can be used to make that work.
repeated_arr = np.repeat(np.repeat(arr, fine_scan_length, axis=1), fine_pixels_per_coarse_pixel, axis=2)
new_arr = np.empty((arr.shape[0], fine_scan_length * (arr.shape[1] + 1), fine_pixels_per_coarse_pixel * coarse_scan_width), dtype=arr.dtype)
new_arr[:, fine_pixels_per_coarse_pixel // 2 : -(fine_pixels_per_coarse_pixel // 2), :-fine_pixels_per_coarse_pixel] = repeated_arr
half_scan_size = fine_scan_length // 2
half_coarse_pixel_as_fine = fine_pixels_per_coarse_pixel // 2
new_arr[:, :half_scan_size, :-fine_pixels_per_coarse_pixel] = new_arr[:, half_coarse_pixel_as_fine:half_coarse_pixel_as_fine + half_scan_size, :-fine_pixels_per_coarse_pixel]
new_arr[:, -half_scan_size:, :-fine_pixels_per_coarse_pixel] = new_arr[:, -(half_coarse_pixel_as_fine + half_scan_size):-half_coarse_pixel_as_fine, :-fine_pixels_per_coarse_pixel]
new_arr[:, :, -fine_pixels_per_coarse_pixel:] = new_arr[:, :, -(2 * fine_pixels_per_coarse_pixel):-fine_pixels_per_coarse_pixel]
new_arr = new_arr.reshape((-1, coarse_scan_width * fine_pixels_per_coarse_pixel))
return new_arr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is definitely unreadable :)
Ok so this overall code performs better (twice as fast-ish), but the code is pretty gross. I couldn't use the class methods because everything was being passed to map_blocks so at least the way it was it wouldn't have worked. But now all the values that were class properties are basically passed to every method. I could create a class inside the map_blocks function 🤔 Thoughts @mraspaud? |
geotiepoints/modisinterpolator.py
Outdated
coarse_resolution=None, | ||
fine_resolution=None, | ||
coarse_scan_width=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do these need to be passed vs retrieve from self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read the note in the docstring. The decorator uses them to determine what the output shape and chunk size is going to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit ugly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tell me another option that allows me to reuse the map_blocks helper decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could move the __init__
logic into this method but then the __init__
has no purpose. I could change this back into a function (again) and create the class inside the map_blocks'd function. Then this all makes sense.
# Conflicts: # geotiepoints/modisinterpolator.py
Regarding the small coverage loss, this seems to be mostly for error cases not being checked in the scan-aligned block decorator. I'd like to wait to add tests for these as I do more restructuring of these modules in #38. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one comment.
This was mostly discussed on slack, but the angle-based interpolation solution for MODIS interpolation is pretty slow and produces a lot of dask tasks. Additionally, it produces very odd chunking of the results for some reason (perhaps due to slicing and appending). This PR is an attempt at cleaning some of that up and trying to get some consistency between the angled-based and the "simple" interpolation.
git diff origin/main **/*py | flake8 --diff