Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize angle-based modis interpolation for dask #35

Merged
merged 21 commits into from
Jun 2, 2022

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Mar 8, 2022

This was mostly discussed on slack, but the angle-based interpolation solution for MODIS interpolation is pretty slow and produces a lot of dask tasks. Additionally, it produces very odd chunking of the results for some reason (perhaps due to slicing and appending). This PR is an attempt at cleaning some of that up and trying to get some consistency between the angled-based and the "simple" interpolation.

  • Closes #xxxx
  • Tests added
  • Tests passed
  • Passes git diff origin/main **/*py | flake8 --diff
  • Fully documented

@djhoese djhoese self-assigned this Mar 8, 2022
@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #35 (4002297) into main (3632ec7) will decrease coverage by 0.04%.
The diff coverage is 96.53%.

@@            Coverage Diff             @@
##             main      #35      +/-   ##
==========================================
- Coverage   79.81%   79.77%   -0.05%     
==========================================
  Files          18       18              
  Lines        1214     1221       +7     
==========================================
+ Hits          969      974       +5     
- Misses        245      247       +2     
Flag Coverage Δ
unittests 79.77% <96.53%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
geotiepoints/simple_modis_interpolator.py 93.79% <93.93%> (-0.65%) ⬇️
geotiepoints/modisinterpolator.py 98.63% <98.13%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3632ec7...4002297. Read the comment docs.

@coveralls
Copy link

coveralls commented Mar 8, 2022

Coverage Status

Coverage decreased (-0.04%) to 78.891% when pulling 4002297 on djhoese:optimize-modis-interp into 3632ec7 on pytroll:main.

@djhoese djhoese marked this pull request as ready for review March 9, 2022 02:36
y = (
np.arange((coarse_scan_length + 1) * fine_scan_length) % fine_scan_length
) + 0.5
y = y[fine_scan_length // 2 : -(fine_scan_length // 2)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran black on this file so stickler is going to be mad about this unless I update flake8 config to ignore it.

Comment on lines 279 to 291
def _expand_tiepoint_array_1km(
coarse_pixels_per_1km,
coarse_scan_width,
fine_pixels_per_coarse_pixel,
arr,
fine_scan_length,
):
arr = np.repeat(arr, fine_scan_length, axis=1)
arr = np.concatenate(
(arr[:, : fine_scan_length // 2, :], arr, arr[:, -(fine_scan_length // 2):, :]), axis=1
)
arr = np.repeat(arr.reshape((-1, coarse_scan_width - 1)), fine_pixels_per_coarse_pixel, axis=1)
return np.hstack((arr, arr[:, -fine_pixels_per_coarse_pixel:]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mraspaud I'm concerned about this function. These operations involve a lot of temporary arrays. I would normally prefer to create one single result array and then fill it in. I was about to change it until I realized that the "corner" arrays (arr in this function) are one row and one column short by design. Is this required? I'm wondering if the logic here would be simpler if those were able to stay the original size, especially if it is only cut short to avoid duplicate computation. I'm sure it isn't, but thought I'd mention it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh I almost got it with a single array (locally) but the repeats and the scan dimension hurt my head. I'll play with it more tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ugly, but this works as a replacement for this method but dask profiling shows it uses the tiniest bit more memory. I was hoping to not need the repeat calls, but I don't think broadcasting can be used to make that work.

    repeated_arr = np.repeat(np.repeat(arr, fine_scan_length, axis=1), fine_pixels_per_coarse_pixel, axis=2)
    new_arr = np.empty((arr.shape[0], fine_scan_length * (arr.shape[1] + 1), fine_pixels_per_coarse_pixel * coarse_scan_width), dtype=arr.dtype)
    new_arr[:, fine_pixels_per_coarse_pixel // 2 : -(fine_pixels_per_coarse_pixel // 2), :-fine_pixels_per_coarse_pixel] = repeated_arr
    half_scan_size = fine_scan_length // 2
    half_coarse_pixel_as_fine = fine_pixels_per_coarse_pixel // 2
    new_arr[:, :half_scan_size, :-fine_pixels_per_coarse_pixel] = new_arr[:, half_coarse_pixel_as_fine:half_coarse_pixel_as_fine + half_scan_size, :-fine_pixels_per_coarse_pixel]
    new_arr[:, -half_scan_size:, :-fine_pixels_per_coarse_pixel] = new_arr[:, -(half_coarse_pixel_as_fine + half_scan_size):-half_coarse_pixel_as_fine, :-fine_pixels_per_coarse_pixel]
    new_arr[:, :, -fine_pixels_per_coarse_pixel:] = new_arr[:, :, -(2 * fine_pixels_per_coarse_pixel):-fine_pixels_per_coarse_pixel]
    new_arr = new_arr.reshape((-1, coarse_scan_width * fine_pixels_per_coarse_pixel))
    return new_arr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is definitely unreadable :)

@djhoese
Copy link
Member Author

djhoese commented Mar 9, 2022

Ok so this overall code performs better (twice as fast-ish), but the code is pretty gross. I couldn't use the class methods because everything was being passed to map_blocks so at least the way it was it wouldn't have worked. But now all the values that were class properties are basically passed to every method. I could create a class inside the map_blocks function 🤔

Thoughts @mraspaud?

@djhoese
Copy link
Member Author

djhoese commented Mar 9, 2022

I also don't like the CPU usage:

image

This says to me that the processing is wasting time doing something else (memory allocation maybe?).

Comment on lines 135 to 137
coarse_resolution=None,
fine_resolution=None,
coarse_scan_width=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do these need to be passed vs retrieve from self?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the note in the docstring. The decorator uses them to determine what the output shape and chunk size is going to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit ugly :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me another option that allows me to reuse the map_blocks helper decorator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move the __init__ logic into this method but then the __init__ has no purpose. I could change this back into a function (again) and create the class inside the map_blocks'd function. Then this all makes sense.

@djhoese
Copy link
Member Author

djhoese commented May 31, 2022

Regarding the small coverage loss, this seems to be mostly for error cases not being checked in the scan-aligned block decorator. I'd like to wait to add tests for these as I do more restructuring of these modules in #38.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one comment.

geotiepoints/modisinterpolator.py Outdated Show resolved Hide resolved
@djhoese djhoese merged commit 5e14e42 into pytroll:main Jun 2, 2022
@djhoese djhoese deleted the optimize-modis-interp branch June 2, 2022 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants