Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODIS Interpolation Comparisons #39

Closed
djhoese opened this issue May 17, 2022 · 11 comments · Fixed by #41
Closed

MODIS Interpolation Comparisons #39

djhoese opened this issue May 17, 2022 · 11 comments · Fixed by #41
Assignees

Comments

@djhoese
Copy link
Member

djhoese commented May 17, 2022

While working on my 3 PRs for optimizing MODIS interpolation (#35, #36, #38), I noticed that in real world cases my cython-based PRs were producing invalid/bad images where the bow-tie pixels of an image were clearly being mapped incorrectly from the default interpolation used by Satpy (the geotiepoints/modisinterpolator.py version that uses the CVIIRS algorithm). This happens even though the unit tests pass! This means that the tests aren't strict/rigid enough in their thresholds to the point that they let invalid/bad results through.

NOTE: All plots/images in this issue were made with the current main branch and don't use my updated code.

I've since been trying to do a deep dive on what is going on and how the "simple" interpolation compares to the CVIIRS interpolation. I've discovered at least one problem with the last column of the data. Here is a zoomed in view of the differences between the two methods' longitude arrays, specifically the upper right corner, when used on real Aqua data:

image

You may be able to see on the right most columns (where I've removed the black border line in the matplotlib plot) that the differences are suddenly much higher for the last column. If I look at the individual plots for these longitudes I can't see anything obvious in either method's results. I'll post the images I've generated so far to start the discussion, but we need to come up with ways to decide which method to use as "truth".

Note I refer to the CVIIRS algorithm as "angle-based" because it requires solar zenith angles. I refer to the other method as "simple" because it pretty close to basic bilinear interpolation from the basic understanding I have.

geotiepoints_interp_anglebased
geotiepoints_interp_differences
geotiepoints_interp_simple

CC @simonrp84

@djhoese
Copy link
Member Author

djhoese commented May 17, 2022

Ah, the last "column" is actually 3 columns of data (here's a zoomed in view of the last 200 columns of the image, only showing the last 10 columns in this view):

image

Edit: I think given the angle-based/CVIIRS algorithm has this last line of this method I blame CVIIRS (where cols == 4):

    def _expand_tiepoint_array_1km(self, arr, lines, cols):
        arr = da.repeat(arr, lines, axis=1)
        arr = da.concatenate((arr[:, :lines//2, :], arr, arr[:, -(lines//2):, :]), axis=1)
        arr = da.repeat(arr.reshape((-1, self.cscan_full_width - 1)), cols, axis=1)
        return da.hstack((arr, arr[:, -cols:]))

@mraspaud
Copy link
Member

Thanks a lot for looking into this! The last columns are extrapolated, right?

@djhoese
Copy link
Member Author

djhoese commented May 17, 2022

So if I exclude the last 3 columns from the lon/lat difference limit we get much better differences. Considering all pixels we get (lon diff min, lon diff max, lat diff min, lat diff max):

-0.0030819955405547717 0.06203039353681561 -0.031740776286142136 0.00018770053469552295

Excluding 3 right-most columns:

-0.0030819955405547717 0.0018535406073283411 -0.0003218550081101057 0.00018770053469552295

Plotting the differences with these new limits shows:

geotiepoints_interp_differences

A difference of -.00308199554 (the maximum longitude difference) at the equator is a difference of 343.086 meters in the mercator projection. Considering this geolocation interpolation is 1km to 250m, that's still a pretty significant difference.

@djhoese
Copy link
Member Author

djhoese commented May 17, 2022

@mraspaud as far as extrapolation, I'd have to check that one link that @simonrp84 linked to on slack, but I don't think there is extrapolation in the column/cross-track direction but there is in the row/along-track direction. In the simple interpolation I use scipy's map_coordinates with these indexes:

    x = np.arange(res_factor * num_cols, dtype=np.float32) * (1. / res_factor)
    # 0.375 for 250m, 0.25 for 500m
    y = np.arange(res_factor * ROWS_PER_SCAN, dtype=np.float32) * \
        (1. / res_factor) - (res_factor * (1. / 16) + (1. / 8))

So the x/column indexes are all positive and less than or equal to the original indexes. The y/row indexes have the - (res_factor * (1. / 16) + (1. / 8)) term which should mean that the first couple rows of the output geolocation are extrapolated...I think. But also the "simple" algorithm has special handling for these (the first two rows and the last 2 rows) to perform a special linear extrapolation.

@djhoese
Copy link
Member Author

djhoese commented May 17, 2022

Ok so here is the information that Simon had linked to: https://www.icare.univ-lille.fr/modis-geolocation/

We can see from this image that the last 3 250m columns are indeed "after" the last defined 1km column (in red):

image

There is a similar image for the first columns which show that the first 250m column is inline with the first 1km column's X coordinate. This matches what the simple interpolation is doing from what I can tell (first x index is 0, last index is 0.75 beyond the last column index (or 0.25 pixels beyond the edge of the last 1km pixel).

This does suggest that the code I showed above that is in the CVIIRS/angle-based interpolation where it appends the last 4 columns in the tiepoint array is wrong (again, from what I understand). It should at least be extrapolating those columns' tiepoints.

@mraspaud I propose we replace the "valid"/truth test data with the output of the simple interpolation and compare the CVIIRS/angle-based interpolation based on that. I also propose that for 1km->500m and 1km->250m interpolation, Satpy should use the simple interpolation by default. We can then work toward:

  1. Fixing the interpolation of the angled-based interpolation for at least these last 3 columns.
  2. Possibly add 5km->1km interpolation to the simple interpolation.

@mraspaud
Copy link
Member

I will try to have a look at this tomorrow

@djhoese
Copy link
Member Author

djhoese commented May 24, 2022

After some discussion on slack and in a call, @mraspaud has pointed out some major issues in the simple interpolation. If we plot the the geodetic distance between pixels cross track (this is actually the average of the same row in multiple scans) we see this:

image

The simple linear interpolation has very clear steps between control points. This isn't unexpected but isn't great either if we can do any better especially given the cviirs algorithm is able to produce much cleaner line overall. Initial tests of using order=2 in the map_coordinates call shows much better interpolation in the middle of the scan but the ends (due to the interpolation) go wacky.

Additionally this plot shows the last columns of the simple interpolation are actually just a repeat of the last input pixel. This is due to the "nearest" mode used for pixels outside the input region. There isn't a mode builtin to map_coordinates that will do anything better. This is overall a bug/failing in the simple interpolation that we will need to work on at a future date.

@djhoese
Copy link
Member Author

djhoese commented May 24, 2022

I'll make a PR soon, but I've come up with a "good enough" hack to extrapolate the simple interpolation's right-most columns. Even with this though, I'm a little concerned by the different between simple and cviirs at the tops and bottoms of scans. Here's a zoom in on the word part of the longitude array (the upper-left in this particular data case).

image

@mraspaud
Copy link
Member

mraspaud commented May 25, 2022

I found a bug in the modisintepolator.py that is due to the different nature of the modis tiepoints compared to the cviirs one. The fix is pretty simple, and improve the interpolation a the scan edges a lot.
Here is the patch for main:

diff --git a/geotiepoints/modisinterpolator.py b/geotiepoints/modisinterpolator.py
index 6d70db2..5fb83a3 100644
--- a/geotiepoints/modisinterpolator.py
+++ b/geotiepoints/modisinterpolator.py
@@ -23,7 +23,14 @@
 """Interpolation of geographical tiepoints using the second order interpolation
 scheme implemented in the CVIIRS software, as described here:
 Compact VIIRS SDR Product Format User Guide (V1J)
-http://www.eumetsat.int/website/wcm/idc/idcplg?IdcService=GET_FILE&dDocName=PDF_DMT_708025&RevisionSelectionMethod=LatestReleased&Rendition=Web
+https://www.eumetsat.int/media/45988
+and
+Anders Meier Soerensen, Stephan Zinke,
+A tie-point zone group compaction schema for the geolocation data of S-NPP and NOAA-20 VIIRS SDRs to reduce file sizes
+in memory-sensitive environments,
+Applied Computing and Geosciences, Volume 6, 2020, 100025, ISSN 2590-1974,
+https://doi.org/10.1016/j.acags.2020.100025.
+(https://www.sciencedirect.com/science/article/pii/S2590197420300070)
 """
 
 import xarray as xr
@@ -34,9 +41,8 @@
 from .geointerpolator import lonlat2xyz, xyz2lonlat
 
 R = 6371.
-# Aqua scan width and altitude in km
-scan_width = 10.00017
-H = 705.
+# Aqua altitude in km
+H = 709.
 
 
 def compute_phi(zeta):
@@ -51,7 +57,7 @@ def compute_zeta(phi):
     return np.arcsin((R + H) * np.sin(phi) / R)
 
 
-def compute_expansion_alignment(satz_a, satz_b, satz_c, satz_d):
+def compute_expansion_alignment(satz_a, satz_b, satz_c, satz_d, scan_width):
     """All angles in radians."""
     zeta_a = satz_a
     zeta_b = satz_b
@@ -180,7 +186,7 @@ def interpolate(self, lon1, lat1, satz1):
 
         satz_a, satz_b, satz_c, satz_d = get_corners(da.deg2rad(satz1))
 
-        c_exp, c_ali = compute_expansion_alignment(satz_a, satz_b, satz_c, satz_d)
+        c_exp, c_ali = compute_expansion_alignment(satz_a, satz_b, satz_c, satz_d, self.cscan_width)
 
         x, y = self.get_coords(scans)
         i_rs, i_rt = da.meshgrid(x, y)

@djhoese
Copy link
Member Author

djhoese commented May 25, 2022

@mraspaud with that change things are much better:

image

Note the 1e-5 on the latitude difference and the extra 0 in the longitude difference decimal.

@mraspaud
Copy link
Member

Yeah, I think we're approaching numerical precision as the lons and lats are in float32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants