Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract bounding box from NetCDF/HDF5 files and insert into the geospatial metadata block #9541

Merged
merged 12 commits into from
May 9, 2023

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Apr 19, 2023

insert the bounding box into the geospatial metadata block

What this PR does / why we need it:

Prevent users from having to manually enter bounding box values. As with FITS, we get the values from the files themselves.

Which issue(s) this PR closes:

Special notes for your reviewer:

I tried pretty hard to get the compound field code working but ended up switching to something that works, jsonParser.parseField which is also used by the native API. I switched it back in 189c774.

If the longitude "domain" is 0 to 360 instead of -180 to 180, we convert it. See also this issue:

Suggestions on how to test this:

Upload various NetCDF files, especially ones that have a bounding box. Here's a nice small one with a bounding box that the code tests: https://www.ncei.noaa.gov/data/international-comprehensive-ocean-atmosphere/v3/archive/nrt/ICOADS_R3.0.0_1662-10.nc

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Not really. The bounding box is populated. Here's what it looks like when you are just looking at (not editing) a dataset:

It seems a little weird that we don't show any labels (north, south, east, west). There's an open issue for this at #6589 but it's out of scope.

Screen Shot 2023-04-19 at 4 55 39 PM

The values look more reasonable when you are editing:

Screen Shot 2023-04-19 at 4 55 56 PM

Update: this seems like an easy fix so I made a PR:

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Included. I updated the User Guide.

insert the bounding box into the geospatial metadata block
@mreekie mreekie added the Size: 3 A percentage of a sprint. 2.1 hours. label Apr 26, 2023
@mreekie mreekie moved this from Ready for Review ⏩ to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 26, 2023
@pdurbin
Copy link
Member Author

pdurbin commented May 1, 2023

I've been saying at standup that I need to tweak something for S3 but I was wrong. S3 works fine so in f5be4a8 I removed the FIXME and replaced it with some details.

I'll note that @JR-1991 @atrisovic and I are still discussing a bit the getStandardLongitude method in this PR. I'm happy to discus this more with a reviewer. Here's an awesome diagram Jan made and posted:

diagram

I'm putting this in "ready for review".

@pdurbin pdurbin moved this from IQSS Team - In Progress 💻 to Ready for Review ⏩ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 1, 2023
@pdurbin pdurbin removed their assignment May 1, 2023
@sekmiller sekmiller self-assigned this May 2, 2023
@sekmiller sekmiller moved this from Ready for Review ⏩ to In Review 🔎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 2, 2023
return null;
}
if (westAsFloat > 180 || eastAsFloat > 180) {
Float westStandard = westAsFloat - 360;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we're converting 0-360 to -180-180 then we need to subtract just 180 from the given value, not 360.

Copy link
Contributor

@JR-1991 JR-1991 May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller I was also puzzled with this at the beginning, but subtracting 360° is the correct procedure. The math only tells half the truth, because the prime meridian (0° longitude) needs to stay at Greenwich and thus the interval's "cycle point" where both ends meet changes by 180°. Thus, this has to be attributed for, which leads to the 360°.

That's how I understood it so far, but this illustration may explain it better:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JR-1991 thanks! Even with that explanation, it's hard to keep straight in my head.

I was relying on a tool to check that maps are the same. That is these two URLs (minus 360) show the same map:

Here's a screenshot of the maps (and a bit more commentary) I originally posted at https://dataverse.zulipchat.com/#narrow/stream/376593-geospatial/topic/extract.20lat.2Flong.20.239331/near/348542658

Screen Shot 2023-05-03 at 9 20 22 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JR-1991 OK. Looking at the globe and the example location makes sense when the value is between 180 and 360, but what the code does here is if the value for either is over 180 it subtracts 360 from both so there is a potential for a situation where one val is > 180 and the other is less and in that case, say 170, and subtracting 360 results in -190 which is off the scale below -180.

Copy link
Contributor

@JR-1991 JR-1991 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller thanks for the review! Indeed subtracting from both would lead to a failure. I tested it visually and in the case of $\theta_1 = [280°, 170°]$ the result should translate to $\theta_2 = [-80°, 170°]$ - Hence if a degree is under 180 it should not be subtracted. I have updated the illustration:

image

I understand it in this way, that within this domain transformation only one half of the circle is changing. Note, in 0° - 360° domain the first half is the same as in the domain of (-180°) - 180° and thus there is a 1:1 relation if degrees are within 0-180. Yet, the other half does change to another range in a more drastic way. Tipping over 180° by a degree suddenly leads you to -179° which in the other domain corresponds to 181° - Hence the 360° provides a valid transformation for this case.

It's quite complicated though 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should add to the figure, that $f(\theta)$ is defined for $\theta \in [0°, 360°]$ and if $\theta < 0°$ its already in the desired domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address all this in 690b8a4 just now and left this comment about it: #9541 (comment)

Maybe I should have resolve other conversation and gotten it down to just this one. Oh well. Please see the other comment.

Copy link
Member

@atrisovic atrisovic May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the thread and agree with the comments! Instead of the lines 154-182, we can check each coordinate by itself and it would work well, ie:

Float westStandard = westAsFloat;
Float eastStandard = eastAsFloat;

 if (westAsFloat > 180) {
             westStandard = westAsFloat - 360;
}

if (eastAsFloat > 180){
             eastStandard = eastAsFloat - 360;
}

WestAndEastLongitude updatedWeLong = new WestAndEastLongitude(westStandard.toString(), eastStandard.toString());
return updatedWeLong;

assertEquals(new WestAndEastLongitude("-16.320007", "-6.220001"), extractor.getStandardLongitude(new WestAndEastLongitude("343.68", "353.78")));
// One is over 180. Change
assertEquals(new WestAndEastLongitude("-260.0", "-179.0"), extractor.getStandardLongitude(new WestAndEastLongitude("100", "181")));
// Both are under 180. No change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JR-1991 it seems to me that the test above should fail as the result of -260 is outside the range of -180 - 180.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I'll ping Jan for some input on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin just for clarification, the right hand-side extractor transforms by subtracting 360° from both?

If so, then I have to agree, this test should fail and truly evaluate to $[100°, -179°]$ due to the above comment and that there can't be -260° since the domain has it's lower bound at -180°.

Given the proposed change in the calculation, the test should look like this:

assertEquals(
    new WestAndEastLongitude("100.0", "-179.0"),
    extractor.getStandardLongitude(new WestAndEastLongitude("100", "181"))
);

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's late on a Friday but I powered through and pushed some fixes around longitude at 2edb351.

Almost as importantly, I added docs! Rules in the User Guide about how users (and QA) should expect the code to behave. I'd love to hear what you (@JR-1991) and @atrisovic and @sekmiller (whom I've been bugging all day) think! Here they are:

  • West Longitude and East Longitude are expected to be in the range of -180 and 180. (When using :ref:geospatial-search, you should use this range for longitude.)
  • If West Longitude and East Longitude are both over 180 (outside the expected -180:180 range), 360 will be subtracted to shift the values from the 0:360 range to the expected -180:180 range.
  • If either West Longitude or East Longitude are less than zero but the other latitude is greater than 180 (which would imply an indeterminate domain, a lack of clarity of if the domain is -180:180 or 0:360), metadata will be not be extracted.

I also added more tests. I'm happy to add more if you all think of more edge cases. I don't have real NetCDF files to test with that have wacky values. It might be a challenge for QA to find NetCDF files to exercise all the rules.

@pdurbin pdurbin self-assigned this May 3, 2023
@github-actions

This comment has been minimized.

@@ -1531,72 +1654,71 @@ private void processDatasetMetadata(FileMetadataIngest fileMetadataIngest, Datas
// create a new compound field value and its child
//
DatasetFieldCompoundValue compoundDsfv = new DatasetFieldCompoundValue();
int nonEmptyFields = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whitespace change and below is from me removing this code and then (just) re-adding it because it seems to be working afterall! My test still passes at least. 😅

@coveralls
Copy link

Coverage Status

Coverage: 20.221% (+0.08%) from 20.146% when pulling 690b8a4 on 9331-extract-bounding-box2 into c07f3a8 on develop.

@pdurbin pdurbin removed their assignment May 5, 2023
@pdurbin
Copy link
Member Author

pdurbin commented May 5, 2023

I'm ready for more review so I'm removing myself from this PR. Here are the changes since the last round:

1fb2798 add API test for FITS file metadata extraction #9331
189c774 restore original "extract from compound" code #9331
2edb351 document and fix up the rules for longitude, add tests #9331

Yes, I was convince the code to extract from compound fields wasn't working. I put it back and now it works! Well, it was always working, I suppose. Now I'm feeding it the data it expects!

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. thanks for the changes

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from In Review 🔎 to Ready for QA ⏩ May 8, 2023
@sekmiller sekmiller removed their assignment May 8, 2023
@github-actions

This comment has been minimized.

return null;
}
if (westAsFloat > 180 || eastAsFloat > 180) {
Float westStandard = westAsFloat - 360;
Copy link
Member

@atrisovic atrisovic May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the thread and agree with the comments! Instead of the lines 154-182, we can check each coordinate by itself and it would work well, ie:

Float westStandard = westAsFloat;
Float eastStandard = eastAsFloat;

 if (westAsFloat > 180) {
             westStandard = westAsFloat - 360;
}

if (eastAsFloat > 180){
             eastStandard = eastAsFloat - 360;
}

WestAndEastLongitude updatedWeLong = new WestAndEastLongitude(westStandard.toString(), eastStandard.toString());
return updatedWeLong;

@kcondon kcondon self-assigned this May 8, 2023
@kcondon
Copy link
Contributor

kcondon commented May 9, 2023

Issues found:

  1. Preview for netcdf file shows permissions issue:
Screen Shot 2023-05-09 at 11 39 47 AM Screen Shot 2023-05-09 at 11 40 02 AM
  1. Removing a netcdf file with bounding box does not remove the bounding box info from dataset metadata. [Kevin] I'm told this is unsupported so maybe a note as discussed that extraction is "add only".

  2. Not sure whether all files with spatial data can be extracted https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/2129/tiles/1950/9597_1950/catalog.html?dataset=2129t/1950/9597_1950/dayl.nc , https://git.earthdata.nasa.gov/projects/LPDUR/repos/appeears-tutorials/raw/MCD12Q1.006_500m_aid0001.nc?at=refs%2Fheads%2Fmain&_ga=2.149397558.1682554425.1683648699-1018054437.1683648699 , https://data.isimip.org/files/cbe2b84e-4984-4f6b-ab68-263e7e862181/ so as discussed, maybe add a note about which fields we look for: geospatial_lon_min
    geospatial_lon_max
    geospatial_lat_max
    geospatial_lat_min

@pdurbin
Copy link
Member Author

pdurbin commented May 9, 2023

Preview for netcdf file shows permissions issue:

Sadly, this is a known issue. It's in "sprint ready", at least:

The fix will need to happen over in the previewers repo so there's an issue to track it there:

@github-actions
Copy link

github-actions bot commented May 9, 2023

📦 Pushed preview application image as

ghcr.io/gdcc/dataverse:9331-extract-bounding-box2

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@kcondon kcondon merged commit 85bde07 into develop May 9, 2023
9 of 10 checks passed
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA ✅ to Done 🚀 May 9, 2023
@kcondon kcondon deleted the 9331-extract-bounding-box2 branch May 9, 2023 18:58
@pdurbin pdurbin added this to the 5.14 milestone May 10, 2023
@@ -367,7 +369,20 @@ public List<DataFile> saveAndAddFilesToDataset(DatasetVersion version,
} else {
logger.fine("Failed to extract indexable metadata from file " + fileName);
}
} else if (FileUtil.MIME_TYPE_INGESTED_FILE.equals(dataFile.getContentType())) {
} else if (fileMetadataExtractableFromNetcdf(dataFile, tempLocationPath)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks direct uploads for me. When using a direct upload, the file is unattached and the owner is set to 'null' on line 336. However, 'fileMetadataExtractableFromNetcdf' needs the owner dataset to determine the folder inside storageIO.open();. This results in a NullPointerException, preventing adding the files to the dataset. I have only tested with the local filesystem, but from looking at the code, this probably also breaks uploading files directly to S3. Can you investigate and fix it? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErykKul yikes! This has already been merged. 😬 Could you please create an issue about this?

Copy link
Contributor

@kcondon kcondon May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reproduced this in develop: direct file upload via UI appears to upload correctly, ie the progress bar completes and the file is present in the uploaded file list but fails on save with a grayed out screen as if waiting for a completion signal and a server log error:
diredct_upload_ui_err.txt
upload_fail

Note that direct file upload continues to work via API. Also, deleting the file continues to work via ui.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce it too. I created an issue:

Thanks, both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 3 A percentage of a sprint. 2.1 hours.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Extracting lat/long and insert into geospatial bounding box fields
8 participants