Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SEVIRI native reader with 'time_parameters' metadata #1877

Merged
merged 5 commits into from Aug 5, 2022

Conversation

simonrp84
Copy link
Member

@simonrp84 simonrp84 commented Nov 9, 2021

Edit: The original purpose of this PR has changed. Originally it switched the native reader to use the observation time for the generic start and end time, but since its original creation a new "standard" has been described to use nominal time for the start/end time and put all time metadata in a special time_parameters subdictionary of .attrs.

Original Description:

The SEVIRI HRIT and NAT readers are currently inconsistent with their start/end time values:

from satpy import Scene
scn = Scene([my_nat_file], reader='seviri_l1b_native')
scn.load(['IR_108'])
print("NATIVE")
print(scn['IR_108'].attrs['start_time'])
print(scn['IR_108'].attrs['end_time'])

scn = Scene(my_hrit_files, reader='seviri_l1b_hrit')
scn.load(['IR_108'])
print("HRIT")
print(scn['IR_108'].attrs['start_time'])
print(scn['IR_108'].attrs['end_time'])

Gives:

NATIVE
2021-11-09 12:00:10.388905
2021-11-09 12:15:10.076995

HRIT 
2021-11-09 12:00:10.388000
2021-11-09 12:12:43.089000

This is because the HRIT reader selects:

  • ['ImageProductionStats']['ActualScanningSummary']['ForwardScanStart'] and
  • ['ImageProductionStats']['ActualScanningSummary']['ForwardScanEnd'] from the trailer / epilogue.

The NAT reader, though, selects:

  • ['ImageAcquisition']['PlannedAcquisitionTime']['TrueRepeatCycleStart'] and
  • ['ImageAcquisition']['PlannedAcquisitionTime']['PlannedRepeatCycleEnd'] from the header.

In this PR I update the NAT reader to select the same start/end time values as the HRIT reader uses. With the PR, the output of the above code snippet is:

NATIVE
2021-11-09 12:00:10.388000
2021-11-09 12:12:43.089000

HRIT 
2021-11-09 12:00:10.388000
2021-11-09 12:12:43.089000

@mraspaud
Copy link
Member

mraspaud commented Nov 9, 2021

Thanks for the quick PR! waiting for the expert opinions here, @sfinkens @ameraner ?

@sjoro
Copy link
Collaborator

sjoro commented Nov 9, 2021

a philosophical question: are the Satpy start_time and end_time meant to represent the repeat cycle times or the actual scanning times? as you can see the end times are then fundamentally different. i might be the one who selected the repeat cycle times for the native-reader back in the days...
in the latter case the trailer has, as indicated, the actual start and end times, though the planned times should be accurate to a fractions of a second (says the documentation) and could be used in case there is any inconvenience of reading these data from the trailer.

@sfinkens
Copy link
Member

sfinkens commented Nov 9, 2021

Looks good to me, thanks for fixing this! They should indeed be consistent.

Regarding philosophy: I'd vote for the actual scanning times as provided by the HRIT reader and also ahi_hsd or abi_l1b for example. That is more consistent with swath data which don't have the concept of repeat cycles. But of course it would be nice to have the nominal timestamps in the attributes as well. In ahi_hsd they're called scheduled time

Also pinging @ninahakansson : I think this doesn't break our workflow, because the end time is computed by level1c4pps here

@mraspaud
Copy link
Member

Not in the scope of this PR, but to react to @sjoro 's question, I would say that different users might have different needs. I suppose a solution would be to have multiple attributes in this case, eg start_scan_time, start_nominal_time, etc
But if we want to be consistent between polar and geo satellites, I would say that the default start_time and end_time should indeed refer to scanning times.

@mraspaud
Copy link
Member

@simonrp84 looks like you broke the seviri_native tests :)

@pnuu
Copy link
Member

pnuu commented Nov 10, 2021

But if we want to be consistent between polar and geo satellites, I would say that the default start_time and end_time should indeed refer to scanning times.

The GEO users (the people looking at the images) expect the timestamps to match the nominal times, which means start_time should stay as it is and the (filename) timestamps should align with the nominal repeat cycle.

@mraspaud
Copy link
Member

The GEO users (the people looking at the images) expect the timestamps to match the nominal times, which means start_time should stay as it is and the (filename) timestamps should align with the nominal repeat cycle.

So we are lucky, because the scanning time in HRIT that we have used all the time as start_time is pretty much in sync with the nominal time :) Maybe a more sustainable solution would be to add new attributes with nominal times and use that in our filenamepatterns (in trollflow2) ?

@sjoro
Copy link
Collaborator

sjoro commented Nov 10, 2021

i'm ok with start_time and end_time being the actual scanning times, feels more consistent with LEO data/instruments. but as @mraspaud suggested, i feel like we'd need to have additional attributes like start_nominal_time, end_nominal_time. would be useful for plotting purposes.

most likely even needed in SIFT... currently we just take the start_time and as you can guess, the dataset marker on the GUI timeline is not placed exactly at the nominal repeat cycle start time, but a bit forward, e.g. 10 seconds. we've had a few users asking about this.

another issue could be (i'm inventing this use case) that a user clicks at 12:14 on the SIFT time line, the end time for SEVIRI data was 12:12.48. at 12:14 there's no valid data to be displayed. here i', imagining that some start and end times could be used from the Satpy readers to give timeframe when that dataset is "valid" and should be displayed. apologies for mixing SIFT in the mixture here, hehe. :)

@simonrp84
Copy link
Member Author

The GEO users (the people looking at the images) expect the timestamps to match the nominal times, which means start_time should stay as it is and the (filename) timestamps should align with the nominal repeat cycle.

Presumably most of the GEO users are using HRIT data, and they seem OK with the actual rather than nominal times in that reader.

I do agree that it'd be useful to provide also the nominal times, so like @sjoro suggested I will add start_nominal_time and end_nominal_time to this PR.

@ameraner
Copy link
Member

I also think that having the scan end time as end_time is the most technically correct and consistent option (it would also match the end time contained in the native filenames), but I share the concerns of having time gaps between the validity periods of repeat cycles... could be problematic for monitoring/LEO-GEO matching/plotting tools etc.

So I agree that adding extra attrs with the nominal repeat cycle time is useful, and would suggest to make this difference clear in the reader documentation.

PS: we would need to do the same on the NetCDF reader (these lines compute the time) and use the forward_scan_start_day, forward_scan_start_msec, forward_scan_end_day, forward_scan_end_msec global attributes values.

@codecov
Copy link

codecov bot commented Nov 10, 2021

Codecov Report

Merging #1877 (e5daa33) into main (92130df) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1877   +/-   ##
=======================================
  Coverage   93.95%   93.96%           
=======================================
  Files         285      285           
  Lines       43754    43834   +80     
=======================================
+ Hits        41108    41187   +79     
- Misses       2646     2647    +1     
Flag Coverage Δ
behaviourtests 4.73% <0.00%> (+<0.01%) ⬆️
unittests 94.63% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...y/tests/reader_tests/test_seviri_l1b_hrit_setup.py 100.00% <ø> (ø)
satpy/readers/seviri_l1b_hrit.py 90.42% <100.00%> (+0.27%) ⬆️
satpy/readers/seviri_l1b_native.py 86.25% <100.00%> (+0.86%) ⬆️
satpy/tests/reader_tests/test_seviri_l1b_native.py 100.00% <100.00%> (ø)
satpy/composites/__init__.py 90.26% <0.00%> (-0.01%) ⬇️
satpy/tests/test_composites.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@sjoro sjoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coveralls
Copy link

coveralls commented Nov 10, 2021

Coverage Status

Coverage increased (+0.008%) to 94.578% when pulling e5daa33 on simonrp84:sev_time_fix into 92130df on pytroll:main.

@simonrp84
Copy link
Member Author

Ok, I have now added nominal_start_time and nominal_end_time attributes to both the HRIT and NAT readers plus (hopefully!) updated the tests correctly with these.

you can access the attributes like so:

files = glob(f'{idir}/*.nat')
scn_nat = Scene(files, reader='seviri_l1b_native')
scn_nat.load(['IR_108], upper_right_corner='NE')
print(scn_nat['IR_108'].attrs['start_time'], scn_nat['IR_108'].attrs['end_time'])
print(scn_nat['IR_108'].attrs['nominal_start_time'], scn_nat['IR_108'].attrs['nominal_end_time'])

These will display minute values of 00, 15, 30 and 45 for nominal_end_time, while end_time will now display 12, 27, 42, 57.

I haven't updated the netcdf reader (as @ameraner suggested) simply as I'm not familiar with that reader and don't have much time to make changes in unfamiliar code. Hopefully someone else can do that!

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks a lot! I have a couple of questions/comments:

  • Is there a reason why the HRIT reader doesn't have properties for nominal times?
  • Regarding naming consistency: Should scheduled_time in the AHI HSD reader be deprecated and renamed to nominal_start_time?
  • Please shortly explain the nominal time attributes in the metadata chapter in the docs

satpy/readers/seviri_l1b_native.py Outdated Show resolved Hide resolved
satpy/readers/seviri_l1b_native.py Outdated Show resolved Hide resolved
satpy/tests/reader_tests/test_seviri_l1b_native.py Outdated Show resolved Hide resolved
@simonrp84
Copy link
Member Author

simonrp84 commented Nov 11, 2021

  • Is there a reason why the HRIT reader doesn't have properties for nominal times?

Because I couldn't find a way to access them as a user! But you raise a good point. Have updated for this now.

  • Regarding naming consistency: Should scheduled_time in the AHI HSD reader be deprecated and renamed to nominal_start_time?

Probably! Should we have that as a separate PR?

  • Please shortly explain the nominal time attributes in the metadata chapter in the docs

Will do, may take some time as I'll need to figure out how to edit the docs.

@sfinkens
Copy link
Member

👍 Separate PR for AHI sounds good!

@simonrp84
Copy link
Member Author

Ok, I've now updated the documentation and have also added a couple of small extra tests test that exceptions are raised if a bad grid origin or earth model is found in the data.

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me now! Also thanks for adding those extra tests. Just one final nitpick, then good to merge from my point of view!

@@ -584,7 +601,7 @@ def create_test_header(earth_model, dataset_id, is_full_disk, is_rapid_scan):
reference_grid: {
'ColumnDirGridStep': column_dir_grid_step,
'LineDirGridStep': line_dir_grid_step,
'GridOrigin': 2, # south-east corner
'GridOrigin': gridorigin, # south-east corner
Copy link
Member

@sfinkens sfinkens Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now outdated. Could you please explain in the docstring that the default value of 2 represents the south-east corner and then remove this comment?

@sfinkens
Copy link
Member

Can't tell why coveralls thinks the coverage dropped...

…d NAT readers, swap the `start_time` and `end_time` in the NAT reader to match those used in the HRIT reader.
@simonrp84 simonrp84 reopened this Aug 2, 2022
@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

In the semi-recent AHI work I did we defined a new time_parameters dictionary in the DataArray.attrs. I wonder if we should update this PR (how much do you want to hate me?) to use that dictionary instead of the new attributes in the root level of the .attrs dictionary.

https://satpy.readthedocs.io/en/latest/readers.html#time-metadata

@sfinkens
Copy link
Member

sfinkens commented Aug 3, 2022

https://satpy.readthedocs.io/en/latest/readers.html#time-metadata

Oh, this looks very nice! But maybe this can be done in a separate PR?

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

I'd be OK with that. I'm a little worried people are going to get used to the new attributes here, but I'm not a user of the SEVIRI native reader so maybe that's OK.

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

@sfinkens @simonrp84 What if I did it in this PR?

Edit: I noticed that CodeScene doesn't like the quality of this PR so I thought I'd clean that up and add the time_parameter changes.

@sfinkens
Copy link
Member

sfinkens commented Aug 3, 2022

Go ahead :)

@simonrp84
Copy link
Member Author

Feel free, I won't have time until tomorrow evening but can do it then if you're busy today.

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

Step 1 was to fix the code quality issues. That is done now. Now I'll try to rework the time_parameters.

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

@sfinkens @simonrp84 I have an...issue. The decision in this PR was that start_time/end_time should be the observation time, not the nominal time. It was suggested in the related AHI PRs that start_time/end_time should typically be the nominal time. In this way times are typically consistent between bands/files which improves caching for things like sensor angles calculated during rayleigh correction. I suppose if I changed this in the Native reader I'd also have to do this for the HRIT reader.... 😢

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

At this time I have not changed what start_time/end_time are, but would like to change them to nominal times. As-is, I'd be OK merging this PR, making a satpy release, and having this discussion another time. But, if we can have the discussion now then it may make things easier in the future and not have to worry about users using different versions of satpy.

@simonrp84
Copy link
Member Author

This seems reasonable to me, but if we also change the HRIT file to use the planned start and end times (rather than the current use of observation start and end times) then the reader is no longer tolerant to the situation where the prologue file is missing for a timeslot: As the plannet start/end times are in the prologue whilst the observation start/end times are in the epilogue.

Would like the opinion of the met center people on this - is losing the tolerance to a missing prologue acceptable?

If we keep this PR for the native reader and deal with HRIT separately then this is not a problem, however.

@djhoese
Copy link
Member

djhoese commented Aug 3, 2022

That is an interesting problem...however is that how the reader is actually configured to work? I mean, does the satpy reader allow for no prologue file. On mobile but I didn't think the code had a try/except around getting the nominal times so it would have failed with no prologue anyway.

@pnuu
Copy link
Member

pnuu commented Aug 4, 2022

As the plannet start/end times are in the prologue whilst the observation start/end times are in the epilogue.

I thought we just use the datetime from the filenames for HRIT 🤔

Also, I at least have the segment gatherer configured so that the PRO file is always required, whether we need it or not.

@mraspaud
Copy link
Member

mraspaud commented Aug 4, 2022

I think atm the hrit reader won't work if any of épi or pro files are missing. We probably could change that, at the cost of having degraded metadata when these files are missing.

Regarding start and end time. We have seen that using real acquisition times can slow down the reading significantly. So I would suggest to use metadata/nominal values for these as default for all readers, and further make the acquisition time per line available as a coordinate for users that require more precise values.
The idea behind this is that the nominal times are largely sufficient for quick sorting or browsing, and that's what they are needed for in 90% of the cases. For other cases, looking at the time coordinate is OK i think.

@sfinkens
Copy link
Member

sfinkens commented Aug 4, 2022

@mraspaud Good point with the performance. I'm a bit torn regarding the other time attributes. On the one hand I like your idea of the scanline acquisition being the only reference. Especially if a user selects a subset of scanlines, there's no mechanism to update the attributes accordingly. On the other hand I'm not sure whether all file formats provide acquisition timestamps. So I feel like it would make sense to keep the time_parameters as described in the docs, as a compromise...

@djhoese
Copy link
Member

djhoese commented Aug 4, 2022

So once I fix the test that @simonrp84 didn't update (test driven development!), it sounds like we're all OK with this being merged even if it undermines the original idea/purpose of this PR?

@djhoese
Copy link
Member

djhoese commented Aug 4, 2022

Oh no. The satpos property in the native reader uses self.start_time to get the satellite position. With this change, the satellite_actual_altitude changes by ~1.5 meters which makes the tests fail. Should this use the observation time or the nominal time? If it uses the observation time then every file will have a slightly different position and may not cache well. If it uses the nominal then it is technically less accurate but more cache-able. Actually, I think I round the satellite altitude in the angle generation anyway so the caching might not be a problem. I'll change it to use observation time.

@simonrp84
Copy link
Member Author

Good spot, I missed that - using the nominal times will be slightly less accurate, but no less so than the existing assumption that using the nominal times instead of actual times is OK for angle generation.

@djhoese djhoese changed the title Update SEVIRI native reader to select the correct scene start/end times. Update SEVIRI native reader with 'time_parameters' metadata Aug 4, 2022
@djhoese
Copy link
Member

djhoese commented Aug 4, 2022

I've modified the title and description to describe what this PR actually (currently) does. If there are no other comments today I will probably merge this as-is and we can plan on bringing the HRIT reader up to date in a future PR.

@djhoese djhoese merged commit 700ad69 into pytroll:main Aug 5, 2022
@sfinkens
Copy link
Member

sfinkens commented Aug 9, 2022

Thanks @djhoese !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants