New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define time metadata options and usage #2012
Comments
|
Regarding averaging, I should point out that PR #473 does averaging in the |
in order to cover the potential use cases i give my thumbs up for various suggested start and end times. i'm ok with the suggested naming of |
|
Good initiative, thanks for starting this! I like the proposed triplet |
Just a note here: if we start using cloud data, |
So in the If yes, maybe So, in my opinion, we need to have a |
@mraspaud Good point. I'm not sure if there is a good name otherwise. Like @ameraner You'll have to forgive me, but I'm not super familiar with the difference in the terms you're using. In the SEVIRI or FCI case, if we simplify it and say the instrument should start a scanning a new image every 15 minutes, I would say the |
Exactly. Those two times can differ by several minutes. In the SEVIRI world, for example, most users would consider one SEVIRI image to be "valid" for 15 minutes (the time period that passes between the repetitions of the acquisition, hence commonly called "repeat cycle", or "slot" in my argumentation), even tho the actual scanning takes only 12 minutes. So for 3 minutes there is an acquisition gap, where the instrument is not recording anything, while preparing to start the next acquisition. Any visualisation tool showing SEVIRI images, would keep one SEVIRI image on screen for 15 minutes, disregarding the 3mins acquisition gap. This is what Sauli was referring to on Slack, talking about SIFT:
--> hence the need for Satpy to provide these gapless On the other hand, any user e.g. wanting to compare precisely SEVIRI against another instrument, would only consider the actual scanning period - hence the need to provide the |
Now I'm confused about what we're confused about. In this SEVIRI discussion there are 2 time ranges, right? We have the "scheduled" or "repeat cycle" time which for SEVIRI would be 15 minutes and would include "nominal" times where by nominal I mean the timing the instrument was trying to meet (within this 15 minutes). We also have the "observation" time where the data actually represents the Earth between the first/start time and the stops at the second/end time. So @ameraner you originally said:
My answer is yes, that's exactly what scheduled time is. The observation time is this acquisition time. Or...are you saying there is the scheduled time (the "pretty" human-readable time 12:03:00), the repeat cycle time (12:03:15.444 to 12:18:15.444), and the observation time (12:03:33.677 to 12:15:24.323)? Where repeat cycle time is the overall 15 minute time slot of the observation but observation time is the actual time range that data was being recorded (shifted from slot time because of additional hardware movement or calibration). |
I feel like we're slowly converging on the concepts, with still some nomenclature misunderstandings. Maybe with a full example what I mean becomes more clear: In the SEVIRI case, we end up with 3 times
the next image could have
So the only pretty time is the In the first comments, I think, what I call here "slot" time was being referred to as "scheduled". My whole point is that the word "scheduled" can be misunderstood to refer to the (quite useless) planning time as described above. Using Maybe I'm overthinking this as I'm too influenced by the specific nomenclature and timing information of SEVIRI. If this still doesn't make sense I'll be able to sleep ok also with "scheduled" 😄 |
@ameraner Makes sense. I agree that the "planned time" is not something most of us probably deal with and was not something I had considered. I'm going to ask around in the GOES-R folks and see what other terminology is used for these kinds of things. Maybe we can land on something that is clearer than "scheduled" time. One last question, I assume that all SEVIRI file formats provide all 3 time ranges you've talked about? |
The observation start/end times and the scheduled end time are explicitly in the files. The planned start time is not given (because by the time the file is created, the "true" observation start time is known). The repeat cycle times are not there either, we always have to calculate them manually by rounding the observation times. |
The repeat cycle |
So I brought this up with some NOAA folks and some other people at SSEC and the general feeling was that "slot" by itself is a confusing word since people tend to think of orbital slots of satellites. I got generally positive feedback for "scheduled" for the human-friendly repeat cycle time. Some people pointed out that the difference between the "planned" time and the observation time is so small that it won't have an effect on anything data analysis wise (that's my phrasing/understanding). It was also brought up that the difference between a scheduled time and an observation time, even for angle generation, isn't going to make a huge difference in generating pretty pictures. If the SEVIRI files don't have the "planned" time in them then I say we go with "scheduled_start_time" and "observation_start_time" (and their "end" counterparts). You had mentioned @ameraner that you could at least sleep OK with that decision. We still need to think about "filename" versus something else (@mraspaud any other ideas come up in the last couple days?). Another related piece of metadata we could consider including is the "repeat cycle" as a timedelta for the duration of the scheduled time, but I suppose that's redundant. |
By |
I tend to use @ameraner said this:
Off topic but I don't think that's totally correct. At least for the native L1.5 data there's the Regarding @djhoese's comments:
That's true for the NOAA sats but might not be true for other sats. INSAT-3DR, for example, often has a minute or two gap between planned and actual times. Point being, if we have both planned and actual listed in the file, we should also include them in the attributes from the reader whenever possible. On the substance of the discussion: I like |
Doesn't it? None of the image data in question was recorded/observed after |
I've started playing around with this idea with AHI HSD to see how it effected performance. Looks like using scheduled time for |
Interestingly I also tried setting |
This discussion will probably happen on slack (as it has been so far today), but the current suggestions are:
Additionally @mraspaud started the discussion on putting these time ranges into a sub-dictionary similar to data_arr.attrs['time_parameters']['observation_start_time'] Anyone have a better name that |
Feature Request
Is your feature request related to a problem? Please describe.
See #2010, #1461, #1384, and #473 for related discussions. In Satpy, we have standardized that readers should provide a
start_time
and optionally anend_time
to define the time range of the data being loaded. However, this is often not the only piece of time information we have. For example, geostationary satellites which have a nominal schedule for their observations will have the time the data was supposed to be recorded and the time it actually was recorded. This has become an issue in things like AHI HRIT/HSD performance where thestart_time
was set to the observation time of the data and differed between each band. This results in things like solar and sensor angles being calculated separately for each band even though they represent the same "scene" of space/time. While it may be more accurate to use the observation time, a better performance can be achieved if the scheduled/nominal time is used.Describe the solution you'd like
I propose two things be added/changed to Satpy:
start_time
andend_time
as general definitions for the time range of the data and will actually be one of the following time fields. Additionally I think we should have ascheduled_start_time
,scheduled_end_time
,observation_start_time
, andobservation_end_time
. I believe we already have some non-defined standard for scan times? That would be another good parameter to allow. We could also have aforecast_time
or amodel_time
for model data to distinguish when the processing was run/started and what time it is forecasting for (I don't deal with this data much so tell me when I'm being wrong). I think as part of this we should say thatstart_time
should be equal toobservation_start_time
when possible for consistency across readers and better performance.satpy.config
parameter for telling angle generation what time field to use. A field likeangle_time_reference
which can be set to the metadata key name for the time to use. It would default tostart_time
, but could be set to observation_start_time or scan_start_time. Thinking about this more, if the angle generation was updated to handle a range of times (like interpolated between start and end) then maybe this key should be eitherobservation
,scan
, orscheduled
, but always default tostart_time/end_time
.Describe any changes to existing user workflow
Anyone using the reader metadata of
start_time
andend_time
may have slight differences (hopefully only small) in their calculations. Otherwise, the workflow should be unchanged except for those users who care about the accuracy of the time.Additional context
Keyword arguments to the readers would be an option like in #1384, but I'm realizing now that this prevents all the information being provided to the user which is worse than choosing what information to provide.
The text was updated successfully, but these errors were encountered: