Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul how we handle dates/times for maps #7673

Open
ayshih opened this issue Jun 13, 2024 · 7 comments · May be fixed by #7682
Open

Overhaul how we handle dates/times for maps #7673

ayshih opened this issue Jun 13, 2024 · 7 comments · May be fixed by #7682
Labels
Discussion An issue opened for, or undergoing discussion. 🔥 🔥 Urgent issue or fix map Affects the map submodule
Milestone

Comments

@ayshih
Copy link
Member

ayshih commented Jun 13, 2024

This is spun out from #7606. I propose an overhaul of how we handle dates/times for maps. To briefly summarize the FITS standard:

  • DATE-OBS is commonly used in older files, and usually (but not always!) refers to the start of the observation
  • DATE-BEG refers to the start of the observation
  • DATE-END refers to the end of the observation
  • DATE-AVG refers to the "average" time of the observation, and the standard is suggestive (but not explicit) that the observer metadata is coupled to this time. Also, the standard does not prescribe a specific way to calculate this average.

To briefly summarize our current code, GenericMap has the following properties:

  • .date_start, which is just DATE-BEG
  • .date_end, which is just DATE-END
  • .date_average, which is DATE-AVG (if present), or falls back to the arithmetic mean of DATE-BEG and DATE-END (if present)
  • .date, which is in order of decreasing priority: DATE-OBS > .date_average > .date_start > .date_end > the current time

Issues:

  • The .wcs property is instantiated with only one of the possible times, with .wcs.wcs.dateobs set to .date, and the other times are not preserved. Also, since the coordinate frame constructed from this WCS uses .date, it may not be accurate if DATE-OBS is present but is not the same time as the observer metadata (e.g., DATE-AVG).
  • The .observer_coordinate property is instantiated with its time as .date, and as above, that may not be accurate if DATE-OBS is present but is not the same time as the observer metadata (e.g., DATE-AVG).

Proposed overhaul plan:

  1. Re-define .date_average to be the reference time for observer metadata. The new logic would be in order of decreasing priority: DATE-AVG > .date. We should remove the assumption that the arithmetic mean of DATE-BEG and DATE-END is the correct way to construct the average time.
  2. Re-define .date to be the "canonical" time to refer to an observation, which is commonly the start of the observation, and would be the time that, e.g., is used in plot titles. The new priority would be DATE-OBS > .date_start > DATE-AVG > .date_end > the time that sunpy.map was first imported in the current session (see Rotating a map results in a different date #5543 (comment)).
  3. Re-define .observer_coordinate so that it uses .date_average instead of .date
  4. Re-define .wcs so that it populates the following using properties, which can be overridden by source subclasses:
  • .wcs.wcs.dateobs with .date (note that this will never be None)
  • .wcs.wcs.dateavg with .date_average (note that this will never be None)
  • .wcs.wcs.datebeg with .date_start if not None
  • .wcs.wcs.dateend with .date_end if not None
  1. Define our WCS->frame mapping so that it prioritizes .wcs.dateavg over .wcs.dateobs. This is actually already the case, but I list it for completeness:
    dateobs = wcs.wcs.dateavg or wcs.wcs.dateobs or None

Considerations:

  1. If we do the above, we might want to change the name of .date_average, and potentially of .date, to make it more clear what they mean and to enable some deprecation.
  2. There would be the potential confusion that .observer_coordinate.date would be equal to .date_average, and potentially different from .date..
@ayshih ayshih added map Affects the map submodule Discussion An issue opened for, or undergoing discussion. labels Jun 13, 2024
@Cadair
Copy link
Member

Cadair commented Jun 14, 2024

I am ok with this, but I am not convinced by:

We should remove the assumption that the arithmetic mean of DATE-BEG and DATE-END is the correct way to construct the average time.

I know there are potentially other ways of populating DATE-AVG but this seems like a sensible default? Are you suggesting removing it because of falling back to .date for the reference time?

I am not really clear on why you would have -beg and -end but not -avg, and if you do have both then I don't know why we would choose -beg as the reference time and not -end surely in that case picking the center point is the least-wrong & least-correct option? 😆

@ayshih
Copy link
Member Author

ayshih commented Jun 15, 2024

Small post before a big post...

I know there are potentially other ways of populating DATE-AVG but this seems like a sensible default?

I agree that if we had to have a default, then the midpoint of DATE-BEG and DATE-END would be sensible. My inclination is that providing sensible defaults is the realm of source-specific subclasses as opposed to the standards-based GenericMap.

One could make an analogous argument that .date_start, in the absence of DATE-BEG/DATE-AVG/DATE-END, ought to fall back to DATE-OBS, since that's almost certainly correct for the start of the observation. Again, it doesn't seem to me like something that GenericMap should do.

@ayshih
Copy link
Member Author

ayshih commented Jun 15, 2024

My proposal above is a little knotted because I was desperately trying to avoid adding a new date property. The consequence is that .date_average has to serve the double duty of both being the prosaic "average" date of the observation and the critical reference date of the observer metadata. This means that .date_average has to default to something for our machinery to work, and that default can get weird because it is more important for observer location that it matches a date in the header than that it is consistent with a common notion of the "average" date.

If we can stomach adding a new date property, the proposal becomes cleaner and – importantly – straightforward to document.

Proposed alternate overhaul plan:

  1. Define a new date property .observer_date (name suggestions welcome) to be the reference date for observer metadata. The logic would be DATE-AVG > DATE-OBS > DATE-BEG > DATE-END > .date. It is intentional that none of the (potentially overridden) .date_* properties are given priority over header keywords. A source subclass needs to override this specific property, and not just one of the .date_* properties, to force a date that is not in the header.
  2. Re-define .date to be the "canonical" time to refer to an observation, which is commonly the start of the observation, and would be the time that, e.g., is used in plot titles. The new priority would be DATE-OBS > .date_start > .date_average > .date_end > the time that sunpy.map was first imported in the current session (see Rotating a map results in a different date #5543 (comment)).
  3. Re-define .observer_coordinate so that it uses .observer_date instead of .date
  4. Re-define .wcs so that it populates the following using properties, which can be overridden by source subclasses:
  • .wcs.wcs.dateobs with .date (note that this will never be None)
  • .wcs.wcs.dateavg with .observer_date (note that this will never be None)
  • .wcs.wcs.datebeg with .date_start if not None
  • .wcs.wcs.dateend with .date_end if not None
  1. Define our WCS->frame mapping so that it prioritizes .wcs.dateavg over .wcs.dateobs. This is actually already the case, but I list it for completeness:
    dateobs = wcs.wcs.dateavg or wcs.wcs.dateobs or None

Considerations:

  • No change needs to be made to the existing .date_average property (although I'm still inclined to remove the default computation)
  • It is unintuitive that .wcs.wcs.dateavg is populated with .observer_date rather than .date-average, but that's how to get observer_date into the machinery. If DATE-AVG is defined, then they will be equal (barring overriding). If DATE-AVG is not defined, .observer_date may return something that is not what one might anticipate for the "average" date, but since DATE-AVG is not actually defined, the return can't actually be formally wrong.

@Cadair Cadair added the 🔥 🔥 Urgent issue or fix label Jun 19, 2024
@Cadair Cadair added this to the 6.0.0 milestone Jun 19, 2024
@Cadair
Copy link
Member

Cadair commented Jun 19, 2024

I agree with all of this apart from populating wcs.dateobs with .date under all situations. I think we are introducing ambiguities into the WCS object if we populate the (old, effectively deprecated, and arguably shouldn't be used any more) dateobs property if it's not in the header.

I am going to start implementing this with wcs.wcs.dateobs = self.meta.get("DATE-OBS"), and then rely on dateavg to always be set as the canonical date.

@Cadair Cadair linked a pull request Jun 19, 2024 that will close this issue
@Cadair
Copy link
Member

Cadair commented Jun 20, 2024

In the process of working on this I am now wondering why we are prioritising DATE-OBS over DATE-BEG for the .date property. I feel like we did this initially when adding FITS 4 support because we didn't want to break things, but given we are here breaking things it feels like we really should be prioritising the new well defined keys over the older one. My proposal for .date would then be:

  • DATE-BEG
  • DATE-OBS
  • DATE-AVG
  • DATE-END

I would put -OBS at the end but I can see someone somewhere making a stupid file with -AVG and -OBS but not -BEG


Edit:

On Albert's prompting I went and had another look over the specs. I assume @ayshih is recommending we put DATE-OBS first as it should normally be the start time, but there could be a situation where you have specific reason for it not to be:

The FITS 4 spec section 4.4.2.2 states:

The value of DATE-OBS shall be assumed to refer to the start of an observation, unless another interpretation is clearly explained in the comment field.

I would note however that the time paper says:

DATE-OBS is already defined in Sect. 4.4.2.2 of the FITS Standard. It is not specifically defined as the start time of the observation and has also been used to indicate some form of mean observing date and time. In order to specify a start date and time unambiguously one should use: [...] DATE-BEG, DATE-AVG, DATE-END

Given the ambiguity in both the historical definition of DATE-OBS and deliberate definition of how to calculate DATE-AVG I guess by prioritising DATE-OBS we are allowing a very very niche use case where your coordinate system is defined using some value of DATE-AVG but your "canonical" observation time isn't the start of your observation or your AVG time either.

@ayshih
Copy link
Member Author

ayshih commented Jun 25, 2024

I guess by prioritising DATE-OBS we are allowing a very very niche use case where your coordinate system is defined using some value of DATE-AVG but your "canonical" observation time isn't the start of your observation or your AVG time either.

It doesn't matter how niche the use case is. The FITS standard lays out DATE-OBS as the place to record the canonical observation time. The other DATExxxx keywords aren't even discussed in Section 4.4.2.2. Both the FITS standard and the time paper also quite reasonably say that DATE-BEG is the place to unambiguously record the start of the observation time, but that doesn't have to be the same as the canonical observation time. When DATE-OBS is equal DATE-BEG, priority is moot. When DATE-OBS is not equal to DATE-BEG, we should trust the data source to have a good reason to specify DATE-OBS that way and prioritize it over DATE-BEG.

As a toy example for why a date file would have this, imagine an imager that produces an integrated image product every UTC minute from summing individual frame captures that are a few seconds long. Because the frame-capture times may not be exactly synchronized to UTC, you can conceivably end up keywords with values like these:

  • DATE-OBS: 2024-Jun-25 12:07:00 (with a corresponding descriptive comment for why it is not the literal start of the observation)
  • DATE-BEG: 2024-Jun-25 12:06:59
  • DATE-AVG: 2024-Jun-25 12:07:30
  • DATE-END: 2024-Jun-25 12:08:02

Thus, the canonical observation time (exactly on the UTC minute) is not the same as the literal start of the observation. Also, DATE-AVG here is the halfway point of the canonical minute, and is not exactly equal the arithmetic mean of DATE-BEG and DATE-END. As far as I'm concerned, such keyword values conform to the FITS standard, and .date should return DATE-OBS and not DATE-BEG.

@Cadair
Copy link
Member

Cadair commented Jun 25, 2024

Makes sense to me, gosh this is complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion An issue opened for, or undergoing discussion. 🔥 🔥 Urgent issue or fix map Affects the map submodule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants