Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding schema for Source Catalog #374

Closed
wants to merge 10 commits into from

Conversation

bmorris3
Copy link
Contributor

@bmorris3 bmorris3 commented Feb 14, 2024

(work in progress)

This PR implements a basic schema for the output from the Source Catalog Step being developed in spacetelescope/romancal#1102, with data models implemented in spacetelescope/roman_datamodels#317.

Checklist

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.38%. Comparing base (0fb1239) to head (6bf137c).
Report is 7 commits behind head on main.

❗ Current head 6bf137c differs from pull request most recent head 25c146c. Consider uploading reports for the commit 25c146c to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #374   +/-   ##
=======================================
  Coverage   95.38%   95.38%           
=======================================
  Files           4        4           
  Lines         195      195           
=======================================
  Hits          186      186           
  Misses          9        9           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@WilliamJamieson WilliamJamieson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments

src/rad/resources/schemas/source_catalog-1.0.0.yaml Outdated Show resolved Hide resolved
datamodel_name: SourceCatalogModel
archive_meta: None
type: object
properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General questions: Since this is a data model should it have a meta keyword? If so what metadata should be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet – I'll follow up when I get directions from RTB.

@schlafly
Copy link
Collaborator

Re possibly including metadata, when I make fits catalogs for other surveys, what I do is include the image header as the FITS header for the catalog. You could also imagine including the step arguments so as to record some information about how the catalog was created.

For FITS the headers are usually pretty simple, but the asdf meta area can be more extensive so it's less clear to me that this makes as much sense there. It's also nice if the files are easy to open; I'm not sure what number of dependencies one needs to bring in to open a file that contains one of our image headers.

@bmorris3
Copy link
Contributor Author

In our last meeting, @schlafly checked if there was any downside to copying (parts of?) metadata from the L3 image used to produce the source catalog. @WilliamJamieson confirmed that ASDF would not complain if for example, GWCS was not installed, when loading a source catalog containing a copy of the metadata from its L3 image.

So it sounds like we're likely to include some kind of metadata, though we don't know which yet.

@bmorris3 bmorris3 marked this pull request as ready for review February 19, 2024 18:50

title: Source catalog generated by the Source Catalog Step.
datamodel_name: SourceCatalogModel
archive_meta: None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archive_meta is used to indicate this is the entry point for storing data in the db (e.g. wfi_image-1.0.0.yaml, guidewindow-1.0.0.yaml, etc.) Is this such a file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The honest answer is: I don't know. I'm waiting on instructions from RTB on these specs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdupriestsci The source catalogs and the segmentation maps are officially a data product that needs to be stored in the archive. They are new products we are developing now but "at the level of wfi_image". Does this answer your question? What would be the value of this keyword then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nden I'm not sure what you mean by "at the level of wfi_image". If they are their own files and not part of the wfi_image file then having the archive_meta: None is correct (we are not yet assigning a value to this keyword yet, we're just checking for its existence.

Please note that we don't have filetypes for Segmentation Maps or Source Catalogs in the archive (unless they're going under a different name), so this will require coordination with the db (either Azure or Oak, depending on who's doing it).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdupriestsci Sorry for the bad terminology. By "at the same level as wfi_image" I meant that these will be separate files. And thanks for clarifying the correct value in this case is None.

@ddavis-stsci @schlafly Could you coordinate with the rest of the team the work on the filetypes in the DB?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should bring this up in the RAD group but I think the correct filetype will be "Science WFI Level 4", FileTypeID=64.

Copy link
Collaborator

@nden nden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote what I think the initial metadata for these two schemas should be.
We can change/improve the metadata when we have more information.

CHANGES.rst Outdated Show resolved Hide resolved
src/rad/resources/schemas/segmentation_map-1.0.0.yaml Outdated Show resolved Hide resolved
exact_datatype: true
meta:
anyOf:
- $ref: common-1.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need all of common here. In the absence of direction what to do I will list what I think we need to add to the metadata.

  • remove common

- type: object
properties:
filename:
type: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under properties add

  • basic
  • coordinates
  • wfi_optical_element
  • program
  • visit (if a visit level catalog)

remove filename as it is included in basic


title: Source catalog generated by the Source Catalog Step.
datamodel_name: SourceCatalogModel
archive_meta: None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdupriestsci The source catalogs and the segmentation maps are officially a data product that needs to be stored in the archive. They are new products we are developing now but "at the level of wfi_image". Does this answer your question? What would be the value of this keyword then?

- type: object
properties:
segmentation_map:
type: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should follow what the segmentation_map schema includes.

@nden nden requested review from WilliamJamieson and schlafly and removed request for WilliamJamieson February 26, 2024 14:39
@schlafly
Copy link
Collaborator

I think the easiest approach here is to literally duplicate the corresponding image metadata into the catalog metadata. i.e., catalog.meta = image.meta.

This is what we do, e.g., for unWISE and DECaPS (albeit with FITS files).

I think the only danger here is when the metadata gets complicated. e.g., right now in the romancal L2 files we store tweakreg_catalog in the metadata. That obviously gets circular and bad; I think we should instead store that outside the metadata. This also means that there will be things like stnode.Filename and gWCS objects in the metadata. If that makes the asdf files less accessible to general users that would be a problem.

But conceptually this is duplicating metadata that we have decided is important to understand the image and so it applies equally well to understanding the catalog. Especially for the L3s where the amount of metadata is presently modest this makes a lot of sense to me. For the L2s where the metadata currently contains a lot of cruft, we'll be copying that cruft over, but I'd propose we think about pruning that cruft from the L2s rather than pruning it when selecting a subset of metadata to copy from the L2s.

@nden nden added this to the Build 14 milestone Feb 26, 2024
@nden
Copy link
Collaborator

nden commented Feb 27, 2024

Of course this is the schema for the files so you can't really assign catalog.meta = image.meta but need to define them. Might as well figure out which parts make sense to be included.

@schlafly
Copy link
Collaborator

I'm not good at reading rad schemas---I think we should try to follow the image metadata structure if we do this, and I think I read this schema as saying e.g. that the engineering_quality will be accessible as meta.engineering_quality. Meanwhile for the images it'sat meta.visit.engineering_quality.

For L2 catalogs, the most important metadata to bring over is the photometry table that does the unit conversions; let's bring that as well. But for L3 catalogs we don't need that. Optional?

The next most important thing is the WCS in my opinion, so let's copy that too.

Then it becomes deeply unclear to me what we need. 'instrument' for the detector and 'exposure' for the times would cover most things. Those are both L2s that we would make optional?

Another approach would just be to do something like one metadata keyword "image_filename" which is the filename of the image that the catalog was run on, and another metadata keyword "image_metadata" that is just given as an "object" and is identically the L2 / L3 image schema copied over, trying to avoid making a new set of metadata we need to separately track and copy...

description: |
Photometry and astrometry computed in the Source Catalog Step.
tag: tag:astropy.org:astropy/table/table-1.*
meta:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please put meta above the binary contents for both this and segmentation_map-1.0.0.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, order matters?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter code-wise (nor in the end file, you.. you set that order below). More about readability of schemas.

@nden
Copy link
Collaborator

nden commented Mar 29, 2024

Superseded by #393

@nden nden closed this Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants