Skip to content

Conversation

hsinfang
Copy link
Collaborator

@hsinfang hsinfang commented Mar 2, 2023

No description provided.

The CHAINED collections used to be added manually to the export.yaml.
This commit moves it to the script so that it is done in one go.
This also refreshes the test data repo.
@hsinfang hsinfang force-pushed the tickets/DM-38066 branch 3 times, most recently from 868333e to 0773d23 Compare March 8, 2023 00:31
@hsinfang hsinfang marked this pull request as ready for review March 8, 2023 00:33
@hsinfang hsinfang requested a review from kfindeisen March 8, 2023 19:15
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm most concerned about process_test.py; some clarification on why we can't upload to the development service would be helpful.

args = _make_parser().parse_args()
butler = Butler(args.central_repo)
inst_name = args.inst
local_storage = os.path.join("/lscratch", os.getlogin(), "tmp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system- (and account-?) specific nature of this expression makes me very nervous. Why not just use tempdir?

Comment on lines 76 to 77
os.environ["DB_APDB"] = "postgres"
os.environ["USER_APDB"] = "postgres"
Copy link
Member

@kfindeisen kfindeisen Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that the use of Postgres is hardcoded into MiddlewareInterface. That should be fixable once DM-36772 is fixed, but the latter ticket is a low priority.

In the meantime, how do you ensure that this database exists and does not conflict with other users?


collection = interface.instrument.makeDefaultRawIngestRunName()
_log.debug("Use query %s on %s", args.data_query, butler)
data_ids = butler.registry.queryDataIds(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be queryDatasets if you're looking specifically for raws? I think all the processing below would be simpler with datasetRef as well.

detector=data_id["detector"],
instrument=inst_name,
)
_log.debug("Process %s", data_id.full)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long is data_id.full? I think just printing data_id should give all the information that's actually human-friendly.

Otherwise butler complains the sizes do not match.
hsinfang added 2 commits March 9, 2023 16:27
In the foreseeable future, templates are instrument-dependent and
we do not plan to build multi-instrument coadds as generic templates
for now.

The collection name is changed from "templates" to "<instrument>/templates".

The template collection name in the test repo is updated too.
The central repo may have more data than we want, including
non-production data. Instead of exporting all CALIBRATION collections
from the central repo, we only want those in the default calibration
chain.

As of Feb 2023, exporting a `~CollectionType.CHAINED` collection
does not automatically export its child collections.  Also, there can
be CHAINED collections inside CHAIEND collections. Instead of picking
collections and re-constructing the chain, we will just preserve the
entire collection structure, and export them all despite that some
may not carry data of interest.
@hsinfang hsinfang force-pushed the tickets/DM-38066 branch 2 times, most recently from 9a642f9 to 63c3334 Compare March 14, 2023 13:18
Extra collections in the chain in the central repo used to cause
MissingCollectionError in preparing the local butler if some collections
are not necessary for the incoming visit. Here we just export the
entire default chain even if some collections are empty and not useful.
Typically the default chain include refcats, calibration, skymap, etc.
While we can ensure that the templates' collection exists in the
central repo, it may not be chained the default collection.
Currently that is the case in most shared repos. So we export it
separately and chain it in the local repo.

This is a workaround until the prompt processing has its own
default top-level collection in the shared repos.
There can be multiple skymaps in the central repo and
all skymaps are in the "skymaps" collection.
We need to specify which skymap to use.
This reference catalog is used to calibrate AuxTel imaging surveys;
see RFC-819 and DM-33444.
While in most cases, having 0 templates is troublesome, we
may sometimes want to use the prompt processing system to
run, for example, just ISR without needing templates. Give
a warning instead of raising an error when no templates
are found.
@hsinfang
Copy link
Collaborator Author

With the caveats about the test script process_test.py, I decided to remove it from this PR. It mocks up things that may not be used and is very confusing. And a better way to actually test processing AuxTel data with the system will be possible after DM-38269.

@hsinfang hsinfang merged commit 95a8c13 into main Mar 14, 2023
@hsinfang hsinfang deleted the tickets/DM-38066 branch March 14, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants