-
Notifications
You must be signed in to change notification settings - Fork 0
DM-38066: Reconcile /repo/embargo as the central repo #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f05f3a5
to
8ddc400
Compare
The CHAINED collections used to be added manually to the export.yaml. This commit moves it to the script so that it is done in one go. This also refreshes the test data repo.
868333e
to
0773d23
Compare
0773d23
to
dff55ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'm most concerned about process_test.py
; some clarification on why we can't upload to the development service would be helpful.
python/tester/process_test.py
Outdated
args = _make_parser().parse_args() | ||
butler = Butler(args.central_repo) | ||
inst_name = args.inst | ||
local_storage = os.path.join("/lscratch", os.getlogin(), "tmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The system- (and account-?) specific nature of this expression makes me very nervous. Why not just use tempdir
?
python/tester/process_test.py
Outdated
os.environ["DB_APDB"] = "postgres" | ||
os.environ["USER_APDB"] = "postgres" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that the use of Postgres is hardcoded into MiddlewareInterface
. That should be fixable once DM-36772 is fixed, but the latter ticket is a low priority.
In the meantime, how do you ensure that this database exists and does not conflict with other users?
python/tester/process_test.py
Outdated
|
||
collection = interface.instrument.makeDefaultRawIngestRunName() | ||
_log.debug("Use query %s on %s", args.data_query, butler) | ||
data_ids = butler.registry.queryDataIds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be queryDatasets
if you're looking specifically for raws? I think all the processing below would be simpler with datasetRef
as well.
python/tester/process_test.py
Outdated
detector=data_id["detector"], | ||
instrument=inst_name, | ||
) | ||
_log.debug("Process %s", data_id.full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long is data_id.full
? I think just printing data_id
should give all the information that's actually human-friendly.
Otherwise butler complains the sizes do not match.
eebe859
to
ed8298c
Compare
In the foreseeable future, templates are instrument-dependent and we do not plan to build multi-instrument coadds as generic templates for now. The collection name is changed from "templates" to "<instrument>/templates". The template collection name in the test repo is updated too.
The central repo may have more data than we want, including non-production data. Instead of exporting all CALIBRATION collections from the central repo, we only want those in the default calibration chain. As of Feb 2023, exporting a `~CollectionType.CHAINED` collection does not automatically export its child collections. Also, there can be CHAINED collections inside CHAIEND collections. Instead of picking collections and re-constructing the chain, we will just preserve the entire collection structure, and export them all despite that some may not carry data of interest.
9a642f9
to
63c3334
Compare
Extra collections in the chain in the central repo used to cause MissingCollectionError in preparing the local butler if some collections are not necessary for the incoming visit. Here we just export the entire default chain even if some collections are empty and not useful. Typically the default chain include refcats, calibration, skymap, etc.
While we can ensure that the templates' collection exists in the central repo, it may not be chained the default collection. Currently that is the case in most shared repos. So we export it separately and chain it in the local repo. This is a workaround until the prompt processing has its own default top-level collection in the shared repos.
There can be multiple skymaps in the central repo and all skymaps are in the "skymaps" collection. We need to specify which skymap to use.
This reference catalog is used to calibrate AuxTel imaging surveys; see RFC-819 and DM-33444.
While in most cases, having 0 templates is troublesome, we may sometimes want to use the prompt processing system to run, for example, just ISR without needing templates. Give a warning instead of raising an error when no templates are found.
63c3334
to
e44f285
Compare
With the caveats about the test script process_test.py, I decided to remove it from this PR. It mocks up things that may not be used and is very confusing. And a better way to actually test processing AuxTel data with the system will be possible after DM-38269. |
No description provided.