-
Notifications
You must be signed in to change notification settings - Fork 0
DM-49822: Remove Butler from upload_images. #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/tester/upload_from_repo.py
Outdated
for visit in visit_list: | ||
group = increment_group(instrument, group, 1) | ||
refs = prepare_one_visit(kafka_url, group, butler, instrument, visit) | ||
uris = [butler.getURI(ref) for ref in refs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory butler.get_many_uris
is faster than calling getURI in a loop (for direct butler it will do a single database query).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use something like that, but I didn't see that in (daily) pipelines.lsst.io. Did something break the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's in the base class. A shame it doesn't get propagated to the derived.
Since this ticket is better practice and works, are we ok to (rebase and then) merge to main? Or is there a way we can wrangle |
|
||
path = transferred[0].path | ||
path = os.path.join(temp_dir, uri.basename()) | ||
ResourcePath(path).transfer_from(uri, transfer="copy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what this code is trying to do, but it looks like it's downloading a file from remote to posix path, optionally extracting a sidecar json, and then uploading the raw it to a new location. Is that correct?
Why isn't this doing a direct transfer from one location to another using ResourcePath? Secondly, why isn't the JSON header extracted in the remote using astropy.io.fits with ResourcePath.to_fsspec? Would simplify the code a lot (and use ResourcePath.write without boto calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a direct transfer because it needs to modify some headers in the files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we modifying the headers of raw files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are giving them new group ID, new exposure ID, etc to fake new exposures. The script sends a next_visit event with that metadata, and the prompt service will attempt to process the matching exposure (matching via the headers).
No description provided.