Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42306: Performance optimizations #942

Merged
merged 26 commits into from Jan 23, 2024
Merged

DM-42306: Performance optimizations #942

merged 26 commits into from Jan 23, 2024

Conversation

timj
Copy link
Member

@timj timj commented Jan 16, 2024

Depends on lsst/resources#76

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (b7211c8) 88.37% compared to head (f516a08) 88.39%.
Report is 1 commits behind head on main.

Files Patch % Lines
python/lsst/daf/butler/_config.py 80.00% 1 Missing and 1 partial ⚠️
python/lsst/daf/butler/_config_support.py 50.00% 1 Missing and 1 partial ⚠️
python/lsst/daf/butler/mapping_factory.py 77.77% 1 Missing and 1 partial ⚠️
python/lsst/daf/butler/_location.py 95.23% 0 Missing and 1 partial ⚠️
python/lsst/daf/butler/datastore/file_templates.py 97.77% 0 Missing and 1 partial ⚠️
python/lsst/daf/butler/datastores/fileDatastore.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #942      +/-   ##
==========================================
+ Coverage   88.37%   88.39%   +0.01%     
==========================================
  Files         303      303              
  Lines       38979    39070      +91     
  Branches     8221     8238      +17     
==========================================
+ Hits        34448    34535      +87     
- Misses       3339     3341       +2     
- Partials     1192     1194       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timj timj force-pushed the tickets/DM-42306 branch 10 times, most recently from b16c8b9 to 81b8fc3 Compare January 17, 2024 23:36
@timj timj requested a review from TallJimbo January 18, 2024 15:59
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizations look great.

I'm not sure I understand how the transfer-from dry-run option works (maybe because I'm not sure precisely how the checking for existing datasets in the target butler works even without dry-run).

I do think we should drop the ThreadPoolExecutor commit for now.

python/lsst/daf/butler/_butler.py Show resolved Hide resolved
python/lsst/daf/butler/datastores/fileDatastore.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/datastores/fileDatastore.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/datastore/file_templates.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/_location.py Outdated Show resolved Hide resolved
If the relative path is coming directly from a datastore record
we know it can't be something that is trying to ../ out of the
datastore.

The safety check in the Location constructor has significant
overhead.

Although with client/server this might be less true.
This can lead to some small performance improvement when ResourcePath
does not need to check itself whether something is a file or not.
This makes it easier for code to copy a Location without having
to know if deepcopy should be used.
The .fields method was slow and also being called four times.
Now call a grouped fields method once that returns all the answers.
This reverts commit f212bf5.

We have decided that we are not read for the thread pool speedup.
@timj
Copy link
Member Author

timj commented Jan 22, 2024

I'm not sure I understand how the transfer-from dry-run option works (maybe because I'm not sure precisely how the checking for existing datasets in the target butler works even without dry-run).

It does all the work that it would have to do to do the transfer but it doesn't copy any files or write to the target butler. I just modified how it works such that it will report that N datasets have already been transferred rather than dry run assuming that it always has to transfer everything.

@timj timj force-pushed the tickets/DM-42306 branch 2 times, most recently from 5138c62 to cc7c1cb Compare January 23, 2024 16:07
Dry run is more useful if it really does report what it is going
to do rather than reporting that it might transfer all the files
to the target butler.
@timj timj force-pushed the tickets/DM-42306 branch 2 times, most recently from 4d5ec6a to f1578f7 Compare January 23, 2024 16:21
@timj timj force-pushed the tickets/DM-42306 branch 2 times, most recently from cc7657e to f516a08 Compare January 23, 2024 23:06
@timj timj merged commit 458dcd7 into main Jan 23, 2024
18 checks passed
@timj timj deleted the tickets/DM-42306 branch January 23, 2024 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants