New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-22599: Develop PipelineTask unit test framework #114
Conversation
The code has been cleaned up, and tests added.
It's hard to explicitly provide correct keys without understanding how the Butler dimensions system works in detail. Moving key constraints to automated (if simple-minded) code greatly reduces the burden on callers.
Each test should have its own collection for isolation, but creating a completely new repository each time is impractical.
The code has been cleaned up, and minimal tests added.
Is all the time in |
No, it's about 80% from |
Weird because for me running |
The daf_butler tests create very many new butler repos and there are only two tests that take longer than one second. One is an S3 test that takes just over a second and the other is a big registry test taking 7 seconds. If making a repository for tests had 5 seconds overhead the tests in daf_butler would take an incredible amount of time. What do you get if you run the daf_butler tests with |
Looks like I slightly overestimated the time for
While I'm at it, here are the finer-grained timings:
|
Well, I checked out the branch and ran the tests myself and I see:
so your machine is really really slow for some reason. |
Well, I know that my computer runs tests faster than Jenkins (at the level of individual packages), so I think there's still room for concern. |
Are you using a local SSD or an NFS mount? |
I do have a comment on the code itself. I really think that the test code for creating butlers and dataset types should be moved to daf_butler. daf_butler already has some of these functions in the helper packages inside daf_butler tests directory but it seems that they need to be consolidated with your code here and moved to |
A suggestion from @timj on Slack (possibly redundant with the merge proposed above): use an in-memory SQLite database for the registry to speed up the Butler operations. |
Since the config is not being specified in your API I think you can create your own Config to pass to c = lsst.daf.butler.Config()
c["registry", "db"] = "sqlite:///:memory:"
Butler.makeRepo(root, config=c) |
I can confirm that using an in-memory SQLite database can make things go tremendously faster. I discovered on a recent ticket that one of our Registry tests does an absurdly large number of inserts (playing with spatial indexing on regions covering ~half the sphere), and it was totally fine until I tried running that test against on-disk databases. |
Given the scope of this change, I'm closing this PR and will open a new one once the |
This PR adds a module,
lsst.pipe.base.tests
, with test utilities specific toPipelineTask
subclasses. The intent of these tests is to enable unit testing of:Connections
are correctly written and whether they match the inputs and outputs of therun
methodrunQuantum
methodNote that, because it depends on a real Butler, the test code has some performance limitations: each call to
makeTestRepo
takes 4-6 seconds on my machine and each call tomakeUniqueButler
takes an extra second.