New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26690: Add command-line tool for Registry.queryDataIds #402
Conversation
43c7627
to
845db52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the code comments, I'm not entirely sure I completely understand the output of the command so it would be great if there was some extra help text somewhere (I'm not sure how easy it is to add more information in click vs adding extra text to the sphinx page. Mentioning DimensionGraph in the help string may be confusing.
A few things jumped out:
- I was able to run with a dimensions argument and no
--collections
option so the docstring was confusing at best. It wasn't at all clear what I would do with the dimension argument and it was a pleasant surprise when for a whim I asked for "patch" on a calexp dataset and was rewarded by it telling me the tract/patch for each calexp. Even better with htm7. That's great but not obvious. - Sometimes you get identical rows repeated and it's not clear why. It might be because there are repeated options with dimensions that we do not see but it doesn't help anybody to include the duplicates. Filtering them out should be straightforward.
- Table sorting should be used as is now done in query-datasets.
collections=collections) | ||
|
||
if len(results.graph) > 0: | ||
return Table([{str(k): v for k, v in item.full.items()} for item in list(results)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the list()
at the end there? results
is an iterator so it should iterate automatically without first going through the list.
After you create the table you will need to sort the columns as we now do for query-datasets (we want to prioritize spatial/temporal columns over others).
tests/test_cliCmdQueryDataIds.py
Outdated
datasets=datasets, | ||
where=where) | ||
|
||
def _assertTablesEqual(self, tables, expectedTables): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked this method a little on the branch I recently merged. I had a problem with a missing comma in expectedTables definition with one element leading to the tables being rows not tables. I added an assert to check they were all tables. You can move helper methods to lsst.daf.butler.tests.
tests/test_cliCmdQueryDataIds.py
Outdated
def setUp(self): | ||
self.root = tempfile.mkdtemp(dir=TESTDIR) | ||
Butler.makeRepo(self.root, config=Config(self.configFile)) | ||
self.butlerConfigFile = os.path.join(self.root, "butler.yaml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being stored in an instance property? I think it's only used a few lines below. Also, you don't need it -- you should take the return value of Butler.makeRepo
and pass that directly to the Butler constructor below.
self.butler.registry) | ||
|
||
# Add needed Dimensions | ||
self.butler.registry.insertDimensionData("instrument", {"name": "DummyCamComp"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may want to take a look at lsst.daf.butler.tests.makeTestRepo
which does a lot of this heavy lifting for you. The downside is that it expects a single butler per class and assumes you'll create a new collection for each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't work out how to make it do what I need, and I think it mocks some behaviors in a way that's hard to understand (it mentions a registry override, and arbitrary cross assignment of relationships). In a day I couldn't get it to do what I needed, Jim directed me at something else (test_simpleButler), and I understand what this code does (and I think it's clear from reading it, but maybe just because it's made of stuff that I do grok). So, I'd like to leave it as is if that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'm not sure what the real problem is because makeTestRepo doesn't mock anything -- you can call it with empty dataIds iterables and it's still less code than calling Butler.makeRepo yourself isn't it?
tests/test_cliCmdQueryDataIds.py
Outdated
[563, 234, 456.7, 752, 8, 9, 27]) | ||
|
||
@staticmethod | ||
def _addDatasetType(datasetTypeName, dimensions, storageClass, registry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use lsst.daf.butler.tests.addDatasetType
tests/test_cliCmdQueryDataIds.py
Outdated
return datasetType | ||
|
||
@staticmethod | ||
def _queryDataIds(repo, dimensions=(), collections=(), datasets=None, where=None, ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this trampoline gives us over importing queryDataIds from script and using it directly. Is it solely the defaulted arguments? If so can you add a comment here?
tests/test_cliCmdQueryDataIds.py
Outdated
res = self._queryDataIds(self.root, dimensions=("visit",)) | ||
expected = AstropyTable( | ||
array(( | ||
("R", "DummyCamComp", "d-r", "423", "1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers here have to be numbers and not strings.
@repo_argument(required=True) | ||
@dimensions_argument(help=unwrap("""DIMENSIONS are dimensions of the data IDs to yield. Will be expanded to a | ||
complete DimensionGraph. --collections is required if using this | ||
argument.""")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--collections
would be required for the --datasets
option, not this argument.
I'd just change this docstring to "DIMENSIONS are the keys of the data IDs to yield. Will be expanded to include any dependencies." I think that's a bit more beginner-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I think it would help a lot if we gave some examples here and said "such as exposure, instrument, or tract".
Are we planning on at some point adding an option that would list the full record for the dimension? eg listing the exposure.obs_id or detector.full_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning on at some point adding an option that would list the full record for the dimension? eg listing the exposure.obs_id or detector.full_name?
There is a simpler Python API for that via a different registry method, queryDimensionRecords
. I figured we'd just make that another subcommand on another ticket; the full-records expanded data ID you can get back from queryDataIds
(by calling expanded()
on the result object) isn't something that lends itself to command-line pretty-printing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. That would be handy so I'll maybe take a look at that myself.
One other thing I forgot was that typing |
a38a225
to
3a80090
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment about your new table assert.
tests/test_cliCmdQueryDatasets.py
Outdated
@@ -177,7 +165,7 @@ def testShowURI(self): | |||
"visit", "URI")), | |||
) | |||
|
|||
self._assertTablesEqual(tables, expectedTables) | |||
assertAstropyTablesEqual(self, tables, expectedTables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want assertAstropyTablesEqual to be in a class and then use multiple inheritance in this TestCase so that it inherits from unittest.TestCase and the butlerTestHelper (or whatever you call it). Then you'd still be able to do self.assertTablesEqual
here and make it look like a normal test assert. We use the multiple inheritance technique a lot in obs_base butler tests.
d4e8cdd
to
6a6da12
Compare
6a6da12
to
dccd9eb
Compare
No description provided.