New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25101: butler ingest-raws CLI ingesting subsets of files in directory #362
Conversation
b67e9d7
to
5dd5854
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Specifically do not burn FITS into daf_butler, use existing directory hierarchy, consider the future of ButlerURI.
python/lsst/daf/butler/core/utils.py
Outdated
@@ -53,6 +55,10 @@ | |||
from lsst.utils import doImport | |||
|
|||
|
|||
# regular expression that can be used to find supported fits file extensions. | |||
fits_re = r"\.fit[s]?\b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the default regex for fits files should be in daf_butler. daf_butler knows nothing about fits files. It's only specifically obs_base raw ingest that wants to find fits files. The default should be .*
python/lsst/daf/butler/core/utils.py
Outdated
@@ -307,3 +313,38 @@ def __setstate__(self: Any, state: Any) -> None: # noqa: N807 | |||
assert not state | |||
cls.__setstate__ = __setstate__ | |||
return cls | |||
|
|||
|
|||
def findFiles(values: Iterable[str], regex: Optional[str] = None) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to have to be changed soon to use ButlerURI -- not for this ticket -- but maybe for now the name should be findFileResources
or something to indicate that we aren't always going to be local files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you said in the middleware meeting that this impl would not work for e.g. S3 but that it's good enough for now. The name change is a good idea, will do.
python/lsst/daf/butler/core/utils.py
Outdated
|
||
Returns | ||
------- | ||
files: `list` [`str`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resources
tests/test_utils.py
Outdated
@@ -164,5 +166,69 @@ def testTypeNames(self): | |||
self.assertEqual(getFullTypeName(item), typeName) | |||
|
|||
|
|||
class FindFilesTestCase(unittest.TestCase): | |||
|
|||
def setUp(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than setting up a temp directory and the like, you can just ask it to look in tests/config
for \.yaml\b
files and check that you find some representative ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there overhead in tempdir that you're worried about? IMO it's better to do it this way & to know exactly what is and isn't expected, and to be able to test for that. And also better to not depend on a resource where it's not obvious that changing that will affect this test. But if you really want I'll change it(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more code to support when the purpose of the test is to find files and not set up test infrastructure. It will be a bit faster to not create the temp dir but we create loads of them anyhow. I do understand that this way means you have full control over the results, but the other way doesn't have to be fragile if you are only looking for a subset of files rather than an exact list of every file and if we were to remove some of those config files there'd be breakage in many places. I'm not going to insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to change the unit test, getting expected lists of files using glob. I'm not sure it's really better, but it works and eliminates the temp file infrastructure.
accepts a list of files to return and directories to search for files, and a regex string to use when searching for files.
5dd5854
to
b5736fb
Compare
No description provided.