Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25101: butler ingest-raws CLI ingesting subsets of files in directory #362

Merged
merged 3 commits into from Sep 1, 2020

Conversation

n8pease
Copy link
Contributor

@n8pease n8pease commented Aug 28, 2020

No description provided.

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Specifically do not burn FITS into daf_butler, use existing directory hierarchy, consider the future of ButlerURI.

@@ -53,6 +55,10 @@
from lsst.utils import doImport


# regular expression that can be used to find supported fits file extensions.
fits_re = r"\.fit[s]?\b"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the default regex for fits files should be in daf_butler. daf_butler knows nothing about fits files. It's only specifically obs_base raw ingest that wants to find fits files. The default should be .*

@@ -307,3 +313,38 @@ def __setstate__(self: Any, state: Any) -> None: # noqa: N807
assert not state
cls.__setstate__ = __setstate__
return cls


def findFiles(values: Iterable[str], regex: Optional[str] = None) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to have to be changed soon to use ButlerURI -- not for this ticket -- but maybe for now the name should be findFileResources or something to indicate that we aren't always going to be local files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you said in the middleware meeting that this impl would not work for e.g. S3 but that it's good enough for now. The name change is a good idea, will do.


Returns
-------
files: `list` [`str`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resources

@@ -164,5 +166,69 @@ def testTypeNames(self):
self.assertEqual(getFullTypeName(item), typeName)


class FindFilesTestCase(unittest.TestCase):

def setUp(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than setting up a temp directory and the like, you can just ask it to look in tests/config for \.yaml\b files and check that you find some representative ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there overhead in tempdir that you're worried about? IMO it's better to do it this way & to know exactly what is and isn't expected, and to be able to test for that. And also better to not depend on a resource where it's not obvious that changing that will affect this test. But if you really want I'll change it(?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more code to support when the purpose of the test is to find files and not set up test infrastructure. It will be a bit faster to not create the temp dir but we create loads of them anyhow. I do understand that this way means you have full control over the results, but the other way doesn't have to be fragile if you are only looking for a subset of files rather than an exact list of every file and if we were to remove some of those config files there'd be breakage in many places. I'm not going to insist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to change the unit test, getting expected lists of files using glob. I'm not sure it's really better, but it works and eliminates the temp file infrastructure.

accepts a list of files to return and directories to search for
files, and a regex string to use when searching for files.
@n8pease n8pease merged commit 4e2051f into master Sep 1, 2020
@timj timj deleted the tickets/DM-25101 branch February 16, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants