New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-26383: Add webdav datastore tests to daf_butler #356
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I do have some comments that should be thought about before I can merge. If you are going to do further work I will add you to the daf_butler repo as a collaborator.
@@ -93,7 +93,7 @@ def __init__(self, other=None, searchPaths=None): | |||
uri = ButlerURI(other, forceDirectory=True) | |||
uri.updateFile("butler.yaml") | |||
other = uri.geturl() | |||
elif uri.scheme == "https": | |||
elif uri.scheme.startswith("http"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should no longer be needed since I fixed that code in #355
@@ -181,6 +181,9 @@ def _write_in_memory_to_artifact(self, inMemoryDataset: Any, ref: DatasetRef) -> | |||
raise FileExistsError(f"Cannot write file for ref {ref} as " | |||
f"output file {location.uri} exists.") | |||
|
|||
if not location.uri.dirname().exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to consider how we deal with this in the new RemoteFileDatastore that has replaced this implementation. For now we should probably just add it and absorb the one extra S3 call.
@@ -1,6 +1,6 @@ | |||
datastore: | |||
cls: lsst.daf.butler.datastores.webdavDatastore.WebdavDatastore | |||
root: https://some.webdav.server/butlerRoot | |||
root: http://localhost:8080/butlerRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to the YAML here noting that this is an explicit URL for the test server rather than something random.
tests/test_butler.py
Outdated
self.tmpConfigFile = posixpath.join(rooturi, "butler.yaml") | ||
|
||
def tearDown(self): | ||
rooturi = f"http://{self.serverName}/{self.root}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be safer to store this in the object in setUp
so you can guarantee consistency in rooturi
.
tests/test_butler.py
Outdated
self.datastoreStr = f"datastore={self.root}" | ||
self.datastoreName = [f"WebdavDatastore@{rooturi}"] | ||
|
||
Butler.makeRepo(rooturi, config=config, forceConfigRoot=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeRepo
can take a ButlerURI
(but make sure you use forceDirectory
) if that makes things easier for you.
tests/test_butler.py
Outdated
|
||
config = { | ||
"host": "0.0.0.0", | ||
"port": 8080, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to use 8080? Does it deal with people who are running local web servers there already? If you run the tests with pytest -n5 tests/test_butler.py
won't it start multiple web servers, one for each subprocess? Don't they clash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't necessarily have to use 8080, any port above 1024 is probably fine (we may run into permissions issues if using any port below 1024).
I've added some necessary checks : if there is another service running on the selected port, the webdav server will fail to start and all tests will raise an OSError and fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running pytest with -n5
for example does not seem to be an issue and succeeds. I'm not sure if it's something related to how pytest
works with multithread, or if it's just the fact that, once the first webdav server is up, subsequent webdav server threads silently fail and tests all run against the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using port 0 works (it depends on whether you can ask the server what port it has been assigned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems plausible that they are sharing the same server. This seems a bit risky if the first subprocess completes before the others have finished talking to it. I imagine we are lucking out because the webdav tests are a small fraction of all the tests.
"""Starts a local webdav-compatible HTTP server, | ||
Listening on http://localhost:8080 | ||
This server only runs when this test class is instantiated, | ||
and then shuts down. Must be started is a separate thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What causes it to shut down? Can it be shutdown in tearDownClass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we use a daemon
thread, which is automatically killed once the main program finishes its execution.
Another solution would be to make the thread non-daemon and explicitly ask it to terminate in tearDown(), but I think the result would be identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do kill it in tearDownClass then that might make it more obvious that the other subprocesses are using it.
tests/test_uri.py
Outdated
@@ -358,11 +358,16 @@ def testQuoting(self): | |||
class WebdavURITestCase(unittest.TestCase): | |||
|
|||
def setUp(self): | |||
os.environ["WEBDAV_AUTH_METHOD"] = "TOKEN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use unittest.mock.patch here to patch the environment so we know we aren't contaminating other tests that might be called later. I haven't entirely thought about how this can be done without having the env set for each test separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ebf3ebe.
@unittest.mock.patch.dict
class decorator only applies to methods with test
prefix in the name, which means it doesn't apply to setUp() and tearDown() for example: I had to manually decorate those methods in test_butler.py.
After rebase, I'm having trouble with edit: the issue was with using POST (privously we used PUT) to transfer a local file to the webdav repository. Moving from POST to PUT fixed it. |
ebf3ebe
to
f8ccdab
Compare
@@ -710,6 +710,9 @@ def _extractIngestInfo(self, path: Union[str, ButlerURI], ref: DatasetRef, *, | |||
# Work out the name we want this ingested file to have | |||
# inside the datastore | |||
tgtLocation = self._calculate_ingested_datastore_name(srcUri, ref, formatter) | |||
if not tgtLocation.uri.dirname().exists(): | |||
log.debug("Folder %s does not exist yet.", tgtLocation.uri.dirname().geturl()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the geturl()
here because there is a __str__
method that does this.
7cd4a09
to
340bbfd
Compare
@timj I think this new version addresses all your observations. |
3ba4129
to
b8ae901
Compare
Following merged PR #351, here we're adding explicit WebdavDatastore tests. This requires starting a local webdav server against which requests are run. Adds a dependency to
wsgidav
andcheroot
packages, for testing only. Tests are skipped if dependencies cannot be met.Also implementing ButlerURI.parent() method, which returns the parent directory of the current ButlerURI (same as dirname() for a file-like URI).