Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26383: Add webdav datastore tests to daf_butler #356

Merged
merged 22 commits into from Aug 28, 2020

Conversation

bgounon
Copy link
Contributor

@bgounon bgounon commented Aug 20, 2020

Following merged PR #351, here we're adding explicit WebdavDatastore tests. This requires starting a local webdav server against which requests are run. Adds a dependency to wsgidav and cheroot packages, for testing only. Tests are skipped if dependencies cannot be met.
Also implementing ButlerURI.parent() method, which returns the parent directory of the current ButlerURI (same as dirname() for a file-like URI).

@bgounon bgounon changed the title Add webdav datastore tests to daf_butler [DM-26383] Add webdav datastore tests to daf_butler Aug 20, 2020
@bgounon bgounon changed the title [DM-26383] Add webdav datastore tests to daf_butler DM-26383: Add webdav datastore tests to daf_butler Aug 20, 2020
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I do have some comments that should be thought about before I can merge. If you are going to do further work I will add you to the daf_butler repo as a collaborator.

@@ -93,7 +93,7 @@ def __init__(self, other=None, searchPaths=None):
uri = ButlerURI(other, forceDirectory=True)
uri.updateFile("butler.yaml")
other = uri.geturl()
elif uri.scheme == "https":
elif uri.scheme.startswith("http"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should no longer be needed since I fixed that code in #355

@@ -181,6 +181,9 @@ def _write_in_memory_to_artifact(self, inMemoryDataset: Any, ref: DatasetRef) ->
raise FileExistsError(f"Cannot write file for ref {ref} as "
f"output file {location.uri} exists.")

if not location.uri.dirname().exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to consider how we deal with this in the new RemoteFileDatastore that has replaced this implementation. For now we should probably just add it and absorb the one extra S3 call.

@@ -1,6 +1,6 @@
datastore:
cls: lsst.daf.butler.datastores.webdavDatastore.WebdavDatastore
root: https://some.webdav.server/butlerRoot
root: http://localhost:8080/butlerRoot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to the YAML here noting that this is an explicit URL for the test server rather than something random.

self.tmpConfigFile = posixpath.join(rooturi, "butler.yaml")

def tearDown(self):
rooturi = f"http://{self.serverName}/{self.root}"
Copy link
Member

@timj timj Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be safer to store this in the object in setUp so you can guarantee consistency in rooturi.

self.datastoreStr = f"datastore={self.root}"
self.datastoreName = [f"WebdavDatastore@{rooturi}"]

Butler.makeRepo(rooturi, config=config, forceConfigRoot=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeRepo can take a ButlerURI (but make sure you use forceDirectory) if that makes things easier for you.


config = {
"host": "0.0.0.0",
"port": 8080,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to use 8080? Does it deal with people who are running local web servers there already? If you run the tests with pytest -n5 tests/test_butler.py won't it start multiple web servers, one for each subprocess? Don't they clash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily have to use 8080, any port above 1024 is probably fine (we may run into permissions issues if using any port below 1024).
I've added some necessary checks : if there is another service running on the selected port, the webdav server will fail to start and all tests will raise an OSError and fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running pytest with -n5 for example does not seem to be an issue and succeeds. I'm not sure if it's something related to how pytest works with multithread, or if it's just the fact that, once the first webdav server is up, subsequent webdav server threads silently fail and tests all run against the first one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using port 0 works (it depends on whether you can ask the server what port it has been assigned).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems plausible that they are sharing the same server. This seems a bit risky if the first subprocess completes before the others have finished talking to it. I imagine we are lucking out because the webdav tests are a small fraction of all the tests.

"""Starts a local webdav-compatible HTTP server,
Listening on http://localhost:8080
This server only runs when this test class is instantiated,
and then shuts down. Must be started is a separate thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes it to shut down? Can it be shutdown in tearDownClass ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use a daemon thread, which is automatically killed once the main program finishes its execution.
Another solution would be to make the thread non-daemon and explicitly ask it to terminate in tearDown(), but I think the result would be identical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do kill it in tearDownClass then that might make it more obvious that the other subprocesses are using it.

@@ -358,11 +358,16 @@ def testQuoting(self):
class WebdavURITestCase(unittest.TestCase):

def setUp(self):
os.environ["WEBDAV_AUTH_METHOD"] = "TOKEN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use unittest.mock.patch here to patch the environment so we know we aren't contaminating other tests that might be called later. I haven't entirely thought about how this can be done without having the env set for each test separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ebf3ebe.
@unittest.mock.patch.dict class decorator only applies to methods with test prefix in the name, which means it doesn't apply to setUp() and tearDown() for example: I had to manually decorate those methods in test_butler.py.

@bgounon
Copy link
Contributor Author

bgounon commented Aug 25, 2020

After rebase, I'm having trouble with _extractIngestInfo in fileLikeDatastore not transferring files as expected, I'll update the PR when this is figured out.

edit: the issue was with using POST (privously we used PUT) to transfer a local file to the webdav repository. Moving from POST to PUT fixed it.

@@ -710,6 +710,9 @@ def _extractIngestInfo(self, path: Union[str, ButlerURI], ref: DatasetRef, *,
# Work out the name we want this ingested file to have
# inside the datastore
tgtLocation = self._calculate_ingested_datastore_name(srcUri, ref, formatter)
if not tgtLocation.uri.dirname().exists():
log.debug("Folder %s does not exist yet.", tgtLocation.uri.dirname().geturl())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the geturl() here because there is a __str__ method that does this.

@bgounon
Copy link
Contributor Author

bgounon commented Aug 26, 2020

@timj I think this new version addresses all your observations.
In particular, the webdav server now dynamically chooses a free port on which to listen. This allows for multiple webdav servers to start simultaneously without interfering with each other (for example when using pytest -nX).
Also, the server is now gracefully shut down in tearDownClass instead of using daemon threads.

@bgounon bgounon requested a review from timj August 27, 2020 08:29
@timj timj merged commit 3815080 into lsst:master Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants