-
Notifications
You must be signed in to change notification settings - Fork 419
Complete sync and async copy/get/put tests #1267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@martindurant It would be good if you could check the |
fsspec/asyn.py
Outdated
| in ``fsspec.config.conf``, falling back to 1/8th of the system limit . | ||
| """ | ||
| from fsspec.implementations.local import LocalFileSystem, make_path_posix | ||
| from .implementations.local import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get circular imports if we move all these from the methods to the top of the module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the imports to the top works fine locally, I've pushed a commit to check in CI.
martindurant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically all good, just a couple of questions for you (one of which escaped as a standalone comment)
fsspec/tests/abstract/__init__.py
Outdated
| fs.touch(fs_join(subdir, "subfile2")) | ||
| fs.touch(fs_join(nesteddir, "nestedfile")) | ||
| return source | ||
| source = self._scenario_cp(fs, fs_join, fs_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little surprised here, because it looks like we need cp to be working correctly before we get to running the tests. Wasn't touch() more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking further, I see you just moved the touch() calls. But isn't _scenario_cp misnamed, then? In general, for any of these that a downstream developer is expected to implement (if any), it would be worthwhile describing all the arguments of all the methods better. Perhaps general documentation in https://filesystem-spec.readthedocs.io/en/latest/developer.html and/or https://filesystem-spec.readthedocs.io/en/latest/copying.html would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downstream developers won't need to re-implement the "scenario" functions, as long as they have implemented mkdir and touch these will work fine.
I had planned to add developer documentation for these new tests, it is the last but one item in my original comment of this PR. But I don't want downstream developers to be using them yet, not until I at least have both s3fs and gcsfs working using them. We are very nearly but not quite there yet.
I can think of two possible improvements for this PR.
-
AbstractTestHarnessis a mixture of functions that might need to be overwritten in downstream projects (e.g.fs_join) and those that will not (e.g.fs_scenario_cp). It is not clear which is in which category. So I could move the latter group into a new parent class ofAbstractFixturesand just keep the overridable ones inAbstractFixtures. Both classes need docstrings too, to explain this split and highlight what functions may need to be overridden. -
As you say, names like
fs_scenario_cpare not good! When we have many of the existing tests moved to the new structure I envisage a handful of "setup scenarios", one of which will be used in each test. The idea is to avoid each test having its own setup of directories and/or files as that gets unwieldy and instead have a minimal set of setup scenarios. At this stage there is just the one scenario which is the "setup scenario that is used for all cp, get and put tests" that I have shortened to "scenario_cp". But I can see that having "cp" in the name is confusing as it implies that copying occurs when the function is called. I am very happy to change the name, I just need a better one! It could bescenario_for_copying. It could just have an ID rather than a name although I am using IDs like1ain the copying docs so numbers and letters could be confusing.scenario_alpha? I don't like that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about bulk_operations_scenario, perhaps suffixed with 0 if there might eventually be more of them?
|
I think
|
|
Likely moto... |
Yes, s3fs tests pass locally with |
This adds a full set of
AbstractGetTestsandAbstractPutTeststhat are equivalent to the existingAbstractCopyTests, and ports the recent changes from synchronous copy/get/put code (#1250, #1254, #1255, #1259) to async. This completes the top 10 most common cp/get/put use cases documented by #713.The abstract test suite all passes locally for me using
localandmemoryfilesystems (the latter skips 4 tests, see below) and alsos3fs(using the branch that is fsspec/s3fs#713).Noteworthy implementation details:
AbstractCopyTests,AbstractGetTestsandAbstractPutTests. This doesn't make any real difference forlocalandmemorytests, but makes thes3fstests much faster than individual test scope. The scope could be made once per test session, but that would involve a bit of a refactor so is not planned yet.sourceandtargetdirectories for these tests uses fixtures that automatically clean up after themselves at the end of each test, ensuring that the filesystem is empty ready for the next test.Work to do in future PRs soon:
memorygettests are skipped as currently there is a bespoke code path that does not auto create a local directory when required. Needs a fix toMemoryFileSystem.sourceandtargetdirectories yet.gcsfsdoesn't pass all the new tests yet.I propose the following order of operations:
s3fsis happy with the new changes, then review and merge of that PR.fsspecis advisable then, as one of my recent PRs brokeasyncgetin some situations and this is fixed in this PR.Longer term there are other improvements that are not urgent:
cp/get/putuse cases.fsspec.