Complete sync and async copy/get/put tests #1267

ianthomas23 · 2023-05-16T10:13:05Z

This adds a full set of AbstractGetTests and AbstractPutTests that are equivalent to the existing AbstractCopyTests, and ports the recent changes from synchronous copy/get/put code (#1250, #1254, #1255, #1259) to async. This completes the top 10 most common cp/get/put use cases documented by #713.

The abstract test suite all passes locally for me using local and memory filesystems (the latter skips 4 tests, see below) and also s3fs (using the branch that is fsspec/s3fs#713).

Noteworthy implementation details:

Filesystem scope for tests is per test class, so there is one setup-and-teardown combination for each of the AbstractCopyTests, AbstractGetTests and AbstractPutTests. This doesn't make any real difference for local and memory tests, but makes the s3fs tests much faster than individual test scope. The scope could be made once per test session, but that would involve a bit of a refactor so is not planned yet.
Setting up the source and target directories for these tests uses fixtures that automatically clean up after themselves at the end of each test, ensuring that the filesystem is empty ready for the next test.

Work to do in future PRs soon:

Four of the memory get tests are skipped as currently there is a bespoke code path that does not auto create a local directory when required. Needs a fix to MemoryFileSystem.
I haven't fully dealt with all combinations of trailing slashes on source and target directories yet.
gcsfs doesn't pass all the new tests yet.

I propose the following order of operations:

Review and merge this when CI happy.
I'll update Test using new abstract test harness s3fs#713 to check that s3fs is happy with the new changes, then review and merge of that PR.
A new release of fsspec is advisable then, as one of my recent PRs broke async get in some situations and this is fixed in this PR.
I'll fix the above 3 "future PRs" issues.

Longer term there are other improvements that are not urgent:

Cover some of the less-common cp/get/put use cases.
Test more of implementations that ship in fsspec.
Document how to use the abstract test harness in downstream projects.
Move tests from original test scripts to the new abstract test classes.

fsspec/implementations/tests/test_memory.py

ianthomas23 · 2023-05-16T10:15:53Z

@martindurant It would be good if you could check the async code looks OK as that is the area I am least familiar with.

martindurant · 2023-05-19T13:40:15Z

fsspec/asyn.py

        in ``fsspec.config.conf``, falling back to 1/8th of the system limit .
        """
-        from fsspec.implementations.local import LocalFileSystem, make_path_posix
+        from .implementations.local import (


Do we get circular imports if we move all these from the methods to the top of the module?

Moving the imports to the top works fine locally, I've pushed a commit to check in CI.

martindurant

Basically all good, just a couple of questions for you (one of which escaped as a standalone comment)

martindurant · 2023-05-19T13:43:10Z

fsspec/tests/abstract/__init__.py

-        fs.touch(fs_join(subdir, "subfile2"))
-        fs.touch(fs_join(nesteddir, "nestedfile"))
-        return source
+        source = self._scenario_cp(fs, fs_join, fs_path)


Little surprised here, because it looks like we need cp to be working correctly before we get to running the tests. Wasn't touch() more explicit?

Looking further, I see you just moved the touch() calls. But isn't _scenario_cp misnamed, then? In general, for any of these that a downstream developer is expected to implement (if any), it would be worthwhile describing all the arguments of all the methods better. Perhaps general documentation in https://filesystem-spec.readthedocs.io/en/latest/developer.html and/or https://filesystem-spec.readthedocs.io/en/latest/copying.html would help.

Downstream developers won't need to re-implement the "scenario" functions, as long as they have implemented mkdir and touch these will work fine.

I had planned to add developer documentation for these new tests, it is the last but one item in my original comment of this PR. But I don't want downstream developers to be using them yet, not until I at least have both s3fs and gcsfs working using them. We are very nearly but not quite there yet.

I can think of two possible improvements for this PR.

AbstractTestHarness is a mixture of functions that might need to be overwritten in downstream projects (e.g. fs_join) and those that will not (e.g. fs_scenario_cp). It is not clear which is in which category. So I could move the latter group into a new parent class of AbstractFixtures and just keep the overridable ones in AbstractFixtures. Both classes need docstrings too, to explain this split and highlight what functions may need to be overridden.

As you say, names like fs_scenario_cp are not good! When we have many of the existing tests moved to the new structure I envisage a handful of "setup scenarios", one of which will be used in each test. The idea is to avoid each test having its own setup of directories and/or files as that gets unwieldy and instead have a minimal set of setup scenarios. At this stage there is just the one scenario which is the "setup scenario that is used for all cp, get and put tests" that I have shortened to "scenario_cp". But I can see that having "cp" in the name is confusing as it implies that copying occurs when the function is called. I am very happy to change the name, I just need a better one! It could be scenario_for_copying. It could just have an ID rather than a name although I am using IDs like 1a in the copying docs so numbers and letters could be confusing. scenario_alpha? I don't like that either.

how about bulk_operations_scenario, perhaps suffixed with 0 if there might eventually be more of them?

ianthomas23 · 2023-05-24T11:55:23Z

I think s3fs test failure is unrelated as the other recent PR has the same error. CI passed OK 5 days ago. Differences in environment between passing and failing:

package	passed	failing
c-ares	1.19.0	1.19.1
coverage	7.2.5	7.2.6
grpcio	1.54.2	1.55.0
libgrpc	1.54.2	1.55.0
libprotobuf	3.21.12	4.23.1
moto	4.1.9	4.1.10
protobuf	4.21.12	4.23.1
requests	2.29.0	2.31.0
typing-extensions	4.5.0	4.6.0
websocket-client	1.5.1	1.5.2

martindurant · 2023-05-24T14:20:17Z

Likely moto...

ianthomas23 · 2023-05-24T16:43:21Z

Likely moto...

Yes, s3fs tests pass locally with moto==4.1.9 and fail with moto==4.1.10. We are not receiving a "BucketKeyEnabled" key when querying head_object.

ianthomas23 added 5 commits May 15, 2023 08:54

AbstractGetTests

7f06b3d

AbstractPutTests

d3abf8b

Make abstract test fixtures member not static functions

2b16809

Async cp/get/put

09d3e5e

Test filesystems have class scope and tests clean up after themselves

9a4a579

ianthomas23 commented May 16, 2023

View reviewed changes

fsspec/implementations/tests/test_memory.py Show resolved Hide resolved

martindurant reviewed May 19, 2023

View reviewed changes

Move imports to top of asyn.py

d35b140

ianthomas23 mentioned this pull request May 23, 2023

Issues using fsspec.spec put method on version 2023.5.0 #1270

Closed

ianthomas23 added 2 commits May 24, 2023 11:06

Use bulk_operations_scenario name

190c0b3

Separate class for functions that may/may not be overridden

31bf56c

martindurant merged commit 5720dfc into fsspec:master May 25, 2023

This was referenced May 30, 2023

Create parent directories in get_file #1282

Merged

put() inconsistent with cp #882

Closed

put method inconsistent between multiple runs fsspec/s3fs#737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Complete sync and async copy/get/put tests #1267

Complete sync and async copy/get/put tests #1267

Uh oh!

ianthomas23 commented May 16, 2023

Uh oh!

Uh oh!

ianthomas23 commented May 16, 2023

Uh oh!

martindurant May 19, 2023

Uh oh!

ianthomas23 May 22, 2023

Uh oh!

martindurant left a comment

Uh oh!

martindurant May 19, 2023

Uh oh!

martindurant May 19, 2023

Uh oh!

ianthomas23 May 22, 2023

Uh oh!

martindurant May 23, 2023

Uh oh!

ianthomas23 commented May 24, 2023

Uh oh!

martindurant commented May 24, 2023

Uh oh!

ianthomas23 commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Complete sync and async copy/get/put tests #1267

Complete sync and async copy/get/put tests #1267

Uh oh!

Conversation

ianthomas23 commented May 16, 2023

Uh oh!

Uh oh!

ianthomas23 commented May 16, 2023

Uh oh!

martindurant May 19, 2023

Choose a reason for hiding this comment

Uh oh!

ianthomas23 May 22, 2023

Choose a reason for hiding this comment

Uh oh!

martindurant left a comment

Choose a reason for hiding this comment

Uh oh!

martindurant May 19, 2023

Choose a reason for hiding this comment

Uh oh!

martindurant May 19, 2023

Choose a reason for hiding this comment

Uh oh!

ianthomas23 May 22, 2023

Choose a reason for hiding this comment

Uh oh!

martindurant May 23, 2023

Choose a reason for hiding this comment

Uh oh!

ianthomas23 commented May 24, 2023

Uh oh!

martindurant commented May 24, 2023

Uh oh!

ianthomas23 commented May 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants