Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3: fsspec migration #5683

Merged
merged 37 commits into from
May 5, 2021
Merged

Conversation

isidentical
Copy link
Contributor

Supersedes the first attempt on #5622.

@isidentical isidentical marked this pull request as draft March 24, 2021 08:20
@isidentical isidentical changed the title s3: fsspec migration [WIPs3: fsspec migration Mar 24, 2021
@isidentical isidentical changed the title [WIPs3: fsspec migration [WIP] s3: fsspec migration Mar 24, 2021
@isidentical isidentical mentioned this pull request Mar 24, 2021
13 tasks
@efiop efiop mentioned this pull request Mar 30, 2021
2 tasks
@isidentical isidentical marked this pull request as ready for review April 12, 2021 07:58
@isidentical
Copy link
Contributor Author

I'm marking this PR ready to review. The test suite passes on every version, though since we are awaiting a new s3fs release we can't just merge it as is yet (only one line change in setup to point to pypi s3fs instead of github one solve this though, so it won't block reviews).

@isidentical isidentical requested a review from a team April 12, 2021 08:00
tests/remotes/s3.py Outdated Show resolved Hide resolved
tests/docker-compose.yml Outdated Show resolved Hide resolved
tests/remotes/s3.py Outdated Show resolved Hide resolved
@isidentical
Copy link
Contributor Author

Benchmarks from the latest revision, same in general but there seems to be a slight improvement on dvc status -c which might be related with dircache on fsspec: https://gist.github.com/isidentical/9e0246c4fe700c019a49cd6ec6de2440

Comment on lines +9 to +12
motoserver:
image: motoserver/moto
ports:
- "5000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical, now that we are using docker 😩, please turn the following config off so that s3 tests are not run by default:

"s3": True,

Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

And s3fs is out too https://pypi.org/project/s3fs/ (as well as gcsfs), just waiting on adlfs 🙂

@shcheklein
Copy link
Member

hey, guys - quick ask - could we summarize (and may be visualize) before after? It's hard for me to figure out what do specific dvc status -c commands mean in the gist, what do we tests, what we don't test ...

(not a blocker for merge obviously, you decide - but would be really helpful to understand better what's going on and review)

🙏

@isidentical
Copy link
Contributor Author

It's hard for me to figure out what do specific dvc status -c commands mean in the gist, what do we tests, what we don't test ...

Actually, it is a very simple example of push + pull of big single file / a dataset of a lot of small files with status -c's in the middle of every operation (e.g before push / after push) etc.

(not a blocker for merge obviously, you decide - but would be really helpful to understand better what's going on and review)

I'll try to prepare a proper table with before/after of different scenerios.

@isidentical
Copy link
Contributor Author

We've discovered a bug that caused exists() calls on s3fs to take 6~ seconds for non-existent files, now this bug is resolved and it is taking 1.8-2.0~ seconds. This is still worse than our current implementation (for both existing and non-existing files, when the calls are made separately. This overhead is compensated when list_hashes_exist performs calls in pararel) (0.6-0.7~ seconds). For the last day, I've been debugging the cause and it appears to be the cost of aiobotocore + aiohttp (and all the underlying stuff).

@efiop efiop mentioned this pull request May 3, 2021
2 tasks
@isidentical isidentical changed the title [WIP] s3: fsspec migration s3: fsspec migration May 5, 2021
Copy link
Member

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding work 🔥

Let's merge and move forward for now, aiobotocore issue is a problem, but we can live with it for now (we also have some time before the next release, etc).

Thank you! 🙏

@efiop efiop merged commit d4a3df7 into iterative:master May 5, 2021
@efiop efiop added the refactoring Factoring and re-factoring label May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants