Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support directories with millions of files. #95

Closed
2 tasks done
davies opened this issue Jan 20, 2021 · 6 comments
Closed
2 tasks done

Support directories with millions of files. #95

davies opened this issue Jan 20, 2021 · 6 comments
Assignees
Labels
area/metadata Issues or PRs related to metadata area/performance Issues or PRs related to performance kind/feature New feature or request
Milestone

Comments

@davies
Copy link
Contributor

davies commented Jan 20, 2021

What would you like to be added:

Currently, we fetch the attributes of files in single directory with single batch request to Redis, that could be slow or fail, and block other requests.

We can split those into small batches, for example, 1000 per batch.

Why is this needed:

The number of files could be millions, we don't want people be bited by that.

Backlog

@davies davies added the kind/feature New feature or request label Jan 20, 2021
@davies davies added this to the Release 1.0 milestone Jan 21, 2021
@suzaku
Copy link
Contributor

suzaku commented Jan 21, 2021

Any more detail for this issue? In what kind of operation are the file attributes fetched?

@davies
Copy link
Contributor Author

davies commented Jan 21, 2021

When you do ls, a Readdir() will be called. When plus is true, it will fetch all the attributes in one batch.

@suzaku
Copy link
Contributor

suzaku commented Jan 21, 2021

Got it, I can take a try on this one.

@xiaogaozi xiaogaozi added the area/metadata Issues or PRs related to metadata label Jan 22, 2021
@suzaku
Copy link
Contributor

suzaku commented Jan 22, 2021

If there are millions of files, we might run into trouble when the first HGetAll call is reached. Any idea how to work around it?

@davies
Copy link
Contributor Author

davies commented Jan 22, 2021

Yes. Right now, we don't have a work around, that could be the next challenge.

@davies
Copy link
Contributor Author

davies commented Jan 28, 2021

The second part is done by #128

davies pushed a commit that referenced this issue Jan 28, 2021
davies pushed a commit that referenced this issue Jan 28, 2021
davies pushed a commit that referenced this issue Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metadata Issues or PRs related to metadata area/performance Issues or PRs related to performance kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants