Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will open an unbounded number of files, and keep them open #281

Closed
inducer opened this issue May 4, 2015 · 4 comments
Closed

Will open an unbounded number of files, and keep them open #281

inducer opened this issue May 4, 2015 · 4 comments
Labels

Comments

@inducer
Copy link
Contributor

inducer commented May 4, 2015

I use dulwich in a course management system, and as part of that use case, a user issues lots of tiny fetches, resulting in lots of tiny packfiles. It looks as though dulwich keeps all of these files open, as the server process will eventually run out of file descriptors--even if the corresponding Repo gets garbage-collected.

I encountered this with 0.9.8. I'd much appreciate any help.

Thanks!

@keis
Copy link
Contributor

keis commented May 5, 2015

dulwich does not support git garbage collection or repacks AFAIK. I ran into the exact same issue and we ended up just putting a git repack -a -d in cron on the server.

@garyvdm
Copy link
Contributor

garyvdm commented May 5, 2015

I think @inducer was talking about the python Repo object being garbage collected by the python gc, rather than a git gc.

@inducer:

How many packfiles are we talking about? Please can you do ls .git/objects/pack/ | wc -l to find out how many packs + index files there are.

I did some testing and the files do get closed when a Repo object gets garbage-collected.

But I can reproduce getting IOError: [Errno 24] Too many open files by creating a repo where then number of packs + indexes exceeds the max open files. With this script:

import shutil
import tempfile

from dulwich.repo import Repo
from dulwich.objects import Blob

repo_dir = tempfile.mkdtemp('dulwich_open_file_test')
try:
    repo = Repo.init_bare(repo_dir)
    print('create packs')
    for i in range(2500):
        blob = Blob.from_string('Blob {}'.format(i).encode('utf8'))
        repo.object_store.add_objects([(blob, None), ])

    print('read packs')
    for sha in repo.object_store:
        object = repo.get_object(sha)
finally:
    repo.object_store.close()
    del repo
    shutil.rmtree(repo_dir)

So given this, and for other performance reasons, I would also recommend running git repack -a -d to reduce the number of pack files.

@jelmer jelmer added the bug label May 23, 2015
@jelmer
Copy link
Owner

jelmer commented Jan 15, 2017

Note that pack files are now closed when they are no longer used; however, Dulwich doesn't yet repack automatically.

@jelmer
Copy link
Owner

jelmer commented Aug 6, 2017

See #296 for repack. I'll close this bug since the other file description issues are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants