Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf2zarr leaks semaphore objects #209

Open
tomwhite opened this issue May 15, 2024 · 16 comments
Open

vcf2zarr leaks semaphore objects #209

tomwhite opened this issue May 15, 2024 · 16 comments

Comments

@tomwhite
Copy link
Contributor

From #201 (comment)

On a Mac:

vcf2zarr convert sample.vcf.gz sample.zarr -p 1
    Scan: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 2.57files/s]
 Explode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 22.8vars/s]
  Encode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 927/927 [00:00<00:00, 1.62kB/s]
Finalise: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21.0/21.0 [00:00<00:00, 1.64karray/s]
/Users/tom/miniconda3/envs/bio2zarr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@jeromekelleher
Copy link
Contributor

Can you try that with just explode, and with different numbers of workers please?

vcf2zarr convert sample.vcf.gz sample.icf

@tomwhite
Copy link
Contributor Author

vcf2zarr explode sample.vcf.gz sample.icf
Do you want to overwrite sample.icf? (use --force to skip this check) [y/N]: y
    Scan: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 2.48files/s]
 Explode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 23.1vars/s]
/Users/tom/miniconda3/envs/bio2zarr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

vcf2zarr explode sample.vcf.gz sample.icf -p 3
Do you want to overwrite sample.icf? (use --force to skip this check) [y/N]: y
    Scan: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 2.51files/s]
 Explode: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 20.8vars/s]
/Users/tom/miniconda3/envs/bio2zarr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@jeromekelleher
Copy link
Contributor

I had a go at this in #214 - would you mind trying it out and seeing if it resolves this problem please @tomwhite?

@tomwhite
Copy link
Contributor Author

Still seeing the warning with the new code...

@jeromekelleher
Copy link
Contributor

Hmm, ok thanks. I might need a bit of help with this one then, it's too obscure to track down with being able to reproduce. Any chance you could take a look?🙏

@tomwhite
Copy link
Contributor Author

I tried removing all the tqdm code (including imports) and I still get the warning. I'll keep looking, but I'm not really sure what is going on here!

@jeromekelleher
Copy link
Contributor

Great to know that tqdm isn't causing this! I guess it must be something to do with the locks associated with the multiprocessing.Value. Is there anyway we can get more detailed feedback on where these semaphores are being leaked?

@jeromekelleher jeromekelleher added this to the 0.1.0 release milestone May 23, 2024
@jeromekelleher
Copy link
Contributor

Did you have any luck tracking this down @tomwhite? Some things that would be useful to try:

  • When you run with -p0 (and everything is run synchronously in a single thread) does this show up?
  • If you comment out the initializer and initargs arguments here, and make the update_progress function a no-op, do we still leak semaphores?

I'd really like to get rid of this...

@jeromekelleher
Copy link
Contributor

Can we just capture the warning in main, as a workaround also?

@jeromekelleher
Copy link
Contributor

Looks like this is some python 3.9 on mac quirk - I've reproduced on CI on both ARM and intel:
Screenshot from 2024-05-25 23-09-09

@jeromekelleher
Copy link
Contributor

I made an attempt to catch the warnings in #226, but it's tricky to do this via CI. I'm sure the warnings are harmless, and as it's only on Python 3.9 I don't think we need to make it a release blocker. Would be nice to just suppress the warning, at the same time, though.

@jeromekelleher
Copy link
Contributor

https://discuss.pytorch.org/t/issue-with-multiprocessing-semaphore-tracking/22943

Some comments here on how to suppress.

@jeromekelleher
Copy link
Contributor

Just to update here that I'm working on tracking this down. It's a doozy...

@jeromekelleher
Copy link
Contributor

I've tried lots of different ways to resolve this, and have come to the conclusion that it's something quite specific to Python 3.9 on Macs. Given that we don't leak semaphores on later Python versions it seems likely to me that this is an underlying bug in Python, and (given the substantial effort in trying to find workarounds) there's likely not much we can do about it.

  • Silencing the warnings is misguided (Try to silence harmless warning on py 3.9 Mac #226) because leaking semaphores genuinely is bad, and also we can't do it for console scripts (there's no mechanism to catch warnings at the Python-interpreter shutdown level here).
  • One guess at what was happening is that the default multiprocessing context in Python 3.9 for macs was still "fork" and so switching to "spawn" was leading to dangling resources. This didn't work (Set single mp context at early import #227)
  • Another idea was to free the the progress Value associated with each of the context managers on exit. This didn't work, as the leak is most likely happening because of the worker processes (Explicitly free multiprocessing.Value for progress #228)
  • The final attempt was to try being entirely explicit about the lifetimes of these Values both in the worker process and main process. This is done in See if semaphore leak is from concurrent.futures #230, along with various bits of extra debugging. This is a good place to start if you are interested in trying to track this down.

So, the conclusion is to mark this as a known issue, and to document the problem, suggesting that users move Python version if they are going to be doing serious work on their macs.

@tomwhite
Copy link
Contributor Author

tomwhite commented Jun 3, 2024

I way away last week, so thanks for tracking this down and documenting everything!

I agree that it's fine to document the Python 3.9 limitation. Python 3.9 won't be around much longer anyway (https://scientific-python.org/specs/spec-0000/).

jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Jun 3, 2024
jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Jun 3, 2024
jeromekelleher added a commit to jeromekelleher/bio2zarr that referenced this issue Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants