Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fineweb's parallel processing on windows #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ltogniolli
Copy link

Fineweb importing fails on Windows with "Attempt to start a new process before the current process has finished its bootstrapping phase." due to platform semantics. Fix it by adding an entry point guard as indicated in the documentation

@lukasugar
Copy link

Interestingly, I got the same error on macOS. Your fix worked!

@gerardaristizabalpla4
Copy link

gerardaristizabalpla4 commented Jun 30, 2024

THANK YOU!!

Were you able to run everything that Andrej run in Windows? I was having issues with compile and scaled_dot_product_attention because the package Triton apparently can't be downloaded for Windows. Thank You!

@GrahamTheCoder
Copy link

GrahamTheCoder commented Jul 3, 2024

@lukasugar Does it run? Yes. Except it isn't as fast because there's no windows implementation of:

  • NCCL - so limited to 1 GPU
  • Flash attention v2 - so attention is slower
  • Triton - so torch.compile isn't available, so again, slower

On a 3090 24GB I get about 52K t/s, if I run it under ubuntu even under WSL2 I get 62K t/s. Baremetal linux was maybe 0.5% faster, but opens up multi-gpu as an option.

For multi-gpu on windows I did try using torch.distributed.init_process_group(backend="gloo") and omitting --standalone from torchrun command, but I get "Cannot use ReduceOp.AVG with Gloo" - it only works with NCCL. I haven't tried treating them as if they are each on separate machines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants