Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Anmol6 · 2024-01-11T10:38:33Z

No description provided.

bclavie · 2024-01-12T17:16:01Z

I think after the latest batch of fixes it looks good now! Thank you so much for doing this!

bclavie · 2024-01-12T17:18:16Z

colbert/infra/config/settings.py

@@ -30,6 +30,8 @@ class RunSettings:
    total_visible_gpus = torch.cuda.device_count()
    gpus: int = DefaultVal(total_visible_gpus)

+    use_rank1_fork: bool = DefaultVal(False)


Just a quick note (well, two):

Should this not be the opposite logic? i.e. defaults to True (and the associated logic changes in the code)? I feel like it'd make more sense with the naming

Do we want this to be a config item rather than a flag passed to the function? I think either works fine and maybe having it as part of config is more fitting for how the rest of the code is structured.

I was using fork in the sense of a code fork but will revert to fork as in multiprocessing! thanks for explaining the terminology!

Thanks, all looking good to me now! Just needs Omar's final approval 👍

fblissjr · 2024-01-12T18:43:30Z

This fixed a problem I was having downstream in RAGatouille, always running in distributed mode on wsl2 with a single gpu. Thank you for the PR!

bclavie/RAGatouille#30 (comment)

edit: looks like the trainer is still always forcing distributed torch. the collection indexer fixed indexing, though. Definitely the right direction though.

Anmol6 added 2 commits January 11, 2024 02:38

handle rank 1 case without mp

7bcd34e

condition dist for cuda too

23000ab

Anmol6 changed the title ~~Handle rank 1 case without mp~~ Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel Jan 11, 2024

Anmol6 marked this pull request as ready for review January 11, 2024 11:04

Anmol6 added 3 commits January 11, 2024 11:25

set seed & CUDA_VISIBLE_DEVICES & add flag

45e9051

s

f1d3be5

s

55ff35c

fblissjr mentioned this pull request Jan 12, 2024

Slow index creation [Windows + WSL] bclavie/RAGatouille#30

Closed

bclavie requested a review from okhat January 12, 2024 17:15

bclavie reviewed Jan 12, 2024

View reviewed changes

Anmol6 and others added 3 commits January 12, 2024 15:29

fork as in mp fork

6d55d74

Update settings.py

2810280

Update launcher.py

b9b6004

okhat merged commit 9389495 into stanford-futuredata:main Jan 14, 2024

Anmol6 deleted the fix-rank-1-case branch January 14, 2024 18:10

This was referenced Jan 14, 2024

chore: Pass self.config.avoid_fork_if_possible = True bclavie/RAGatouille#51

Merged

Indexing can not be completed (on Windows) bclavie/RAGatouille#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Anmol6 commented Jan 11, 2024

bclavie commented Jan 12, 2024 •

edited

bclavie Jan 12, 2024 •

edited

Anmol6 Jan 12, 2024

bclavie Jan 14, 2024

fblissjr commented Jan 12, 2024 •

edited

Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Indexing: handle cpu & single-gpu without using multiprocessing & dist. data parallel #290

Conversation

Anmol6 commented Jan 11, 2024

bclavie commented Jan 12, 2024 • edited

bclavie Jan 12, 2024 • edited

Choose a reason for hiding this comment

Anmol6 Jan 12, 2024

Choose a reason for hiding this comment

bclavie Jan 14, 2024

Choose a reason for hiding this comment

fblissjr commented Jan 12, 2024 • edited

bclavie commented Jan 12, 2024 •

edited

bclavie Jan 12, 2024 •

edited

fblissjr commented Jan 12, 2024 •

edited