Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by mpenagar · Pull Request #21926 · huggingface/transformers

mpenagar · 2023-03-03T10:57:22Z

What does this PR do?

Make sure all processes wait until data is saved before loading the processor from the output_dir in the pytorch/speech-recognition/run_speech_recognition_ctc.py example.

Issue:

Non-main proccesses might try to load the processor from the output_dir before it is saved.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Make sure all processes wait until data is saved before loading the processor from the output_dit

HuggingFaceDocBuilderDev · 2023-03-03T11:12:05Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-03-03T12:56:58Z

cc @sanchit-gandhi

sanchit-gandhi

Thanks @mpenagar, great catch! Would you also mind updating the seq2seq ASR fine-tuning script too?

transformers/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py

Line 487 in 64d95c4

if is_main_process(training_args.local_rank):

Is exactly the same update that's required!

…rocessor from the output_dit

mpenagar · 2023-03-07T10:19:42Z

Updated seq2seq ASR fine-tuning script. I'm not very good with github, I guess there is no need to do a new PR.

mpenagar · 2023-03-08T17:27:56Z

Wait... there is something I don't get correctly.

As far as I understand from the (documentation ) , any code inside a block with training_args.main_process_first(): should be executed only by the main process:

A context manager for torch distributed environment where
on needs to do something on the main process, while blocking
replicas, and when it’s finished releasing the replicas.

One such use is for datasets’s map feature which to be efficient
should be run once on the main process, which upon completion
saves a cached version of results and which then automatically
gets loaded by the replicas.

But in my experience, the code is executed by all the processes, not just the main one. Take this minimal `example.py':

from transformers import TrainingArguments,HfArgumentParser
from transformers.trainer_utils import is_main_process

def main():
    parser = HfArgumentParser((TrainingArguments,))
    training_args, = parser.parse_args_into_dataclasses()
    rank = training_args.local_rank
    main_process = is_main_process(rank)
    print(f'\nBEFORE WITH - local_rank={rank} is_main_process={main_process}')
    with training_args.main_process_first():
        print(f'\nINSIDE WITH - local_rank={rank}')

if __name__ == "__main__":
    main()

If I execute it in a 4 GPU node:

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node 4 example.py --output_dir none

The synching is working, but all processes execute the "INSIDE" print

Executing with newer torchrun does the same:

OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=4 example.py --output_dir none

What I am geting wrong?

sgugger · 2023-03-08T17:32:22Z

No, as the name indicates, it executes the code in the context manager on the main process, and then on all the others. The code is indeed executed in all processes, just in a certain order.

Since with Datasets, everything is cached, executing the preprocessing inside that contextmanager means that process 0 will do the preprocessing, and then all the other will load the result from the cache without needing to do the preprocessing.

mpenagar · 2023-03-08T17:38:54Z

Ok, then the PR is not correct, since all the processes will try to write the json files. I removed the original:

if is_main_process(training_args.local_rank):

that should be there inside the with block...

sgugger · 2023-03-08T19:21:23Z

Indeed, your changes are perfect. Is this ready to be merged now?

mpenagar · 2023-03-09T09:51:38Z

It is working in my end without any problem

sanchit-gandhi

Thanks for iterating and fixing the seq2seq training script as well @mpenagar! I'll merge once you've confirmed this is ready!

sanchit-gandhi · 2023-04-04T17:06:19Z

Is this good for merge @mpenagar? Changes LGTM!

mpenagar · 2023-04-05T08:45:31Z

Yes, it is ready. Anyway, I don't know how github works. Should I close the PR (there is a "Close" button there)?

sanchit-gandhi · 2023-04-05T14:55:24Z

Awesome, thanks for confirming @mpenagar and for your contribution 🤗

…ion_ctc.py (huggingface#21926) * Update run_speech_recognition_ctc.py Make sure all processes wait until data is saved before loading the processor from the output_dit * Make sure all processes wait until data is saved before loading the processor from the output_dit * Update run_speech_recognition_ctc.py * Update run_speech_recognition_seq2seq.py

Update run_speech_recognition_ctc.py

fd60eed

Make sure all processes wait until data is saved before loading the processor from the output_dit

sanchit-gandhi approved these changes Mar 6, 2023

View reviewed changes

Make sure all processes wait until data is saved before loading the p…

de171fe

…rocessor from the output_dit

mpenagar added 2 commits March 8, 2023 18:41

Update run_speech_recognition_ctc.py

167a3fb

Update run_speech_recognition_seq2seq.py

a7a577a

sgugger approved these changes Mar 8, 2023

View reviewed changes

mpenagar changed the title ~~Sync preocesses before loading the processor at run_speech_recognition_ctc.py~~ Sync preprocesses before loading the processor at run_speech_recognition_ctc.py Mar 9, 2023

sanchit-gandhi approved these changes Mar 9, 2023

View reviewed changes

huggingface deleted a comment from github-actions bot Apr 4, 2023

sgugger merged commit d5239ba into huggingface:main Apr 5, 2023

Conversation

mpenagar commented Mar 3, 2023

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Mar 3, 2023

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

mpenagar commented Mar 7, 2023

Uh oh!

mpenagar commented Mar 8, 2023

Uh oh!

sgugger commented Mar 8, 2023

Uh oh!

mpenagar commented Mar 8, 2023

Uh oh!

sgugger commented Mar 8, 2023

Uh oh!

mpenagar commented Mar 9, 2023

Uh oh!

sanchit-gandhi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpenagar commented Apr 5, 2023

Uh oh!

sanchit-gandhi commented Apr 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Mar 3, 2023 •

edited

Loading

sanchit-gandhi left a comment •

edited

Loading

sanchit-gandhi commented Apr 4, 2023 •

edited

Loading