Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce expressivity_predict, and change pretssel_inference to expressivity_evaluate. #251

Merged
merged 4 commits into from
Dec 10, 2023

Conversation

kauterry
Copy link
Contributor

@kauterry kauterry commented Dec 7, 2023

This PR does the following:

Testing:

expressivity_predict <input_audio_path> --tgt_lang spa --model_name seamless_expressivity --vocoder_name vocoder_pretssel --output_path spa_whisper.wav

2023-12-08 22:29:24,508 INFO -- seamless_communication.cli.expressivity.predict.predict: Running inference on device=device(type='cuda', index=0) with dtype=torch.float16.
Using the cached tokenizer of seamless_expressivity. Set force to True to download again.
Using the cached tokenizer of seamless_expressivity. Set force to True to download again.
/private/home/krs/miniconda3/envs/fairseq2/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2023-12-08 22:29:31,425 INFO -- seamless_communication.cli.expressivity.predict.predict: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2023-12-08 22:29:31,425 INFO -- seamless_communication.cli.expressivity.predict.predict: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2023-12-08 22:29:31,425 INFO -- seamless_communication.cli.expressivity.predict.predict: unit_generation_ngram_filtering=False
2023-12-08 22:29:32,436 INFO -- seamless_communication.cli.expressivity.predict.predict: Saving expressive translated audio in spa
2023-12-08 22:29:32,463 INFO -- seamless_communication.cli.expressivity.predict.predict: Translated text in spa: ¿Por qué estás golpeando mi jukebox?

expressivity_evaluate eng_spa_100.tsv --task s2st --tgt_lang spa --output_path expressivity_whisper --ref_field tgt_text --model_name seamless_expressivity --vocoder_name vocoder_pretssel --duration_factor 1.0

Using the cached tokenizer of seamless_expressivity. Set force to True to download again.
Using the cached tokenizer of seamless_expressivity. Set force to True to download again.
2023-12-08 22:23:50,414 INFO -- seamless_communication.cli.expressivity.evaluate.evaluate: text_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(1, 200), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2023-12-08 22:23:50,415 INFO -- seamless_communication.cli.expressivity.evaluate.evaluate: unit_generation_opts=SequenceGeneratorOptions(beam_size=5, soft_max_seq_len=(25, 50), hard_max_seq_len=1024, step_processor=None, unk_penalty=0.0, len_penalty=1.0)
2023-12-08 22:23:50,415 INFO -- seamless_communication.cli.expressivity.evaluate.evaluate: unit_generation_ngram_filtering=False
/private/home/krs/miniconda3/envs/fairseq2/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
100%|██████████████████████████████████████████████| 99/99 [00:40<00:00, 2.44it/s]
2023-12-08 22:24:32,111 INFO -- seamless_communication.cli.expressivity.evaluate.evaluate: Processed 99 hyps, 99 refs
2023-12-08 22:24:32,128 INFO -- seamless_communication.cli.expressivity.evaluate.evaluate: Output results in expressivity_whisper/eng_spa_100/generate-eng_spa_100.tsv

pytest -v --device cuda:0

========================== 21 passed in 95.65s (0:01:35) ==========================

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 7, 2023
@kauterry kauterry marked this pull request as ready for review December 9, 2023 06:36
Copy link
Contributor

@elbayadm elbayadm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for the tutorial

@kauterry kauterry merged commit 6ab3787 into main Dec 10, 2023
1 check passed
@kauterry kauterry deleted the expressivity_predict branch December 10, 2023 05:54
gcmvn_mean = torch.tensor(_gcmvn_mean, device=device, dtype=dtype)
gcmvn_std = torch.tensor(_gcmvn_std, device=device, dtype=dtype)

wav, sample_rate = torchaudio.load(args.input)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you'd want to use AudioDecoder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to use this since I'm resampling to 16khz if the user specifies a generic audio.

@yilinyang7
Copy link

I'd suggest to leverage this file: https://github.com/facebookresearch/seamless_communication/blob/main/demo/expressive/app.py

It does the same thing.

@kauterry
Copy link
Contributor Author

I'd suggest to leverage this file: https://github.com/facebookresearch/seamless_communication/blob/main/demo/expressive/app.py

It does the same thing.

That file should actually leverage the expressivity/predict.py file, because in your suggestion we'll have the issue of circular imports. Please feel free to send a refactor PR.

@yilinyang7
Copy link

That file should actually leverage the expressivity/predict.py file, because in your suggestion we'll have the issue of circular imports. Please feel free to send a refactor PR.

I don't think it's ideal to change those files (e.g. HF demo & our public demo code), since they're up and running now..

gwenzek pushed a commit that referenced this pull request Jan 18, 2024
* make dot/ and test_data folder

* linting

* linting

* Guil's comments

---------

Co-authored-by: Tuan Tran <tuantran@devfair0436.h2.fair>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants