Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust setup.py. Map scripts to binary names. Adjust Readme docs. #23

Merged
merged 9 commits into from
Aug 23, 2023

Conversation

mavlyutovr
Copy link
Contributor

@mavlyutovr mavlyutovr commented Aug 22, 2023

Adjust setup.py:

  • import requirements from files
  • setup metadata
  • add scripts as separate projects
  • add script->binary name mapping
  • add symlinks from project modules when installing in editable mode

Adjust Readme docs:

  • use binary names instead of python .. path to script.py
  • add a mention of an extra requirement for libsndfile (for fairseq2)

Adjustments in the script files:

  • set logging
  • move argument parsing under main()

E2E evaluation:

conda create  -y -n sc309 python=3.9
conda activate sc309
conda install -y -c conda-forge libsndfile
pip install .
mkdir -p ~/dataset
m4t_prepare_dataset  --source_lang eng --target_lang kor --split validation --save_dir ~/dataset
m4t_predict ~/dataset/downloads/extracted/633d087081013940f8c282cc7b5155e838ece8e0293d1651b7c684c47c563276/dev/10010138729160973689.wav s2st eng --output_path ~/out.wav
m4t_prepare_dataset  --source_lang eng --target_lang kor --split train --save_dir ~/dataset
m4t_finetune --train_dataset ~/dataset/train_manifest.json --eval_dataset ~/dataset/validation_manifest.json  --model_name seamlessM4T_medium --save_model_to ~/checkpoint.pt --log_steps 1 --batch_size 2

Same if installing in editable mode (also testing python 3.10):

conda create  -y -n sc310 python=3.10
conda activate sc310
conda install -y -c conda-forge libsndfile
pip install -e .
mkdir -p ~/dataset
python scripts/m4t/finetune/dataset.py --source_lang eng --target_lang kor --split validation --save_dir ~/dataset
python scripts/m4t/predict/predict.py ~/dataset/downloads/extracted/633d087081013940f8c282cc7b5155e838ece8e0293d1651b7c684c47c563276/dev/10010138729160973689.wav s2st eng --output_path ~/out.wav
python scripts/m4t/finetune/dataset.py --source_lang eng --target_lang kor --split train --save_dir ~/dataset
python scripts/m4t/finetune/finetune.py --train_dataset ~/dataset/train_manifest.json --eval_dataset ~/dataset/validation_manifest.json  --model_name seamlessM4T_medium --save_model_to ~/checkpoint.pt --log_steps 1 --batch_size 2

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2023
requirements.txt Outdated
@@ -3,3 +3,4 @@ datasets
torchaudio
soundfile
librosa
fairseq2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's early stage of fairseq2 shall we add version to avoid compatibility issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg, fixed

README.md Outdated
```
conda install -y -c conda-forge libsndfile
```
At this point fairseq2 has a confirmed support only for Linux and macOS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make it explicit that macOS support ✅ but pre-built package ❌, as Can mentioned in #19

right now we don't have pre-built packages available for macOS. For the time-being I suggest using a Linux box. We plan to have macOS wheels available pretty soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

setup.py Outdated
package_dir={"": "src"},
package_data={"": ["assets/cards/*.yaml"]},
version="1.0.0",
packages=find_packages(where="src") + ['m4t_scripts.finetune', 'm4t_scripts.predict'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect heavy code (that's not main entry) in finetune / predict or something importable? I was thinking we should move things that need to be imported to src/ all. But open to this change here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to expose CLI scripts as executables.

@kauterry kauterry self-requested a review August 22, 2023 20:41
Copy link
Contributor

@kauterry kauterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename python scripts/m4t/predict/predict.py to m4t_predict here too? https://github.com/facebookresearch/seamless_communication/blob/main/scripts/m4t/predict/README.md

Copy link
Contributor

@kauterry kauterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create new directories seamless_communication and m4t_scripts, do we want that? cc @cndn

@mavlyutovr mavlyutovr merged commit 21241a6 into main Aug 23, 2023
1 check passed
@mavlyutovr mavlyutovr deleted the adjust_setup_py branch August 23, 2023 00:06
@cndn cndn mentioned this pull request Aug 23, 2023
class cmd_for_editable_mode(develop):
def run(self):
# add symlinks for modules if install in editable mode
_add_symlinks()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something, but are these symlinks necessary? Normally pip install -e . adds the relative "src" directory to Python's sys.path, so Python should be able to resolve.

Copy link
Contributor Author

@mavlyutovr mavlyutovr Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editable mode of Python creates a symlink in site-packages to the module's parent folder. This logic does not work if you have more than one module to expose and they don't share their parent folder.

Comment on lines +144 to +146
# symlinks
seamless_communication
m4t_scripts
Copy link
Contributor

@kauterry kauterry Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Ruslan means by "more than one module to expose" @cbalioglu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants