Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocTests Speech] Add doc tests for all speech models #15031

Merged

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Jan 4, 2022

What does this PR do?

This PR revives the doc tests and adds doc tests for all speech models now that the new docs are finished.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten patrickvonplaten changed the title [DocTests] Revive doc tests [WIP][DocTests] Revive doc tests Jan 4, 2022
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Let's take the opportunity to have the doctests passing on this PR before merging! (5 files failing now, but for simple issues I believe)

.github/workflows/doctests.yml Show resolved Hide resolved
src/transformers/models/wavlm/modeling_wavlm.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling those! Could you develop what issue you had with black exactly?

.github/workflows/doctests.yml Show resolved Hide resolved
src/transformers/file_utils.py Outdated Show resolved Hide resolved
src/transformers/file_utils.py Show resolved Hide resolved
src/transformers/file_utils.py Show resolved Hide resolved
utils/documentation_tests.txt Outdated Show resolved Hide resolved
@patrickvonplaten patrickvonplaten changed the title [WIP][DocTests] Revive doc tests [WIP][DocTests] Add doc tests for all speech models Jan 25, 2022
@patrickvonplaten patrickvonplaten changed the title [WIP][DocTests] Add doc tests for all speech models [WIP][DocTests Speech] Add doc tests for all speech models Jan 25, 2022
diff = clean_code != code
if diff:
print(f"Overwriting content of {code_file}.")
with open(code_file, "w", encoding="utf-8", newline="\n") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to put them in some temp dir to run the tests and not overwrite the existing one (otherwise we will get modifications we don't want when running the doctest locally).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I made two commands now - one that adds the line - one that reverts it. Think this is more intuitive for local testing instead of copying the files and this way we also don't need to add a temp path before every file to be tested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document somewhere the instruction that has to be run before/after testing locally then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a long intro in the beginning of the file. @sgugger - do you think it could be a good idea to add a README.md to utils so that community contributors have some docs on how to use the utils scripts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more of making a section in the doc page for the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that would be good for me as well!

@patrickvonplaten patrickvonplaten changed the title [WIP][DocTests Speech] Add doc tests for all speech models [DocTests Speech] Add doc tests for all speech models Jan 26, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great to have those running again!

- "tests/**"
- ".github/**"
- "templates/**"
types: [assigned, opened, synchronize, reopened]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to remove assigned from those @LysandreJik, can you confirm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but Patrick mentioned that he'll remove that from the PR before merging as it only needs to run on schedule

... "hf-internal-testing/tiny-random-unispeech-sat"
... )
>>> model = UniSpeechForPreTraining.from_pretrained("microsoft/unispeech-large-1500h-cv")
>>> # TODO: Add full pretraining example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the doctest syntax while waiting for a tested example, but I don't think it's good to remove the example from the doc entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is that it was just copy-pasted and never worked.

>>> # for contrastive loss training model should be put into train mode
>>> model.train()
>>> loss = model(input_values, mask_time_indices=mask_time_indices).loss
>>> # TODO: Add full pretraining example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem is that it was just copy-pasted and never worked.

Comment on lines +55 to +57
- name: Clean files after doctests
run: |
python utils/prepare_for_doc_test.py src docs --remove_new_line
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be run if the previous instruction fails (but not sure we care)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but yeah I'm also not sure if it matters really as it's the only test run in this suite

diff = clean_code != code
if diff:
print(f"Overwriting content of {code_file}.")
with open(code_file, "w", encoding="utf-8", newline="\n") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document somewhere the instruction that has to be run before/after testing locally then.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you for working on it @patrickvonplaten

- "tests/**"
- ".github/**"
- "templates/**"
types: [assigned, opened, synchronize, reopened]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but Patrick mentioned that he'll remove that from the PR before merging as it only needs to run on schedule

utils/prepare_for_doc_test.py Show resolved Hide resolved
@@ -35,8 +35,16 @@ jobs:
run: |
apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip
pip install .[dev]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert this once we have a stable dev docker image

@@ -19,7 +19,7 @@ env:

jobs:
run_doctests:
runs-on: [self-hosted, docker-gpu, single-gpu]
runs-on: [self-hosted, docker-gpu-test, single-gpu]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving on -test for now

@patrickvonplaten patrickvonplaten merged commit 9f831bd into huggingface:master Jan 27, 2022
@patrickvonplaten patrickvonplaten deleted the start_cleaning_doc_tests branch January 27, 2022 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants