Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare repo for auto-formatters #1546

Merged
merged 39 commits into from Feb 17, 2022
Merged

prepare repo for auto-formatters #1546

merged 39 commits into from Feb 17, 2022

Conversation

pmeier
Copy link
Contributor

@pmeier pmeier commented Jan 28, 2022

Addresses pytorch/data#169 (comment). Don't review yet. I'll explain everything as soon as the setup is done.

Copy link
Contributor Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is now fully configured. I intentionally did not yet push the changes that will be made by the auto-formatters since there are so many of them that this PR will get unreviewable afterwards. Let me know if you are happy with the configs and I'll pull the trigger as soon as this approved. I'll also add a paragraph the contributing guidelines when the configs are approved.

.circleci/config.yml.in Show resolved Hide resolved
.circleci/config.yml.in Show resolved Hide resolved
.flake8 Show resolved Hide resolved
.pre-commit-config.yaml Show resolved Hide resolved
.prettierignore Show resolved Hide resolved
.prettierignore Show resolved Hide resolved
setup.cfg Show resolved Hide resolved
tox.ini Show resolved Hide resolved
.circleci/config.yml.in Show resolved Hide resolved
@pmeier pmeier marked this pull request as ready for review January 28, 2022 09:34
@mthrok
Copy link
Contributor

mthrok commented Jan 28, 2022

Is this effort coordinated with #1534?

@parmeet
Copy link
Contributor

parmeet commented Jan 28, 2022

Is this effort coordinated with #1534?

cc: @abhinavarora

@abhinavarora
Copy link
Contributor

Is this effort coordinated with #1534?

No, this is not coordinated. @pmeier, I also have #1534, which is similar to what torchaudio and vision has and this is in sync with Meta's internal workflow. Would you like to merge the PRs together. I will be happy to address your feedback as well :-)

Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pmeier , I closed my PR #1534 in favor of this PR which is much more detailed. Overall your changes look great to me! I will verify them with our internal workflow as well and then approve the PR :-)

One question for you: Do you think we can use pre-commit to run clang-format as well like I was doing in #1534. That ways OSS contributors will not have to worry about it while making commits.

@pmeier
Copy link
Contributor Author

pmeier commented Jan 31, 2022

Hey @abhinavarora, I'm sorry I was not aware that there was another effort to add this functionality.

One question for you: Do you think we can use pre-commit to run clang-format as well like I was doing in #1534. That ways OSS contributors will not have to worry about it while making commits.

Yes, that is indeed a better way to deal with this. Let me adapt the PR.

Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Just added a nit comment. Feel free to merge the PR after addressing that. Thank you so much for enforcing high code standards in torchtext repo and making us at par with other PyTorch domain libraries.

CONTRIBUTING.md Outdated Show resolved Hide resolved
@pmeier
Copy link
Contributor Author

pmeier commented Feb 1, 2022

@abhinavarora

Feel free to merge the PR after addressing that.

I don't have rights, so you'll have to. Just to make sure we are not talking past each other. You want to merge this PR after the auto format changes are pushed right not before, right?

Regardless, it would be nice to confirm that nothing breaks by getting the unittest workflows to pass after I have pushed the changes. But the CI currently seems toasted. Are you sure you want to merge this PR with all the changes without being able to confirm everything is fine through CI? One thing we stumbled over when adding this to torchvision was that we implicitly relied on a custom import order in some modules. usort changed that and so importing torchvision failed.

@abhinavarora
Copy link
Contributor

Regardless, it would be nice to confirm that nothing breaks by getting the unittest workflows to pass after I have pushed the changes. But the CI currently seems toasted. Are you sure you want to merge this PR with all the changes without being able to confirm everything is fine through CI? One thing we stumbled over when adding this to torchvision was that we implicitly relied on a custom import order in some modules. usort changed that and so importing torchvision failed.

@pmeier Thank you for sharing your experience during torchvision. In that case, let's add the changes from the formatters in this PR. Once added, we can validate the changes on CI. Since some dataset tests on CI are toasted (due to some caching issues on CI), we can also validate the chnages by running pytest locally. I will be happy to do that on my end before merging the PR.

To summarize I could take on the following action items after you add the auto formatter changes:

  1. Run pytest locally to validate the changes.
  2. Import this PR to Meta's internal repo and validate there.
  3. Merge the PR

Please let me know if you have any concerns. Thank you again for this effort :-)

@abhinavarora
Copy link
Contributor

Regardless, it would be nice to confirm that nothing breaks by getting the unittest workflows to pass after I have pushed the changes. But the CI currently seems toasted. Are you sure you want to merge this PR with all the changes without being able to confirm everything is fine through CI? One thing we stumbled over when adding this to torchvision was that we implicitly relied on a custom import order in some modules. usort changed that and so importing torchvision failed.

@pmeier Thank you for sharing your experience during torchvision. In that case, let's add the changes from the formatters in this PR. Once added, we can validate the changes on CI. Since some dataset tests on CI are toasted (due to some caching issues on CI), we can also validate the chnages by running pytest locally. I will be happy to do that on my end before merging the PR.

To summarize I could take on the following action items after you add the auto formatter changes:

  1. Run pytest locally to validate the changes.
  2. Import this PR to Meta's internal repo and validate there.
  3. Merge the PR

Please let me know if you have any concerns. Thank you again for this effort :-)

Hi @pmeier, I validated this PR against meta's internal workflow and ran into some discrepancies. In order to address these, I took the following steps:

  1. Downgraded the version of usort in pre-commit to match Meta's internal workflow. This is the same as the version used in audio and vision.
  2. Added a .clang-format style file as present in vision and audio repos for C/C++ formatting to be in sync.

I also merged latest code and formatted it in this PR so that there are no merge conflicts. I will go ahead and now merge this PR. Thank you so much for your great work in onboarding us to high code formatting standards!

@pmeier
Copy link
Contributor Author

pmeier commented Feb 4, 2022

Downgraded the version of usort in pre-commit to match Meta's internal workflow. This is the same as the version used in audio and vision.

I don't know how Meta's internals work, but I think upgrading the internal workflows would be the way to go. The only reason torchvision is using usort==0.6.4 is that at the time of adding the hooks, this was the latest version. In fact, I have a PR open that upgrades it 1.0.1 pytorch/vision#5106. @NicolasHug wanted to merge this (soon?) after "fixing" the internal tooling.

There are two major improvements of 1.0.1 over 0.6.4:

  1. Multiple import statements that import from the same module will be merged into one.

  2. Imports inside an import statement are sorted lexiographically.

For example

from foo import baz
from foo import spam, ham
from foo import bar

will be turned into

from foo import bar, baz, ham, spam

@pmeier
Copy link
Contributor Author

pmeier commented Feb 4, 2022

There are two problems with clang-format

  1. Having never run run-clang-format.py myself, I was under the impression that it will fix the inconsistencies automatically. This is not the case since it will only show you the difference. The binary itself has a -i flag that enables this. So we should either change the wording in the contributing guide to indicate that the user needs to make the changes themselves or nuke the use of the Python script in favor of running the binary directly.

  2. Running the clang-format binary either directly or through the Python script is far from trivial. The binary is not statically linked and requires libtinfo.so.5. In the CI jobs, we simply do apt install libtinfo5, but what if your system (like mine) has no option to install this (outdated) version? My next step was trying to install ncurses=5 from conda-forge since it provides libtinfo.so.5 only to discover that python>=3.7 requires ncurses>=6. Since the upcoming release will drop Python 3.6 support, the contributor needs to have a separate environment just to run clang-format. To add insult to injury, you have to manually set LD_LIBRARY_PATH because libtinfo.so.5 is not found by default. After jumping through all these hoops the final blocker that I hit, that I still can't run the binary with the -i flag since

    libtinfo.so.5: no version information available (required by ./clang-format-linux64)

    Although I'm very much in favor of letting an auto formatter do its job rather than "abusing" it only as linter, I can't recommend going that route.

    In any way, we should greatly expand the contributing guide to explain the steps necessary to even run clang-format. Plus, since the binary is provided by Meta, could you ask the person responsible to statically link it so we don't have to deal with all this?

@NicolasHug
Copy link
Member

I have a PR open that upgrades it 1.0.1 pytorch/vision#5106. @NicolasHug wanted to merge this (soon?) after "fixing" the internal tooling

Yes you'll have to tweak another thing internally to use the latest ufmt version:
https://fb.workplace.com/groups/pyfmt/permalink/925830131387837/

@codecov
Copy link

codecov bot commented Feb 16, 2022

Codecov Report

Merging #1546 (8a12398) into main (8808e7e) will decrease coverage by 0.04%.
The diff coverage is 82.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1546      +/-   ##
==========================================
- Coverage   85.33%   85.28%   -0.05%     
==========================================
  Files          58       58              
  Lines        2496     2488       -8     
==========================================
- Hits         2130     2122       -8     
  Misses        366      366              
Impacted Files Coverage Δ
torchtext/experimental/datasets/raw/wmt14.py 21.66% <25.00%> (-1.29%) ⬇️
torchtext/data/utils.py 51.63% <45.00%> (ø)
torchtext/data/datasets_utils.py 74.77% <66.66%> (+0.11%) ⬆️
torchtext/data/functional.py 61.66% <66.66%> (ø)
...orchtext/experimental/datasets/raw/wmtnewscrawl.py 68.18% <66.66%> (ø)
torchtext/_download_hooks.py 53.48% <81.81%> (ø)
torchtext/nn/modules/multiheadattention.py 92.40% <83.33%> (ø)
torchtext/utils.py 81.06% <84.61%> (ø)
torchtext/models/roberta/bundler.py 88.33% <87.50%> (ø)
torchtext/models/roberta/modules.py 84.66% <87.50%> (-0.10%) ⬇️
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8808e7e...8a12398. Read the comment docs.

@abhinavarora
Copy link
Contributor

There are two problems with clang-format

1. Having never run `run-clang-format.py` myself, I was under the impression that it will fix the inconsistencies automatically. This is not the case since it will only show you the difference. The binary itself has a `-i` flag that enables this. So we should either change the wording in the contributing guide to indicate that the user needs to make the changes themselves or nuke the use of the Python script in favor of running the binary directly.

2. Running the `clang-format` binary either directly or through the Python script is far from trivial. The binary is not statically linked and requires `libtinfo.so.5`. In the CI jobs, we simply do `apt install libtinfo5`, but what if your system (like mine) has no option to install this (outdated) version? My next step was trying to install `ncurses=5` from conda-forge since it provides `libtinfo.so.5` only to discover that `python>=3.7` requires `ncurses>=6`. Since the upcoming release will drop Python 3.6 support, the contributor needs to have a separate environment just to run `clang-format`. To add insult to injury, you have to manually set `LD_LIBRARY_PATH` because `libtinfo.so.5` is not found by default. After jumping through all these hoops the final blocker that I hit, that I still can't run the binary with the `-i` flag since
   > libtinfo.so.5: no version information available (required by ./clang-format-linux64)
   
   
   Although I'm very much in favor of letting an auto formatter do its job rather than "abusing" it only as linter, I can't recommend going that route.
   In any way, we should greatly expand the contributing guide to explain the steps necessary to even run `clang-format`. Plus, since the binary is provided by Meta, could you ask the person responsible to statically link it so we don't have to deal with all this?

@pmeier you are right. I will create an option in run-clang-format.py to also format the files. As far as statically linking the binaries is concerned, let me reach out to the teams that own them to see if this an option!

@abhinavarora abhinavarora merged commit c31a400 into pytorch:main Feb 17, 2022
@pmeier pmeier deleted the pre-commit branch February 22, 2022 16:10
@pmeier
Copy link
Contributor Author

pmeier commented Feb 22, 2022

FYI: a recent update in the build dependencies of ufmt broke the pre-commit hook. You are going to see lint CI failures. We moved them out of the hooks temporarily in pytorch/vision#5454.

@parmeet
Copy link
Contributor

parmeet commented Feb 22, 2022

FYI: a recent update in the build dependencies of ufmt broke the pre-commit hook. You are going to see lint CI failures. We moved them out of the hooks temporarily in pytorch/vision#5454.

cc: @abhinavarora

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants