Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Release: From scratch builds #1739

Merged
merged 32 commits into from
Aug 26, 2022
Merged

Automate Release: From scratch builds #1739

merged 32 commits into from
Aug 26, 2022

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Jul 14, 2022

Description

This PR supports a manually triggerable Github Action that will make an official release. The way this works is assuming a code freeze, we will run a bunch of scripts on the master branch which will create binaries and upload them to official channels. Automates a total of 27 binaries.

  • Conda - 18 = 3 * 3 * 2 binaries, 2 binaries for py38 and py39 per package torchserve, torch-model-archiver and per OS windows, mac, ubuntu
  • Pypi - 3 binaries, one per package
  • Docker - 4 binaries, one for each of CPU and GPU and another two for their respective latest tags
  • Docker KFS - 2 binaries one for CPU and one for GPU

Each of these needs credentials that are stored as Github secrets, so anyone on the team can now trigger an official release and it's not restricted to Meta employees. To protect the team from fat finger errors, the workflow also requires 2 approvers to fully launch.

There are 2 schools of thought when it comes to promoting binaries

  1. This PR: Create binaries from scratch
  2. Promote nightly binaries to official: This is the workflow docker encourages to save build time and storage BUT it's finicky and not worth the trouble for Conda and Pypi - see Fully Automate Release: Via download, renaming and promotion #1735 as a draft for how this would work. The reason why I opted out of this design is because in Pypi we need to rename both the package version and name from torchserve-nightly to torchserve which was not easy to do and also because if you manually compress a file conda won't verify it. I made the ask to the conda team and haven't heard back Extend anaconda copy to allow name & version changes conda/conda#11613

This change also makes it so we cannot do hot fixes for a single binary, we need to bump version.txt and upload everything.

Fixes #1666

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

For testing we cannot actually upload official images so I have below 2 forms of testing. A dry_run test showing which commands will be ran and a test to upload to a staging environment called marksaroufim

End to end test: does everything except upload https://github.com/msaroufim/serve/actions/runs/2680194992 - I CANNOT actually upload real binaries to Pypi because those would publish real ones and I don't have access to test.pypi.org that's owned by AWS but you can see from the logs that this should work as expected

Conda & Pypi tests

dry_run

build.py

https://gist.github.com/msaroufim/c2dbac8739c4fcb5017af77b2db2fd0d

upload.py

(serve) ubuntu@ip-172-31-16-198:~/serve/binaries$ python upload.py --upload-pypi-packages --upload-conda-packages --dry_run --test-pypi
Executing command: twine upload /home/ubuntu/serve/binaries/../dist/* --username __token__ --repository-url https://test.pypi.org/legacy/
Executing command: twine upload /home/ubuntu/serve/binaries/../model-archiver/dist/* --username __token__ --repository-url https://test.pypi.org/legacy/
Executing command: twine upload /home/ubuntu/serve/binaries/../workflow-archiver/dist/* --username __token__ --repository-url https://test.pypi.org/legacy/
All packages uploaded to test.pypi.org successfully. Please install package as 'pip install -i https://test.pypi.org/simple/ <package-name>'

Staging environment

build.py

(serve) ubuntu@ip-172-31-16-198:~/serve/binaries/conda/output/linux-64$ ls
current_repodata.json      repodata.json                repodata_from_packages.json.bz2            torch-workflow-archiver-0.2.4-py38_0.tar.bz2  torchserve-0.6.0-py39_0.tar.bz2
current_repodata.json.bz2  repodata.json.bz2            torch-model-archiver-0.6.0-py38_0.tar.bz2  torch-workflow-archiver-0.2.4-py39_0.tar.bz2
index.html                 repodata_from_packages.json  torch-model-archiver-0.6.0-py39_0.tar.bz2  torchserve-0.6.0-py38_0.tar.bz2

upload.py

Conda binaries can be seen here when uploaded to the marksaroufim organization https://anaconda.org/marksaroufim/repo

PyPi binaries to test.pypi.org need permissions which I don't have quite yet but upload seems to be working OK: https://gist.github.com/msaroufim/6f6c053e9dbe0d5b097abf8d1054d512

Docker tests

Staging environment

Dry_run

For convenience I've also added new support for dry_run in our build.py and upload.py scripts

pytorch/torchserve docker test

(serve) ubuntu@ip-172-31-16-198:~/serve/docker$ python build_upload_release.py --dry_run                                                                                            
Namespace(dry_run=True, organization='pytorch') 
Executing command: ./build_image.sh -bt dev -t pytorch/torchserve:latest
Executing command: ./build_image.sh -bt dev -g -cv cu102 -t pytorch/torchserve:latest-gpu
Executing command: docker tag pytorch/torchserve:latest pytorch/torchserve:latest-cpu
Executing command: docker tag pytorch/torchserve:latest pytorch/torchserve:0.6.0-cpu
Executing command: docker tag pytorch/torchserve:latest-gpu pytorch/torchserve:0.6.0-gpu
Executing command: docker push pytorch/torchserve:latest
Executing command: docker push pytorch/torchserve:latest-cpu
Executing command: docker push pytorch/torchserve:latest-gpu
Executing command: docker push pytorch/torchserve:0.6.0-cpu
Executing command: docker push pytorch/torchserve:0.6.0-gpu

pytorch/torchserve-kfs docker test

Executing command: ./build_image.sh -t pytorch/torchserve-kfs:0.6.0
Executing command: ./build_image.sh -g -t pytorch/torchserve-kfs:0.6.0-gpu
Executing command: docker push pytorch/torchserve-kfs:0.6.0
Executing command: docker push pytorch/torchserve-kfs:0.6.0-gpu

Related changes

There were also some other changes that are due like

  • Deleting retag_binary.sh since we are no longer encouraging that workflow
  • Deleting pip/build_wheels.sh because that's been replaced by a function called build_dist_whl()
  • Created a helper function get_ts_version() which inspects the value of __version__ so I can append the correct version to docker image names
  • Refactored try_and_handle() helper function with dry_run support into a utils folder and changed most of the os.system() calls in build.py and upload.py

Manual approval

Manual approvals work by making the Github Action run in a specific environment, in this case it's called official-release and can only be run if triggered by either @lxning and @msaroufim - we can easily add more people as needed in Github settings. So if Mark triggers this workflow then Li should approve it and vice versa.

Screen Shot 2022-07-15 at 4 39 24 PM

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@msaroufim msaroufim requested a review from agunapal July 15, 2022 00:05
docker/docker_nightly.py Outdated Show resolved Hide resolved
@msaroufim msaroufim requested review from lxning and maaquib July 16, 2022 00:11
@msaroufim msaroufim requested a review from agunapal July 19, 2022 23:46
binaries/build.py Show resolved Hide resolved
binaries/upload.py Outdated Show resolved Hide resolved
@msaroufim msaroufim added the ci label Jul 20, 2022
@codecov
Copy link

codecov bot commented Aug 8, 2022

Codecov Report

Merging #1739 (2bfde0b) into master (696442b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1739   +/-   ##
=======================================
  Coverage   45.23%   45.23%           
=======================================
  Files          64       64           
  Lines        2602     2602           
  Branches       60       60           
=======================================
  Hits         1177     1177           
  Misses       1425     1425           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

binaries/build.py Outdated Show resolved Hide resolved
Co-authored-by: Aaqib <maaquib@gmail.com>
@msaroufim msaroufim requested a review from maaquib August 26, 2022 20:31
@msaroufim msaroufim merged commit b62e5d7 into master Aug 26, 2022
@msaroufim msaroufim deleted the scratch branch September 22, 2022 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fully automated release
3 participants