-
Notifications
You must be signed in to change notification settings - Fork 559
Use Ansible for building wheels and provisioning docker images. #4531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
de88ad1
Add basic ansible configuration for bazel and installing apt pkgs
24e4baf
Add apt repos and some signing keys
a514d19
Add pip packages
mateuszlewko 4e8cfb2
Don't use apt-key for adding repo keys
mateuszlewko 1be9ba7
Don't use apt-key for adding repo keys
mateuszlewko 9613b09
Add fetch_srcs role for fetching PyTorch and XLA repos
mateuszlewko 3187aed
Add patches application
mateuszlewko 6f59512
Add role for compling PyTorch and XLA sources
mateuszlewko 50eafb3
WIP in build srcs
mateuszlewko f211dce
Succesfully build XLA
mateuszlewko 0890c58
Clean-up and merge env variables; Separate stage; arch and accelerato…
mateuszlewko d9591e5
Fix passing env variables; Add missing XLA_SANDBOX_BUILD
mateuszlewko 50cf442
Rename playbooks dir to ansible
mateuszlewko 56dfbe2
Add cloudbuild file that uses ansible playbook
mateuszlewko a319f14
Add 'signed-by' to all apt repos
mateuszlewko 31d6c22
Add placeholders for release config vars
mateuszlewko d298a41
Add release build
mateuszlewko c59b3cb
Disable verbose ansible in docker build
mateuszlewko 96fbb5f
Add ansible config file and enable displaying tasks duration
mateuszlewko 2be17cd
Add TORCH_XLA_VERSION env variable, which is used when building XLA
mateuszlewko 642bc8e
Disable Ansible warnings about no inventory; Force git clone; revert …
mateuszlewko 28fa5cd
Add basic tests for bazel and fetch_srcs roles
mateuszlewko bcda96e
Add import tests for build_srcs
mateuszlewko ea4d548
Set git versions for which imports work
mateuszlewko a79495d
Pass env vars to imports test
mateuszlewko 271c90c
Add configure_env role and apply minor cleanup
mateuszlewko b88d169
Don't replace existing env var entries in /etc/environment
mateuszlewko f5ba73f
Merge branch 'pytorch:master' into master
mateuszlewko b6892c1
Move ansible dir to /docker/experimental
mateuszlewko 074bd49
Remove vars_prompt so that the playbook is not interactive
mateuszlewko e673729
Shorten variable validation error message
mateuszlewko 281c4aa
Add readme file; cleanup some variables
mateuszlewko c7ded27
Change git revisions to head
mateuszlewko c053a5b
Remove variable from task name that's not substituted
mateuszlewko 0d6c8a7
Fix link formatting in README.md
mateuszlewko a5d36de
Append env variables to bashrc and zshrc instead of /etc/environment
mateuszlewko 666df60
Bump cuda packages version; add sympy
mateuszlewko 13b36a3
Downgrade to clang-10
mateuszlewko 00ce2bf
Remove libomp5
mateuszlewko 7124a8c
Set correct version for libcudnn8
mateuszlewko fb0b9f4
Merge branch 'pytorch:master' into ansible
mateuszlewko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| --- | ||
| # .ansible-lint | ||
|
|
||
| profile: moderate | ||
| skip_list: | ||
| - schema[tasks] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # Ansible playbook | ||
|
|
||
| This ansible playbook will perform the following actions on the localhost: | ||
| * install required pip and apt packages, depending on the specified stage, | ||
| architecture and accelerator (see [apt.yaml](config/apt.yaml) and | ||
| [pip.yaml](config/pip.yaml)). | ||
| * fetch bazel (version configured in [vars.yaml](config/vars.yaml)), | ||
| * fetch PyTorch and XLA sources at master (or specific revisions, | ||
| see role `fetch_srcs` in [playbook.yaml](playbook.yaml)). | ||
| * set required environment variables (see [env.yaml](config/env.yaml)), | ||
| * build and install PyTorch and XLA wheels, | ||
| * apply infrastructure tests (see `*/tests.yaml` files in [roles](roles)). | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| * Python 3.8+ | ||
| * Ansible. Install with `pip install ansible`. | ||
|
|
||
| ## Running | ||
|
|
||
| The playbook requires passing explicitly 3 variables that configure playbook | ||
| behavior (installed pip/apt packages and set environment variables): | ||
| * `stage`: build or release. Different packages are installed depending on | ||
| the chosen stage. | ||
| * `arch`: aarch64 or amd64. Architecture of the built image and wheels. | ||
| * `accelerator`: tpu or cuda. Available accelerator. | ||
|
|
||
| The variables can be passed through `-e` flag: `-e "<var>=<value>"`. | ||
|
|
||
| Example: `ansible-playbook playbook.yaml -e "stage=build arch=amd64 accelerator=tpu"` | ||
|
|
||
| ## Config structure | ||
|
|
||
| The playbook configuration is split into 4 files, per each logical system. | ||
| The configuration is simply loaded as playbook variables which are then passed | ||
| to specific roles and tasks. | ||
| Only variables in [config/env.yaml](config/env.yaml) are passed as env variables. | ||
|
|
||
| * [apt.yaml](config/apt.yaml) - specifies apt packages for each stage and | ||
| architecture or accelerator. | ||
| Packages shared between all architectures and accelerators in a given stage | ||
| are specified in `*_common`. They are appended to any architecture specific list. | ||
|
|
||
| This config also contains a list of required apt repos and signing keys. | ||
| These variables are mainly consumed by the [install_deps](roles/install_deps/tasks/main.yaml) role. | ||
|
|
||
| * [pip.yaml](config/pip.yaml) - similarly to apt.yaml, lists pip packages per stage and arch / accelerator. | ||
| In both pip and apt config files stage and and arch / accelerator are | ||
| concatenated together and specified under one key (e.g. build_amd64, release_tpu). | ||
|
|
||
| * [env.yaml](config/env.yaml) - contains Ansible variables that are passed as env variables when | ||
| building PyTorch and XLA (`build_env`). Variables in `release_env` are saved in `/etc/environment` (executed for the `release` stage). | ||
|
|
||
| * [vars.yaml](config/vars.yaml) - Ansible variables used in other config files and throughout the playbook. | ||
| Not associated with any particular system. | ||
|
|
||
| Variables from these config files are dynamically loaded (during playbook execution), | ||
| see [playbook.yaml](playbook.yaml). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # See https://docs.ansible.com/ansible/latest/reference_appendices/config.html | ||
| # for various configuration options. | ||
|
|
||
| [defaults] | ||
| # Displays tasks execution duration. | ||
| callbacks_enabled = profile_tasks | ||
| # The playbooks is only run on the implicit localhost. | ||
| # Silence warning about empty hosts inventory. | ||
| localhost_warning = False | ||
|
|
||
| [inventory] | ||
| # Silence warning about no inventory. | ||
| # This option is available since Ansible 2.14 (available only with Python 3.9+). | ||
| inventory_unparsed_warning = False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Contains lists of apt packages for each stage (build|release) and arch or accelerator. | ||
| apt: | ||
| pkgs: | ||
| build_common: | ||
| - ccache | ||
| - curl | ||
| - git | ||
| - gnupg | ||
| - libopenblas-dev | ||
| - ninja-build | ||
| - procps | ||
| - python3-pip | ||
| - rename | ||
| - vim | ||
| - wget | ||
|
|
||
| build_cuda: | ||
| - cuda-libraries-11-8 | ||
| - cuda-toolkit-11-8 | ||
| - cuda-minimal-build-11-8 | ||
| - libcudnn8=8.8.0.121-1+cuda11.8 | ||
| - libcudnn8-dev=8.8.0.121-1+cuda11.8 | ||
|
|
||
| build_amd64: | ||
| - "clang-{{ clang_version }}" | ||
|
|
||
| build_aarch64: | ||
| - scons | ||
| - gcc-10 | ||
| - g++-10 | ||
|
|
||
| release_common: | ||
| - curl | ||
| - git | ||
| - gnupg | ||
| - google-cloud-cli | ||
| - libgomp1 | ||
| - libopenblas-base | ||
| - patch | ||
|
|
||
| release_cuda: | ||
| - cuda-libraries-11-8 | ||
| - cuda-minimal-build-11-8 | ||
| - libcudnn8=8.8.0.121-1+cuda11.8 | ||
|
|
||
| # Specify objects with string fields `url` and `keyring`. | ||
| # The keyring path should start with /usr/share/keyrings/ for debian and ubuntu. | ||
| signing_keys: | ||
| - url: https://apt.llvm.org/llvm-snapshot.gpg.key | ||
| keyring: /usr/share/keyrings/llvm.pgp | ||
| - url: https://packages.cloud.google.com/apt/doc/apt-key.gpg | ||
| keyring: /usr/share/keyrings/cloud.google.gpg | ||
| - url: "https://developer.download.nvidia.com/compute/cuda/repos/{{ cuda_repo }}/x86_64/3bf863cc.pub" | ||
| keyring: /usr/share/keyrings/cuda.pgp | ||
|
|
||
| repos: | ||
| # signed-by path should match the corresponding keyring path above. | ||
| - "deb [signed-by=/usr/share/keyrings/llvm.pgp] http://apt.llvm.org/{{ llvm_debian_repo }}/ llvm-toolchain-{{ llvm_debian_repo }}-{{ clang_version }} main" | ||
| - "deb-src [signed-by=/usr/share/keyrings/llvm.pgp] http://apt.llvm.org/{{ llvm_debian_repo }}/ llvm-toolchain-{{ llvm_debian_repo }}-{{ clang_version }} main" | ||
| - "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | ||
| - "deb [signed-by=/usr/share/keyrings/cuda.pgp] https://developer.download.nvidia.com/compute/cuda/repos/{{ cuda_repo }}/x86_64/ /" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Variables that will be stored in /etc/environment file for the release stage. | ||
| # They'll be accessible for all processes on the host. | ||
| release_env: | ||
| common: | ||
| CC: "clang-{{ clang_version }}" | ||
| CXX: "clang++-{{ clang_version }}" | ||
| LD_LIBRARY_PATH: "$LD_LIBRARY_PATH:/usr/local/lib" | ||
|
|
||
| tpu: | ||
| ACCELERATOR: tpu | ||
| TPUVM_MODE: 1 | ||
|
|
||
| cuda: | ||
| TF_CUDA_COMPUTE_CAPABILITIES: 7.0,7.5,8.0 | ||
| XLA_CUDA: 1 | ||
|
|
||
| # Variables that will be passed to shell environment only for building PyTorch and XLA libs. | ||
| build_env: | ||
| common: | ||
| LD_LIBRARY_PATH: "$LD_LIBRARY_PATH:/usr/local/lib" | ||
| # Set explicitly to 0 as setup.py defaults this flag to true if unset. | ||
| BUILD_CPP_TESTS: 0 | ||
| CC: "clang-{{ clang_version }}" | ||
| CXX: "clang++-{{ clang_version }}" | ||
| PYTORCH_BUILD_NUMBER: 1 | ||
| TORCH_XLA_VERSION: "{{ package_version }}" | ||
| PYTORCH_BUILD_VERSION: "{{ package_version }}" | ||
| XLA_SANDBOX_BUILD: 1 | ||
|
|
||
| amd64: | ||
| ARCH: amd64 | ||
|
|
||
| aarch64: | ||
|
|
||
| cuda: | ||
| TF_CUDA_COMPUTE_CAPABILITIES: 7.0,7.5,8.0 | ||
| XLA_CUDA: 1 | ||
|
|
||
| tpu: | ||
| ACCELERATOR: tpu | ||
| TPUVM_MODE: 1 | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Contains lists of pip packages for each stage (build|release) and arch or accelerator. | ||
| pip: | ||
| pkgs: | ||
| # Shared between all architectures and accelerators for the build stage. | ||
| build_common: | ||
| - astunparse | ||
| - cffi | ||
| - cloud-tpu-client | ||
| - cmake | ||
| - coverage | ||
| - dataclasses | ||
| - expecttest==0.1.3 | ||
| - future | ||
| - git-archive-all | ||
| - google-api-python-client | ||
| - google-cloud-storage | ||
| - hypothesis | ||
| - lark-parser | ||
| - ninja | ||
| - numpy | ||
| - oauth2client | ||
| - pyyaml | ||
| - requests | ||
| - setuptools | ||
| - six | ||
| - tensorboard | ||
| - tensorboardX | ||
| - tqdm | ||
| - typing | ||
| - typing_extensions | ||
| - sympy | ||
|
|
||
| build_amd64: | ||
| - mkl | ||
| - mkl-include | ||
|
|
||
| build_aarch64: | ||
|
|
||
| # Shared between all architectures and accelerators for the release stage. | ||
| release_common: | ||
| - numpy | ||
| - pyyaml | ||
| - mkl | ||
| - mkl-include | ||
|
|
||
| release_tpu: | ||
| - torch_xla[tpuvm] | ||
|
|
||
| # Packages that will be installed with the `--nodeps` flag. | ||
| pkgs_nodeps: | ||
| release_common: | ||
| - torchvision | ||
| - pillow | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Used for fetching cuda from the right repo, see apt.yaml. | ||
| cuda_repo: ubuntu1804 | ||
| # Used for fetching clang from the right repo, see apt.yaml. | ||
| llvm_debian_repo: buster | ||
| clang_version: 10 | ||
| # PyTorch and PyTorch/XLA wheel versions. | ||
| package_version: 2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| - name: "Install build dependencies" | ||
| hosts: localhost | ||
| connection: local | ||
|
|
||
| # The playbook requires passing 3 variables explicitly: | ||
| # - stage: build or release. Different packages are installed depending on | ||
| # the chosen stage. | ||
| # - arch: aarch64 or amd64. Architecture of the built image and wheels. | ||
| # - accelerator: tpu or cuda. Available accelerator. | ||
| pre_tasks: | ||
| - name: "Validate required variables" | ||
| ansible.builtin.assert: | ||
| that: "{{ lookup('ansible.builtin.vars', item.name) is regex(item.pattern) }}" | ||
| fail_msg: | | ||
| "Variable '{{ item.name }}' doesn't match pattern '{{ item.pattern }}'" | ||
| "Pass the required variable with: --e \"{{ item.name }}=<value>\"" | ||
| loop: | ||
| - name: stage | ||
| pattern: ^(build|release)$ | ||
| - name: arch | ||
| pattern: ^(aarch64|amd64)$ | ||
| - name: accelerator | ||
| pattern: ^(tpu|cuda)$ | ||
|
|
||
| - name: "Include vars from config files" | ||
| ansible.builtin.include_vars: | ||
| file: "config/{{ item }}" | ||
| loop: | ||
| # vars.yaml should be the first as other config files depend on it. | ||
| - vars.yaml | ||
| - apt.yaml | ||
| - pip.yaml | ||
| - env.yaml | ||
|
|
||
| roles: | ||
| - bazel | ||
|
|
||
| - role: install_deps | ||
| vars: | ||
| apt_keys: "{{ apt.signing_keys }}" | ||
|
|
||
| # If a variable (like `apt.pkgs.common`) is defined, but not set to | ||
| # anything it cannot be concatenated with a list. | ||
| # Use `v | default([], true)` to set `v` to an empty array if it evaluates to false. | ||
| # See https://jinja.palletsprojects.com/en/3.0.x/templates/#jinja-filters.default. | ||
| apt_pkgs: "{{ | ||
| apt.pkgs[stage + '_common'] | default([], true) + | ||
| apt.pkgs[stage + '_' + arch] | default([], true) + | ||
| apt.pkgs[stage + '_' + accelerator] | default([], true) | ||
| }}" | ||
|
|
||
| apt_repos: "{{ apt.repos }}" | ||
|
|
||
| pip_pkgs: "{{ | ||
| pip.pkgs[stage + '_common'] | default([], true) + | ||
| pip.pkgs[stage + '_' + arch] | default([], true) + | ||
| pip.pkgs[stage + '_' + accelerator] | default([], true) | ||
| }}" | ||
|
|
||
| pip_pkgs_nodeps: "{{ | ||
| pip.pkgs_nodeps[stage + '_common'] | default([], true) + | ||
| pip.pkgs_nodeps[stage + '_' + arch] | default([], true) + | ||
| pip.pkgs_nodeps[stage + '_' + accelerator] | default([], true) | ||
| }}" | ||
|
|
||
| - role: fetch_srcs | ||
| vars: | ||
| src_root: "/src" | ||
| pytorch_git_rev: HEAD | ||
| xla_git_rev: HEAD | ||
|
|
||
| - role: build_srcs | ||
| vars: | ||
| src_root: "/src" | ||
| env_vars: "{{ | ||
| build_env.common | default({}, true) | | ||
| combine(build_env[arch] | default({}, true)) | | ||
| combine(build_env[accelerator] | default({}, true)) | ||
| }}" | ||
|
|
||
| - role: configure_env | ||
| vars: | ||
| env_vars: "{{ | ||
| release_env.common | default({}, true) | | ||
| combine(release_env[arch] | default({}, true)) | | ||
| combine(release_env[accelerator] | default({}, true)) | ||
| }}" | ||
| when: stage == "release" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| bazelisk_version: 1.15.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| - name: "Download bazelisk v{{ bazelisk_version }}" | ||
| ansible.builtin.get_url: | ||
| url: "https://github.com/bazelbuild/bazelisk/releases/download/v{{ bazelisk_version }}/bazelisk-linux-amd64" | ||
| dest: /usr/local/bin/bazel | ||
| mode: 'u=rxw,g=rw,o=r' | ||
|
|
||
| - name: "Tests" | ||
| include_tasks: tests.yaml | ||
| tags: | ||
| - tests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| - name: "Bazel --version runs succesfully" | ||
| ansible.builtin.command: | ||
| cmd: bazel --version |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about any conda deps in ansible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed conda from the experimental build to simplify the process and integrate better with Bazel in the future, which has support for
pippackages