Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
de88ad1
Add basic ansible configuration for bazel and installing apt pkgs
Jan 18, 2023
24e4baf
Add apt repos and some signing keys
Jan 18, 2023
a514d19
Add pip packages
mateuszlewko Jan 19, 2023
4e8cfb2
Don't use apt-key for adding repo keys
mateuszlewko Jan 19, 2023
1be9ba7
Don't use apt-key for adding repo keys
mateuszlewko Jan 19, 2023
9613b09
Add fetch_srcs role for fetching PyTorch and XLA repos
mateuszlewko Jan 20, 2023
3187aed
Add patches application
mateuszlewko Jan 20, 2023
6f59512
Add role for compling PyTorch and XLA sources
mateuszlewko Jan 20, 2023
50eafb3
WIP in build srcs
mateuszlewko Jan 20, 2023
f211dce
Succesfully build XLA
mateuszlewko Jan 23, 2023
0890c58
Clean-up and merge env variables; Separate stage; arch and accelerato…
mateuszlewko Jan 23, 2023
d9591e5
Fix passing env variables; Add missing XLA_SANDBOX_BUILD
mateuszlewko Jan 23, 2023
50cf442
Rename playbooks dir to ansible
mateuszlewko Jan 23, 2023
56dfbe2
Add cloudbuild file that uses ansible playbook
mateuszlewko Jan 23, 2023
a319f14
Add 'signed-by' to all apt repos
mateuszlewko Jan 23, 2023
31d6c22
Add placeholders for release config vars
mateuszlewko Jan 23, 2023
d298a41
Add release build
mateuszlewko Jan 23, 2023
c59b3cb
Disable verbose ansible in docker build
mateuszlewko Jan 23, 2023
96fbb5f
Add ansible config file and enable displaying tasks duration
mateuszlewko Jan 24, 2023
2be17cd
Add TORCH_XLA_VERSION env variable, which is used when building XLA
mateuszlewko Jan 24, 2023
642bc8e
Disable Ansible warnings about no inventory; Force git clone; revert …
mateuszlewko Jan 24, 2023
28fa5cd
Add basic tests for bazel and fetch_srcs roles
mateuszlewko Jan 24, 2023
bcda96e
Add import tests for build_srcs
mateuszlewko Jan 24, 2023
ea4d548
Set git versions for which imports work
mateuszlewko Jan 27, 2023
a79495d
Pass env vars to imports test
mateuszlewko Jan 27, 2023
271c90c
Add configure_env role and apply minor cleanup
mateuszlewko Jan 30, 2023
b88d169
Don't replace existing env var entries in /etc/environment
mateuszlewko Jan 30, 2023
f5ba73f
Merge branch 'pytorch:master' into master
mateuszlewko Jan 30, 2023
b6892c1
Move ansible dir to /docker/experimental
mateuszlewko Jan 30, 2023
074bd49
Remove vars_prompt so that the playbook is not interactive
mateuszlewko Jan 31, 2023
e673729
Shorten variable validation error message
mateuszlewko Jan 31, 2023
281c4aa
Add readme file; cleanup some variables
mateuszlewko Jan 31, 2023
c7ded27
Change git revisions to head
mateuszlewko Jan 31, 2023
c053a5b
Remove variable from task name that's not substituted
mateuszlewko Jan 31, 2023
0d6c8a7
Fix link formatting in README.md
mateuszlewko Feb 9, 2023
a5d36de
Append env variables to bashrc and zshrc instead of /etc/environment
mateuszlewko Feb 15, 2023
666df60
Bump cuda packages version; add sympy
mateuszlewko Feb 15, 2023
13b36a3
Downgrade to clang-10
mateuszlewko Feb 15, 2023
00ce2bf
Remove libomp5
mateuszlewko Feb 15, 2023
7124a8c
Set correct version for libcudnn8
mateuszlewko Feb 15, 2023
fb0b9f4
Merge branch 'pytorch:master' into ansible
mateuszlewko Feb 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docker/experimental/ansible/.ansible-lint
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
# .ansible-lint

profile: moderate
skip_list:
- schema[tasks]
58 changes: 58 additions & 0 deletions docker/experimental/ansible/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Ansible playbook

This ansible playbook will perform the following actions on the localhost:
* install required pip and apt packages, depending on the specified stage,
architecture and accelerator (see [apt.yaml](config/apt.yaml) and
[pip.yaml](config/pip.yaml)).
* fetch bazel (version configured in [vars.yaml](config/vars.yaml)),
* fetch PyTorch and XLA sources at master (or specific revisions,
see role `fetch_srcs` in [playbook.yaml](playbook.yaml)).
* set required environment variables (see [env.yaml](config/env.yaml)),
* build and install PyTorch and XLA wheels,
* apply infrastructure tests (see `*/tests.yaml` files in [roles](roles)).

## Prerequisites

* Python 3.8+
* Ansible. Install with `pip install ansible`.

## Running

The playbook requires passing explicitly 3 variables that configure playbook
behavior (installed pip/apt packages and set environment variables):
* `stage`: build or release. Different packages are installed depending on
the chosen stage.
* `arch`: aarch64 or amd64. Architecture of the built image and wheels.
* `accelerator`: tpu or cuda. Available accelerator.

The variables can be passed through `-e` flag: `-e "<var>=<value>"`.

Example: `ansible-playbook playbook.yaml -e "stage=build arch=amd64 accelerator=tpu"`

## Config structure

The playbook configuration is split into 4 files, per each logical system.
The configuration is simply loaded as playbook variables which are then passed
to specific roles and tasks.
Only variables in [config/env.yaml](config/env.yaml) are passed as env variables.

* [apt.yaml](config/apt.yaml) - specifies apt packages for each stage and
architecture or accelerator.
Packages shared between all architectures and accelerators in a given stage
are specified in `*_common`. They are appended to any architecture specific list.

This config also contains a list of required apt repos and signing keys.
These variables are mainly consumed by the [install_deps](roles/install_deps/tasks/main.yaml) role.

* [pip.yaml](config/pip.yaml) - similarly to apt.yaml, lists pip packages per stage and arch / accelerator.
In both pip and apt config files stage and and arch / accelerator are
concatenated together and specified under one key (e.g. build_amd64, release_tpu).

* [env.yaml](config/env.yaml) - contains Ansible variables that are passed as env variables when
building PyTorch and XLA (`build_env`). Variables in `release_env` are saved in `/etc/environment` (executed for the `release` stage).

* [vars.yaml](config/vars.yaml) - Ansible variables used in other config files and throughout the playbook.
Not associated with any particular system.

Variables from these config files are dynamically loaded (during playbook execution),
see [playbook.yaml](playbook.yaml).
14 changes: 14 additions & 0 deletions docker/experimental/ansible/ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# See https://docs.ansible.com/ansible/latest/reference_appendices/config.html
# for various configuration options.

[defaults]
# Displays tasks execution duration.
callbacks_enabled = profile_tasks
# The playbooks is only run on the implicit localhost.
# Silence warning about empty hosts inventory.
localhost_warning = False

[inventory]
# Silence warning about no inventory.
# This option is available since Ansible 2.14 (available only with Python 3.9+).
inventory_unparsed_warning = False
61 changes: 61 additions & 0 deletions docker/experimental/ansible/config/apt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Contains lists of apt packages for each stage (build|release) and arch or accelerator.
apt:
pkgs:
build_common:
- ccache
- curl
- git
- gnupg
- libopenblas-dev
- ninja-build
- procps
- python3-pip
- rename
- vim
- wget

build_cuda:
- cuda-libraries-11-8
- cuda-toolkit-11-8
- cuda-minimal-build-11-8
- libcudnn8=8.8.0.121-1+cuda11.8
- libcudnn8-dev=8.8.0.121-1+cuda11.8

build_amd64:
- "clang-{{ clang_version }}"

build_aarch64:
- scons
- gcc-10
- g++-10

release_common:
- curl
- git
- gnupg
- google-cloud-cli
- libgomp1
- libopenblas-base
- patch

release_cuda:
- cuda-libraries-11-8
- cuda-minimal-build-11-8
- libcudnn8=8.8.0.121-1+cuda11.8

# Specify objects with string fields `url` and `keyring`.
# The keyring path should start with /usr/share/keyrings/ for debian and ubuntu.
signing_keys:
- url: https://apt.llvm.org/llvm-snapshot.gpg.key
keyring: /usr/share/keyrings/llvm.pgp
- url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
keyring: /usr/share/keyrings/cloud.google.gpg
- url: "https://developer.download.nvidia.com/compute/cuda/repos/{{ cuda_repo }}/x86_64/3bf863cc.pub"
keyring: /usr/share/keyrings/cuda.pgp

repos:
# signed-by path should match the corresponding keyring path above.
- "deb [signed-by=/usr/share/keyrings/llvm.pgp] http://apt.llvm.org/{{ llvm_debian_repo }}/ llvm-toolchain-{{ llvm_debian_repo }}-{{ clang_version }} main"
- "deb-src [signed-by=/usr/share/keyrings/llvm.pgp] http://apt.llvm.org/{{ llvm_debian_repo }}/ llvm-toolchain-{{ llvm_debian_repo }}-{{ clang_version }} main"
- "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main"
- "deb [signed-by=/usr/share/keyrings/cuda.pgp] https://developer.download.nvidia.com/compute/cuda/repos/{{ cuda_repo }}/x86_64/ /"
42 changes: 42 additions & 0 deletions docker/experimental/ansible/config/env.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Variables that will be stored in /etc/environment file for the release stage.
# They'll be accessible for all processes on the host.
release_env:
common:
CC: "clang-{{ clang_version }}"
CXX: "clang++-{{ clang_version }}"
LD_LIBRARY_PATH: "$LD_LIBRARY_PATH:/usr/local/lib"

tpu:
ACCELERATOR: tpu
TPUVM_MODE: 1

cuda:
TF_CUDA_COMPUTE_CAPABILITIES: 7.0,7.5,8.0
XLA_CUDA: 1

# Variables that will be passed to shell environment only for building PyTorch and XLA libs.
build_env:
common:
LD_LIBRARY_PATH: "$LD_LIBRARY_PATH:/usr/local/lib"
# Set explicitly to 0 as setup.py defaults this flag to true if unset.
BUILD_CPP_TESTS: 0
CC: "clang-{{ clang_version }}"
CXX: "clang++-{{ clang_version }}"
PYTORCH_BUILD_NUMBER: 1
TORCH_XLA_VERSION: "{{ package_version }}"
PYTORCH_BUILD_VERSION: "{{ package_version }}"
XLA_SANDBOX_BUILD: 1

amd64:
ARCH: amd64

aarch64:

cuda:
TF_CUDA_COMPUTE_CAPABILITIES: 7.0,7.5,8.0
XLA_CUDA: 1

tpu:
ACCELERATOR: tpu
TPUVM_MODE: 1

53 changes: 53 additions & 0 deletions docker/experimental/ansible/config/pip.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Contains lists of pip packages for each stage (build|release) and arch or accelerator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about any conda deps in ansible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed conda from the experimental build to simplify the process and integrate better with Bazel in the future, which has support for pip packages

pip:
pkgs:
# Shared between all architectures and accelerators for the build stage.
build_common:
- astunparse
- cffi
- cloud-tpu-client
- cmake
- coverage
- dataclasses
- expecttest==0.1.3
- future
- git-archive-all
- google-api-python-client
- google-cloud-storage
- hypothesis
- lark-parser
- ninja
- numpy
- oauth2client
- pyyaml
- requests
- setuptools
- six
- tensorboard
- tensorboardX
- tqdm
- typing
- typing_extensions
- sympy

build_amd64:
- mkl
- mkl-include

build_aarch64:

# Shared between all architectures and accelerators for the release stage.
release_common:
- numpy
- pyyaml
- mkl
- mkl-include

release_tpu:
- torch_xla[tpuvm]

# Packages that will be installed with the `--nodeps` flag.
pkgs_nodeps:
release_common:
- torchvision
- pillow
7 changes: 7 additions & 0 deletions docker/experimental/ansible/config/vars.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Used for fetching cuda from the right repo, see apt.yaml.
cuda_repo: ubuntu1804
# Used for fetching clang from the right repo, see apt.yaml.
llvm_debian_repo: buster
clang_version: 10
# PyTorch and PyTorch/XLA wheel versions.
package_version: 2.0
88 changes: 88 additions & 0 deletions docker/experimental/ansible/playbook.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
- name: "Install build dependencies"
hosts: localhost
connection: local

# The playbook requires passing 3 variables explicitly:
# - stage: build or release. Different packages are installed depending on
# the chosen stage.
# - arch: aarch64 or amd64. Architecture of the built image and wheels.
# - accelerator: tpu or cuda. Available accelerator.
pre_tasks:
- name: "Validate required variables"
ansible.builtin.assert:
that: "{{ lookup('ansible.builtin.vars', item.name) is regex(item.pattern) }}"
fail_msg: |
"Variable '{{ item.name }}' doesn't match pattern '{{ item.pattern }}'"
"Pass the required variable with: --e \"{{ item.name }}=<value>\""
loop:
- name: stage
pattern: ^(build|release)$
- name: arch
pattern: ^(aarch64|amd64)$
- name: accelerator
pattern: ^(tpu|cuda)$

- name: "Include vars from config files"
ansible.builtin.include_vars:
file: "config/{{ item }}"
loop:
# vars.yaml should be the first as other config files depend on it.
- vars.yaml
- apt.yaml
- pip.yaml
- env.yaml

roles:
- bazel

- role: install_deps
vars:
apt_keys: "{{ apt.signing_keys }}"

# If a variable (like `apt.pkgs.common`) is defined, but not set to
# anything it cannot be concatenated with a list.
# Use `v | default([], true)` to set `v` to an empty array if it evaluates to false.
# See https://jinja.palletsprojects.com/en/3.0.x/templates/#jinja-filters.default.
apt_pkgs: "{{
apt.pkgs[stage + '_common'] | default([], true) +
apt.pkgs[stage + '_' + arch] | default([], true) +
apt.pkgs[stage + '_' + accelerator] | default([], true)
}}"

apt_repos: "{{ apt.repos }}"

pip_pkgs: "{{
pip.pkgs[stage + '_common'] | default([], true) +
pip.pkgs[stage + '_' + arch] | default([], true) +
pip.pkgs[stage + '_' + accelerator] | default([], true)
}}"

pip_pkgs_nodeps: "{{
pip.pkgs_nodeps[stage + '_common'] | default([], true) +
pip.pkgs_nodeps[stage + '_' + arch] | default([], true) +
pip.pkgs_nodeps[stage + '_' + accelerator] | default([], true)
}}"

- role: fetch_srcs
vars:
src_root: "/src"
pytorch_git_rev: HEAD
xla_git_rev: HEAD

- role: build_srcs
vars:
src_root: "/src"
env_vars: "{{
build_env.common | default({}, true) |
combine(build_env[arch] | default({}, true)) |
combine(build_env[accelerator] | default({}, true))
}}"

- role: configure_env
vars:
env_vars: "{{
release_env.common | default({}, true) |
combine(release_env[arch] | default({}, true)) |
combine(release_env[accelerator] | default({}, true))
}}"
when: stage == "release"
1 change: 1 addition & 0 deletions docker/experimental/ansible/roles/bazel/defaults/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
bazelisk_version: 1.15.0
10 changes: 10 additions & 0 deletions docker/experimental/ansible/roles/bazel/tasks/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
- name: "Download bazelisk v{{ bazelisk_version }}"
ansible.builtin.get_url:
url: "https://github.com/bazelbuild/bazelisk/releases/download/v{{ bazelisk_version }}/bazelisk-linux-amd64"
dest: /usr/local/bin/bazel
mode: 'u=rxw,g=rw,o=r'

- name: "Tests"
include_tasks: tests.yaml
tags:
- tests
3 changes: 3 additions & 0 deletions docker/experimental/ansible/roles/bazel/tasks/tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- name: "Bazel --version runs succesfully"
ansible.builtin.command:
cmd: bazel --version
Loading