This pipeline glues together the **Model Training**, **Evaluation**, **Serving**, and **Data Operations** components. The ultimate goal is rapid development-to-deployment cycles with iterative improvements—this is the *Ops* in MLOps.

We'll provision resources and install tooling through infrastructure-as-code:

* **Terraform**: Manages our cloud infra declaratively.
* **Ansible**: Installs Kubernetes and Argo ecosystem tools.
* **Argo CD**: Enables GitOps-based continuous delivery.
* **Argo Workflows**: Powers the container-native orchestration of our ML pipelines.


Start by cloning the infrastructure repository

In [2]:
git clone --recurse-submodules https://github.com/ho1447/ML-SysOps_Project.git

Cloning into 'ML-SysOps_Project'...
remote: Enumerating objects: 269, done.[K
remote: Counting objects: 100% (79/79), done.[K
remote: Compressing objects: 100% (65/65), done.[K
remote: Total 269 (delta 37), reused 43 (delta 12), pack-reused 190 (from 1)[K
Receiving objects: 100% (269/269), 2.84 MiB | 3.49 MiB/s, done.
Resolving deltas: 100% (111/111), done.


The code structure in continous_X_pipeline has has the following structure :

    ├── tf
    │   └── kvm
    ├── ansible
    │   ├── general
    │   ├── pre_k8s
    │   ├── k8s
    │   ├── post_k8s
    │   └── argocd
    ├── k8s
    │   ├── platform
    │   ├── staging
    │   ├── canary
    │   └── production
    └── workflows

# 2. Setup Environment

### Install and configure Terraform

## Download a Terraform client

In [3]:
mkdir -p /work/.local/bin
wget https://releases.hashicorp.com/terraform/1.10.5/terraform_1.10.5_linux_amd64.zip
unzip -o -q terraform_1.10.5_linux_amd64.zip
mv terraform /work/.local/bin
rm terraform_1.10.5_linux_amd64.zip

--2025-05-13 23:47:10--  https://releases.hashicorp.com/terraform/1.10.5/terraform_1.10.5_linux_amd64.zip
Resolving releases.hashicorp.com (releases.hashicorp.com)... 18.238.171.62, 18.238.171.95, 18.238.171.54, ...
Connecting to releases.hashicorp.com (releases.hashicorp.com)|18.238.171.62|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 27714924 (26M) [application/zip]
Saving to: ‘terraform_1.10.5_linux_amd64.zip’


2025-05-13 23:47:10 (130 MB/s) - ‘terraform_1.10.5_linux_amd64.zip’ saved [27714924/27714924]



The Terraform CLI has been successfully installed to the following directory: `/work/.local/bin`. In order to execute Terraform commands directly from the terminal, we need to append this directory to our PATH environment variable. The PATH variable defines the set of directories the system searches to locate executable files.

In [4]:
export PATH=/work/.local/bin:$PATH

To verify that Terraform is properly installed and accessible, we can run the terraform command without any subcommands. If configured correctly, this should display the Terraform usage/help information.

In [5]:
terraform

Usage: terraform [global options] <subcommand> [args]

The available commands for execution are listed below.
The primary workflow commands are given first, followed by
less common or more advanced commands.

Main commands:
  init          Prepare your working directory for other commands
  validate      Check whether the configuration is valid
  plan          Show changes required by the current configuration
  apply         Create or update infrastructure
  destroy       Destroy previously-created infrastructure

All other commands:
  console       Try Terraform expressions at an interactive command prompt
  fmt           Reformat your configuration in the standard style
  force-unlock  Release a stuck lock on the current workspace
  get           Install or upgrade remote Terraform modules
  graph         Generate a Graphviz graph of the steps in an operation
  import        Associate existing infrastructure with a Terraform resource
  login         Obtain and save credentials for a

: 127

### Configure the PATH

It’s important to note that both the Terraform and Ansible executables have been installed to a non-standard path (/work/.local/bin), rather than a system-wide location. As such, to run commands like terraform or ansible-playbook from any location in the terminal, we must ensure this directory is included in the PATH.

In [6]:
# runs in Chameleon Jupyter environment
export PATH=/work/.local/bin:$PATH
export PYTHONUSERBASE=/work/.local

## Kubespray

For Kubernetes installation, we will utilize Kubespray — a robust collection of Ansible playbooks purpose-built for deploying production-ready Kubernetes clusters. Before proceeding, we'll confirm that all required dependencies for Kubespray are present and properly configured:

In [9]:
PYTHONUSERBASE=/work/.local pip install --user -r /home/jgr7704_nyu_edu/work/ML-SysOps_Project-main\ 2/continous_X_pipeline/ansible/k8s/kubespray/requirements.txt



To authenticate with the OpenStack environment, we’ll retrieve credentials via the Horizon dashboard:
Navigate to the Chameleon Cloud website.

- Select Experiment > KVM@TACC from the main menu.
- Log in if prompted.
- At the top left, ensure the appropriate project is selected in the project dropdown (e.g., “CHI-XXXXXX”).
  
From the left-hand sidebar:

- Expand the Identity section and click on Application Credentials.
- Click Create Application Credential.
- Provide the name mlops-lab.
- Set an appropriate Expiration Date.
- Click Create Application Credential.
- Download the resulting clouds.yaml file.

Finally, ensure the clouds.yaml file is placed in the correct location. Terraform will automatically look for this file in either of the following locations:
~/.config/openstack/
The current working directory where Terraform is executed
For simplicity and consistency, we will move the clouds.yaml file to the directory from which we intend to run our Terraform commands.

In [10]:
cp clouds.yaml /home/jgr7704_nyu_edu/work/ML-SysOps_Project-main\ 2/continous_X_pipeline/tf/kvm/clouds.yaml