Skip to content

Latest commit

 

History

History
215 lines (165 loc) · 13 KB

File metadata and controls

215 lines (165 loc) · 13 KB

development environment

This folder contains infrastructure as code (IaC) for a minimal development environment that supports swapping backend machines and associated GPU(s) to meet the demands of a given development task. It currently uses terraform with the google cloud platform provider and the google notebooks instance resource, but it could be adapted for other cloud platforms, providers, or resources (see the terraform documentation for further reference). It is assumed all commands are run from within this folder.

workflow

The expected workflow is to

  • set up a development machine with make up,
  • connect to the machine via the associated jupyter lab server accessible from the google cloud platform user interface for interactive use,
  • ssh to the machine from a terminal or IDE such as VS Code for library development,
  • toggle the machine off and on with make stop and make start, and
  • destroy associated compute resources with make down.

prerequisites

software

configuration

environment variables

  • set environment variables

    • dotenv-gen.sh is provided to help construct a .env file that is read by the Makefile to set environment variables. If you do not want to use dotenv-gen.sh, you can create a .env file as informally described, for example, in dotenv containing all variables written to .env at the end of dotenv-gen.sh and remove reference to dotenv-gen.sh in the Makefile

    • example .env file (see below for variables related to the startup script)

      TF_VAR_project=<GCP Project ID> # your google cloud platform project ID
      TF_VAR_email=<GCP account email address> # your google cloud platform account email address
      TF_VAR_credentials_file=~/.config/gcloud/application_default_credentials.json # local path to your application default credentials
      TF_VAR_notebooks_name=reponame-dev-notebook # name to assign to your development virtual machine
      GITHUB_USERNAME=username # github username associated to uploading startup scripts as github gists
      GITHUB_ORG_NAME=githuborg # name of the github org or user containing the github repository with code for development
      GITHUB_REPO_NAME=reponame # name of a github repository with a conda environment yaml file
      GITHUB_BRANCH_NAME=master # name of github repository branch to checkout
      GITHUB_REPO_CONDA_ENV_PATH_NAME=conda/environment-gpu.yml # path to conda environment yaml file in the github repository
      GH_PAT=ghp_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # github personal access token with repo scope
      GH_REPO=${GITHUB_USERNAME}/${GITHUB_REPO_NAME} # derived
      GCP_GACD=$(shell cat service-account-credentials.json) # GCP service account credentials
      GCP_SERVICE_ACCOUNT=111111111111-compute@developer.gserviceaccount.com # GCP service account email
      GCP_PROJECT_ID=${TF_VAR_project} # this is an alias for the gcp project ID
      GCP_REGION=us-central1 # the google cloud platform region for application deployment
      GCP_RUN_SERVICE_NAME=app-test # the service name for application deployment
      PKG_ARCHIVE_URL=us-central1-docker.pkg.dev/${GCP_PROJECT_ID}/${GITHUB_REPO_NAME} # the url to the GAR package
      PKG_APP=${GITHUB_REPO_NAME}app # the package name containing the application
      PKG_IMAGE_TAG=latest # the package tag to deploy
      MLFLOW_TRACKING_URI=https://server.mlflow # the url to the mlflow tracking server
      MLFLOW_TRACKING_USERNAME=username # the username for the mlflow tracking server
      MLFLOW_TRACKING_PASSWORD=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # the key for the mlflow tracking server
      TF_VAR_post_startup_script_url=https://gist.githubusercontent.com/githubusername/b6c8cd158b00f99d21511a905cc7626a/raw/post-startup-script-dev-notebook.sh # publicly accessible URL to a startup script
      GITHUB_STARTUP_SCRIPT_GIST_ID=b6c8cd158b00f99d21511a905cc7626a # the github gist ID if you would like to use a github gist
    • set variables using pass or manually

      • execute pass insert github_username

      • complete the same process for gcp_credentials_file, gcp_email, gcp_project, gcp_notebooks_name, github_org, github_repo, github_branch, and github_repo_conda_env_path

      • gcp_credentials_file contains the path to appication default credentials. The most common value is ~/.config/gcloud/application_default_credentials.json

      • check these are all defined with $ pass

        $ pass
        Password Store
        ├── gcp_credentials_file
        ├── gcp_email
        ├── gcp_project
        ├── gcp_notebooks_name
        ├── github_org
        ├── github_repo
        ├── github_branch
        ├── github_repo_conda_env_path
        └── github_username
  • if there is a variable you would like to set that is not currently exposed in the environment, review/edit terraform.tfvars

    • you can optionally set parameters not currently read from environment variables in this file
    • for example, you may want to set the machine type, accelerator/GPU type, disk size, etc

startup script

  • edit/generate startup script

    • review/edit startup-script-gen.sh

      • this script is executed by default at the top level of the Makefile to set variables and upload post-startup-script.sh to a publicly accessible location for consumption by the virtual machine. A copy of the latter will be downloaded to and executed from the path /opt/c2d/post_start.sh on the remote machine.

      • if you would like to avoid using this script, add values for the following variables to .env and comment reference to startup-script-gen.sh in the Makefile

        TF_VAR_post_startup_script_url=https://gist.githubusercontent.com/githubusername/b6c8cd158b00f99d21511a905cc7626a/raw/post-startup-script-dev-notebook.sh # publicly accessible URL to a startup script
        GITHUB_STARTUP_SCRIPT_GIST_ID=b6c8cd158b00f99d21511a905cc7626a # the github gist ID if you would like to use a github gist
    • edit template-post-startup-script.sh

  • Uploading multiple revisions of the startup script to an associated github gist in succession may cause it to get out of sync with the github server cache. You may find it helpful to run

    gh gist list
    make -n delete_gist
    make delete_gist

    to refresh the github gist ID associated to your startup script. If you are confident in how this works in your environment, you can likely just run make delete_gist.

test

  • when the requirements above are satisfied, make test will do the following
    • upload post-startup-script-$(TF_VAR_notebooks_name).sh to github gist
    • print TF_VAR* and GITHUB* environment variables

usage

Makefile

The primary interface is via the Makefile, which is being used here as a modular collection of short shell scripts rather than as a build system. You can fill environment variables and print each command prior to running with make -n <target> such as make -n up. Please see GNU make for further reference. The primary targets are

make up - create -OR- update the instance
make stop - stop the instance
make start - start the instance
make down - delete the instance

All other targets are auxiliary. The Makefile is primarily to document commands that are commonly used to work with the terraform resource(s). You can simply copy the command from the Makefile and run it manually in the terminal if you do not want to use make.

data disk management

The data disk associated to a given notebooks_name is retained and reattached even after running make down and make up. This is useful to avoid losing work, especially when spot/preemptibility is enabled (not currently supported by the terraform google_notebooks_instance resource). However, this is associated to a cost for retaining the persistent disk. If you want to disable this behavior and delete the data disk automatically when destroying a machine, set no_remove_data_disk = false in terraform.tfvars. To manually delete the data disk associated to the current value of notebooks_name, run make -n delete_data_disk to verify the correct disk would be deleted and then rerun without -n to delete the data disk.

machine images

Check available machine images from the deeplearning-platform-release by running make show_disk_images. You can modify the machine image by setting vm_image_project and vm_image_family in terraform.tfvars. You can alternatively use a docker image by reviewing and editing the content of notebooks-instance.tf to use container_image instead of vm_image. You can also run make show_container_images to list available images. Note however that using a container image as opposed to a disk image would require a different post-startup configuration process. This can be incorporated into a derivative container image.

remote usage

ssh

The Makefile will run

gcloud compute config-ssh

to update your ssh configuration file and print the configured hostname at the end of make up. From a terminal, you can ssh into the hostname printed at the end of make up or try make ssh_gcp. In order to connect from IDEs or otherwise, it may be helpful to update your ~/.ssh/config file with something similar to (updated to reference the key files you use with google cloud platform)

Host gcp
    HostName <IP_ADDRESS>
    IdentityFile ~/.ssh/google_compute_engine
    UserKnownHostsFile=~/.ssh/google_compute_known_hosts
    IdentitiesOnly=yes
    CheckHostIP=no
    StrictHostKeyChecking=no
    RequestTTY Yes
    RemoteCommand cd /home/jupyter && sudo su jupyter

The IP_ADDRESS of the remote host is printed at the end of make up. You can run gcloud compute instances list to display the IP_ADDRESS of the virtual machine if you need to reference it. If you are using VS Code you may need to manually set "remote.SSH.enableRemoteCommand": true in order to respect execution of the RemoteCommand within the ssh session.

If you use the container rather than disk image to setup the virtual machine, you may find an alternative RemoteCommand useful

Host gcp
    ...
    RemoteCommand sudo docker exec -it payload-container /bin/bash

github

You may find it useful to execute a script similar to the following

mkdir -p $HOME/.config/gh && \
printf "\n[user]
    name = Your Name
    email = your@email
[credential]
    helper = store\n" >> $HOME/.gitconfig && \
printf "github.com:
    oauth_token: ghp_github_oauth_token
    user: githubusername
    git_protocol: https" > $HOME/.config/gh/hosts.yml && \
printf "https://githubusername:ghp_github_oauth_token@github.com\n" > $HOME/.git-credentials

to support github integration from the remote server.