# TABLE OF CONTENTS:
---
* [Notebook Summary](#Notebook-Summary)
* [Setup](#Setup)
    * [Connect to Workspace](#Connect-to-Workspace)
* [Conda Environments](#Conda-Environments)
    * [Environment Artifacts](#Environment-Artifacts)
        * [Development Environment](#Development-Environment)
        * [Training Environment](#Training-Environment)
        * [Deployment Environment](#Deployment-Environment)
    * [Development Environment Setup](#Development-Environment-Setup)
    * [Environment Registration](#Environment-Registration)
---

# Notebook Summary

This notebook contains instructions on how to create a conda development environment as well as a Jupyter kernel based on this conda environment that can be used for development with Jupyter. It will also create Azure Machine Learning (AML) environments for development, training and deployment respectively. It should be run from the azureml_py36 kernel in the AML Compute Instance or from any other kernel that has the azureml.core library installed so that the new environment can be saved to the workspace using the AML Python SDK. Subsequent notebooks should then be run from the newly created Jupyter kernel and therefore contain all packages needed for development.

# Setup

In [2]:
# Import libraries
import azureml.core
import os
from azureml.core import Environment, Workspace

print(f"azureml.core Version: {azureml.core.__version__}")

azureml.core Version: 1.20.0


### Connect to Workspace

In order to connect and communicate with the AML workspace, a workspace object needs to be instantiated using the AML Python SDK.

In [3]:
# Connect to the AML workspace using interactive authentication
ws = Workspace.from_config()

# Conda Environments

### Environment Artifacts

To ensure reproducability and to facilitate collaboration, .yml files will be created for each environment. It is important that, when new packages are needed, they will be added to these .yml files.

Environment-related artifacts will sit in their own parent directory `<PROJECT_ROOT>/environments`. Conda .yml files will have an own directory within the environments directory `(<PROJECT_ROOT>/environments/conda)`.

In general it is a recommendation to use conda with the default channels as a priority means to install Python packages. If conda tells you the package you want doesn't exist, then use pip (or try conda-forge and other well-known channels, which have more packages available than the default conda channel). Conda makes sure that the dependencies of all packages are fulfilled simultaneously, the lack of which often leads to broken environments when using pip for the installation of packages.

#### Development Environment

Create a **development_environment.yml** file which contains all packages needed for a conda environment for development. This .yml file will be used to build an AML environment as well as a local development environment and Jupyter kernel for your notebook development.

If any dependencies are added throughout the model development phase they should be added to this .yml file and the conda environment and Jupyter kernel should be rebuilt. Packages should not be installed using "pip install" or "conda install" directly to ensure reproducability is maintained and all packages are tracked in the .yml file.

If for any reasons you want to install packages just in your local development environment and not add them to the .yml file, then you should **not use "pip install" or "conda install"** from the notebook either. This will not ensure that the packages are installed in the conda environment of the Jupyter kernel that you are currently running. Instead use the syntax from the following cells to make certain that the packages are installed in the conda environment of your currently running kernel. Only do this for testing of packages and do not forget to add packages you want to keep to the .yml file eventually.

In [4]:
# # Do this for pip packages
# import sys
# !{sys.executable} -m pip install matplotlib

In [5]:
# # Do this for conda packages
# import sys
# !conda install --yes --prefix {sys.prefix} matplotlib

In [6]:
%%writefile ../environments/conda/development_environment.yml
name: stanford-dogs-dev-env
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- python=3.7.1
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli
    - azure-cognitiveservices-vision-customvision
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets
    - ipykernel
    - python-dotenv==0.15.0
channels:
- conda-forge
- pytorch

Overwriting ../environments/conda/development_environment.yml


#### Training Environment

Also create .yml files for training and deployment. Each file is prefixed with their respective stage, i.e. **training_environment.yml** and **deployment_environment.yml**. Moreover, the environment names are also labeled according to the respective stage, namely **stanford-dogs-train-env** and **stanford-dogs-deploy-env**. 

In [7]:
%%writefile ../environments/conda/training_environment.yml
name: stanford-dogs-train-env
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- python=3.7.1
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets
    - ipykernel
    - python-dotenv==0.15.0
channels:
- conda-forge
- pytorch

Overwriting ../environments/conda/training_environment.yml


#### Deployment Environment

In [8]:
%%writefile ../environments/conda/deployment_environment.yml
name: stanford-dogs-deploy-env
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- python=3.7.1
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets
    - ipykernel
    - python-dotenv==0.15.0
channels:
- conda-forge
- pytorch

Overwriting ../environments/conda/deployment_environment.yml


### Development Environment Setup

In order to create a local conda development environment and Jupyter kernel to develop with, execute the following steps in the terminal from the project root directory.

`conda env create -f environments/conda/development_environment.yml --force`

`conda activate stanford-dogs-dev-env`

`python -m ipykernel install --user --name=stanford-dogs-dev-env`

The environment will then be available as a kernel in JupyterLab or Jupyter:

<img src="../docs/images/kernel_selection.png" alt="kernel_selection" width="400"/>

In order to list the available environments, the following command can be used:

`conda env list`

In order to remove an environment and the corresponding Jupyter kernel, the following commands can be used.

`jupyter kernelspec remove stanford-dogs-dev-env` (run this from within the environment)

`conda deactivate`

`conda env remove -n stanford-dogs-dev-env`

### Environment Registration

The environments are saved to the AML workspace from where they can be retrieved from another compute or by another Data Scientist during all phases along the ML lifecycle.

In [9]:
# Create the AML development environment from conda development_environment.yml file
dev_env = Environment.from_conda_specification(name="stanford-dogs-dev-env",
                                               file_path="../environments/conda/development_environment.yml")

# Enable docker-based environment
dev_env.docker.enabled = True

# Specify docker base image from mcr
dev_env.docker.base_image = "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1"

# Register the environment
dev_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "stanford-dogs-dev-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
                "p

In [10]:
# Create the AML training environment from conda training_environment.yml file
train_env = Environment.from_conda_specification(name="stanford-dogs-train-env",
                                                 file_path="../environments/conda/training_environment.yml")

# Enable docker-based environment
train_env.docker.enabled = True

# Specify docker base image from mcr
train_env.docker.base_image = "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1"

# Register the environment
train_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "stanford-dogs-train-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
                

In [11]:
# Create the AML deployment environment from conda deployment_environment.yml file
deployment_env = Environment.from_conda_specification(name="stanford-dogs-deploy-env",
                                                      file_path="../environments/conda/deployment_environment.yml")

# Enable docker-based environment
deployment_env.docker.enabled = True

# Specify docker base image from mcr
deployment_env.docker.base_image = "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1"

# For an inferencing environment, the inferencing_stack_version has to be set to "latest"
deployment_env.inferencing_stack_version = "latest"

# Register the environment
deployment_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": "latest",
    "name": "stanford-dogs-deploy-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
           