# TABLE OF CONTENTS:
---
* [Notebook Summary](#Notebook-Summary)
* [Setup](#Setup)
    * [Notebook Parameters](#Notebook-Parameters)
    * [Connect to Workspace](#Connect-to-Workspace)
* [Conda Environments](#Conda-Environments)
    * [Environment Artifacts](#Environment-Artifacts)
        * [Development Environment (1)](#Development-Environment-(1))
        * [Training Environment (2)](#Training-Environment-(2))
        * [Deployment Environment (3)](#Deployment-Environment-(3))
    * [Development Environment Setup](#Development-Environment-Setup)
    * [AML Environment Registration](#AML-Environment-Registration)
---

# Notebook Summary

This notebook contains instructions on how to create a local conda development environment as well as a Jupyter kernel based on this local conda environment. The conda development environment can be used for IDE-based development with the Azure Machine Learning (AML) Compute Instance (CI), e.g. with VSCode. The Jupyter kernel can be used for development on the AML CI using JupyterLab, Jupyter or AML notebooks. 

This notebook will also show how to create AML environments for development (1), training (2) and deployment (3) respectively. These AML environments are environments that are registered to the AML workspace and can be shared by Data Scientists across different compute targets. This will ensure easy reproducability of environments. It is recommended to create separate environments for development (1), and training (2) and deployment (3) because not all dependencies will be needed in every stage of the ML lifecycle and the less dependencies we have, the less error-prone the environment is (e.g. visualization libraries might only be necessary during the development phase and metric tracking libraries might only be necessary during the training phase; also we want to have the deployment environment as lightweight as possible).

**Note**: This notebook should be run from the azureml_py38 Jupyter kernel in the AML CI (it is available by default) or from any other Jupyter kernel that has the azureml.core library installed so that the new environments can be registered to the AML workspace using the AML Python SDK. Subsequent notebooks should then be run from the newly created Jupyter kernel (that is based on your local conda development environment) and therefore contain all packages needed for development.

# Setup

In [1]:
# Import libraries
import azureml.core
import os
from azureml.core import Environment, Workspace

print(f"azureml.core Version: {azureml.core.__version__}")

azureml.core Version: 1.30.0


### Notebook Parameters

Specify the notebook parameters which are used in the source code below.

In [2]:
# Specify the AML development environment name (1)
# (this will be the name with which the dev environment is registered to the AML workspace)
dev_env_name = "dogs_clf_dev_env"

# Specify the AML training environment name (2)
# (this will be the name with which the train environment is registered to the AML workspace)
train_env_name = "dogs_clf_train_env"

# Specify the AML deployment environment name (3)
# (this will be the name with which the deploy environment is registered to the AML workspace)
deploy_env_name = "dogs_clf_deploy_env"

# Specify the base docker image that will underlie all 3 environments
base_docker_image = "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1"

Next to the notebook parameters, the conda .yml files further down the notebook will have to be modified to reflect all required dependencies.

### Connect to Workspace

In order to connect and communicate with the AML workspace, a workspace object needs to be instantiated using the AML Python SDK.

In [3]:
# Connect to the AML workspace using interactive authentication
ws = Workspace.from_config()

**Note:** From the AML CI, above command works without any additional steps. See `https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=azure-portal#download-a-configuration-file` for more information on how to download the config.json to another compute target.

# Conda Environments

### Environment Artifacts

To ensure reproducability and to facilitate collaboration, conda .yml files will be created for each environment. It is important that, when new packages are needed, they will be added to these .yml files.

Environment-related artifacts will sit in their own parent directory `<PROJECT_ROOT>/environments`. Conda .yml files will have an own directory within the environments directory `(<PROJECT_ROOT>/environments/conda)`.

In general, it is a recommendation to use conda with the default channels as a priority means to install Python packages. If conda tells you the package you want doesn't exist, then use pip (or try conda-forge and other well-known channels, which have more packages available than the default conda channel). Conda makes sure that the dependencies of all packages are fulfilled simultaneously, the lack of which often leads to broken environments when using pip for the installation of packages.

#### Development Environment (1)

Create a **development_environment.yml** file which contains all packages needed for a conda environment for development. This .yml file will be used to build an AML environment as well as a local development environment and Jupyter kernel for notebook development.

If any dependencies are added throughout the model development phase they should be added to this .yml file and the conda environment and Jupyter kernel should be rebuilt. Packages should not be installed using "pip install" or "conda install" directly to ensure reproducability is maintained and all packages are tracked in the .yml file.

If for any reasons you still want to install packages just in your local development environment and not add them to the .yml file, then you should **not use "pip install" or "conda install"** from the notebook either. This will not ensure that the packages are installed in the conda environment of the Jupyter kernel that you are currently running. Instead use the syntax from the following cells to make certain that the packages are installed in the conda environment of your currently running kernel (or open the terminal, activate the conda environment and pip install / conda install your packages from there). Only do this for testing of packages and do not forget to add packages you want to keep to the .yml file eventually.

In [4]:
# # Do this for pip packages
# import sys
# !{sys.executable} -m pip install matplotlib

In [5]:
# # Do this for conda packages
# import sys
# !conda install --yes --prefix {sys.prefix} matplotlib

In [6]:
%%writefile ../environments/conda/development_environment.yml
name: dogs_clf_dev_env
channels:
- conda-forge
- pytorch    
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- pip=21.0.1
- pytest=6.2.2
- python=3.7.1
- python-dotenv=0.8.2
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli==2.3.1
    - azure-cognitiveservices-vision-customvision==3.1.0
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets
    - ipykernel==5.5.5

Overwriting ../environments/conda/development_environment.yml


#### Training Environment (2)

Also create .yml files for training and deployment. Each file is prefixed with their respective stage, i.e. **training_environment.yml** and **deployment_environment.yml**. Moreover, the environment names are also labeled according to the respective stage, namely **dogs_clf_train_env** and **dogs_clf_deploy_env**. 

In [7]:
%%writefile ../environments/conda/training_environment.yml
name: dogs_clf_train_env
channels:
- conda-forge
- pytorch
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- pip=21.0.1
- python=3.7.1
- python-dotenv=0.8.2
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli==2.3.1
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets

Overwriting ../environments/conda/training_environment.yml


#### Deployment Environment (3)

In [8]:
%%writefile ../environments/conda/deployment_environment.yml
name: dogs_clf_deploy_env
channels:
- conda-forge
- pytorch
dependencies:
- joblib=0.13.2
- pip=21.0.1
- python=3.7.1
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- pip:
    - azure-cli==2.3.1
    - azureml-core==1.20.0
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets

Overwriting ../environments/conda/deployment_environment.yml


### Development Environment Setup

In order to create a local conda development environment and Jupyter kernel to develop with, execute the following steps in the terminal from the project root directory.

`conda env create -f environments/conda/development_environment.yml --force`

`conda activate dogs_clf_dev_env`

`python -m ipykernel install --user --name=dogs_clf_dev_env`

**Note**: You might have to run `conda init bash` first and restart the terminal.

The environment will then be available as a kernel in JupyterLab or Jupyter:

<img src="../docs/images/kernel_selection.png" alt="kernel_selection" width="400"/>

In order to list the available environments, the following command can be used:

`conda env list`

In order to remove an environment and the corresponding Jupyter kernel, the following commands can be used.

`jupyter kernelspec remove dogs_clf_dev_env` (run this from within the environment)

`conda deactivate`

`conda env remove -n dogs_clf_dev_env`

### AML Environment Registration

The environments are saved to the AML workspace from where they can be retrieved and used from another compute target and/or by another Data Scientist during all phases along the ML lifecycle.

In [9]:
# Create the AML development environment from conda development_environment.yml file
dev_env = Environment.from_conda_specification(name=dev_env_name,
                                               file_path="../environments/conda/development_environment.yml")

# Specify docker base image from mcr
dev_env.docker.base_image = base_docker_image

# Register the environment
dev_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "dogs_clf_dev_env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
                "pytor

In [10]:
# Create the AML training environment from conda training_environment.yml file
train_env = Environment.from_conda_specification(name=train_env_name,
                                                 file_path="../environments/conda/training_environment.yml")


# Specify docker base image from mcr
train_env.docker.base_image = base_docker_image

# Register the environment
train_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "dogs_clf_train_env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
                "pyt

In [11]:
# Create the AML deployment environment from conda deployment_environment.yml file
deploy_env = Environment.from_conda_specification(name=deploy_env_name,
                                                      file_path="../environments/conda/deployment_environment.yml")

# Specify docker base image from mcr
deploy_env.docker.base_image = base_docker_image

# For an inferencing environment, the inferencing_stack_version has to be set to "latest"
deploy_env.inferencing_stack_version = "latest"

# Register the environment
deploy_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": "latest",
    "name": "dogs_clf_deploy_env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
               

Now you should be able to see your registered AML environments in the AML workspace:
    
<img src="../docs/images/aml_environments.png" alt="aml_environments" width="800"/>   