# TABLE OF CONTENTS:
---
* [Conda Environments](#Conda-Environments)
    * [Environment Artifacts](#Environment-Artifacts)
    * [Development Environment Setup](#Development-Environment-Setup)
    * [Environment Registration](#Environment-Registration)
---

This notebook should be run from the azureml_py36 kernel in the Azure Machine Learning (AML) Compute Instance or from any other kernel that has the azureml.core library installed so that the new environment can be saved to the workspace using the AML Python SDK. Subsequent notebooks should then be run from the newly created kernel.

In [8]:
import os

from azureml.core import Environment, Workspace
from azureml.core.authentication import MsiAuthentication

# Conda Environments

### Environment Artifacts

Environment-related artifacts will sit in their own parent directory called **"environments"**. Conda .yml files will have an own directory within the environments directory called **"conda"**. These directories are created in the following cells if they do not exist yet.

In [9]:
environment_dir = os.path.join(os.getcwd(), "../environments")
os.makedirs(environment_dir, exist_ok=True)
print(f"Environment directory {environment_dir} has been created.")

Environment directory /mnt/batch/tasks/shared/LS_root/mounts/clusters/sbirkamlci/code/pytorch-use-cases-azure-ml/image_classification_template_project/notebooks/../environments has been created.


In [10]:
conda_dir = os.path.join(os.getcwd(), "../environments/conda")
os.makedirs(conda_dir, exist_ok=True)
print(f"Conda directory {conda_dir} has been created.")

Conda directory /mnt/batch/tasks/shared/LS_root/mounts/clusters/sbirkamlci/code/pytorch-use-cases-azure-ml/image_classification_template_project/notebooks/../environments/conda has been created.


Create an **environment.yml** file which contains all packages needed for a conda environment for development, training and deployment to ensure reproducibility and easy collaboration with other team members. This .yml file will be used to build an AML environment as well as a local development environment and jupyter kernel for your notebook experience. If the different stages (development, training and deployment) vary greatly, a separate conda environment file for each stage can be created. In that case they should be prefixed with their respective stage, e.g. **training_environment.yml**. Moreover, the environment names should also be prefixed with the respective stage, e.g. **training-pytorch-aml-env**. If any dependencies are added throughout the model development phase they should be added to this .yml file and the conda environment and kernel should be rebuilt. Packages should not be installed using "pip install" or "conda install" to ensure reproducibility is maintained.

In general try to use conda with the default channels as a priority means to install Python packages. If conda tells you the package you want doesn't exist, then use pip (or try conda-forge and other well-known channels, which have more packages available than the default conda channel). Conda makes sure that the dependencies of all packages are fulfilled simultaneously, the lack of which often leads to broken environments when using pip.

In [11]:
%%writefile ../environments/conda/environment.yml
name: pytorch-aml-env
dependencies:
- joblib=0.13.2
- matplotlib=3.3.3
- mlflow=1.13.1
- python=3.7.1
- pytorch::pytorch=1.7.0
- pytorch::torchvision=0.8.1
- scipy=1.6.0
- tqdm=4.38.0
- pip:
    - azure-cli
    - azureml-contrib-functions
    - azureml-defaults
    - azureml-sdk
    - azureml-widgets
    - ipykernel
    - python-dotenv==0.15.0
channels:
- conda-forge
- pytorch

Overwriting ../environments/conda/environment.yml


**Note:** azureml-contrib-functions is needed for deployment to Azure Functions.

If for any reasons you want to install packages just in your local development environment and not add them to the .yml file, then you should **not use "pip install" or "conda install"** from the notebook either. This will not ensure that the packages are installed in the conda environment of the kernel that you are currently running. Instead use the syntax from the following cells to make certain that the packages are installed in the conda environment of your currently running kernel. Only do this for testing of packages and do not forget to add packages you want to keep to the .yml file eventually.

In [5]:
# # Do this for pip packages
# import sys
# !{sys.executable} -m pip install matplotlib

In [6]:
# # Do this for conda packages
# import sys
# !conda install --yes --prefix {sys.prefix} matplotlib

### Development Environment Setup

In order to create a local conda environment and jupyter kernel to develop with, execute the following steps in the terminal from the project root directory.

`conda env create -f environments/conda/environment.yml (--force)`

`conda activate pytorch-aml-env`

`python -m ipykernel install --user --name=pytorch-aml-env`

The environment will then be available as a kernel in JupyterLab or Jupyter:

<img src="../docs/images/kernel_selection.png" alt="kernel_selection" width="400"/>

In order to list the available environments, the following command can be used:

`conda env list`

In order to remove an environment and the corresponding jupyter kernel, the following commands can be used.

`jupyter kernelspec remove pytorch-aml-env` (run this from within the environment)

`conda deactivate`

`conda env remove -n pytorch-aml-env`

### Environment Registration

The environment is saved to the workspace from where it can be retrieved from another compute by another data scientist or for other purposes along the ml lifecycle (e.g. model training or model deployment).

In [7]:
# Connect to the AML workspace with interactive authentication.
# For alternative connection options (e.g. for automated workloads) see the aml_snippets directory.
ws = Workspace.from_config()

# Create an environment from conda environment.yml file
env = Environment.from_conda_specification(name="pytorch-aml-env",
                                           file_path="../environments/conda/environment.yml")

# Enable docker-based environment
env.docker.enabled = True

# Specify docker base image from mcr
env.docker.base_image = "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1"

# For an inferencing environment, the inferencing_stack_version has to be set to "latest"
env.inferencing_stack_version = "latest"

# Register the environment
env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20201113.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": "latest",
    "name": "pytorch-aml-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "conda-forge",
                "pyt