# Azure ML quickstart
## 1. Create local run environment
```powershell
 pip install azureml-core azureml-telemetry azureml-widgets azureml-dataset-runtime
```
## 2. Create Azure Compute Cluster
![AzureML Create compute cluster](./../img/create_cluster.png "Create compute cluster")

![AzureML Naming convention](./../img/cluster_name.png "Cluster naming convention")

## 3. Upload your dataset (if not present)
Try to do it in separate script.

In [1]:
from azureml.core import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core.datastore import Datastore

1. you need to connect to workspace
    - Ideally download the `config.json` and load it with it.
    - ![config.json](./../img/config.png "config.json for connecting to AML")
    - Use `InteractiveLoginAuthentification` to automatically redirect to web browser and login to Azure ML. (Use own credentials ;)

In [3]:
ws = Workspace.from_config(path='azure_config.json', auth=InteractiveLoginAuthentication())

UserErrorException: UserErrorException:
	Message: You are currently logged-in to 25733538-6b16-4aa3-8ed6-297eb79b8e06 tenant. You don't have access to bc82dccd-f19d-42cb-9ce3-0d5df33ef086 subscription, please check if it is in this tenant. All the subscriptions that you have access to in this tenant are = 
 [SubscriptionInfo(subscription_name='Azure for Students', subscription_id='41027adc-1ed8-4205-8562-161a2155faa1')]. 
 Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.
	InnerException None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "You are currently logged-in to 25733538-6b16-4aa3-8ed6-297eb79b8e06 tenant. You don't have access to bc82dccd-f19d-42cb-9ce3-0d5df33ef086 subscription, please check if it is in this tenant. All the subscriptions that you have access to in this tenant are = \n [SubscriptionInfo(subscription_name='Azure for Students', subscription_id='41027adc-1ed8-4205-8562-161a2155faa1')]. \n Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk."
    }
}

2. get the default Datastore (you don't have privileges to any other)

In [None]:
datastore = Datastore.get_default(ws)

3. Get the list of all files you want to upload
    - there is no other way, the datasets in AzureML are only files and using the full path in their name works as the directory tree
    - __Be extra careful with the relative_root and target_path__
       - for example my upload that loadst data from directory "data/..." and uploads them into "datasets/TexDat"
       - `relative_root` is the common path *parent directories* in the data names where you want your data to be translated under the `target_path`

In [None]:
import os
files_tr = os.listdir('../data/train')
files_tr = list(map(lambda x : os.path.join('../data/train/', x), files_tr))
files_v = os.listdir('../data/val')
files_v = list(map(lambda x : os.path.join('../data/val/', x), files_v))
files_te = os.listdir('../data/test')
files_te = list(map(lambda x : os.path.join('../data/test/', x), files_te))
files = files_tr+files_v+files_te

datastore.upload_files(files=files, relative_root='../data', target_path='datasets/TexDat')

## 4. Create Dataset
For me personally, it was easier to create Dataset manually in Studio than to use the API...

![Create dataset](./../img/create_dataset.gif "Create Dataset, ..., File, Blob, ..., Path to files/**")

## 5. Create Control run script
There are several tutorials in the AzureML documentation... maybe most useful are these few:
1. [Hello World](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-hello-world)
2. [Train your ML network](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-train)
3. [Use own data](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-bring-data)
4. [Create SW Environment](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)
5. [Where & How to save outputs from your training](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-save-write-experiment-files)


### 5.1. Let's start here with useful imports
If you are starting from here and don't have all the imports from previous steps, then here you are...

In [1]:
import os
import azureml.core
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core.environment import Environment
from azureml.core import ScriptRunConfig
from azureml.core import Dataset
from azureml.data.datapath import DataPath
from azureml.core.authentication import InteractiveLoginAuthentication

print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.28.0


### 5.2 Don't forget to log in to your workspace
(usually the authentication is required only the first time)

In [2]:
ws = Workspace.from_config('azure_config.json', auth=InteractiveLoginAuthentication())
print(ws.name, ws.location, ws.resource_group, sep='\t')

a0047stufiitml01	westeurope	a0047-STUFIIT-ML01


### 5.3 Get your created Dataset - for use in training

In [6]:
dataset = Dataset.get_by_name(workspace=ws, name='TexDat')

### 5.4 Pretty much everything is configured by this ScriptRunConfig
Read the [documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py). <br>
If you ever wondered, how the Azure runs your scripts - it uploads your `source_directory`. Use `.gitignore` if there is anything not required to run your script, because the whole space for created snapshot is only 300MB including `./outputs`. <br>
You can [override snapshot size limit](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-save-write-experiment-files#storage-limits-of-experiment-snapshots), though.

In [12]:
config = ScriptRunConfig(source_directory='../',
                         script='main.py',
                         arguments=['--wandb', "use your own key :P",
                                    '--data_path', dataset.as_mount()], # This is important how to mount dataset from DataStore
                         compute_target='P100x1-hudec') # Compute target is your created compute cluster

### 5.4 Set the name of your experiment
... where all your runs will be aggregated

In [8]:
experiment = Experiment(workspace=ws, name='Texture2018')

### 5.5 Create/Select your required remote environment
###### Note1: pip install opencv-python-headless - for standard opencv AzureML returns import error
###### Note2: Beware that not all latest libraries are available for AzureML... PyTorch 1.8.0 thankfully is ;)

#### 5.5.1 You can list [curated](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-a-curated-environment) azure environments
And select one of already registered environments - if it has all you need for your ML project

In [35]:
for i,key in enumerate(ws.environments.keys()):
    print(f"{i}.",key)

0. pt-wandb-env
1. tf-wandb-env
2. Torch1.8-Opencv-wandb
3. AzureML-Pytorch1.7-Cuda11-OpenMpi4.1.0-py36
4. AzureML-Scikit-learn0.24-Cuda11-OpenMpi4.1.0-py36
5. AzureML-TensorFlow2.4-Cuda11-OpenMpi4.1.0-py36
6. AzureML-Minimal
7. AzureML-Triton
8. AzureML-Minimal-Inference-CPU
9. AzureML-TensorFlow-1.15-Inference-CPU
10. AzureML-PyTorch-1.6-CPU
11. AzureML-XGBoost-0.9-Inference-CPU
12. AzureML-PyTorch-1.6-Inference-CPU
13. AzureML-DeepSpeed-0.3-GPU
14. AzureML-TensorFlow-2.3-CPU
15. AzureML-PyTorch-1.6-GPU
16. AzureML-TensorFlow-2.3-GPU
17. AzureML-TensorFlow-2.2-CPU
18. AzureML-Tutorial
19. AzureML-TensorFlow-2.2-GPU
20. AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu
21. AzureML-sklearn-0.24-ubuntu18.04-py37-cuda11-gpu
22. AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu
23. AzureML-VowpalWabbit-8.8.0
24. AzureML-PyTorch-1.3-CPU
25. AzureML-PyTorch-1.5-CPU
26. AzureML-PyTorch-1.5-GPU


You can see the libraries included (watch for the counter)

In [38]:
envs = Environment.list(workspace=ws)
count = 0
for env in envs:
    print("Name",env)
    print("packages", envs[env].python.conda_dependencies.serialize_to_string())
    count += 1
    if count == 3:
        break

Name pt-wandb-env
packages channels:
- conda-forge
- anaconda
dependencies:
- python=3.6.2
- scipy
- scikit-image
- scikit-learn
- numpy
- pandas
- pip:
  - azureml-defaults
  - azureml-core
  - azureml-dataset-runtime
  - wandb
  - Pillow
  - torch==1.6.0
  - torchvision==0.7.0
name: azureml_ab95fef37536a3cdc05badd51f080bd5

Name tf-wandb-env
packages channels:
- conda-forge
- anaconda
dependencies:
- python=3.6.2
- scipy
- scikit-image
- scikit-learn
- numpy
- pandas
- pip:
  - azureml-core
  - azureml-dataset-runtime
  - tensorflow-gpu==2.2.0
  - wandb
  - Pillow
name: azureml_fa25278502cdb61f88f71980f535b1b5

Name Torch1.8-Opencv-wandb
packages channels:
- anaconda
- conda-forge
dependencies:
- python=3.6.2
- pip:
  - wandb==0.10.30
  - torch>=1.8.1
  - torch-summary
  - numpy>=1.19.3
  - opencv-python-headless>=4.4.0.46
  - matplotlib>=3.3.3
  - tqdm==4.60.0
  - azureml-core==1.28.0
- pip
name: azureml_d0e2eed3435e05eb20828b9dfa35f644



#### 5.5.2 Or you can create your own environment
.. from scratch using conda dependencies or pip [requirements.txt](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-conda-dependencies-or-pip-requirements-files)
.. dont't forget to register it to workspace
.. you can allow Docker for quicker loading...

In [None]:
azureml.core.runconfig.DockerConfiguration(use_docker=True)
env = Environment.from_pip_requirements(name='Torch1.8-Opencv-wandb', file_path='../requirements.txt')
env.register(workspace=ws)

#### 5.5.3. Select the one environment you desire
Or actually once you create your required env and register it, this is the only thing you will be calling `Environment.get(...)`

In [49]:
env = Environment.get(workspace=ws, name='Torch1.8-Opencv-wandb')

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210301.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "Torch1.8-Opencv-wandb",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda

#### 5.5.4. And of course, we need to use the environment

In [13]:
config.run_config.environment = env


## 5.6. Submit your experiment with config to AzureML
...and watch it in your jupyter ntb or in azureml studio..

In [14]:
run = experiment.submit(config)

In [28]:
aml_url = run.get_portal_url()
print(aml_url)

https://ml.azure.com/runs/Texture2018_1621849379_0e646884?wsid=/subscriptions/bc82dccd-f19d-42cb-9ce3-0d5df33ef086/resourcegroups/a0047-STUFIIT-ML01/workspaces/a0047stufiitml01&tid=5dbf1add-202a-4b8d-815b-bf0fb024e033


# There is a lot more to do with Azure...
Like logging for example... <br>
I did not try it, because I use [wandb](wandb.ai). <br>

## Important
To log anything, use AML [Run module](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)?view=azure-ml-py#azureml_core_Run_get_context) <br>
To save your model - use `import os` and save it to default directory `./outputs`. Then, you can use `os` calls and also torch/tf calls to save models when you have the correct path

This is where you will find it all: <br>
![Runs Outputs](./../img/run_outputs.png)