In [1]:
!pip install azureml-core --quiet
!pip install azure-ml-component --quiet

# Connect to AzureML Workspace

AML workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace holds all your experiments, compute targets, models, datastores, etc.

First, download the workspace credentials to `config.json` using the Azure portal. On upper-right corner, click on the "Visual Studio Enterprise Subscription" tab, and click on "Download config file" link.
![](images/config_download.png)

In [2]:
from azureml.core import Workspace, Experiment, Run, Datastore, Dataset
from azure.ml.component import Component, dsl

# This will prompt you to login to Azure
workspace = Workspace.from_config()

# Let's take a look at the workspace information
workspace

Workspace.create(name='kaggle', subscription_id='dba5253c-7f60-45fe-86c5-a01af9f846f0', resource_group='kaggle-rg')

# Uploading a dataset to Azure Blob Storage

To upload your dataset to Azure Blob Storage:
* Go to "data" tab, and click on "Create" button.
  ![](images/register_dataset_1.png)

* Give a unique name to your dataset. Select "File" type (AzureML v1 API), and click on "Next" button.

  IMPORTANT: do not select v2! Otherwise the rest of this tutorial will fail.
  ![](images/register_dataset_4.png)

* Select "Upload files from local" option, and click on "Next" button.
  ![](images/register_dataset_3.png)

* Select "workspaceblobstore" as the datastore. Do not use anything else!
  ![](images/register_dataset_5.png)

* Upload the dataset file, and click on "Next" button.
  ![](images/register_dataset_2.png)

# Submit an experiment that uses your dataset

* First, select a base environment. It's easier to start with some default Ubuntu environments officially managed by Microsoft, with Pytorch and other ML libraries pre-installed. List of curated environments in AzureML: [Azure Machine Learning Curated Environments (API v1)](https://learn.microsoft.com/en-us/azure/machine-learning/resource-curated-environments?view=azureml-api-2&viewFallbackFrom=azureml-api-1).
  * (Advanced) To create a custom environment, please refer to [Manage Azure Machine Learning environments with the CLI & SDK](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2?view=azureml-api-2&tabs=cli&viewFallbackFrom=azureml-api-1#use-a-curated-environment)
* Then, select an experiment. In AzureML, experiments are like a folder that contains all the runs. You can create a new experiment, or select an existing one.
* Create a ScriptRun config that contains the information about the script to run, and the environment to use.

For more info, please refer to the official documentation: [Configure and submit training jobs](https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-set-up-training-targets?view=azureml-api-1#create-an-environment).

In [3]:
from azureml.core import Environment

# Get a python environment, based on the latest PyTorch image from Azure ML.
# It is not recommended to change this, as building your own custom environment will take longer.
environment = Environment.get(workspace, name="AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu")

## Create/select an experiment

An experiment is like a folder for your runs. You can create a new experiment, or select an existing one. The best practice is to group create a new experiment for each new task. Rules of thumb:
* If you are trying to solve a new problem or to use a new model, create a new experiment.
* If you are trying to improve an existing model (either with code/config changes or dataset updates), use the same experiment.

In [4]:
# Create a new experiment folder, one per project/model type
from azureml.core import Experiment

experiment_name = 'sample_experiment'
experiment = Experiment(workspace=workspace, name=experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
sample_experiment,kaggle,Link to Azure Machine Learning studio,Link to Documentation


# Choose your compute target

In "compute" tab, you can see compute targets. Click on them to see the price / GPU type / max cluster size, etc.
You can see the size of the cluster, how busy each node is, and the price per hour for each compute type.
![](images/run_1.png)

# Create a ScriptRun config

A [`ScriptRunConfig`](https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) object is used to configure the information necessary to submit a run. It's a simple way to configure your run, but it's not the only way. You can also use a `PythonScriptStep` object to configure your run. For more info, please refer to the official documentation: [Configure and submit training jobs](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#create-an-environment).

In [5]:
from azureml.core import ScriptRunConfig
from azureml.core import Dataset

# Create a script config that will run the training script using the dataset
# WARNING: ScriptRunConfig is ONLY for python scripts!
src = ScriptRunConfig(source_directory="./sample_code/",
                      script='train.py',
                      compute_target='gpu-t4-lp',
                      environment=environment)

# Finallly, submit the experiment
Don't worry if your run contains bugs! You can always cancel them :)

In [6]:
# Submit the experiment
run = experiment.submit(config=src)
run

Experiment,Id,Type,Status,Details Page,Docs Page
sample_experiment,sample_experiment_1683939214_46697197,azureml.scriptrun,Starting,Link to Azure Machine Learning studio,Link to Documentation


You will then see the run in "Jobs" tab. Click on it to manage, and see the logs.
![](images/run_2.png)

Azure ML will take some time to prepare the job (usually ~10-15 minutes). The reason it takes so long is because it needs to prepare the cluster, the environment, the image, etc. Once it's ready, it will start running the job and you will see `std_log.txt` where you can see the logs from your job.
![](images/run_3.png)

# Tips

* If you're using `V100` or `T4` computes (`gpu-v100-x1` or `gpu-t4-lp`), it is **strongly** advised to train in `FP16` or `BF16`. 16-bit training will accelerate your pipeline and allow you to use larger batch sizes.
* Since the warmup time on AzureML is long, please consider experimenting on a local machine or Google Colab first, and then submit the job to AzureML once you're 100% sure that your code is working.
* Cancel the job if you see that it's not working to save resources. You can always submit a new job.