# Working with Compute

When you run a script as an Azure Machine Learning experiment, you need to define the execution context for the experiment run. The execution context is made up of:

* The Python environment for the script, which must include all Python packages used in the script.
* The compute target on which the script will be run. This could be the local workstation from which the experiment run is initiated, or a remote compute target such as a training cluster that is provisioned on-demand.

In this lab, you'll explore *environments* and *compute targets* for experiments.

## Connect to Your Workspace

The first thing you need to do is to connect to your workspace using the Azure ML SDK.

> **Note**: If the authenticated session with your Azure subscription has expired since you completed the previous exercise, you'll be prompted to reauthenticate.

In [2]:
from azureml import core

ws = core.Workspace.from_config()
print(f'Ready to use Azure ML {core.VERSION} to work with {ws.name}')

Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (portalocker 2.0.0 (/anaconda/envs/azureml_py36/lib/python3.6/site-packages), Requirement.parse('portalocker~=1.0'), {'msal-extensions'}).


Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FAEJXSQL8 to authenticate.
Interactive authentication successfully completed.
Ready to use Azure ML 1.11.0 to work with workspace


## Prepare Data

In this lab, you'll use a dataset containing details of diabetes patients. Run the cell below to create this dataset (if you already created it in a previous lab, the code will find the existing version.)

In [3]:
default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(
        files=['./data/diabetes.csv', './data/diabetes2.csv'],
        target_path='diabetes-data/',
        overwrite=True,
        show_progress=True,
    )

    tab_data_set = core.Dataset.Tabular.from_delimited_files(
        path=(default_ds, 'diabetes-data/*.csv')
    )

    tab_data_set = tab_data_set.register(
        workspace=ws, 
        name='diabetes dataset',
        description='diabetes data',
        tags = {'format':'CSV'},
        create_new_version=True,
    )
    print('Dataset registered.')
else:
    print('Dataset already registered.')

Dataset already registered.


## Define an Environment

When you run a Python script as an experiment in Azure Machine Learning, a Conda environment is created to define the execution context for the script. Azure Machine Learning provides a default environment that includes many common packages; including the **azureml-defaults** package that contains the libraries necessary for working with an experiment run, as well as popular packages like **pandas** and **numpy**.

You can also define your own environment and add packages by using **conda** or **pip**, to ensure your experiment has access to all the libraries it requires. 

Run the following cell to create an environment for the diabetes experiment.

In [4]:
from azureml.core import conda_dependencies

diabetes_env = core.Environment("diabetes-experiment-env")
diabetes_env.python.user_managed_dependencies = False
diabetes_env.docker.enabled = True

diabetes_packages = conda_dependencies.CondaDependencies.create(
    conda_packages=['scikit-learn'],
    pip_packages=['azureml-defaults', 'azureml-dataprep[pandas]']
)

diabetes_env.python.conda_dependencies = diabetes_packages

print(diabetes_env.name, 'defined.')

diabetes-experiment-env defined.


Now you can use the environment for the experiment by assigning it to an Estimator (or RunConfig).

The following code assigns the environment you created to a generic estimator, and submits an experiment. As the experiment runs, observe the run details in the widget and in the **azureml_logs/60_control_log.txt** output log, you'll see the conda environment being built.

In [5]:
from azureml.train import estimator
from azureml import widgets

script_params = {
    '--regularization': 0.1
}

diabetes_ds = ws.datasets.get("diabetes dataset")

experiment_folder = 'diabetes-training-logistic'
config = estimator.Estimator(
    source_directory=experiment_folder,
    inputs=[diabetes_ds.as_named_input('diabetes')],
    script_params=script_params,
    compute_target = 'local',
    environment_definition = diabetes_env,
    entry_script='diabetes_training.py'
)

experiment = core.Experiment(workspace = ws, name = 'diabetes-training')

run = experiment.submit(config=config)
widgets.RunDetails(run).show()
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'diabetes-training_1597345346_2c6d001d',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2020-08-13T19:08:05.113366Z',
 'endTimeUtc': '2020-08-13T19:08:21.342847Z',
 'properties': {'_azureml.ComputeTargetType': 'local',
  'ContentSnapshotId': '10467a7d-9c2c-4a4d-a1eb-d316550532cb',
  'azureml.git.repository_uri': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'mlflow.source.git.repoURL': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '5c1eefa0df81f69f138a39b10008e0fad0c65849',
  'mlflow.source.git.commit': '5c1eefa0df81f69f138a39b10008e0fad0c65849',
  'azureml.git.dirty': 'False'},
 'inputDatasets': [{'dataset': {'id': 'ae295d27-1d6d-4897-b7d2-d19f9275b922'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'diabetes', 'mechanism': 'Direct'}}],
 'runDefinition': {'script': 'diabetes_training.py',
  'scriptType': None,
  'useAbsolutePath': False,
  '

The experiment successfully used the environment, which included all of the packages it required.

Having gone to the trouble of defining an environment with the packages you need, you can register it in the workspace.

In [6]:
diabetes_env.register(workspace=ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20200423.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "diabetes-experiment-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "co

## Run an Experiment on a Remote Compute Target

In many cases, your local compute resources may not be sufficient to process a complex or long-running experiment that needs to process a large volume of data; and you may want to take advantage of the ability to dynamically create and use compute resources in the cloud.

Azure ML supports a range of compute targets, which you can define in your workpace and use to run experiments; paying for the resources only when using them. In this case, we'll run the diabetes training experiment on a compute cluster with a unique name of your choosing, so let's verify that exists (and if not, create it) so we can use it to run training experiments.

> **Important**: Change *your-compute-cluster* to a unique name for your compute cluster in the code below before running it!

In [8]:
from azureml.core import compute
from azureml.core import compute_target

cluster_name = "susumu-cluster"

try:
    training_cluster = compute.ComputeTarget(
        workspace=ws, name=cluster_name,
    )
    print('Found existing cluster, use it.')
except compute_target.ComputeTargetException:
    compute_config = compute.AmlCompute.provisioning_configuration(
        vm_size='STANDARD_D2_V2', max_nodes=1
    )
    training_cluster = compute.ComputeTarget.create(
        ws, cluster_name, compute_config,
    )

training_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


Now you're ready to run the experiment on the compute you created. You can do this by specifying the **compute_target** parameter in the estimator (you can set this to either the name of the compute target, or a **ComputeTarget** object.)

You'll also reuse the environment you registered previously.

In [9]:
registered_env = core.Environment.get(ws, 'diabetes-experiment-env')

script_params = {
    '--regularization': 0.1
}

diabetes_ds = ws.datasets.get("diabetes dataset")

config = estimator.Estimator(
    source_directory=experiment_folder,
    inputs=[diabetes_ds.as_named_input('diabetes')],
    script_params=script_params,
    compute_target = cluster_name,
    environment_definition = registered_env,
    entry_script='diabetes_training.py',
)

experiment = core.Experiment(workspace = ws, name = 'diabetes-training')

run = experiment.submit(config=config)
widgets.RunDetails(run).show()
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

{'runId': 'diabetes-training_1597345960_c766c1f6',
 'target': 'susumu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-08-13T19:16:15.61313Z',
 'endTimeUtc': '2020-08-13T19:19:54.975503Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '10467a7d-9c2c-4a4d-a1eb-d316550532cb',
  'azureml.git.repository_uri': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'mlflow.source.git.repoURL': 'https://github.com/susumuasaga/mslearn-aml-labs',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '5c1eefa0df81f69f138a39b10008e0fad0c65849',
  'mlflow.source.git.commit': '5c1eefa0df81f69f138a39b10008e0fad0c65849',
  'azureml.git.dirty': 'False',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [{'dataset': {'id': 'ae295d27-1d6d-4897-b7d2-d19f9275b922'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'diabetes', 'mech

The experiment will take quite a lot longer because a container image must be built with the conda environment, and then the cluster nodes must be started and the image deployed before the script can be run. For a simple experiment like the diabetes training script, this may seem inefficient; but imagine you needed to run a more complex experiment with a large volume of data that would take several hours on your local workstation - dynamically creating more scalable compute may reduce the overall time significantly.

While you're waiting for the experiment to run, you can check on the status of the compute in the widget above or in [Azure Machine Learning studio](https://ml.azure.com).

> **Note**: After some time, the widget may stop updating. You'll be able to tell the experiment run has completed by the information displayed immediately below the widget and by the fact that the kernel indicator at the top right of the notebook window has changed from  **&#9899;** (indicating the kernel is running code) to **&#9711;** (indicating the kernel is idle).

After the experiment has finished, you can get the metrics and files generated by the experiment run. The files will include logs for building the image and managing the compute.

In [10]:
metrics = run.get_metrics()
for key in metrics.keys():
    print(key, metrics[key])
print('\n')
for file in run.get_file_names():
    print(file)

Regularization Rate 0.1
Accuracy 0.7891111111111111
AUC 0.8568509052814499


azureml-logs/55_azureml-execution-tvmps_3c9b746289bf78fc189d682dc258fbfdc6ebec21635181d8a34aa5fe45eafd75_d.txt
azureml-logs/65_job_prep-tvmps_3c9b746289bf78fc189d682dc258fbfdc6ebec21635181d8a34aa5fe45eafd75_d.txt
azureml-logs/70_driver_log.txt
azureml-logs/75_job_post-tvmps_3c9b746289bf78fc189d682dc258fbfdc6ebec21635181d8a34aa5fe45eafd75_d.txt
azureml-logs/process_info.json
azureml-logs/process_status.json
logs/azureml/107_azureml.log
logs/azureml/dataprep/backgroundProcess.log
logs/azureml/dataprep/backgroundProcess_Telemetry.log
logs/azureml/dataprep/engine_spans_l_5ea49ddd-002d-4c9d-a5a8-4ae63046cf04.jsonl
logs/azureml/dataprep/python_span_l_5ea49ddd-002d-4c9d-a5a8-4ae63046cf04.jsonl
logs/azureml/job_prep_azureml.log
logs/azureml/job_release_azureml.log
outputs/diabetes_model.pkl


**More Information**:

- For more information about environments in Azure Machine Learning, see [Reuse environments for training and deployment by using Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments).
- For more information about compute targets in Azure Machine Learning, see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/azure/machine-learning/concept-compute-target).