Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# 00 Environment Setup
---

This notebook walks you through all the necessary steps to configure your environment for this solution accelerator including:
1. Connecting to your workspace and create a config.json (*this can be skipped if running on a Notebook VM*)
2. Deploying a compute cluster
3. Creating and registering additional containers in blob storage for logging

### Prerequisites
At this point, you should have created your AML workspace. If you haven't created one already, you can create one in step 3.

## 1.0 Verify Azure ML SDK Version

In [None]:
import azureml.core

print("This notebook was created using version 1.0.74.1 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.

## 2.0 Connect to your Azure ML workspace

### Workspace parameters

To configure this solution accelerator to use your Azure ML workspace, supply the following information:
* Your subscription id
* The name of your resource group
* A name for your workspace
* (*optional*) The region that will host your workspace

The following cell allows you to specify your workspace parameters.  This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation.  If no environment variable exists, the parameters will be set to the specified default values. 

If you do not have a workspace set up, refer to the next section to create a new workspace. 

Replace the default values in the cell below with your workspace parameters:

In [None]:
import os

subscription_id = os.getenv("SUBSCRIPTION_ID", default="<my-subscriptoin-id>")
resource_group = os.getenv("RESOURCE_GROUP", default="<my-resource-group>")
workspace_name = os.getenv("WORKSPACE_NAME", default="<my-workspace-name>")
workspace_region = os.getenv("WORKSPACE_REGION", default="westus2")

## 3.0 Access workspace & write config

The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters.  If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method.  The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. 

In [None]:
from azureml.core import Workspace

try:
    ws = Workspace(subscription_id = subscription_id, 
                   resource_group = resource_group, 
                   workspace_name = workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    ws.write_config(path="../", file_name="config.json")
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")

When connecting to the Workspace, you may be prompted to complete interactive authentication. If you are receiving errors refer to [Authentication in AzureML](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.ipynb). If you have multiple subscriptions, you can use InteractiveLoginAuthentication to specify which directory to use. Here's a sample code block to use InteractiveLoginAuthentication.

In [None]:
# from azureml.core.authentication import InteractiveLoginAuthentication

# interactive_auth = InteractiveLoginAuthentication(tenant_id="my-tenant-id")

# from azureml.core import Workspace

# ws = Workspace(subscription_id="my-subscription-id",
#                resource_group="my-ml-rg",
#                workspace_name="my-ml-workspace",
#                auth=interactive_auth)

### Create a new workspace 
If you don't have an existing workspace, you can use the command below to create one. The cell will create an Azure ML workspace for you in your subscription provided you have the correct permissions.

**Note**: A *basic* workspace is created by default. If you would like to create an *enterprise* workspace, please specify ```sku = 'enterprise'```. Please visit our [pricing page](https://azure.microsoft.com/en-us/pricing/details/machine-learning/) for more details on enterprise edition.

In [None]:

# Create the workspace using the specified parameters
# ws = Workspace.create(name = workspace_name,
#                      subscription_id = subscription_id,
#                      resource_group = resource_group, 
#                      location = workspace_region,
#                      create_resource_group = True,
#                      sku = 'basic',
#                      exist_ok = True)
# ws.get_details()

# write the details of the workspace to a configuration file to the notebook library
# ws.write_config(path="../", file_name="config.json")

## 4.0 Create a compute cluster
The following cell will create a AML compute cluster to run the training, scoring, and forecasting pipelines. This is a one-time set up so you won't need to re-run this in future notebooks.

We created a AML compute cluster called 'train-many-model'. The VM size is STANDARD_D13_V2 with a minimal node count of 5.
The D-series VMs are used for tasks that require higher compute power and temporary disk performance. This [page](https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-sizes-specs) will give you more information on VM sizes to help you decide which will best fit your use case. 

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "train-many-model"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D13_V2',
                                                           min_nodes=5,
                                                           max_nodes=20)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

## 5.0 Create blob containers

For each pipeline, a seperate blob container is used to store the prediction results and logs. This section creates the three blob containers. We'll then register these containers as datastores in the workspace in the next cell.

First, identify the connection string of your blob storage account. This can be found in the Azure portal. Navigate to your storage account and  under *Settings* click the *Access keys* tab. Copy one of the connection strings and paste the value in the cell below.

In [None]:
connect_str = '<my-connection-string>' # Storage account connection string

**Note:** If you can't import BlobServiceClient, run the following pip install command to install the package. After you install the package, you'll need to restart the kernel and re-run the cell:

In [None]:
!pip install azure-storage-blob==12.1.0

In [None]:
from azure.storage.blob import BlobServiceClient

blob_service_client = BlobServiceClient.from_connection_string(connect_str)

# Create the training container
train_container_client = blob_service_client.create_container('training-output')

# Create the forecasting container
forecast_container_client = blob_service_client.create_container('forecasting-output')

## 6.0 Register blob containers to the workspace

After creating blob containers, register the containers to the workspace. You'll use these datastores in the training, scoring, and forecasting notebooks, respectively. 

We'll need the blob account name and account key. These can be found in the Azure portal. Navigate to your storage account and  under *Settings* click the *Access keys* tab. You should be able to find the *Storage account name* and *Key*. Paste the values into the cell below.

In [None]:
account_name = "<my-account-name>" # Storage account name
account_key = "<my-account-key>" # Storage account key

Now we can register the blob containers to the Workspace. We will call the **datastore_name** in the following notebooks. 

In [None]:
from azureml.core import Datastore

training_output_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name='training_output_datastore', 
           account_name=account_name,
           container_name='training-output', 
           account_key=account_key)

forecasting_output_datastore = Datastore.register_azure_blob_container(
           workspace=ws,
           datastore_name='forecasting_output_datastore',
           account_name=account_name, 
           container_name='forecasting-output', 
           account_key=account_key)  

You can use [Microsoft Storage Explorer](https://docs.microsoft.com/en-us/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows) to easily view your datastores and files. 

---

## Next steps

Now that you've created your configuration file, you're all set to move on to the [01_Data_Preparation.ipynb](https://github.com/microsoft/solution-accelerator-many-models/blob/master/01_Data_Preparation/01_Data_Preparation.ipynb) notebook to prepare your datasets.