## Create Azure Machine Learning datasets for Anomaly Detection

Azure Machine Learning datasets can be extremely useful for your local or remote experiments. In this notebook, we will do the following things.

1. Configure workspace using credentials for Azure subscription
2. Download the dataset from ADLS Gen2
3. Upload the featured dataset into the default datastore in Azure
4. Register the featured dataset into Azure


## Configure workspace using credentials for Azure subscription

As part of the setup you have already created a Workspace. To run AutoML, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem.

In [12]:
# Install the required package
!pip install azure-storage-blob==2.1.0

# Import the libraries
from azureml.core import Workspace

# Importing user defined config
import config

# Import the subscription details as below to access the resources
subscription_id=config.subscription_id
resource_group=config.resource_group
workspace_name=config.workspace_name

try:
    workspace = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    workspace.write_config()
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")

Workspace configuration succeeded. Skip the workspace creation steps below


## Download the  dataset from ADLS Gen2

In [13]:
## setting up the credentials for ADLS Gen2
import os
from azure.storage.blob import BlockBlobService

# setting up blob storage configs
STORAGE_ACCOUNT_NAME = config.STORAGE_ACCOUNT_NAME
STORAGE_ACCOUNT_ACCESS_KEY = config.STORAGE_ACCOUNT_ACCESS_KEY
STORAGE_CONTAINER_NAME = "azureml-mfg"

blob_service = BlockBlobService(STORAGE_ACCOUNT_NAME, STORAGE_ACCOUNT_ACCESS_KEY) 

# Create a project_folder if it doesn't exist
if not os.path.isdir('anomalydata'):
 os.mkdir('anomalydata')

output_file_path=os.path.join(os.getcwd(),"anomalydata", "mfg_anomaly_pdm.csv")
output_blob_file= "mfg_anomaly_pdm.csv"

# uploading the csv to  the ADLSGen2 storage container
blob_service.get_blob_to_path(STORAGE_CONTAINER_NAME, output_blob_file,output_file_path)



<azure.storage.blob.models.Blob at 0x7f9e1ddada20>

## Upload the featured dataset into the default datastore in Azure

In [15]:
#Uploading dataset to the Datastore 
from sklearn import datasets
from azureml.core.dataset import Dataset
from scipy import sparse
import os 

ds = workspace.get_default_datastore()
ds.upload(src_dir='./anomalydata', target_path='mfganomalydata', overwrite=True, show_progress=True)
 
final_df = Dataset.Tabular.from_delimited_files(path=ds.path('mfganomalydata/mfg_anomaly_pdm.csv'))

Uploading an estimated of 1 files
Uploading ./anomalydata/mfg_anomaly_pdm.csv
Uploaded ./anomalydata/mfg_anomaly_pdm.csv, 1 files out of an estimated total of 1
Uploaded 1 files


## Register the featured dataset into Azure

In [None]:
#Registering the dataset in Azure  ML
train_data_registered = final_df.register(workspace=workspace,
                                 name='pdmanomalymfg',
                                 description='Synapse Mfg data',
                                 tags= {'type': 'Mfg', 'date':'2020'},
                                 create_new_version=False)