## Create Azure Machine Learning datasets for Customer Churn prediction

Azure Machine Learning datasets can be extremely useful for your local or remote experiments. In this notebook, we will do the following things.

1. Configure workspace using credentials for Azure subscription
2. Download the dataset from ADLS Gen2
3. Upload the featured dataset into the default datastore in Azure
4. Register the featured dataset into Azure

## Disclaimer

By accessing this code, you acknowledge the code is made available for presentation and demonstration purposes only and that the code (1) is not subject to SOC 1 and SOC 2 compliance audits, and (2) is not designed or intended to be a substitute for the professional advice, diagnosis, treatment, or judgment of a certified financial services professional. Do not use this code to replace, substitute, or provide professional financial advice, or judgement. You are solely responsible for ensuring the regulatory, legal, and/or contractual compliance of any use of the code, including obtaining any authorizations or consents, and any solution you choose to build that incorporates this code in whole or in part.

© 2021 Microsoft Corporation. All rights reserved

## Configure workspace using credentials for Azure subscription

As part of the setup you have already created a Workspace. To run AutoML, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem.

In [1]:
# !pip install azure-mgmt-resource==10.2.0

In [2]:
# pip install azure-mgmt-resource

In [3]:
# pip install azure-mgmt-resource==19.0.0

In [4]:
# Install the required package

from azureml.core import Workspace
import GlobalVariables

print(GlobalVariables.subscription_id)
print(GlobalVariables.resource_group)
print(GlobalVariables.workspace_name)

try:
    workspace = Workspace(subscription_id =GlobalVariables.subscription_id, 
                          resource_group =GlobalVariables.resource_group,
                          workspace_name =GlobalVariables.workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")

3f01ab49-a56f-4ee7-97fa-d23155156b42
media-test
amlws-zquzavzvs5x6m-pm0208
Workspace configuration succeeded. Skip the workspace creation steps below


## Download the  dataset from ADLS Gen2

In [5]:
## setting up the credentials for ADLS Gen2
import os
from azure.storage.blob import ContainerClient


STORAGE_CONTAINER_NAME = "azuremldatasets"

container_client = ContainerClient.from_connection_string(GlobalVariables.STORAGE_ACCOUNT_CONNECTION_STRING, STORAGE_CONTAINER_NAME) 
blobs_list = container_client.list_blobs()


output_file_path=os.path.join(os.getcwd(),"data", "retail_banking_customer_churn.csv")
output_blob_file= "retail_banking_customer_churn_data.csv"

# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
    os.mkdir('data')


# uploading the csv to the ADLSGen2 storage container
container_client.upload_blob(name=output_blob_file, data=output_file_path, overwrite=True)



<azure.storage.blob._blob_client.BlobClient at 0x7fba38149898>

## Upload the featured dataset into the default datastore in Azure

In [7]:
from sklearn import datasets
from azureml.core.dataset import Dataset
from scipy import sparse
import os 
 
# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
    os.mkdir('data')
 
 
ds = workspace.get_default_datastore()
ds.upload(src_dir='./data', target_path='retail_banking', overwrite=True, show_progress=True)
 
final_df = Dataset.Tabular.from_delimited_files(path=ds.path('retail_banking/retail_banking_customer_churn_data.csv'))

Uploading an estimated of 2 files
Uploading ./data/prepared_customer_churn_data.csv
Uploaded ./data/prepared_customer_churn_data.csv, 1 files out of an estimated total of 2
Uploading ./data/retail_banking_customer_churn_data.csv
Uploaded ./data/retail_banking_customer_churn_data.csv, 2 files out of an estimated total of 2
Uploaded 2 files


## Register the featured dataset into Azure

In [8]:
# train_data_registered = Dataset.get_by_name(amlworkspace,"train_data",version='latest')
#train_data_registered.unregister_all_versions()

train_data_registered = final_df.register(workspace=workspace,
                                 name='customer_churn',
                                 description='Synapse Retail Banking Customer Churn Dataset - Original',
                                 tags= {'type': 'Banking', 'date':'2020'},
                                 create_new_version=True)