## Create Azure Machine Learning datasets for Customer Churn prediction

Azure Machine Learning datasets can be extremely useful for your local or remote experiments. In this notebook, we will do the following things.

1. Configure workspace using credentials for Azure subscription
2. Download the dataset from ADLS Gen2
3. Upload the featured dataset into the default datastore in Azure
4. Register the featured dataset into Azure

## *****For Demonstration purpose only, Please customize as per your enterprise security needs and compliances***** 
Disclaimer: By accessing this code, you acknowledge the code is made available for presentation and demonstration purposes only and that the code: (1) is not subject to SOC 1 and SOC 2 compliance audits; (2) is not designed or intended to be a substitute for the professional advice, diagnosis, treatment, or judgment of a certified financial services professional; (3) is not designed, intended or made available as a medical device; and (4) is not designed or intended to be a substitute for professional medical advice, diagnosis, treatment or judgement. Do not use this code to replace, substitute, or provide professional financial advice or judgment, or to replace, substitute or provide medical advice, diagnosis, treatment or judgement. You are solely responsible for ensuring the regulatory, legal, and/or contractual compliance of any use of the code, including obtaining any authorizations or consents, and any solution you choose to build that incorporates this code in whole or in part. 


## Configure workspace using credentials for Azure subscription

As part of the setup you have already created a Workspace. To run AutoML, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem.

In [1]:
# Install the required package
# import pip 
# pip.main(['install','azure-storage-blob==2.1.0'])
# !pip install azure-storage-blob==2.1.0

from azureml.core import Workspace

subscription_id='#SUBSCRIPTION_ID#'
resource_group='#RESOURCE_GROUP_NAME#'
workspace_name='#ML_WORKSPACE_NAME#'

try:
    workspace = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    workspace.write_config()
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")

StatementMeta(FinanceSpk, 57, 1, Finished, Available)

Workspace configuration succeeded. Skip the workspace creation steps below

## Download the  dataset from ADLS Gen2

In [2]:
from azure.storage.blob import ContainerClient, BlobClient
import pandas as pd
from io import BytesIO

CONNECTIONSTRING = 'DefaultEndpointsProtocol=https;AccountName=#STORAGE_ACCOUNT_NAME#;AccountKey=#STORAGE_ACCOUNT_KEY#;EndpointSuffix=core.windows.net'
CONTAINER_NAME = 'retail-banking-customer-churn'

BLOBNAME = 'retail_banking_customer_churn_for_model.csv'
blob = BlobClient.from_connection_string(conn_str=CONNECTIONSTRING, container_name=CONTAINER_NAME, blob_name=BLOBNAME)
blob_data = blob.download_blob()
BytesIO(blob_data.content_as_bytes())
# uploading the csv to the ADLSGen2 storage container
data = pd.read_csv(BytesIO(blob_data.content_as_bytes()))
data.to_csv(BLOBNAME,header=True)

StatementMeta(FinanceSpk, 57, 2, Finished, Available)



In [3]:
data.head()

StatementMeta(FinanceSpk, 57, 3, Finished, Available)

   age  marital  housing  ...  job_student  job_technician  job_unemployed
0   56        1        1  ...            0               0               0
1   57        1        1  ...            0               0               0
2   37        1        2  ...            0               0               0
3   40        1        1  ...            0               0               0
4   56        1        1  ...            0               0               0

[5 rows x 32 columns]

## Upload the featured dataset into the default datastore in Azure

In [5]:
from shutil import copyfile
from sklearn import datasets
from azureml.core.dataset import Dataset
from scipy import sparse
import os 
 

# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
 os.mkdir('data')
 
copyfile("retail_banking_customer_churn_for_model.csv", "data/retail_banking_customer_churn_for_model.csv") 
ds = workspace.get_default_datastore()
ds.upload(src_dir='./data', target_path='retail_banking', overwrite=True, show_progress=True)
 
final_df = Dataset.Tabular.from_delimited_files(path=ds.path('retail_banking/retail_banking_customer_churn_for_model.csv'))

StatementMeta(FinanceSpk, 57, 5, Finished, Available)

Uploading an estimated of 1 files
Uploading ./data/retail_banking_customer_churn_for_model.csv
Uploaded ./data/retail_banking_customer_churn_for_model.csv, 1 files out of an estimated total of 1
Uploaded 1 files

## Register the featured dataset into Azure

In [7]:
# train_data_registered = Dataset.get_by_name(amlworkspace,"train_data",version='latest')
#train_data_registered.unregister_all_versions()

train_data_registered = final_df.register(workspace=workspace,
                                 name='customer_churn',
                                 description='Synapse Retail Banking Customer Churn Dataset - Original',
                                 tags= {'type': 'Banking', 'date':'2020'},
                                 create_new_version=True)

StatementMeta(FinanceSpk, 57, 7, Finished, Available)

