# Customer Classification using E-Commerce Dataset


## Azure ML Workspace

### Set up your development environment
All the setup for development work can be accomplished in a Python notebook. Setup includes:

- Importing Python packages
- Connecting to a workspace to enable communication between your local computer and remote resources
- Creating an experiment to track all your runs
- Creating a remote compute target to use for training

#### Import packages
Import Python packages you need in this session. Also display the Azure Machine Learning SDK version:

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import azureml.core
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.17


#### Connect to a workspace
Create a workspace object from the existing workspace.

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Found the config file in: /home/nbuser/library/config.json
Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FTKLDS448 to authenticate.
Interactive authentication successfully completed.
ecommerce-ws	eastus	ecommerce-aml	eastus


#### Create an experiment
Create an experiment to track the runs in your workspace. A workspace can have multiple experiments:

In [3]:
experiment_name = 'final_experiment'

from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)

#### Create or attach an existing compute resource
By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.

In [4]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

creating a new compute target...
Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-03-26T09:05:57.422000+00:00', 'creationTime': '2019-03-26T09:05:54.230893+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-03-26T09:06:10.349008+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


### Explore data

#### Upload data to the cloud
Now make the data accessible remotely by uploading that data from your local machine into Azure. Then it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload or download data. You can also interact with it from your remote compute targets. It's backed by an Azure Blob storage account.

In [5]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name, ds.name)

AzureBlob ecommercews3975002265 azureml-blobstore-99335e38-05f7-46cc-8735-3c57faeb0b5f workspaceblobstore


In [6]:
data_folder = os.path.join(os.getcwd(), 'Data')
os.makedirs(data_folder, exist_ok = True)

In [7]:
ds.upload(src_dir=data_folder, target_path='Cloud_Data', overwrite=True, show_progress=True)

Uploading /home/nbuser/library/Data/data.csv
Uploaded /home/nbuser/library/Data/data.csv, 1 files out of an estimated total of 1


$AZUREML_DATAREFERENCE_907391106fc645e1aa2532ebd6d709fa

### Train on a remote cluster
For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:

- Create a directory
- Create a training script
- Create an estimator object
- Submit the job

#### Create a directory
Create a directory to deliver the necessary code from your computer to the remote resource.

In [8]:
import os
script_folder = os.path.join(os.getcwd(), "Train")
os.makedirs(script_folder, exist_ok=True)

#### Create a training script
train.ipynb

In [14]:
from azureml.train.estimator import Estimator

script_params = {
    '--data-folder': ds.path('Cloud_Data').as_mount()
}

est = Estimator(source_directory=script_folder,
                compute_target=compute_target,
                script_params=script_params,
                entry_script='train.py',
                conda_packages=["scikit-learn", "matplotlib", "pandas", "numpy", "seaborn", "nltk", "jinja2"])

In [15]:
run = exp.submit(config=est)
run

Experiment,Id,Type,Status,Details Page,Docs Page
final_experiment,final_experiment_1553593109_bdfad989,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


In [16]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [17]:
run.wait_for_completion(show_output=False)

{'runId': 'final_experiment_1553593109_bdfad989',
 'target': 'cpucluster',
 'status': 'Completed',
 'startTimeUtc': '2019-03-26T09:44:13.690549Z',
 'endTimeUtc': '2019-03-26T09:47:32.289081Z',
 'properties': {'azureml.runsource': 'experiment',
  'ContentSnapshotId': '9b7cf02d-b0db-4526-9c92-93d615286055'},
 'runDefinition': {'script': 'train.py',
  'arguments': ['--data-folder',
   '$AZUREML_DATAREFERENCE_a16b05518ab649c6bf6e652278416d21'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpucluster',
  'dataReferences': {'a16b05518ab649c6bf6e652278416d21': {'dataStoreName': 'workspaceblobstore',
    'mode': 'Mount',
    'pathOnDataStore': 'Cloud_Data',
    'pathOnCompute': None,
    'overwrite': False}},
  'jobName': None,
  'autoPrepareEnvironment': True,
  'maxRunDurationSeconds': None,
  'nodeCount': 1,
  'environment': {'name': 'Experiment final_experiment Environment',
   'version': 1,
   'python': {'interpreterPath': 'python',
 

In [18]:
print(run.get_metrics())
print(run.get_file_names())

{'Accuracy': 0.9143538008178975}
['azureml-logs/55_batchai_execution.txt', 'azureml-logs/60_control_log.txt', 'azureml-logs/80_driver_log.txt', 'azureml-logs/azureml.log', 'outputs/test_model.pkl']


In [19]:
# register model 
model = run.register_model(model_name='test_model', model_path='outputs/test_model.pkl')
print(model.name, model.id, model.version, sep = '\t')

test_model	test_model:1	1


In [53]:
compute_target.delete()