## Retail Sales Prediction Using Azure Machine Learning With Data Stored In Data Lakehouse

In this notebook we will train and score a predictive model for retail company. We are going to ingest the data from data lakehouse and build the model.

![](https://stretaildemodev.blob.core.windows.net/notebookimages/business_meeting.jpg?sp=r&st=2022-01-04T00:27:29Z&se=2023-12-31T08:27:29Z&spr=https&sv=2020-08-04&sr=b&sig=2mJCyZkCQKyNh02%2FSf%2BXX5qYp44%2BSoDl%2FQfdp66Hg5k%3D)

### Retail Data Dictionary
- store - store number 
- brand - brand indicator
- week - week number
- logmove - log of units sold
- price
- feat - feature advertisement
- age60 - percentage of the population that is aged 60 or older
- educucation - percentage of the population that has a college degree
- ethinicity - percent of the population that is black or Hispanic
- income - median income
- hhlarge - percentage of households with 5 or more persons
- workwom - percentage of women with full-time jobs
- hval150 - percentage of households worth more than $150,000
- sstrdist - distance to the nearest warehouse store
- sstrvol - ratio of sales of this store to the nearest warehouse store
- cpdist5 - average distance in miles to the nearest 5 supermarkets
- cpwvol5 - ratio of sales of this store to the average of the nearest five store

### Importing Libraries

In [16]:
import azureml.core
import pandas as pd

from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
from notebookutils import mssparkutils
from azureml.data.dataset_factory import TabularDatasetFactory

StatementMeta(retailpool24, 20, 2, Finished, Available)

### Configuring Workspace and Experiment

In [None]:
ws = Workspace.get(name="#ML_WORKSPACE#",subscription_id='#SUBSCRIPTION_ID#', resource_group='#RESOURCE_GROUP_NAME#')

linkedService_name = "AzureMLService"
experiment_name = "syndreamdemoretaildev-RetailSalesData-20211231061227"

# ws = mssparkutils.azureML.getWorkspace(linkedService_name)
ws = Workspace.get(name="#ML_WORKSPACE#",subscription_id='#SUBSCRIPTION_ID#', resource_group='#RESOURCE_GROUP_NAME#')
experiment = Experiment(ws, experiment_name)
datastore = Datastore.get_default(ws)


StatementMeta(, , , Cancelled, )

### Reading data from table

In [30]:
df = spark.sql("SELECT * FROM `CDPRetailCloth`.`RetailSalesData`")
display(df)

StatementMeta(retailpool24, 20, 16, Finished, Available)

SynapseWidget(Synapse.DataFrame, bc8ee15a-3e62-42f5-b1cb-38d6cee1ca12)

### Converting to Pandas Dataframe

In [None]:
pandasdf=df.select("*").toPandas()
pandasdf["logmove"] = pd.to_numeric(pandasdf["logmove"])
dataset = TabularDatasetFactory.register_pandas_dataframe(pandasdf, datastore, name = experiment_name + "-dataset")

StatementMeta(, , , Cancelled, )

### Model Configuration

In [26]:
automl_config = AutoMLConfig(spark_context = sc,
                             task = "regression",
                             training_data = dataset,
                             label_column_name = "logmove",
                             primary_metric = "normalized_root_mean_squared_error",
                             experiment_timeout_hours = 0.5,
                             max_concurrent_iterations = 2,
                             enable_onnx_compatible_models = False)

StatementMeta(retailpool24, 20, 12, Finished, Available)

### Submiting Experiment

In [27]:
run = experiment.submit(automl_config)

StatementMeta(retailpool24, 20, 13, Finished, Available)

Submitting spark run.

In [28]:
displayHTML("<a href={} target='_blank'>Your experiment in Azure Machine Learning portal: {}</a>".format(run.get_portal_url(), run.id))

StatementMeta(retailpool24, 20, 14, Finished, Available)

### Get best model 

In [29]:
run.wait_for_completion()

import mlflow

# Get best model from automl run
best_run, non_onnx_model = run.get_output()

artifact_path = experiment_name + "_artifact"

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    # Save the model to the outputs directory for capture
    mlflow.sklearn.log_model(non_onnx_model, artifact_path)

    # Register the model to AML model registry
    mlflow.register_model("runs:/" + run.info.run_id + "/" + artifact_path, "syndreamdemoretaildev-RetailSalesData-20211231061227-Best")

StatementMeta(retailpool24, 20, 15, Finished, Available)

<ModelVersion: creation_timestamp=1641251612962, current_stage='None', description='', last_updated_timestamp=1641251612962, name='syndreamdemoretaildev-RetailSalesData-20211231061227-Best', run_id='bd3dbe35-928b-457a-8501-acf3d946f571', run_link='', source='azureml://experiments/syndreamdemoretaildev-RetailSalesData-20211231061227/runs/bd3dbe35-928b-457a-8501-acf3d946f571/artifacts/syndreamdemoretaildev-RetailSalesData-20211231061227_artifact', status='READY', status_message='', tags={}, user_id='', version='2'>

Registered model 'syndreamdemoretaildev-RetailSalesData-20211231061227-Best' already exists. Creating a new version of this model...
2022/01/03 23:13:33 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: syndreamdemoretaildev-RetailSalesData-20211231061227-Best, version 2
Created version '2' of model 'syndreamdemoretaildev-RetailSalesData-20211231061227-Best'.