## Retail Sales Forecasting Model Building with AutoML

In this notebook we will train and score a predictive model for retail company. We are going to ingest the data from data lakehouse and build the model.

### Retail Data Dictionary
- store - store number 
- brand - brand indicator
- week - week number
- logmove - log of units sold
- price - price of a single unit
- feat - feature advertisement
- age60 - percentage of the population that is aged 60 or older
- educucation - percentage of the population that has a college degree
- ethinicity - percent of the population that is black or Hispanic
- income - median income
- hhlarge - percentage of households with 5 or more persons
- workwom - percentage of women with full-time jobs
- hval150 - percentage of households worth more than $150,000
- sstrdist - distance to the nearest warehouse store
- sstrvol - ratio of sales of this store to the nearest warehouse store
- cpdist5 - average distance in miles to the nearest 5 supermarkets
- cpwvol5 - ratio of sales of this store to the average of the nearest five store
- time - Date and time

![Sales Forecasting](https://stretaildemodev.blob.core.windows.net/notebookimages/business_meeting2.jpg?sp=r&st=2022-01-04T00:29:08Z&se=2023-12-31T08:29:08Z&spr=https&sv=2020-08-04&sr=b&sig=Y8IoqTvt5VTjZFsDyEQ9FWtoZ9wvHjLoBJwr6nmyc7I%3D)

### Importing libraries

In [1]:
import azureml.core
import GlobalVariables as gv

from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig

### Configuring Workspace and Experiment

In [2]:
#linkedService_name = "AzureMLService"
experiment_name = "syndreamdemoretaildev-RetailSalesData-20211231100029"
ws = Workspace.get(name=gv.WORKSPACE_NAME,subscription_id=gv.SUBSCIPTION_ID, resource_group=gv.RESOURCE_GROUP)
# ws = mssparkutils.azureML.getWorkspace(linkedService_name)

experiment = Experiment(ws, experiment_name)

### Reading data from table

In [3]:
#Read from pandas
import pandas as pd
#df = spark.sql("SELECT * FROM CDPRetailCloth.RetailSalesData")
df = pd.read_csv("retail_sales_datasetv2.csv")
datastore = Datastore.get_default(ws)
dataset = Dataset.Tabular.register_pandas_dataframe(df, datastore, "dataset_from_pandas_df", show_progress=True)
#dataset = TabularDatasetFactory.register_spark_dataframe(df, datastore, name = experiment_name + "-dataset")

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/54b7795c-f1df-4f63-9750-438a32d6ccf1/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


### Model Configuration

In [4]:
from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecasting_parameters = ForecastingParameters(
    time_column_name = "time",
    forecast_horizon = "auto",
    time_series_id_column_names = ["store","brand"],
)

automl_config = AutoMLConfig(
                             task = "forecasting",
                             training_data = dataset,
                             label_column_name = "logmove",
                             primary_metric = "normalized_root_mean_squared_error",
                             experiment_timeout_hours = 0.5,
                             max_concurrent_iterations = 2,
                             n_cross_validations = 5,
                             forecasting_parameters = forecasting_parameters)

### Submitting Experiment

In [5]:
run = experiment.submit(automl_config)



Experiment,Id,Type,Status,Details Page,Docs Page
syndreamdemoretaildev-RetailSalesData-20211231100029,AutoML_2fe1a8e8-f4a6-4d2f-83dd-fedf978e15b3,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


### Get best model

In [6]:
run.wait_for_completion()

import mlflow

# Get best model from automl run
best_run, non_onnx_model = run.get_output()

artifact_path = experiment_name + "_artifact"

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    # Save the model to the outputs directory for capture
    mlflow.sklearn.log_model(non_onnx_model, artifact_path)

    # Register the model to AML model registry
    mlflow.register_model("runs:/" + run.info.run_id + "/" + artifact_path, "syndreamdemoretaildev-RetailSalesData-20211231100029-Best")

Successfully registered model 'syndreamdemoretaildev-RetailSalesData-20211231100029-Best'.
2022/01/03 23:24:42 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: syndreamdemoretaildev-RetailSalesData-20211231100029-Best, version 1
Created version '1' of model 'syndreamdemoretaildev-RetailSalesData-20211231100029-Best'.
