****Important – Do not use in production, for demonstration purposes only – please review the legal notices before continuing****

## Retail Sales Forecasting Model Building with AutoML

Using AutoML, we will train and evaluate a retail forecasting model in this notebook.

### Retail Data Dictionary
- store - store number 
- brand - brand indicator
- week - week number
- logmove - log of units sold
- price - price of a single unit
- feat - feature advertisement
- age60 - percentage of the population that is aged 60 or older
- educucation - percentage of the population that has a college degree
- ethinicity - percent of the population that is black or Hispanic
- income - median income
- hhlarge - percentage of households with 5 or more persons
- workwom - percentage of women with full-time jobs
- hval150 - percentage of households worth more than $150,000
- sstrdist - distance to the nearest warehouse store
- sstrvol - ratio of sales of this store to the nearest warehouse store
- cpdist5 - average distance in miles to the nearest 5 supermarkets
- cpwvol5 - ratio of sales of this store to the average of the nearest five store
- time - Date and time

### Importing libraries

In [1]:
#%pip install certifi==2022.9.24

In [2]:
import azureml.core
import pandas as pd
from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
from azureml.automl.core.forecasting_parameters import ForecastingParameters
import mlflow

import logging

### Configuring Workspace and Experiment

In [3]:
ws = Workspace.from_config()

# choose a name for the run history container in the workspace
experiment_name = "aml_retail_sales_forecasting"

experiment = Experiment(ws, experiment_name)

output = {}
output["Subscription ID"] = ws.subscription_id
output["Workspace"] = ws.name
output["SKU"] = ws.sku
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
output["Run History Name"] = experiment_name
output["SDK Version"] = azureml.core.VERSION
pd.set_option("display.max_colwidth", None)
outputDf = pd.DataFrame(data=output, index=[""])
outputDf.T

Unnamed: 0,Unnamed: 1
Subscription ID,506e86fc-853c-4557-a6e5-ad72114efd2b
Workspace,amlws-midp
SKU,Basic
Resource Group,rg-midpwithazurecosmos-prod
Location,eastus2
Run History Name,aml_retail_sales_forecasting
SDK Version,1.48.0


### Creating the dataset for Azure Machine Learning

In [5]:
df = pd.read_csv("retail_sales_datasetv2.csv")
datastore = Datastore.get_default(ws)
dataset = Dataset.Tabular.register_pandas_dataframe(df, datastore, "dataset_from_pandas_df", show_progress=True)

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/4b3f6576-db1c-4433-884b-c7d72fce888a/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


### Model Configuration

In [7]:
forecasting_parameters = ForecastingParameters(
    time_column_name = "time",
    forecast_horizon = "auto",
    time_series_id_column_names = ["store","brand"],
)

automl_config = AutoMLConfig(
                             task = "forecasting",
                             training_data = dataset,
                             label_column_name = "logmove",
                             primary_metric = "normalized_root_mean_squared_error",
                             experiment_timeout_hours = 0.3,
                             max_concurrent_iterations = 2,
                             #compute_target=compute_target,
                             enable_early_stopping=True,
                             n_cross_validations="auto",  # Feel free to set to a small integer (>=2) if runtime is an issue.
                             max_cores_per_iteration=-1,
                             verbosity=logging.INFO,
                             forecasting_parameters=forecasting_parameters,)

run = experiment.submit(automl_config, show_output=True)                            

No run_configuration provided, running on local with default configuration
Running in the active local environment.


Experiment,Id,Type,Status,Details Page,Docs Page
aml_retail_sales_forecasting,AutoML_55f03bef-c576-4c29-b7f8-95c76c2a2b18,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: DatasetFeaturization. Beginning to featurize the CV split.
Current status: DatasetFeaturizationCompleted. Completed featurizing the CV split.
Current status: DatasetFeaturization. Beginning to featurize the CV split.
Current status: DatasetFeaturizationCompleted. Completed featurizing the CV split.
Current status: DatasetFeaturization. Beginning to featurize the CV split.
Current status: DatasetFeaturizationCompleted. Completed featurizing the CV split.
Current status: DatasetFeaturization. Beginning to featurize the CV split.
Current status: DatasetFeaturizationCompleted. Completed featurizing the CV split.
Current status: DatasetFeaturization. Beginning to featurize the CV split.
Current status: DatasetFeaturizationCompleted.

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


### Submitting Experiment

In [8]:
# Get best model from automl run
best_run, non_onnx_model = run.get_output()

experiment_name = 'aml_retail_sales_forecasting'
artifact_path = experiment_name + "_artifact"

### Registering the best model using mlflow

In [9]:
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    # Save the model to the outputs directory for capture
    mlflow.sklearn.log_model(non_onnx_model, artifact_path)

    # Register the model to AML model registry
    mlflow.register_model("runs:/" + run.info.run_id + "/" + artifact_path, "nrf-RetailSalesForecast")

Successfully registered model 'nrf-RetailSalesForecast'.
2023/02/02 16:36:54 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: nrf-RetailSalesForecast, version 1
Created version '1' of model 'nrf-RetailSalesForecast'.
