# Retail Sales Performance and Inventory Reservation Use Case

Using RETAILER_UNION_V from SAP Datasphere, which are federated from Big Query. Also using DISTRIBUTOR_V, PRODUCT_V, and RETAIL_V which are local table views in SAP Datasphere.

# Install fedml_gcp package

In [1]:
pip install fedml_gcp

Collecting fedml_gcp
  Obtaining dependency information for fedml_gcp from https://files.pythonhosted.org/packages/6c/54/1e7d6fbb936a1b5b3bce7b26b47d2ad1a228a1e431a195066b768b3e4d39/fedml_gcp-2.1.1-py3-none-any.whl.metadata
  Downloading fedml_gcp-2.1.1-py3-none-any.whl.metadata (1.4 kB)
Collecting hdbcli (from fedml_gcp)
  Obtaining dependency information for hdbcli from https://files.pythonhosted.org/packages/22/24/16242b80461ff1c56f114e4c65b8ae595798bae98ba583101a7e14390f4b/hdbcli-2.19.21-cp34-abi3-manylinux1_x86_64.whl.metadata
  Downloading hdbcli-2.19.21-cp34-abi3-manylinux1_x86_64.whl.metadata (6.1 kB)
Collecting google (from fedml_gcp)
  Downloading google-3.0.0-py2.py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
Downloading fedml_gcp-2.1.1-py3-none-any.whl (11 kB)
Downloading hdbcli-2.19.21-cp34-abi3-manylinux1_x86_64.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Import Libraries

In [2]:
import os
#os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

import time
from fedml_gcp import dwcgcp

## Some constant variables to use throughout the notebook

In [3]:
PROJECT_ID = 'de-gcp-ema-con-e7906-npd-1'
REGION = 'us-central1'

BUCKET_NAME = 'fedml-demo'
BUCKET_URI = "gs://"+BUCKET_NAME
BUCKET_FOLDER = 'fedML'
MODEL_OUTPUT_DIR = BUCKET_URI+'/'+BUCKET_FOLDER
GCS_PATH_TO_MODEL_ARTIFACTS= MODEL_OUTPUT_DIR+'/model/'

TRAINING_PACKAGE_PATH = 'RetailTest'
PREDICTOR_PACKAGE_PATH = 'RetailTestPredictor'
JOB_NAME = "retail-training"

MODEL_DISPLAY_NAME = "retail-model"
DEPLOYED_MODEL_DISPLAY_NAME = 'retail-deployed-model'

TAR_BUNDLE_NAME = 'Retail.tar.gz'

CONTAINER_REGISTRY_REPOSITORY = 'retail'
IMAGE = 'image-'+str(int(time.time()))

# Create DwcGCP Instance to access class methods and train model

It is expected that the bucket name passed here already exists in Cloud Storage.

For information on this constructor, please refer to the libraries readme.

In [4]:
params = {'project':PROJECT_ID,
         'location':REGION, 
         'staging_bucket':BUCKET_URI}

In [5]:
dwc = dwcgcp.DwcGCP(params)

# Create tar bundle of script folder so GCP can use it for training

Please refer to the libraries readme for more information on the dwc.make_tar_bundle() function

Before running this cell, please ensure that the script package has all the necessary files for a training job.

In [6]:
dwc.make_tar_bundle(TAR_BUNDLE_NAME, 
                    TRAINING_PACKAGE_PATH, 
                    BUCKET_FOLDER+'/train/'+TAR_BUNDLE_NAME)


2024-01-09 21:06:41,337: fedml_gcp.logger INFO: File Retail.tar.gz uploaded to fedML/train/Retail.tar.gz.


## Determine which training image and deploying image you want to use. 

Please refer here for the training pre-built containers: https://cloud.google.com/vertex-ai/docs/training/create-python-pre-built-container

Please refer here for the deployment pre-built containers: https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers

In [7]:
TRAIN_VERSION = "sklearn-cpu.1-0"
DEPLOY_VERSION = "sklearn-cpu.1-3"

TRAIN_IMAGE = "us-docker.pkg.dev/vertex-ai/training/{}:latest".format(TRAIN_VERSION)
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(DEPLOY_VERSION)

In [8]:
#table_name = 'DATA_VIEW'
job_dir = 'gs://'+BUCKET_NAME

cmd_args = [
    "--job-dir=" + str(job_dir),
    "--bucket_name=" + str(BUCKET_NAME),
    "--bucket_folder=" + str(BUCKET_FOLDER),
    "--package_name=" + 'trainer',
    "--dist_table="+ 'DISTRIBUTOR_V',
    "--dist_size="+ '1',
    "--product_table="+ 'PRODUCT_V',
    "--product_size="+ '1',
    "--retailer_table="+ 'RETAIL_V',
    "--retailer_size="+ '1',
    "--combined_retailer_table="+ 'RETAILER_UNION_V',
    "--combined_retailer_size="+ '1',
    "--lgbmregression_objective="+ 'regression'
    
]

In [9]:
inputs ={
    'display_name':JOB_NAME,
    'python_package_gcs_uri':BUCKET_URI + '/' + BUCKET_FOLDER+'/train/'+TAR_BUNDLE_NAME,
    'python_module_name':'trainer.task',
    'container_uri':TRAIN_IMAGE,
    'model_serving_container_image_uri':DEPLOY_IMAGE,
}

In [11]:
run_job_params = {'model_display_name':MODEL_DISPLAY_NAME,
                  'args':cmd_args,
                  'replica_count':1,
                  'base_output_dir':MODEL_OUTPUT_DIR,
                  'sync':True}

In [12]:
job = dwc.train_model( 
                      training_inputs=inputs, 
                      training_type='customPythonPackage',
                     params=run_job_params)

2024-01-09 21:07:16,338: fedml_gcp.logger INFO: creating custom python package training job
2024-01-09 21:07:16,360: fedml_gcp.logger INFO: running training job
Training Output directory:
gs://fedml-demo/fedML 
View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1571969231610707968?project=179499000957
CustomPythonPackageTrainingJob projects/179499000957/locations/us-central1/trainingPipelines/1571969231610707968 current state:
PipelineState.PIPELINE_STATE_RUNNING
View backing custom job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/8917618400293814272?project=179499000957
CustomPythonPackageTrainingJob projects/179499000957/locations/us-central1/trainingPipelines/1571969231610707968 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomPythonPackageTrainingJob projects/179499000957/locations/us-central1/trainingPipelines/1571969231610707968 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomPythonPackageT

## Deployment

For information on the dwc.deploy() function please refer to the libraries readme.

Here we are deploying a custom predictor for the model we trained above.

In [13]:
from RetailTestPredictor.predictor import MyPredictor

In [16]:
cpr_model_config = {
    'src_dir': PREDICTOR_PACKAGE_PATH,
    'output_image_uri':f"gcr.io/{PROJECT_ID}/{CONTAINER_REGISTRY_REPOSITORY}/{IMAGE}",
    'predictor':MyPredictor,
    'requirements_path':os.path.join(PREDICTOR_PACKAGE_PATH, "requirements.txt"),
    'no_cache':True

}
upload_config = {
    'display_name':DEPLOYED_MODEL_DISPLAY_NAME,
    'artifact_uri':GCS_PATH_TO_MODEL_ARTIFACTS,
}

In [17]:
model = dwc.upload_custom_predictor(cpr_model_config, upload_config)

2024-01-09 22:12:15,218: fedml_gcp.logger INFO: building custom predictor routine


FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'

In [None]:
model_config = {'machine_type': "n1-standard-4", 'traffic_split':{"0": 100}}
endpoint = dwc.deploy(model, model_config)

# Inferencing

In [None]:
import os
import pandas as pd
import numpy as np
import json
from fedml_gcp import DbConnection

In [None]:
data = { 'instances': 
    [
        {"dist_table": 'DISTRIBUTOR_V'},
        {"dist_size": '1'},
        {"product_table":"PRODUCT_V"},
        {"product_size":"1"},
        {"retailer_table": "RETAIL_V"},
        {"retailer_size": "1"},
        {"combined_retailer_table":"RETAILER_UNION_V"},
        {"combined_retailer_size": "1"}
    ]
}


In [None]:
response = dwc.predict(endpoint=endpoint, predict_params=data)

# Write results back to SAP Datasphere

In [None]:
result_df = pd.DataFrame(response, columns=['retailer', 'productsku', 'calendar_year',
                          'calendar_month', 'Predictions'])

In [None]:
types = {'retailer': 'int',
'productsku': 'int',
'calendar_year': 'int',
'calendar_month': 'int'}
result_df = result_df.astype(types)

In [None]:
result_df['ID'] = result_df.index

In [None]:
result_df.head(10)

In [None]:
db2 = DbConnection(url='RetailTestPredictor/config.json')

In [None]:
db2.create_table("CREATE TABLE Retail_Predictions_GCP (ID INTEGER PRIMARY KEY, retailer INTEGER, productsku INTEGER, calendar_year INTEGER, calendar_month INTEGER, Predictions FLOAT(2))")


In [None]:
db2.insert_into_table('Retail_Predictions_GCP', result_df)