<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Tutorial on generating an explanation for a tabular model on Watson OpenScale

This notebook includes steps for creating a tabular watson-machine-learning model, creating a subscription, configuring explainability, and finally generating an explanation for a transaction.

### Contents
- [1. Setup](#setup)
- [2. Creating and deploying a tabular model](#deployment)
- [3. Subscriptions](#subscription)
- [4. Explainability](#explainability)

**Note**: If using Watson Studio, try running the notebook on atleast 'Default Python 3.5 XS' version for faster results.

<a id="setup"></a>
## 1. Setup

### 1.1 Install Watson OpenScale and WML packages

In [None]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1

In [None]:
!pip install --upgrade watson-machine-learning-client --no-cache | tail -n 1

Note: Restart the kernel to assure the new libraries are being used.

### 1.2 Configure credentials

Get the Watson Openscale `apikey` by going to the [Bluemix console](https://console.bluemix.net/) and clicking `Manage->Account->Users`. Select `Platform API Keys` from the sidebar and then click the "Create" button.

One can obtain the Watson OpenScale `instance_id` (guid) by accessing the [cloud console](https://cloud.ibm.com/resources), clicking on `Services` and clicking anywhere on the Watson OpenScale service tile except for the service link and then checking the popping sidebar on the right.

In [None]:
AIOS_CREDENTIALS = {
    "instance_guid": "*****",
    "apikey": "*****", 
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

Generate or fetch the WML credentials by clicking on `Credentials` in the sidebar of the provisioned WML page. 

In [None]:
WML_CREDENTIALS = {
    "apikey": "*****",
    "iam_apikey_description": "*****",
    "iam_apikey_name": "*****",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "*****",
    "instance_id": "*****",
    "password": "*****",
    "url": "https://us-south.ml.cloud.ibm.com",
    "username": "*****"
}

Generate COS credentials by going to the provisioned COS service page and selecting `Writer` role. Specify the following in the "Add Inline Configuration Parameters (Optional)" field: `{"HMAC":true}`.

In [None]:
cos_credentials = {
    "apikey": "*****",
    "cos_hmac_keys": {
        "access_key_id": "*****",
        "secret_access_key": "*****"
    },
    "endpoints": "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints",
    "iam_apikey_description": "*****",
    "iam_apikey_name": "*****",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "*****",
    "resource_instance_id": "*****"
}

<a id="deployment"></a>
## 2. Creating and deploying a tabular model

The dataset used is "GoSales Transactions for Naive Bayes Model" which is publicly available (on Data Science Experience Community). The file would be downloaded in the next step.
      
The dataset details anonymous outdoor equipment purchases to be used for multiclass classification. It is used to create a model below to predict the product line that the customer must be most interested in, such as golf accessories, camping equipment, and others.

### 2.1 Load the training data

In [None]:
!rm GoSales_Tx.csv
!wget https://raw.githubusercontent.com/pmservice/wml-sample-models/master/spark/product-line-prediction/data/GoSales_Tx.csv

### 2.2 Uploading the dataset to COS 

In [None]:
!pip install ibm-cos-sdk

from ibm_botocore.client import Config
import ibm_boto3
  
cos_url = "*****" # example: https://s3-api.us-geo.objectstorage.softlayer.net
auth_endpoint = "*****" # example: https://iam.bluemix.net/oidc/token
bucket = "*****"

cos = ibm_boto3.resource(service_name="s3",
        ibm_api_key_id=cos_credentials["apikey"],
        ibm_auth_endpoint=auth_endpoint,
        config=Config(signature_version='oauth'),
        endpoint_url=cos_url)

# Delete the file if it already exists in COS
cos.Object(bucket, 'GoSales_Tx.csv').delete()

# Upload the file to COS
cos.Object(bucket, 'GoSales_Tx.csv').upload_file('GoSales_Tx.csv')
print("\nUpload Complete")

### 2.3 Creating a model

**Note**: Skip the pyspark install step below if you are using a Spark kernel on Watson Studio.

In [None]:
!pip install pyspark==2.3.1

**Note**: When running this notebook locally, If the `SparkSession` import fails below, set 'SPARK_HOME' environment variable with the path to `pyspark` installation.

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.read.csv(path="GoSales_Tx.csv", sep=",", header=True, inferSchema=True)
df.show(5, truncate = False)

In [None]:
train_df, test_df = df.randomSplit([0.8, 0.2], seed=12345)
print("Total count of data set: {}".format(df.count()))
print("Total count of training data set: {}".format(train_df.count()))
print("Total count of test data set: {}".format(test_df.count()))

In [None]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import Pipeline, Model

stringIndexer_label = StringIndexer(inputCol="PRODUCT_LINE", outputCol="label").fit(df)
stringIndexer_prof = StringIndexer(inputCol="PROFESSION", outputCol="PROFESSION_index")
stringIndexer_gend = StringIndexer(inputCol="GENDER", outputCol="GENDER_index")
stringIndexer_mar = StringIndexer(inputCol="MARITAL_STATUS", outputCol="MARITAL_STATUS_index")

In [None]:
vectorAssembler_features = VectorAssembler(inputCols=["GENDER_index", "AGE", "MARITAL_STATUS_index", "PROFESSION_index"], outputCol="features")
rf = RandomForestClassifier(labelCol="label", featuresCol="features")
labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=stringIndexer_label.labels)
pipeline = Pipeline(stages=[stringIndexer_label, stringIndexer_prof, stringIndexer_gend, stringIndexer_mar, vectorAssembler_features, rf, labelConverter])

In [None]:
model_rf = pipeline.fit(train_df)

In [None]:
predictions = model_rf.transform(test_df)
evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)

print("Accuracy = %g" % accuracy)
print("Test Error = %g" % (1.0 - accuracy))

In [None]:
training_data_reference = {
    "name": "GoSales data reference",
    "connection": {
      "iam_url": auth_endpoint,
      "api_key": cos_credentials["apikey"],
      "url": cos_url,
      "resource_instance_id": cos_credentials["resource_instance_id"]
    },
    "source": {
      "firstlineheader": "true",
      "file_format": "csv",
      "file_name": "GoSales_Tx.csv",
      "bucket": bucket,
      "type": "bluemixcloudobjectstorage"
    }
}

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)
MODEL_NAME = "Go-Sales multiclass model"

model_props = {
    wml_client.repository.ModelMetaNames.NAME: "{}".format(MODEL_NAME),
    wml_client.repository.ModelMetaNames.TRAINING_DATA_REFERENCE: training_data_reference
}

# publish model 
published_model_details = wml_client.repository.store_model(model=model_rf, meta_props=model_props, training_data=train_df, pipeline=pipeline)

In [None]:
model_uid = wml_client.repository.get_model_uid(published_model_details)
print(model_uid)

### 2.4 Deploying the model

In [None]:
deployment = wml_client.deployments.create(model_uid, MODEL_NAME + " deployment")

In [None]:
scoring_url = wml_client.deployments.get_scoring_url(deployment)
print(scoring_url)

In [None]:
# Get a response from the deployed model for an input row

fields = ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION"]
values = [["M", 26, "Single", "Other"]]

payload = {"fields": fields, "values": values}
response = wml_client.deployments.score(scoring_url=scoring_url, payload=payload)

In [None]:
print(response)

<a id="subscription"></a>
## 3. Subscriptions

### 3.1 Configuring OpenScale

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import WatsonMachineLearningAsset

aios_client = APIClient(AIOS_CREDENTIALS)
aios_client.version

**Note**: Please re-run the above cell if it doesn't work the first time.

In [None]:
aios_client.data_mart.bindings.list()

### 3.2 Subscribe the asset

In [None]:
from ibm_ai_openscale.supporting_classes import *

subscription = aios_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    problem_type=ProblemType.MULTICLASS_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='PRODUCT_LINE',
    feature_columns = ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION"],
    categorical_columns = ["GENDER", "MARITAL_STATUS", "PROFESSION"],
    prediction_column='predictedLabel',
    probability_column='probability'
))

### 3.3 Get subscription

In [None]:
aios_client.data_mart.subscriptions.list()

In [None]:
subscription.get_details()

### 3.4 Score the model and get transaction-id

In [None]:
fields = ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION"]
values = [["M", 26, "Single", "Other"]]

payload = {"fields": fields, "values": values}
response = wml_client.deployments.score(scoring_url=scoring_url, payload=payload)

**Note**: Please wait for a few seconds before running the cell below.

In [None]:
transaction_id = subscription.payload_logging.get_table_content().scoring_id[0]
print(transaction_id)

<a id="explainability"></a>
## 4. Explainability

### 4.1 Configure Explainability

In [None]:
subscription.explainability.enable()
subscription.explainability.get_details()

### 4.2 Get explanation for the transaction

In [None]:
subscription.explainability.run(transaction_id, background_mode=False)