<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Tutorial on generating an explanation for a text-based model on Watson OpenScale

This notebook includes steps for creating a text-based watson-machine-learning model, creating a subscription, configuring explainability, and finally generating an explanation for a transaction.

### Contents
- [1. Setup](#setup)
- [2. Creating and deploying a text-based model](#)
- [3. Subscriptions](#subscription)
- [4. Explainability](#explainability)

**Note**: If using Watson Studio, try running the notebook on atleast 'Default Python 3.5 XS' version for faster results.

<a id="setup"></a>
## 1. Setup

### 1.1 Install Watson OpenScale and WML packages

In [None]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1

In [None]:
!pip install --upgrade watson-machine-learning-client --no-cache | tail -n 1

Note: Restart the kernel to assure the new libraries are being used.

### 1.2 Configure credentials

Get the Watson Openscale `apikey` by going to the [Bluemix console](https://console.bluemix.net/) and clicking `Manage->Account->Users`. Select `Platform API Keys` from the sidebar and then click the "Create" button.

One can obtain the Watson OpenScale `instance_id` (guid) by accessing the [cloud console](https://cloud.ibm.com/resources), clicking on `Services` and clicking anywhere on the Watson OpenScale service tile except for the service link and then checking the popping sidebar on the right.

In [None]:
AIOS_CREDENTIALS = {
    "instance_guid": "*****",
    "apikey": "*****", 
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

Generate or fetch the WML credentials by clicking on `Credentials` in the sidebar of the provisioned WML page. 

In [None]:
WML_CREDENTIALS = {
    "apikey": "*****",
    "instance_id": "*****",
    "url": "https://us-south.ml.cloud.ibm.com"
}

## 2. Creating and deploying a text-based model

The dataset used is the UCI-ML SMS Spam Collection Dataset which can be found here: https://archive.ics.uci.edu/ml/machine-learning-databases/00228/. It is a binary classification dataset with the labels being 'ham' and 'spam'.

### 2.1 Loading the training data

In [None]:
# The training data is downloaded and saved as 'SMSSpam.csv' in this step

!pip install pandas
!rm smsspamcollection.zip
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip
!unzip smsspamcollection.zip

import pandas as pd

pd.read_csv("SMSSpamCollection",sep="\t",header=None, encoding="utf-8").to_csv("SMSSpam.csv", header=["label", "text"], sep=",", index=False)

In [None]:
!rm SMSSpamCollection
!rm readme
!rm smsspamcollection.zip

### 2.2 Creating a model

**Note**: Skip the pyspark install step below if you are using a Spark kernel on Watson Studio.

In [None]:
!pip install pyspark==2.3.1

**Note**: When running this notebook locally, If the `SparkSession` import fails below, set 'SPARK_HOME' environment variable with the path to `pyspark` installation.

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.read.csv(path="SMSSpam.csv", header=True, multiLine=True, escape='"')
df.show(5, truncate = False)

In [None]:
train_df, test_df = df.randomSplit([0.8, 0.2], seed=12345)
print("Total count of data set: {}".format(df.count()))
print("Total count of training data set: {}".format(train_df.count()))
print("Total count of test data set: {}".format(test_df.count()))

In [None]:
!pip install nltk
from pyspark.ml.feature import StringIndexer, IndexToString, CountVectorizer, Tokenizer, IDF, StopWordsRemover
from pyspark.ml.classification import GBTClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline, Model
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

nltk.download('punkt')
nltk.download('stopwords')
stop_words = list(set(stopwords.words('english')))

stringIndexer_label = StringIndexer(inputCol="label", outputCol="label_ix").fit(df)
tokenizer = Tokenizer(inputCol="text", outputCol="words")
stopword_remover = StopWordsRemover(inputCol="words", outputCol="filtered_words").setStopWords(stop_words)
count = CountVectorizer(inputCol="filtered_words", outputCol="rawFeatures")
idf = IDF(inputCol="rawFeatures", outputCol="features")
nb = GBTClassifier(labelCol="label_ix")
labelConverter = IndexToString(inputCol="prediction", outputCol="predictionLabel", labels=stringIndexer_label.labels)

In [None]:
pipeline = Pipeline(stages=[stringIndexer_label, tokenizer, stopword_remover, count, idf, nb, labelConverter])
model = pipeline.fit(train_df)
predictions = model.transform(test_df)
evaluator = BinaryClassificationEvaluator(labelCol="label_ix", rawPredictionCol="prediction", metricName="areaUnderROC")
auc = evaluator.evaluate(predictions)

print("Area under ROC curve = %g" % auc)

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

MODEL_NAME = "Text Binary Classifier"
wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

model_props = {
    wml_client.repository.ModelMetaNames.NAME: "{}".format(MODEL_NAME),
}

# publish model 
published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=train_df, pipeline=pipeline)

!rm SMSSpam.csv

In [None]:
model_uid = wml_client.repository.get_model_uid(published_model_details)
print(model_uid)

### 2.3 Deploying the model

In [None]:
deployment = wml_client.deployments.create(model_uid, MODEL_NAME + " deployment")

In [None]:
scoring_url = wml_client.deployments.get_scoring_url(deployment)
print(scoring_url)

## 3. Subscriptions

### 3.1 Configuring AIOS

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import WatsonMachineLearningAsset

aios_client = APIClient(AIOS_CREDENTIALS)
aios_client.version

**Note**: Please re-run the above cell if it doesn't work the first time.

In [None]:
aios_client.data_mart.bindings.list()

### 3.2 Subscribe the asset

In [None]:
from ibm_ai_openscale.supporting_classes import *

subscription = aios_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    label_column='label',
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.UNSTRUCTURED_TEXT,
    feature_columns = ["text"],
    categorical_columns = ["text"],
    prediction_column='predictionLabel',
    probability_column='probability'
))

### 3.3 Get subscription

In [None]:
aios_client.data_mart.subscriptions.list()

In [None]:
subscription.get_details()

### 3.4 Score the model and get transaction-id

In [None]:
text = "SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to 87575. Cost 150p/day, 6days, 16+ TsandCs apply Reply HL 4 info"
payload = {"fields": ["text"], "values": [[text]]}

response = wml_client.deployments.score(scoring_url=scoring_url, payload=payload)

In [None]:
print(response)

**Note**: Please wait for a few seconds before running the cell below.

In [None]:
transaction_id = subscription.payload_logging.get_table_content().scoring_id[0]
print(transaction_id)

## 4. Explainability

### 4.1 Configure Explainability

In [None]:
subscription.explainability.enable()
subscription.explainability.get_details()

### 4.2 Get explanation for the transaction

In [None]:
subscription.explainability.run(transaction_id, background_mode=False)