<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Use pySpark to predict `Business Area` and `Action`</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>

<a id="setup"></a>
## 1. Set up

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  If you do not have existing instance of [Watson Machine Learning (WML) Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) create one (a free plan is offered and information about how to create the instance is [here](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html))
-  Make sure that you are using a Spark 2.1 kernel

<a id="load"></a>
## 2. Load and explore data

In this section you will load the data as an Apache Spark DataFrame and perform a basic exploration.

Load the data to the Spark DataFrame by using *wget* to upload the data to gpfs and then use spark *read* method to read data. 

In [3]:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

# @hidden_cell
# The following code is used to access your data and contains your credentials.
# You might want to remove those credentials before you share your notebook.

properties_0fbabdb9e8424779891ff40e77263ce1 = {
    'driver': 'com.ibm.db2.jcc.DB2Driver',
    'jdbcurl': 'jdbc:db2://dashdb-entry-yp-dal10-01.services.dal.bluemix.net:50000/BLUDB',
    'user': 'dash6973',
    'password': '5338f7276f54'
}

table_name = 'CAR_RENTAL_TRAINING_WOJTEK'

df_data = spark.read.jdbc(properties_0fbabdb9e8424779891ff40e77263ce1['jdbcurl'], table='DASH6973.CAR_RENTAL_TRAINING_WOJTEK', properties=properties_0fbabdb9e8424779891ff40e77263ce1)
df_data.head()

Row(ID=587, Gender='Female', Status='D', Children=1, Age=Decimal('42.39'), Customer_Status='Active', Car_Owner='No', Customer_Service='I would like them to be more speedy.  Also I would like the rental company to check the car for previous damage while I am standing there.', Satisfaction=0, Business_Area='Product: Functioning', Action='Free Upgrade')

Explore the loaded data by using the following Apache Spark DataFrame methods:
-  print schema
-  print top ten records
-  count all records

In [4]:
df_data.printSchema()

root
 |-- ID: integer (nullable = true)
 |-- Gender: string (nullable = true)
 |-- Status: string (nullable = true)
 |-- Children: integer (nullable = true)
 |-- Age: decimal(6,2) (nullable = true)
 |-- Customer_Status: string (nullable = true)
 |-- Car_Owner: string (nullable = true)
 |-- Customer_Service: string (nullable = true)
 |-- Satisfaction: integer (nullable = true)
 |-- Business_Area: string (nullable = true)
 |-- Action: string (nullable = true)



As you can see, the data contains five fields. PRODUCT_LINE field is the one you would like to predict (label).

In [5]:
print("Number of records: " + str(df_data.count()))

Number of records: 243


As you can see, the data set contains 60252 records.

In [6]:
df_data.select('Action').groupBy('Action').count().show()

+--------------------+-----+
|              Action|count|
+--------------------+-----+
|                  NA|  137|
|             Voucher|   21|
|    Premium features|   15|
|On-demand pickup ...|   28|
|        Free Upgrade|   42|
+--------------------+-----+



In [7]:
df_data.select('Business_Area').groupBy('Business_Area').count().show()

+--------------------+-----+
|       Business_Area|count|
+--------------------+-----+
|Service: Accessib...|   13|
|Product: Functioning|   75|
|   Service: Attitude|   12|
|Service: Orders/C...|   16|
|Product: Availabi...|   21|
|Product: Pricing ...|   12|
|Product: Information|    4|
|  Service: Knowledge|   90|
+--------------------+-----+



<a id="model"></a>
## 3. Create an Apache Spark machine learning model

In this section you will learn how to:

- [3.1 Prepare data](#prep)
- [3.2 Create an Apache Spark machine learning pipeline](#pipe)
- [3.3 Train a model](#train)

### 3.1 Prepare data<a id="prep"></a>

In this subsection you will split your data into: 
- train data set
- test data set
- predict data set

In [8]:
splitted_data = df_data.randomSplit([0.8, 0.2], 24)
train_data = splitted_data[0]
test_data = splitted_data[1]

print("Number of training records: " + str(train_data.count()))
print("Number of testing records : " + str(test_data.count()))

Number of training records: 200
Number of testing records : 43


### 3.2 Create the pipeline<a id="pipe"></a>

In this section you will create an Apache Spark machine learning pipeline and then train the model.

In [9]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler, HashingTF, IDF, Tokenizer, SQLTransformer
from pyspark.ml.classification import RandomForestClassifier, DecisionTreeClassifier, LogisticRegression
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import Pipeline, Model

## 1st model for `Business_area` prediction

In the following step, use the StringIndexer transformer to convert all the string fields to numeric ones.

In [10]:
tokenizer = Tokenizer(inputCol="Customer_Service", outputCol="words")

In [11]:
hashing_tf = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol='hash')
idf = IDF(inputCol=hashing_tf.getOutputCol(), outputCol="area_features", minDocFreq=5) #minDocFreq: remove sparse terms

In [12]:
string_indexer_area = StringIndexer(inputCol="Business_Area", outputCol="area_label").fit(df_data)

In [13]:
dt_area = DecisionTreeClassifier(labelCol='area_label', featuresCol=idf.getOutputCol() , predictionCol='prediction_area', probabilityCol='probability_area', rawPredictionCol='rawPrediction_area')

Finally, convert the indexed labels back to original labels.

In [14]:
labelConverter = IndexToString(inputCol="prediction_area", outputCol="predictedAreaLabel", labels=string_indexer_area.labels)

Now build the pipeline. A pipeline consists of transformers and an estimator.

In [15]:
pipeline_area = Pipeline(stages=[tokenizer, hashing_tf, idf, string_indexer_area, dt_area, labelConverter])

### Check the sub-model quality

---

## 2nd model for `Action` prediction

---

In [16]:
string_indexer_gender = StringIndexer(inputCol="Gender", outputCol="gender_ix")
string_indexer_customer_status = StringIndexer(inputCol="Customer_Status", outputCol="customer_status_ix")
string_indexer_status = StringIndexer(inputCol="Status", outputCol="status_ix")
string_indexer_owner = StringIndexer(inputCol="Car_Owner", outputCol="owner_ix")

In [17]:
assembler = VectorAssembler(inputCols=["gender_ix", "customer_status_ix", "status_ix", "owner_ix", "Children", "Age", "Satisfaction", idf.getOutputCol()], outputCol="features")

In [18]:
string_indexer_action = StringIndexer(inputCol="Action", outputCol="label").fit(df_data)

In [19]:
label_action_converter = IndexToString(inputCol="prediction", outputCol="predictedActionLabel", labels=string_indexer_action.labels)

In [20]:
dt_action = DecisionTreeClassifier()

### Check the sub-model quality

----

## One pipeline and one model for `Business Area` & `Action`

---

In [21]:
dt_action = DecisionTreeClassifier()

In [22]:
vector_assembler = VectorAssembler(inputCols=["gender_ix", "customer_status_ix", "status_ix", "owner_ix", "Children", "Age", "Satisfaction", 'prediction_area'], outputCol="features")

In [23]:
pipeline = Pipeline(stages=[tokenizer, hashing_tf, idf, string_indexer_area, dt_area, labelConverter, string_indexer_gender, string_indexer_customer_status, string_indexer_status, string_indexer_action, string_indexer_owner, vector_assembler, dt_action, label_action_converter])

In [24]:
model = pipeline.fit(train_data)

In [25]:
predictions = model.transform(test_data)
predictions.show(2)

+---+------+------+--------+-----+---------------+---------+--------------------+------------+--------------------+------------+--------------------+--------------------+--------------------+----------+--------------------+--------------------+---------------+--------------------+---------+------------------+---------+-----+--------+--------------------+--------------------+--------------------+----------+--------------------+
| ID|Gender|Status|Children|  Age|Customer_Status|Car_Owner|    Customer_Service|Satisfaction|       Business_Area|      Action|               words|                hash|       area_features|area_label|  rawPrediction_area|    probability_area|prediction_area|  predictedAreaLabel|gender_ix|customer_status_ix|status_ix|label|owner_ix|            features|       rawPrediction|         probability|prediction|predictedActionLabel|
+---+------+------+--------+-----+---------------+---------+--------------------+------------+--------------------+------------+----------

In [26]:
evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)

print("Accuracy = %g" % accuracy)

Accuracy = 0.744186


In [27]:
predictions.select('Action', 'predictedActionLabel').groupBy('Action', 'predictedActionLabel').count().show()

+--------------------+--------------------+-----+
|              Action|predictedActionLabel|count|
+--------------------+--------------------+-----+
|On-demand pickup ...|        Free Upgrade|    2|
|On-demand pickup ...|On-demand pickup ...|    1|
|        Free Upgrade|        Free Upgrade|    5|
|        Free Upgrade|    Premium features|    2|
|On-demand pickup ...|    Premium features|    1|
|             Voucher|             Voucher|    1|
|    Premium features|             Voucher|    2|
|                  NA|                  NA|   25|
|        Free Upgrade|On-demand pickup ...|    1|
|    Premium features|        Free Upgrade|    3|
+--------------------+--------------------+-----+



In [28]:
predictions_train = model.transform(train_data)

In [29]:
from pyspark.sql.functions import col

predictions_train.select('ID','Customer_Service', 'predictedAreaLabel','predictedActionLabel').filter(col('predictedActionLabel').isin(['NA','Voucher']) == False).show(100, truncate=False)

+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-------------------------+
|ID  |Customer_Service                                                                                                                                                                                                                                                                                                                       |predictedAreaLabel                |predictedActionLabel     |
+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

<a id="persistence"></a>
## 4. Store the model in the WML repository

First, you must install and import the `watson-machine-learning-client` libraries.

**Note**: Python 3.5 and Apache Spark 2.1 is required.

In [30]:
!rm -rf $PIP_BUILD/watson-machine-learning-client

In [31]:
!pip install watson-machine-learning-client --upgrade

Requirement already up-to-date: watson-machine-learning-client in /usr/local/src/conda3_runtime.v40/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages
Collecting tqdm (from watson-machine-learning-client)
  Downloading https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl (42kB)
[K    100% |████████████████████████████████| 51kB 6.8MB/s eta 0:00:01
[?25hRequirement already up-to-date: tabulate in /usr/local/src/conda3_runtime.v40/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from watson-machine-learning-client)
Collecting urllib3 (from watson-machine-learning-client)
  Downloading https://files.pythonhosted.org/packages/bd/c9/6fdd990019071a4a32a5e7cb78a1d92c53851ef4f56f62a3486e6a7d8ffb/urllib3-1.23-py2.py3-none-any.whl (133kB)
[K    100% |████████████████████████████████| 143kB 6.3MB/s eta 0:00:01
[?25hCollecting certifi (from watson-machine-learning-client)
  Downloading https:/

In [86]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

In [33]:
!pip install --upgrade requests

Requirement already up-to-date: requests in /gpfs/global_fs01/sym_shared/YS1Prod/user/s882-69b8828e03df52-01f17dcd4c8c/.local/lib/python3.5/site-packages
Requirement already up-to-date: urllib3<1.24,>=1.21.1 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s882-69b8828e03df52-01f17dcd4c8c/.local/lib/python3.5/site-packages (from requests)
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in /usr/local/src/conda3_runtime.v40/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from requests)
Requirement already up-to-date: certifi>=2017.4.17 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s882-69b8828e03df52-01f17dcd4c8c/.local/lib/python3.5/site-packages (from requests)
Requirement already up-to-date: idna<2.8,>=2.5 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s882-69b8828e03df52-01f17dcd4c8c/.local/lib/python3.5/site-packages (from requests)


In [87]:
# @hidden_cell

# YS1 
wml_credentials = {
    "apikey"    : "value",
    "instance_id" : "instance_id",
    "url"    : "url"
}

In [88]:
client = WatsonMachineLearningAPIClient(wml_credentials)

In [89]:
client.repository.list_models()

------------------------------------  ------------------  ------------------------  ---------
GUID                                  NAME                CREATED                   FRAMEWORK
4d9f1dfb-bf27-49a0-bedf-be9fd1021991  Car Rental Model D  2018-06-19T08:50:21.763Z  mllib-2.1
------------------------------------  ------------------  ------------------------  ---------


In [90]:
client.repository.get_model_details('4d9f1dfb-bf27-49a0-bedf-be9fd1021991')

{'entity': {'deployed_version': {'created_at': '2018-06-19T09:12:34.617Z',
   'guid': 'd37ef866-20c4-435c-87e2-de0700c8c0e0',
   'url': 'https://ibm-watson-ml.stage1.mybluemix.net/v3/ml_assets/models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/versions/d37ef866-20c4-435c-87e2-de0700c8c0e0'},
  'deployments': {'count': 1,
   'url': 'https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments'},
  'evaluation_metrics_url': 'https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/evaluation_metrics',
  'feedback_url': 'https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/feedback',
  'input_data_schema': {'fields': [{'metadata': {'name': 'ID', 'scale': 0},
     'name': 'ID',
     'nullable': True,
     't

### 4.2 Save the pipeline and model<a id="save"></a>

In [37]:
db2_service_credentials = {
  "port": 50000,
  "db": "BLUDB",
  "username": "dash6973",
  "ssljdbcurl": "jdbc:db2://dashdb-entry-yp-dal10-01.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;",
  "host": "dashdb-entry-yp-dal10-01.services.dal.bluemix.net",
  "https_url": "https://dashdb-entry-yp-dal10-01.services.dal.bluemix.net:8443",
  "dsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal10-01.services.dal.bluemix.net;PORT=50000;PROTOCOL=TCPIP;UID=dash6973;PWD=5338f7276f54;",
  "hostname": "dashdb-entry-yp-dal10-01.services.dal.bluemix.net",
  "jdbcurl": "jdbc:db2://dashdb-entry-yp-dal10-01.services.dal.bluemix.net:50000/BLUDB",
  "ssldsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal10-01.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=dash6973;PWD=5338f7276f54;Security=SSL;",
  "uri": "db2://dash6973:5338f7276f54@dashdb-entry-yp-dal10-01.services.dal.bluemix.net:50000/BLUDB",
  "password": "5338f7276f54"
}

In [38]:
training_data_reference = {
 "name": "DRUG feedback",
 "connection": db2_service_credentials,
 "source": {
  "tablename": table_name,
  "type": "dashdb"
 }
}


In [39]:
model_props = {
    client.repository.ModelMetaNames.NAME: "Car Rental Model D",
    client.repository.ModelMetaNames.TRAINING_DATA_REFERENCE: training_data_reference,
    client.repository.ModelMetaNames.EVALUATION_METHOD: "multiclass",
    client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
           "name": "accuracy",
           "value": accuracy,
           "threshold": 0.8
        }
    ]
}

In [40]:
published_model_details = client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)

In [41]:
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

4d9f1dfb-bf27-49a0-bedf-be9fd1021991


Get saved model metadata from Watson Machine Learning.

**Tip**: Use `client.repository.ModelMetaNames.show()` to get the list of available props.

In [85]:
client.deployments.list()

------------------------------------  ---------------------  ------  --------------  ------------------------  ---------
GUID                                  NAME                   TYPE    STATE           CREATED                   FRAMEWORK
4843b7af-e6d8-4685-866c-4158b313e5bb  Car Rental Deployment  online  DEPLOY_SUCCESS  2018-06-19T08:50:39.006Z  mllib-2.1
------------------------------------  ---------------------  ------  --------------  ------------------------  ---------


<a id="scoring"></a>
## 6. Deploy and score in a Cloud

In this section you will learn how to create online scoring and to score a new data record using the `watson-machine-learning-client`.

**Note:** You can also use the REST API to deploy and score.
For more information about REST APIs, see the [Swagger Documentation](http://watson-ml-api.mybluemix.net/).

#### Create online deployment for published model.

In [42]:
deployment_details = client.deployments.create(model_uid=model_uid, name='Car Rental Deployment')



#######################################################################################

Synchronous deployment creation for uid: '4d9f1dfb-bf27-49a0-bedf-be9fd1021991' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='4843b7af-e6d8-4685-866c-4158b313e5bb'
------------------------------------------------------------------------------------------------




#### Create an online scoring endpoint. 

In [43]:
scoring_url = client.deployments.get_scoring_url(deployment_details)
print(scoring_url)

https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb/online


Now, you can send new scoring records (new data) for which you would like to get predictions. To do that, run the following sample code: 

In [67]:
fields = ['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status','Car_Owner', 'Customer_Service', 'Satisfaction']
values = [3785, 'Male', 'S', 1, 17, 'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 0]

In [68]:
payload_scoring = {"fields": fields,"values": [values]}

scoring = client.deployments.score(scoring_url, payload_scoring)

In [69]:
print(scoring_url)

https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb/online


In [70]:
scoring

{'fields': ['ID',
  'Gender',
  'Status',
  'Children',
  'Age',
  'Customer_Status',
  'Car_Owner',
  'Customer_Service',
  'Satisfaction',
  'Business_Area',
  'Action',
  'words',
  'hash',
  'area_features',
  'area_label',
  'rawPrediction_area',
  'probability_area',
  'prediction_area',
  'predictedAreaLabel',
  'gender_ix',
  'customer_status_ix',
  'status_ix',
  'label',
  'owner_ix',
  'features',
  'rawPrediction',
  'probability',
  'prediction',
  'predictedActionLabel'],
 'values': [[3785,
   'Male',
   'S',
   1,
   17.0,
   'Inactive',
   'Yes',
   'The car should have been brought to us instead of us trying to find it in the lot.',
   0,
   'Service: Knowledge',
   'NA',
   ['the',
    'car',
    'should',
    'have',
    'been',
    'brought',
    'to',
    'us',
    'instead',
    'of',
    'us',
    'trying',
    'to',
    'find',
    'it',
    'in',
    'the',
    'lot.'],
   [262144,
    [9639,
     21872,
     74079,
     86175,
     91878,
     99585,
     1038

In [48]:
string_indexer_area.labels

['Service: Knowledge',
 'Product: Functioning',
 'Product: Availability/Variety/Size',
 'Service: Orders/Contracts',
 'Service: Accessibility',
 'Product: Pricing and Billing',
 'Service: Attitude',
 'Product: Information']

In [49]:
string_indexer_action.labels

['NA',
 'Free Upgrade',
 'On-demand pickup location',
 'Voucher',
 'Premium features']

---

# Payload logging

In [50]:
deployment_url = client.deployments.get_url(deployment_details)
print(deployment_url)
payload_logging_configuration_url = deployment_url + '/payload_logging_configuration'
print(payload_logging_configuration_url)

https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb
https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb/payload_logging_configuration


In [51]:
from pyspark.sql.types import *
import json

input_fileds = filter(lambda f: f.name != "Action" and f.name != "Business_Area", train_data.schema.fields)

output_schema = StructType(list(input_fileds)). \
    add("predictedAreaLabel", StringType(), False). \
    add("predictedActionLabel", StringType(), False, {'modeling_role': 'decoded-target'})
    
# print(json.dumps(output_schema.jsonValue(), indent=2))



In [52]:
print(json.dumps(output_schema.jsonValue(), indent=2))

{
  "fields": [
    {
      "metadata": {
        "name": "ID",
        "scale": 0
      },
      "type": "integer",
      "name": "ID",
      "nullable": true
    },
    {
      "metadata": {
        "name": "Gender",
        "scale": 0
      },
      "type": "string",
      "name": "Gender",
      "nullable": true
    },
    {
      "metadata": {
        "name": "Status",
        "scale": 0
      },
      "type": "string",
      "name": "Status",
      "nullable": true
    },
    {
      "metadata": {
        "name": "Children",
        "scale": 0
      },
      "type": "integer",
      "name": "Children",
      "nullable": true
    },
    {
      "metadata": {
        "name": "Age",
        "scale": 2
      },
      "type": "decimal(6,2)",
      "name": "Age",
      "nullable": true
    },
    {
      "metadata": {
        "name": "Customer_Status",
        "scale": 0
      },
      "type": "string",
      "name": "Customer_Status",
      "nullable": true
    },
    {
      "metadat

In [55]:
# enable payload logging
payload_logging_configuration = {
    "payload_store": {
        "type": "postgresql",
        "location": {
            "tablename": "car_rental_payload_D"
        },
        "connection": {
            "uri": "postgres://admin:JIILAWZGGFYLJZOH@sl-us-south-1-portal.21.dblayer.com:42618/compose"
        }
    },
    "labels": string_indexer_action.labels,
    "output_data_schema": output_schema.jsonValue()
}

In [57]:
import requests

payload_logging_configuration_response = requests.put(payload_logging_configuration_url, json=payload_logging_configuration, headers=client._get_headers())
print(payload_logging_configuration_response)
payload_logging_configuration_response.json()



<Response [200]>


{'labels': [{'probability_column': 'probability_na', 'value': 'NA'},
  {'probability_column': 'probability_free_upgrade', 'value': 'Free Upgrade'},
  {'probability_column': 'probability_on-demand_pickup_location',
   'value': 'On-demand pickup location'},
  {'probability_column': 'probability_voucher', 'value': 'Voucher'},
  {'probability_column': 'probability_premium_features',
   'value': 'Premium features'}],
 'output_data_schema': {'fields': [{'metadata': {'name': 'ID', 'scale': 0},
    'name': 'ID',
    'nullable': True,
    'type': 'integer'},
   {'metadata': {'name': 'Gender', 'scale': 0},
    'name': 'Gender',
    'nullable': True,
    'type': 'string'},
   {'metadata': {'name': 'Status', 'scale': 0},
    'name': 'Status',
    'nullable': True,
    'type': 'string'},
   {'metadata': {'name': 'Children', 'scale': 0},
    'name': 'Children',
    'nullable': True,
    'type': 'integer'},
   {'metadata': {'name': 'Age', 'scale': 2},
    'name': 'Age',
    'nullable': True,
    'typ

## Let's put some scoring records to db

In [91]:
df_data.printSchema()

root
 |-- ID: integer (nullable = true)
 |-- Gender: string (nullable = true)
 |-- Status: string (nullable = true)
 |-- Children: integer (nullable = true)
 |-- Age: decimal(6,2) (nullable = true)
 |-- Customer_Status: string (nullable = true)
 |-- Car_Owner: string (nullable = true)
 |-- Customer_Service: string (nullable = true)
 |-- Satisfaction: integer (nullable = true)
 |-- Business_Area: string (nullable = true)
 |-- Action: string (nullable = true)



In [92]:
pd_data = df_data.drop('Business_Area', 'Action').toPandas()

In [93]:
pd_data.columns

Index(['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status',
       'Car_Owner', 'Customer_Service', 'Satisfaction'],
      dtype='object')

In [94]:
fields = ['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status','Car_Owner', 'Customer_Service', 'Satisfaction']

In [95]:
values_list=[]

for index, row in pd_data.iterrows():
    value = []
    for c in pd_data.columns:
        if c =='Age':
            a = float(row[c])
        else:
            a = row[c]
        value.append(a)
    values_list.append(value)
#    payload_scoring = {"fields": fields,"values": [value]}
#    client.deployments.score(scoring_url, payload_scoring)
    
print(len(values_list))

243


In [96]:
payload_scoring = {"fields": fields,"values": values_list}
# print(payload_scoring)

In [97]:
len(payload_scoring['values'])

243

In [99]:
scoring_url = client.deployments.get_scoring_url(client.deployments.get_details('4843b7af-e6d8-4685-866c-4158b313e5bb'))

In [100]:
print(scoring_url)

https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb/online


In [106]:
scoring = client.deployments.score(scoring_url, payload_scoring)

In [108]:
print(scoring)

{'fields': ['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Satisfaction', 'Business_Area', 'Action', 'words', 'hash', 'area_features', 'area_label', 'rawPrediction_area', 'probability_area', 'prediction_area', 'predictedAreaLabel', 'gender_ix', 'customer_status_ix', 'status_ix', 'label', 'owner_ix', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedActionLabel'], 'values': [[587, 'Female', 'D', 1, 42.39, 'Active', 'No', 'I would like them to be more speedy.  Also I would like the rental company to check the car for previous damage while I am standing there.', 0, 'Service: Knowledge', 'NA', ['i', 'would', 'like', 'them', 'to', 'be', 'more', 'speedy.', '', 'also', 'i', 'would', 'like', 'the', 'rental', 'company', 'to', 'check', 'the', 'car', 'for', 'previous', 'damage', 'while', 'i', 'am', 'standing', 'there.'], [262144, [16332, 24417, 33182, 34116, 38765, 42742, 68443, 68867, 79323, 84798, 92734, 103838, 118308, 147136

In [80]:
print(scoring_url)

https://ibm-watson-ml.stage1.mybluemix.net/v3/wml_instances/4fcec734-03b8-48f9-a360-987fe2897838/published_models/4d9f1dfb-bf27-49a0-bedf-be9fd1021991/deployments/4843b7af-e6d8-4685-866c-4158b313e5bb/online
