# 給經理人的一堂AI實作課
## 《實作一》Python快速入門指南：用Jupyter Notebook編輯參數與執行

<br>
<br><br>
<img style="float: right;" src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/2560px-IBM_logo.svg.png" width="200">
徐志煌$^{1}$、阮公義$^{1}$<br>
<br>
$^1$台灣IBM公司雲端運算暨認知軟體事業部 菜鳥資料科學家<br>

2019/08/28 台北恆逸教育訓練中心

<br>

---

<img style="float: right;" src="https://raw.githubusercontent.com/nghia1991ad/wsl-workshop-material/master/screenshot/data-science-element.png" width="450">

# 資料科學 Data Science
<br><br>
## 「資料科學是一種概念，是聯合統計學、資料分析、<br>機器學習與相關方法，從資料中理解與分析真實發生<br>的現象。」- Hayashi (1998)

<img style="float: right;" src="https://raw.githubusercontent.com/nghia1991ad/wsl-workshop-material/master/screenshot/methodology.png" width="600">

## 資料科學方法論


參考資料：https://www.ibmbigdatahub.com/blog/why-we-need-methodology-data-science

<img style="float: right;" src="https://raw.githubusercontent.com/nghia1991ad/wsl-workshop-material/master/screenshot/logo.jpg" width="400">

<br>

## 使用Jupyter Notebook搭建資料科學最佳互動式環境

<br>
* 適合數據分析的「處理-計算-分析」過程<br>
* 支援Python, R, MATLAB等多種程式語言<br>
* 易於以網頁的形式分享，支援匯出成多種格式文件<br>
* Watson Studio支援雲端資料存取與部署

## 基礎Jupyter Notebook操作

1. 執行程式碼：選擇要執行的Cell，點選上方的__Run鍵__，或是__shift+enter__。

---

## 案例實作
在這個實作課程，我們將會示範如何使用Watson Studio x Jupyter Notebook來建立簡單的AI預測模型。其中會以一個商業應用為案例，帶領各位從最一開始的資料蒐集，最終至模型部署。

## 目錄<a class="anchor" id="toc"></a>
* [1. 商業理解](#business-understanding)
* [2. 資料蒐集](#data-collection)
* [3. 資料理解](#data-understanding)
* [4. 資料準備](#data-preparation)
* [5. 建立模型](#modeling)
* [6. 模型評估](#evaluation)
* [7. 部署模型](#deployment)
* [8. 應用存取](#app_accessing)
* [9. 結語](#concluding_remarks)

---

## 1. 商業理解 <a class="anchor" id="business-understanding"></a>
[返回目錄](#toc)

### 題目：以數據為依據的銀行電話行銷結果預測
#### 摘要：這數據是源於葡萄牙銀行機構的直接市場行銷活動(電話)，專案目標是預測客戶是否會申請定期存款(變數y)。



參考資料：https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

---

## 2. 資料蒐集 <a class="anchor" id="data-collection"></a>

[返回目錄](#toc)
<br>
<!--
### 2.1. 建立Notebook
1. 在WS中點選__Add to progect__。
2. 點選__Notebook__。
3. 選擇上方的__From file__。
4. 點選右側的__Choose file__後選擇檔案，點選__Create Notebook__。
<br>(截圖：[步驟1,2](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-notebook1.jpg?raw=true)、[步驟3,4](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-notebook.jpg?raw=true))
-->
### 2.1. 上傳數據集
1. 在WS中點選__Add to progect__。
2. 點選__Data__。
3. 將資料拖曳至側邊欄中上傳。
<br>(截圖：[步驟1,2,3](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-data.jpg?raw=true))

### 2.2. 讀取儲存在WSL中的.csv檔案(Bank marketing data)
1. 將游標點選想要插入資料處。
2. 點選右側有一個的圖示。
3. 點選bank.csv下方的__Insert to code__。
4. 點選__Insert pandas DataFrame__，插入資料程式碼。
<br>(截圖：[步驟2,3,4](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-insert-data.jpg?raw=true))

In [None]:
# 加入WS中的數據集，方法請參閱上面步驟


---
## 3. 資料理解 <a class="anchor" id="data-understanding"></a>

[返回目錄](#toc)

### 3.1. 資料特徵與模型目標

#### 數據集資訊：
這資料是源於葡萄牙銀行機構的直接市場行銷活動，市場行銷活動是基於電話。為了確認客戶是否會訂閱產品，對於同一個客戶通常會有一次以上的聯繫。

~~~

Bank client data:

1 - age: (numeric)
2 - job: type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - marital: marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education: (categorical: primary, secondary, tertiary and unknown)
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
8 - balance: Balance of the individual.

Related with the last contact of the current campaign:

9 - contact: contact communication type (categorical: 'cellular','telephone')
10 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
11 - day: last contact day 
12 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

Other attributes:

13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; -1 means client was not previously contacted)
15 - previous: number of contacts performed before this campaign and for this client (numeric)
16 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

Output variable (desired target):
17 - y - has the client subscribed a term deposit? (binary: 'yes','no')
~~~

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

df = df_data_1.drop('day',axis=1).copy()
# 改變月份敘述方式為數字
months = {'jan':1,'feb':2,'mar':3,'apr':4,'may':5,'jun':6,'jul':7,'aug':8,'sep':9,'oct':10,'nov':11,'dec':12}
df.month = df.month.map(months)

print("The dimension of the data is {} rows and {} columns".format(df.shape[0],df.shape[1]))


### 3.2. 探索資料與其特徵
### 3.2.1. 統計描述

In [None]:
df.describe(include="all")

### 3.2.2. 連續資料

In [None]:
# 請將下面程式碼中的<提示>取代為對應的變數名稱

# 連續資料：'age','balance','duration','campaign','pdays','previous'
# 請在下方填入其中一種連續資料的名稱，即可檢視資料分布
var_name = '<特徵標籤>'

fig, (ax1, ax2) = plt.subplots(nrows = 1, ncols = 2, figsize = (12, 5))

sns.distplot(df[var_name],kde=False,color='green', ax = ax1)
sns.despine(ax = ax1)
ax1.set_title(var_name.capitalize() + ' Count Distribution', fontsize=15)
ax1.set_xlabel(var_name.capitalize(), fontsize=15)
ax1.set_ylabel('Count', fontsize=15)

sns.boxplot(x = var_name, data = df, orient = 'v', ax = ax2)
sns.despine(ax = ax2)
ax2.set_title(var_name.capitalize() + ' Distribution', fontsize=15)
ax2.set_ylabel(var_name.capitalize(), fontsize=15)
plt.show()

### 3.2.3. 類別資料

In [None]:
# 請將下面程式碼中的<提示>取代為對應的變數名稱

# 類別資料：'job', 'marital', 'education', 'default','housing', 'loan', 'contact', 'month','poutcome'
# 請在下方填入其中一種類別資料的名稱，即可檢視資料分布
var_name = '<特徵標籤>'

plt.figure(figsize=(10, 5))
sns.countplot(x = var_name, data = df)                     
sns.despine()
plt.title(var_name.capitalize() + ' Count Distribution', fontsize=15)
plt.xlabel(var_name.capitalize(), fontsize=15)
plt.ylabel('Count', fontsize=15)
plt.xticks(rotation='45',horizontalalignment='right')
plt.tick_params(labelsize=15)
plt.show()

### 3.2.4. 相關係數

In [None]:
from sklearn.preprocessing import LabelEncoder

df_encoder = df.apply(LabelEncoder().fit_transform)
plt.figure(figsize=(12,10))
sns.heatmap(df_encoder.corr(),annot=True,fmt=".2f",cmap="seismic",vmin=-1,vmax=1)
plt.show()


---
## 4. 資料準備 <a class="anchor" id="data-preparation"></a>

[返回目錄](#toc)

In [None]:
df.head()

In [None]:
from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder

# 將no/yes標籤轉為0/1
y = LabelEncoder().fit_transform(df['y'])

categorical_features = ['job', 'marital', 'education', 'default','housing',
                        'loan', 'contact', 'month','poutcome']
numeric_features = ['age','balance','duration','campaign','pdays','previous']

# 請將下面程式碼中的<提示>取代為對應的變數名稱
# 將類別數據中的不同標籤分成不同欄位的0/1
X_categorical = OneHotEncoder().fit_transform(df[<類別資料標籤>]).toarray()
# 將連續資料標準化
X_numeric = StandardScaler().fit_transform(df[<連續資料標籤>])

X = np.concatenate((X_categorical,X_numeric),axis=1)
X.shape

---
## 5. 建立模型 <a class="anchor" id="modeling"></a>
[返回目錄](#toc)
### 5.1. 將資料分割為訓練與測試兩組

In [None]:
from sklearn.model_selection import train_test_split
np.random.seed(100)

# 分割資料 訓練:測試 為 3:1 比例
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [None]:
print('Training Features Shape:', X_train.shape)
print('Training Labels Shape:', y_train.shape)
print('Testing Features Shape:', X_test.shape)
print('Testing Labels Shape:', y_test.shape)

### 5.2. 製作預測模型

現在資料已經準備齊全，我們開始使用機器學習分類模型

In [None]:
def plot_cm(cm):
    plt.figure(figsize=(5,5))
    tn, fp, fn, tp = cm.ravel()
    cm = [["{}\n (TP)".format(tp),"{}\n (FP)".format(fp)],
          ["{}\n (FN)".format(fn),"{}\n (TN)".format(tn)]]
    ax = sns.heatmap([[1,0],[0,1]], annot=np.array(cm), cmap=['pink','lightgreen'],
                fmt="s",cbar=False,xticklabels=['Yes',"No"],yticklabels=['Yes','No'],annot_kws={"size": 12})
    ax.set_xlabel("Actual",)
    ax.set_ylabel("Predict")
    ax.xaxis.set_label_position('top') 
    ax.tick_params(axis='both', which='major', labelsize=12, labelbottom = False, bottom=False, top = False, labeltop=True,left=False)
    plt.show()
    print("The precision = TP/(TP+FP) = {}/({}+{}) = {:.3f}".format(tp,tp,fp,tp/(tp+fp)))
    print("The recall = TP/(TP+FN) = {}/({}+{}) = {:.3f}".format(tp,tp,fn,tp/(tp+fn)))

In [None]:
# Logistic Regression
from sklearn.metrics import confusion_matrix, recall_score, precision_score
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression() 
logmodel.fit(X_train,y_train)
logpred = logmodel.predict(X_test)

plot_cm(confusion_matrix(y_test, logpred))

In [None]:
# Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
rfcpred = rfc.predict(X_test)

plot_cm(confusion_matrix(y_test,rfcpred))

In [None]:
# Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
dtreepred = dtree.predict(X_test)

plot_cm(confusion_matrix(y_test, dtreepred))

In [None]:
# Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier()
gb.fit(X_train, y_train)
gbpred = gb.predict(X_test)

plot_cm(confusion_matrix(y_test,gbpred))

---
## 6. 模型評估 <a class="anchor" id="evaluation"></a>
[返回目錄](#toc)

對於預測模型，我們會使用AUC ROC(註1)評估模型的準確度。<br>
之後會使用特徵重要性(Feature Importance)分析特徵對於預測結果的影響力。


註1: Area Under Curve of Receiver Operating Curve

### 6.1 AUC ROC

準確度(Accuracy)是使用ROC曲線下的面積來計算，面積為1則是代表完美的測試，面積為0.5則是代表無效測試。<br>以下為對於測試結果準確度的一個粗略分類：

0.90-1.00 = excellent (A)<br>
0.80-0.90 = good (B)<br>
0.70-0.80 = fair (C)<br>
0.60-0.70 = poor (D)<br>
0.50-0.60 = fail (F)


In [None]:
from sklearn.metrics import roc_curve, auc

#計算AUCROC

#LOGMODEL
probs = logmodel.predict_proba(X_test)
preds = probs[:,1]
fprlog, tprlog, thresholdlog = roc_curve(y_test, preds)
roc_auclog = auc(fprlog, tprlog)

#RANDOM FOREST --------------------
probs = rfc.predict_proba(X_test)
preds = probs[:,1]
fprrfc, tprrfc, thresholdrfc = roc_curve(y_test, preds)
roc_aucrfc = auc(fprrfc, tprrfc)

#DECISION TREE ---------------------
probs = dtree.predict_proba(X_test)
preds = probs[:,1]
fprdtree, tprdtree, thresholddtree = roc_curve(y_test, preds)
roc_aucdtree = auc(fprdtree, tprdtree)

#GRADIENT BOOSTING -----------------
probs = gb.predict_proba(X_test)
preds = probs[:,1]
fprgb, tprgb, thresholdgb = roc_curve(y_test, preds)
roc_aucgb = auc(fprgb, tprgb)

# #ALL PLOTS ----------------------------------
plt.figure(figsize = (8,8))
plt.plot([0,1], [0,1], 'r--',alpha=0.5)
plt.plot(fprdtree, tprdtree, 'b', label = 'Decision Tree (%.3f)'%roc_aucdtree, color='brown')
plt.plot(fprrfc, tprrfc, 'b', label = 'Random Forest (%.3f)'%roc_aucrfc, color='green')
plt.plot(fprlog, tprlog, 'b', label = 'Logistic (%.3f)'%roc_auclog, color='grey')
plt.plot(fprgb, tprgb, 'b', label = 'Gradient Boosting (%.3f)'%roc_aucgb, color='blue')
plt.title('Receiver Operating Comparison ',fontsize=20)
plt.ylabel('True Positive Rate',fontsize=20)
plt.xlabel('False Positive Rate',fontsize=15)
plt.legend(loc = 'lower right', prop={'size': 16})
plt.show()

In [None]:
# 統整比較模型AUCROC數值
models = pd.DataFrame({'Model':['Random Forest Classifier', 'Decision Tree Classifier','Logistic Model','Gradient Boosting'],
                       'AUC ROC':[roc_aucrfc,roc_aucdtree,roc_auclog,roc_aucgb]}) 
models.sort_values(by='AUC ROC', ascending=False)

### 6.2 Feature Importance
特徵重要性(feature importance)能顯示出哪些變量對於以建立好的模型所作出的預測結果有重要的影想力。
隨後會用累積特徵重要性(cumulative feature importance)觀察

In [None]:
# 讀取特徵重要性
list_features = pd.get_dummies(df[categorical_features].astype(str)).columns.to_list()+numeric_features
feature_importances = pd.DataFrame(gb.feature_importances_,columns=["Importance"],index=list_features)
feature_importances.sort_values(by="Importance",ascending=False,inplace=True)

plt.figure(figsize=(10,10))
sns.barplot(feature_importances.Importance,feature_importances.index,palette='muted')
plt.grid(axis='x',which='major')
plt.title('Feature Importance')
plt.xlabel('Normalized Importance')
plt.show()

In [None]:
# 計算累積特徵重要性
cumulation_importance = feature_importances.Importance.cumsum()

plt.figure(figsize=(9,6))
plt.step(np.arange(0,51,1), [0]+list(cumulation_importance),zorder=2)
# 繪製累積重要性至95%標注線
plt.plot([0,50],[.95,.95],'r--',zorder=1)
# 繪製累積重要性達95%的特徵數，在此實作中為20個
plt.plot([20,20],[0,1.1],'--',color='grey',zorder=1)
plt.ylim(0,1.1)
plt.title('Cumulative Feature Importances')
plt.xlabel('Number of Features')
plt.ylabel('Cumulative Importance')
plt.show()

### 結論

從前面的步驟中，我們首先數據視覺化，並使用統計方法檢視，然後使用了許多分類模型，嘗試對於這份銀行市場行銷數據作出預測模型。最後我們在嘗試過的4種模型中，其中的Gradient Boosting模型得到了最高的AUCROC = ~0.913，並且也檢視其特徵重要性。

---

## 7. 部署模型 <a class="anchor" id="deployment"></a>
[返回目錄](#toc)
### 7.1. 封裝模型

接下來我們將把「資料準備」的過程與「建立模型」中最好的模型封裝至一起，以利部署模型。

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import GradientBoostingClassifier

categorical_features = ['job', 'marital', 'education', 'default','housing',
                        'loan', 'contact', 'month','poutcome']
numeric_features = ['age','balance','duration','campaign','pdays','previous']

categorical_transformer = Pipeline(steps=[('onehot', OneHotEncoder())])
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])

# 將對於不同種類資料的資料準備過程封裝
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])
# 將資料準備過程與最佳模型進行封裝
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', GradientBoostingClassifier())])
# 以相同亂數切割數據
np.random.seed(100)
features = df.drop(['y'],axis=1)
labels = LabelEncoder().fit_transform(df['y'])
X2_train, X2_test, y2_train, y2_test = train_test_split(features, labels, test_size=0.25)
# 將封裝的模型訓練
clf.fit(X2_train,y2_train)

fpr, tpr, threshold = roc_curve(y2_test, clf.predict_proba(X2_test)[:,1])
roc_auc = auc(fpr, tpr)
print("model AUC ROC: %.3f" % roc_auc)

### 7.2. 建立存取Watson Machine Learning服務的認證client

__取得Watson Machine Learning Service的credentials__
1. 從Watson Studio左側的目錄列表中，選取__Watson Services__並在新視窗中開啟。
2. 在__Machine Learning__的部分，點選要使用的服務。如果沒有，請點選上方的__Add__ __service__新增服務，之後點選進入該服務。
3. 點選右邊列表中的__服務認證__，如果尚未有__服務認證__，點選__新建認證__。
4. 點選__檢視認證__，將整段內容複製，儲存至__wml_credentials__變數中。
<br>(截圖：[步驟1](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-deployment1.jpg?raw=true)、[步驟2](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-deployment2.jpg?raw=true)、[步驟3](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-wml-credentials.jpg?raw=true)、[步驟4](https://github.com/nghia1991ad/wsl-workshop-material/blob/master/screenshot/wsl-add-deployment4.JPG?raw=true))

In [None]:
# 建立存取Watson Machine Learning服務的認證client
# wml_credentials 取得方式請參照上方說明
wml_credentials = 

# 建立存取認證client
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient(wml_credentials)

### 7.3. 儲存模型為專案asset

In [None]:
# 模型附屬資訊
author_name = 'Taiwan_IBM_Cloud_Team'
framework_name = 'scikit-learn'
model_name = 'bank_marketing_gradient_boosting'
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: author_name, 
               client.repository.ModelMetaNames.FRAMEWORK_NAME: framework_name,
               client.repository.ModelMetaNames.NAME: model_name}

# 儲存模型至專案中
stored_model = client.repository.store_model(model=clf, meta_props=model_props,
                                                training_data=features, training_target=labels,)
model_uid = client.repository.get_model_uid(stored_model)
stored_model
# 刪除已儲存的模型
# client.repository.delete(model_uid)

### 7.4. 部署模型

In [None]:
# 將專案中儲存的模型部署
deployment_name = 'Bank Marketing Pipe Deployment'
deployment_details = client.deployments.create(model_uid, deployment_name)
deployment_uid = client.deployments.get_uid(deployment_details)
scoring_endpoint = client.deployments.get_scoring_url(deployment_details)
scoring_endpoint
# 刪除已部署的模型
# client.deployments.delete(deployment_uid)

In [None]:
# 刪除在專案中有相同名稱的模型，同時也會將其部署刪除
uids = [m['metadata']['guid'] for m in client.repository.get_details()['models']['resources']]
uids.remove(model_uid)
for uid in uids:
    model_details = client.repository.get_model_details(uid)
    tmp_model = model_details['entity']['name']
    if tmp_model == model_name:
        client.repository.delete(uid)
        print('Old Model '+uid+' deleted.')

print(client.repository.list())
print(client.deployments.list())
# 結束模型部署

## 8. 應用存取 <a class="anchor" id="app_accessing"></a>
[返回目錄](#toc)
### 8.1. 取得API URL

In [None]:
# 這是前一步驟時已經儲存的API URL
scoring_endpoint
# 但如果部署與存取程式碼是在不同檔案，需要重新建立client，以取得已部署模型的API URL
# 取得已部署模型的uids，最新部署的模型會在序列中最後一個
# uid = client.deployments.get_uids()[-1]
# scoring_endpoint = client.deployments.get_details(uid)['entity']['scoring_url']

### 8.2. 取得ml_instance_id與token

In [None]:
import requests

# 從之前已設定的wml_credentials中找到instance_id
ml_instance_id = wml_credentials["instance_id"]
# 從之前已設定的wml_credentials中找到Watson Machine Learning service apikey貼至這
apikey = wml_credentials["apikey"]
# Get an IAM token from IBM Cloud
# 從IBM Cloud取得IAM token
url     = "https://iam.bluemix.net/oidc/token"
headers = { "Content-Type" : "application/x-www-form-urlencoded" }
data    = "apikey=" + apikey + "&grant_type=urn:ibm:params:oauth:grant-type:apikey"
IBM_cloud_IAM_uid = "bx"
IBM_cloud_IAM_pwd = "bx"
response  = requests.post( url, headers=headers, data=data, auth=( IBM_cloud_IAM_uid, IBM_cloud_IAM_pwd ) )
iam_token = response.json()["access_token"]
iam_token

In [None]:
# 使用shell中的curl指令獲得token
# x = !curl -X POST \
# 'https://iam.cloud.ibm.com/identity/token' \
# -H 'Content-Type: application/x-www-form-urlencoded' \
# -d 'grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={apikey}'
# json.loads(x[5])['access_token']

### 8.3. 存取應用API

In [None]:
import requests, json

# 建立API存取header
headers = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + iam_token, 'ML-Instance-ID': ml_instance_id}
# 測試參數
person = {      "age":        '40', # 'numeric'
                "job":'management', # 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown'
            "marital":   'married', # 'divorced','married','single','unknown'
          "education":  'tertiary', # 'primary', 'secondary', 'tertiary', 'unknown'
            "default":        'no', # 'no','yes','unknown'
            "balance":      '1787', # 'numeric'
            "housing":       'yes', # 'no','yes','unknown'
               "loan":        'no', # 'no','yes','unknown'
            "contact":  'cellular', # 'cellular','telephone'
              "month":        '10', # '1'-'12'
           "duration":        '10', # 'numeric'
           "campaign":         '1', # 'numeric'
              "pdays":        '-1', # 'numeric'
           "previous":         '0', # 'numeric'
           "poutcome":   'success'} # 'failure','nonexistent','success'
fields = list(person.keys())
values = [list(person.values()),]

# 建立API籌載
payload = {"fields": fields, "values": values}
# 傳送API請求，獲得回應
response_scoring = requests.post(scoring_endpoint, json=payload, headers=headers)
print("Scoring response")
apires = json.loads(response_scoring.text)
# 顯示測試參數
print(pd.DataFrame(person,index=['Person1']))
plt.figure(figsize=(8,8))
plt.pie(apires['values'][0][1],labels=['No',"Yes"],colors=["pink","lightgreen"],autopct='%1.1f%%',startangle=-145)
plt.title("Customer prediction")
print('\n')
print(apires)
plt.show()

## 9. 結語 <a class="anchor" id="concluding_remarks"></a>
[返回目錄](#toc)

在這個實習中，我們帶領大家一步步體驗一般在資料科學分析資料的步驟，商業理解、資料蒐集、資料理解、資料準備、建立模型、模型評估。也結合了Watson Studio的原生雲端優勢，可以輕易的讓程式碼存取網路資料，並把模型部署至雲端，馬上將數據轉換為實用的商業模型。