![](https://raw.githubusercontent.com/j82887/Computational-Intelligence-Laboratory/main/Image/3-3.png)

## 01. 讀取鳶尾花資料集

In [None]:
from sklearn import datasets

iris = datasets.load_iris()
print(iris.keys())

In [None]:
from collections import Counter

features = iris.data
label = iris.target
features_name = iris.feature_names
label_name = iris.target_names

print("標籤類別數量：", Counter(label))
print("特徵名稱：", features_name)
print("標籤名稱：", label_name)

## 02. 資料視覺化

In [None]:
import pandas as pd
df = pd.DataFrame(features, columns=features_name)
df['label'] = [label_name[i] for i in label.astype(int)]
df.head()

In [None]:
import seaborn as sns
sns.pairplot(df, hue='label')

## 03. 建立邏輯回歸模型 (Scikit-learn)

In [None]:
from sklearn.linear_model import LogisticRegression

LR_Model = LogisticRegression(random_state=0, max_iter=1000)
LR_Model.fit(features, label)
predict = LR_Model.predict(features)
predict_proba = LR_Model.predict_proba(features)

## 04. 驗證指標

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(label, predict)
acc = accuracy_score(label, predict)
print("混淆矩陣：\n", cm)
print("準確度：%0.3f" %(acc))

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(5,5))
sns.heatmap(cm, annot=True, fmt='.0f', linewidths=1.0, square=True, cmap = 'Blues_r',annot_kws={"size": 20})
plt.ylabel('Actual label', size = 18)
plt.xlabel('Predicted label', size = 18)
plt.title('Accuracy: %0.4f' %(acc), size = 20)
plt.show()

## 05. 模型可解釋性：視覺化

In [None]:
print("模型權重(係數):\n", LR_Model.coef_)
print("模型偏差(截距):", LR_Model.intercept_)

In [None]:
plt.figure(figsize=(6, 8))
plt.title('Feature importance',size = 20)
plt.plot(LR_Model.coef_[0].T,'o',markersize=12, color = 'blue', alpha = 0.5)
plt.plot(LR_Model.coef_[1].T,'o',markersize=12, color = 'orange', alpha = 0.5) 
plt.plot(LR_Model.coef_[2].T,'o',markersize=12, color = 'green', alpha = 0.5) 
plt.xticks(range(len(features_name)), features_name , rotation=90, size = 16)
plt.yticks(rotation=0, size = 16)
plt.legend(loc='upper left', labels=label_name, fontsize = 'x-large') 
plt.ylim(-3, 3)  
plt.grid() 
plt.xlabel("Feature",size = 18) 
plt.ylabel("Coefficient magnitude",size = 18)
plt.show()

## 06. 儲存與讀取模型 (Scikit-learn)

In [None]:
import joblib
joblib.dump(LR_Model, 'Iris_LR.pkl') 
LR_Model = joblib.load('Iris_LR.pkl')

In [None]:
test_features = [[9.0, 3.2, 1.1, 0.1]]
predict = LR_Model.predict(test_features)
print("預測為：", label_name[predict])

# Homework 

* 使用糖尿病資料集 [[連結]](https://www.kaggle.com/uciml/pima-indians-diabetes-database) 進行邏輯回歸實作，與課程相同，需要包含：
    * 資料處理(正規化[0-1])
    * 資料分割(訓練集70% 、驗證集15%、測試集15%)
    * 資料視覺化
    * 模型訓練
    * 模型可解釋性(視覺化)
    * 儲存與讀取模型

* 模型中最高得重要性的特徵是否與我們認知的糖尿病了解呈相關？(比如醫學認知空腹血糖高、肥胖是糖尿病重要因子，模型是呈現同樣結果?)  (使用Scikit-learn的邏輯回歸實作)