**构建完整分类模型的步骤**

- 1，收集数据
- 2，选择合适特征
- 3，选择模型指标
- 4，选择合适的基本模型训练
- 5，优化模型 **Task6**

## 1, 加载数据

In [3]:
from sklearn.datasets import load_iris
import pandas as pd

In [4]:
iris = load_iris()
X = iris.data
y = iris.target
feature = iris.feature_names
data = pd.DataFrame(X, columns=feature)
data['target'] = y
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## 2，构建分类模型
![Classification Models](images/classification_models.png)

### 2.1 逻辑回归，Logistic Regression
![logistic](images/logistic.png)

In [1]:
from sklearn.linear_model import LogisticRegression

In [6]:
logisic_clf = LogisticRegression()
logisic_clf.fit(X, y)
logisic_clf.score(X, y)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


0.9733333333333334

### 2.2 线性判别模型


    LDA的原理简单来说就是将带上标签的数据（点），通过投影的方法，投影到维度更低的空间中，使得投影后的点会形成按类别区分。而我们的目标就是使得投影后的数据，类间方差最大，类内方差最小。


参数：
- solver:{'svd'，'lsqr'，'eigen'}，默认='svd'

In [7]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

In [8]:
lda_clf = LinearDiscriminantAnalysis()
lda_clf.fit(X, y)
lda_clf.score(X, y)

0.98

### 2.3 朴素贝叶斯
![naive bayes](images/naive_bayes.png)

In [9]:
from sklearn.naive_bayes import GaussianNB

In [10]:
nb_clf = GaussianNB()
nb_clf.fit(X, y)
nb_clf.score(X, y)

0.96

### 2.4 决策树
![decision tree](images/decision_tree.png)

In [11]:
from sklearn.tree import DecisionTreeClassifier

In [16]:
dt_clf = DecisionTreeClassifier(min_samples_leaf=5, max_depth=5)
dt_clf.fit(X, y)
dt_clf.score(X, y)

0.9733333333333334

### 2.3 支持向量机
![support vector machine](images/svm.png)

In [17]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

In [18]:
svc_iris = make_pipeline(StandardScaler(), SVC(gamma='auto'))
svc_iris.fit(X, y)
svc_iris.score(X,y)

0.9733333333333334