## Using Default Parameters

1. Load in the dataset

- `sklearn.datasets.load_iris()`: Load and return the iris dataset (classification).

- The `iris dataset` is a classic and very easy multi-class classification dataset.
  - These measures were used to create a linear discriminant model 线性判别分析 to classify the species.

    | Feature           |                |
    | ----------------- | -------------- |
    | Classes           | 3              |
    | Samples per class | 50             |
    | Samples total     | 150            |
    | Dimensionality    | 4              |
    | Features          | real, positive |


In [2]:
from sklearn import datasets
iris = datasets.load_iris()

X = iris['data']
y = iris['target']

print("X",X)
print("y",y)

X [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 

### Create Prediction Model

- Using logistic model for classifying the iris flowers.
  - setting `max_iter` to a higher value to ensure that the model finds a result.

- `max_iterint`: default=100
  - Maximum number of iterations taken for the solvers to converge.


In [3]:
from sklearn.linear_model import LogisticRegression

predict_model = LogisticRegression(max_iter=10000)
predict_model.fit(X, y)


### Evaluate the model

- `sklearn.linear_model.LogisticRegression().score()`: Return the mean accuracy on the given test data and labels.
  - return: float

- With the default setting of C = 1, we achieved a score of 0.973.

In [4]:
score = predict_model.score(X, y)

print("score", score)
#  score 0.9733333333333334

score 0.9733333333333334


### Implementing Grid Search

Since the default value for C is 1, we will set a range of values surrounding it.

1. create a range of value arround 1, which is the default value.
2. loop over the range of values and append to a score list.


In [5]:

C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]

scores = []
for choice in C:
    # set_params(): Set the parameters of this estimator.
    predict_model.set_params(C=choice)
    # fit data after setting parameters
    predict_model.fit(X, y)
    scores.append(predict_model.score(X, y))

print(scores)
# [0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]

[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]


### Results Explained

- The lower values of C performed worse than the base parameter of 1. However, as we increased the value of C to 1.75 the model experienced increased accuracy.

It seems that increasing C beyond this amount **does not help** increase model accuracy.

## Note on Best Practices

- We scored our logistic regression model by using the same data that was used to train it. If the model corresponds too closely to that data, it may not be great at predicting unseen data. This statistical error is known as `over fitting`.

To avoid being misled by the scores on the training data, we can put aside a portion of our data and use it specifically for the purpose of testing the model.
