# Train an Iris classifier, save to file

Author: *Monique Beaulieu*

Steps are:
1. Load iris dataset from sklearn.
2. use `train_test_split()` with `random_state=34` to create train and test sets.
3. Train a logistic regression model on the training data.
4. Print accuracy on the test dataset.
5. Save the model to file using joblib.

In [2]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [3]:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=34)

In [4]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5,
                          return_train_score=True)
grid_search.fit(X_train, y_train)

print("Best parameters: {}".format(grid_search.best_params_)) # best parameter for C is default

Best parameters: {'C': 1}


In [5]:
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

In [6]:
print("Test accuracy: {}".format(model.score(X_test, y_test))) # this makes me think its overfitting??

Test accuracy: 1.0


In [7]:
from joblib import dump, load
dump(model, 'iris-classifier.joblib')

['iris-classifier.joblib']

In [8]:
# checking to see if this worked
clf = load('iris-classifier.joblib')
print("test-set score: {:.2f}".format(clf.score(X_test, y_test)))

test-set score: 1.00


In [10]:
# clf.predict_proba(X_train)

## Conclusion
1. Explain in words the sequence of actions (files and functions) to generate the predicted output. Start with a user visiting `http://localhost:5000`.
2. List three ways to modify, improve or extend this project.


1. 
- the trained model is loaded from joblib file 
- route ('/') calls the index function in the flask-ml-server.py and render_template of the html file "prediction_input" is returned (so displayed on server) 
- this prediction input template allows for the user to input values for the iris features
- after inputing and clicking the "Get prediciton" button the server goes to the route '/iris_prediction'
- once values are inputed there is a get method that calls the get_iris_prediction function in the flask-ml-server.py file
- this get_iris_prediction function creates a 2D array of the inputed feature values
- the .predict function is then applied to the loaded model with the array of the feature values from the user
- the location (index value) for the target names array is returned from .predict
- using the 2D array of the target names and the index value, the string of the predicted iris species is assigned to a variable pred_str
- Similarly, the .predict_proba function is applied to the loaded model with the array of the feature values from the user
- this creates a 2D array of the probabilities of each target
- the location (index value) for the correct predicted probability can be found using the index value found in .predict, the float is assigned to a variable pred_proba
- Finally, the render_template of the html file 'prediciton_response' is returned which includes the variables that include the predicted iris species pred_str and the predicted probability pred_proba which is displayed on the server
- the flask server tells which html files to display when, and the .css file styles the page how you want it (font, background colour, etc.)

2. 
- We could see if using a different model would be better. Scaling the data could also be a possibility. (however the test accuracy is already at 100% so im not sure this would count as improving)
- we could use PCA and have another page where you only have to input the principal feature values and see how that compares
- we could display an image of the predicted species
- we could display the prediction probability of the other species, possibly there is a close call like ~50% so you would want to see what other species it could be. 

## Reflection 
Include a sentence or two about:

what you liked or disliked,
found interesting, confusing, challangeing, motivating while working on this assignment. 



- I started off making this more complicated than it needed to be I started making a pipeline I thought we were supposed to figure out which classifier model to use etc. 
- I liked how he combined some of our old knowledge about flask stuff from last year. That was fun to refresh the memory
- definitely wouldnt have been able to figure out the whole indexing thing without help
