**Classification** problems are a broad category of machine learning problems that involve the prediction of values taken from a discrete, finite number of cases. 

In this example, we'll build a classifier to predict to which species a flower belongs to.

## Reading data

In [None]:
import pandas as pd

iris = # read the file 'datasets/iris.csv'

In [None]:
# Print some info about the dataset


In [None]:
# Print the unique classes present in the dataset using the method unique() in the Class column


In [None]:
# Use the describe() method to print summary statistics about the dataset


In [None]:
# Encode the classes to numeric values
class_encodings = # create a dictionary mapping each class to a numeric value

iris['Class'] = # Use the map() method to convert the class strings to numeric values

In [None]:
iris['Class'].unique()

## Visualizing data

In [None]:
# Create a scatterplot for sepal length and sepal width
import matplotlib.pyplot as plt
%matplotlib inline

sl = iris['Sepal_length']
sw = iris['Sepal_width']

# Create a scatterplot of these two properties using plt.scatter()
# Assign different colors to each data point according to the class it belongs to

# Specify labels for the X and Y axis

# Show graph


In [None]:
# Create a scatterplot for petal length and petal width
pl = iris['Petal_length']
pw = iris['Petal_width']

# Create a scatterplot of these two properties using plt.scatter()
# Assign different colors to each data point according to the class it belongs to

# Specify labels for the X and Y axis

# Show graph


## Classifying species

We'll use [scikit-learn's LogisticRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) to build out classifier.

In [None]:
X = # Create a dataset with all the features by dropping the 'Class' column
t = # Get the 'Class' column values
RANDOM_STATE = 4321

# Use sklean's train_test_plit() method to split our data into two sets.
from sklearn.model_selection import train_test_split

Xtr, Xts, ytr, yts = train_test_split(X, t, random_state=RANDOM_STATE)

In [None]:
# Use the training set to build a LogisticRegression model
from sklearn.linear_model import LogisticRegression

lr = # Create a logistic regression model
# Fit the data to the model using the fit() method

In [None]:
# Use the LogisticRegression's score() method to assess the model accuracy


## Inspecting classification results

Scores like the one calculated above are usually not what we want to assess. it will only return the mean error obtained between predictions and the actual classes in the training dataset. 

Consider what happens, for instance, when you're training a model to classify if someone has a disease or not and 99% of the people don't have that disease. What can go wrong if you use a score like the one above to evaluate your model? *Hint: What would be the score of a classifier that always returns zero(i.e. it always says that the person doesn't have the disease) in this case?*

Simple score metrics are usually not recommended for classification problems. There are at least three different metrics that are commonly used depending on the context:
* **Precision**: This is the number of true positives that the classifier got right - in the example of the disease classifier, this metric would say how many of the people who it said would have the disease _actually_ have that disease;
* **Recall**: This is the number of true positives that are found by the classifier - in the same example, this metric would tell us how many of the people who actually have the disease were _found_ by the classifier;
* **F1-Score**: This is a weighted sum of precision and recall - it's not easy to interpret its value intuitively, but the idea is that the f1-score represents a compromise between precision and recall;

<img src='images/Precisionrecall.svg'></img>
Source: https://en.wikipedia.org/wiki/Precision_and_recall

Some other common evaluation methods for classification models include [ROC chart analysis](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) and the related concept of [Area Under Curve (AUC)](https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it).

*What metric would you prioritise in the case of the disease classifier described before? What are the costs of false positives and false negatives in this case?*

In [None]:
# scikit-learn provides a function called "classification_report" that summarizes the three metrics above
# for a given classification model on a dataset.
from sklearn.metrics import classification_report

# Use this function to print a classification metrics report for the trained classifier.
# See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Another useful technique to inspect the results given by a classification model is to take a look at its *confusion matrix*. This is an K x K matrix (where K is the number of distinct classes identified by the classifier) that gives us, in the position **(i, j)**, how many examples belonging to class **i** were classified as belonging to class **j**. 

That can give us insights on which classes may require more attention.

In [None]:
from sklearn.metrics import confusion_matrix

# Use scikit-learn's confusion_matrix to understand which classes were misclassified.
# See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html


*In the example above, what would you investigate? What classes is the classifier having difficulty to discriminate?*