<a href="https://colab.research.google.com/github/rcarrata/deeplearning_tf_examples/blob/master/7_Classification_StepbyStep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# THEORY 0 - Classification Definition

Classification is used to predict a discrete label. The outputs fall under a finite set of possible outcomes. Many situations have only two possible outcomes. This is called binary classification (True/False, 0 or 1, Hotdog / not Hotdog).

For example:

* Predict whether an email is spam or not
* Predict whether it will rain or not
* Predict whether a user is a power user or a casual user

There are also two other common types of classification: **multi-class classification** and **multi-label classification**.

**Multi-class classification** has the same idea behind binary classification, except instead of **two possible outcomes**, there are three or more.

For example:

* Predict whether a photo contains a pear, apple, or peach
* Predict what letter of the alphabet a handwritten character is
* Predict whether a piece of fruit is small, medium, or large

An important note about binary and multi-class classification is that in both, each outcome has one specific label. However, in multi-label classification, there are multiple possible labels for each outcome. This is useful for customer segmentation, image categorization, and sentiment analysis for understanding text. To perform these classifications, we use models like **Naive Bayes, K-Nearest Neighbors, SVMs**, as well as various deep learning models.

# THEORY 1 - Cross-entropy

Before we continue loading the data and designing the model, we need to talk about cross-entropy, an important concept for evaluating classification model training. 

* **Cross-entropy** is a **score that summarizes the average difference between the actual and predicted probability distributions for all classes**. 

---
* **The goal** is to **minimize the score**, with a ***perfect cross-entropy value is 0***.
---

For example, consider a problem with three classes, each having three examples in the data classified in class 1, class 2, and class 3, respectively. They are represented with one-hot encoding.

* Let the true distribution for each example be:

```python
#the first class is set to probability 1, all others are 0; this example belongs to class #1
ex_1_true = [1, 0, 0] 
#the second class is set to probability 1, all others are 0;this example belongs to class #2
ex_2_true = [0, 1, 0] 
#the third class is set to probability 1, all others are 0;this example belongs to class #3
ex_3_true = [0, 0, 1]
```

Now imagine a predictive model that gave us the following predictions:

```python
#the highest probability is given to class #1
ex_1_predicted = [0.7, 0.2, 0.1] 
#the highest probability is given to class #2
ex_2_predicted = [0.1, 0.8, 0.1] 
#the highest probability is given to class #3
ex_3_predicted = [0.2, 0.2, 0.6] 
```

If we compare the true and predicted distributions above, they seem to be rather different numbers, but there is a good pattern here: each example’s predicted distribution gives the highest probability to the label the example actually belongs to. This means the distributions are similar and the cross-entropy should be small. When we calculate cross-entropy for the example above, we get 0.364, which is rather good and close to 0.

* Now, consider a bad predictive model that gives the highest probability to a wrong label every time:

```python
#the highest probability given to class #3, true labels is class #1
ex_1_predicted_bad = [0.1, 0.1, 0.7]
#the highest probability given to class #1, true labels is class #2
ex_2_predicted_bad = [0.8, 0.1, 0.1] 
#the highest probability given to class #1, true labels is class #3
ex_3_predicted_bad = [0.6, 0.2, 0.2]
```

When we calculate the cross-entropy for these examples, we get 2.036, which is rather bad.

If we take cross-entropy between two identical true distributions, we get perfect probabilities and cross-entropy equal to 0.

Run the code on the right to see this in practice. To calculate cross-entropy between two distributions we are using the log_loss() function in scikit-learn, which is equivalent to calculating cross-entropy.

In [None]:
## Exercise 1 - Cross-Entropy

from sklearn.metrics import log_loss

#the first class is set to probability 1, all others are 0; this example belongs to class #1
ex_1_true = [1, 0, 0] 
#the second class is set to probability 1, all others are 0;this example belongs to class #2
ex_2_true = [0, 1, 0] 
#the third class is set to probability 1, all others are 0;this example belongs to class #3
ex_3_true = [0, 0, 1] 

#the highest probability is given to class #1
ex_1_predicted = [0.7, 0.2, 0.1] 
#the highest probability is given to class #2
ex_2_predicted = [0.1, 0.8, 0.1] 
#the highest probability is given to class #3
ex_3_predicted = [0.2, 0.2, 0.6] 

#the highest probability given to class #3, true labels is class #1
ex_1_predicted_bad = [0.1, 0.1, 0.7]
#the highest probability given to class #1, true labels is class #2
ex_2_predicted_bad = [0.8, 0.1, 0.1] 
#the highest probability given to class #1, true labels is class #3
ex_3_predicted_bad = [0.6, 0.2, 0.2] 

true_labels = [ex_1_true, ex_2_true, ex_3_true]
predicted_labels = [ex_1_predicted, ex_2_predicted, ex_3_predicted]
predicted_labels_bad = [ex_1_predicted_bad, ex_2_predicted_bad, ex_3_predicted_bad]

ll = log_loss(true_labels, predicted_labels)
print('Average Log Loss (good prediction): %.3f' % ll)

ll = log_loss(true_labels, predicted_labels_bad)
print('Average Log Loss (bad prediction): %.3f' % ll)

#your code here
print('(TODO)Average Log Loss (true prediction): %.3f' % ll)
ll = log_loss(true_labels, true_labels)


Average Log Loss (good prediction): 0.364
Average Log Loss (bad prediction): 2.036
(TODO)Average Log Loss (true prediction): 2.036


# THEORY 2 - Loading and analyzing the data

Assume we have a dataset, stored in the train_glass.csv (training data) and test_glass.csv (test data) files, about various products made of glass. 

Using the train_glass.csv file, we want to learn a model that can **predict which glass item can be constructed** given the proportion of **various elements such as Aluminium (Al), Magnesium (Mg), and Iron (Fe)**. We then want to evaluate the model on the test data.

* To load the training data into a pandas DataFrame, we do the following:

```python
import pandas as pd
data_train = pd.read_csv("train_glass.csv")
```

* The following command lists all features with accompanying types about the columns:

```python
print(data_train.info())
```

* The output looks something like this:

```bash
#   Column    Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   Al       300 non-null     float64
 1   Mg       300 non-null     float64 
 3   Fe       300 non-null     float64
 4   item     300 non-null     object
```


We see that Al, Mg, and Fe are numeric columns, and item is an object column containing strings. We would like to predict the item column.

* The following commands show us which categories we have in the item column and what their distribution is:

```python
from collections import Counter
print('Classes and number of values in the dataset`,Counter(data_train[“item”]))
```

which gives something like the following output:

```python
{‘lamps’: 75, ‘tableware’: 125, 'containers': 100}
```

This tells us that we have three categories to predict: “lamps”, “tableware”, and “containers”, and how many samples we have in our training data for each.

Next, we we need to split our data into features and labels by doing the following:

```python
train_x = data_train["item"]
train_y = data_train[[‘Al', ‘Mg’, 'Fe’]]
```

# EXERCISES 2 - Loading and analyzing the data

1. Using pandas, load the air_quality_train.csv into a DataFrame instance called train_data, and load the air_quality_test.csv into a DataFrame instance called test_data.

2. Using DataFrame.info() print all columns with their respective types in the train_data DataFrame.

3. Using collections.Counter() to print the class distribution for the Air_Quality column in the train_data DataFrame.

4. Extract the features columns from train_data DataFrame where feature columns are ['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI'], and assign the result to x_train.

5. Extract the label column “Air_Quality” from the train_data DataFrame, and assign the result to y_train.

In [None]:
import pandas as pd
from collections import Counter

import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

!ls "/content/drive/My Drive/Colab/Classification/air_quality_train.csv"
!ls "/content/drive/My Drive/Colab/Classification/air_quality_test.csv"

root_folder = "/content/drive/My Drive/Colab/"
project_folder = "Classification/"
csv_file_1 = "air_quality_train.csv"
csv_file_2 = "air_quality_test.csv"

csv_data_1 = root_folder + project_folder + csv_file_1
print(csv_data_1)

csv_data_2 = root_folder + project_folder + csv_file_2
print(csv_data_2)

train_data = pd.read_csv(csv_data_1)
test_data = pd.read_csv(csv_data_2)

from google.colab.data_table import DataTable
DataTable.max_columns = 60

#print the class distribution
print(train_data.info())
print(test_data.info())

#extract the features from the training data
print('Air_Quality',Counter(train_data["Air_Quality"]))
print('Air_Quality',Counter(test_data["Air_Quality"]))

#extract the label column from the training data
x_train = train_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
y_train = train_data["Air_Quality"]

Mounted at /content/drive
'/content/drive/My Drive/Colab/Classification/air_quality_train.csv'
'/content/drive/My Drive/Colab/Classification/air_quality_test.csv'
/content/drive/My Drive/Colab/Classification/air_quality_train.csv
/content/drive/My Drive/Colab/Classification/air_quality_test.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7782 entries, 0 to 7781
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PM2.5        7782 non-null   float64
 1   PM10         7782 non-null   float64
 2   NO           7782 non-null   float64
 3   NO2          7782 non-null   float64
 4   NOx          7782 non-null   float64
 5   NH3          7782 non-null   float64
 6   CO           7782 non-null   float64
 7   SO2          7782 non-null   float64
 8   O3           7782 non-null   float64
 9   Benzene      7782 non-null   float64
 10  Toluene      7782 non-null   float64
 11  Xylene       7782 non-null   float64
 12  AQI 

# THEORY 3 - Preparing the data

When using categorical cross-entropy — the loss function necessary for multiclass classification problems — in TensorFlow with Keras, **one needs to convert all the categorical features and labels into one-hot encoding vectors**. 

Previously, when we had features encoded as strings, we used the pandas.get_dummies() function. This works well for features, but it’s **not very usable for labels**. The problem is that **get_dummies() creates a separate column for each category, and you cannot predict for multiple columns**.

* A **better approach** is to ***convert the label vectors to integers ranging from 0 to the number of classes by using sklearn.preprocessing.LabelEncoder***:

```python
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
train_y=le.fit_transform(train_y.astype(str))
test_y=le.transform(test_y.astype(str))
```

* We **first** ***fit the transformer to the training data using the LabelEncoder.fit_transform() method***, 

* and **then** ***fit the trained transformer to the test data using the LabelEncoder.transform() method***.

* We can print the resulting mappings with:

```python
integer_mapping = {l: i for i, l in enumerate(le.classes_)}
print(integer_mapping)
```

* We get the following output:

```python
{‘lamps’: 0, ‘tableware': 1, 'containers': 2}. 
```

* Each category is mapped to an integer, from 0 to 2 (because we have three categories).

* Now that we have labels as integers, we can use a **Keras function called to_categorical() to convert them into one-hot-encodings** — the format we need for our cross-entropy loss:

```python
train_y = tensorflow.keras.utils.to_categorical(train_y, dtype = ‘int64’)
test_y = tensorflow.keras.utils.to_categorical(test_y, dtype = ‘int64’)
```

# EXERCISE 3 - Preparing the Data

1. Use the LabelEncoder.fit_transform() method to encode the label vector y_train into integers and assign the result back to the y_train variable.

2. Use the le.transform() method to encode the label vector y_test into integers, where le is the instance of LabelEncoder trained in the previous step, and assign the result back to y_test.

3. Using the tensorflow.keras.utils.to_categorical() function, convert the integer encoded label vector y_train into a one-hot encoding vector and assign the result back into the y_train variable.

4. Using the tensorflow.keras.utils.to_categorical() function, convert the integer encoded label vector y_test into a one-hot encoding vector and assign the result back into the y_test variable.

In [None]:
import pandas as pd
from collections import Counter
from sklearn.preprocessing import LabelEncoder
import tensorflow
#your code here

#train_data = pd.read_csv("air_quality_train.csv")
#test_data = pd.read_csv("air_quality_test.csv")

#print columns and their respective types
print(train_data.info())
#print the class distribution
print(Counter(train_data["Air_Quality"]))
#extract the features from the training data
x_train = train_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the training data
y_train = train_data["Air_Quality"]
#extract the features from the test data
x_test = test_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the test data
y_test = test_data["Air_Quality"]

#encode the labels into integers
le = LabelEncoder()

y_train = le.fit_transform(y_train.astype(str))
y_test = le.transform(y_test.astype(str))

#print the integer mappings
integer_mapping = {l: i for i, l in enumerate(le.classes_)}
print("The integer mapping:\n", integer_mapping)

#convert the integer encoded labels into binary vectors
y_train = tensorflow.keras.utils.to_categorical(y_train, dtype = 'int64')
y_test = tensorflow.keras.utils.to_categorical(y_test, dtype = 'int64')

print(y_train)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7782 entries, 0 to 7781
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PM2.5        7782 non-null   float64
 1   PM10         7782 non-null   float64
 2   NO           7782 non-null   float64
 3   NO2          7782 non-null   float64
 4   NOx          7782 non-null   float64
 5   NH3          7782 non-null   float64
 6   CO           7782 non-null   float64
 7   SO2          7782 non-null   float64
 8   O3           7782 non-null   float64
 9   Benzene      7782 non-null   float64
 10  Toluene      7782 non-null   float64
 11  Xylene       7782 non-null   float64
 12  AQI          7782 non-null   float64
 13  Air_Quality  7782 non-null   object 
dtypes: float64(13), object(1)
memory usage: 851.3+ KB
None
Counter({'Very Poor': 1297, 'Poor': 1297, 'Moderate': 1297, 'Satisfactory': 1297, 'Severe': 1297, 'Good': 1297})
The integer mapping:
 {'Good': 0, 'Moderate': 1,

# THEORY 4 - Designing a deep learning model for classification

To **initialize a Keras Sequential model in TensorFlow**, we do the following:

```python
from tensorflow.keras.models import Sequential
my_model = Sequential()
```

The process is the following:
 * set the input layer
 * set the hidden layers
 * set the output layer.

---
To **add the input layer**, we use **keras.layers.InputLayer** the following way:

```python
from tensorflow.keras.layers import  InputLayer
my_model.add(InputLayer(input_shape=(data_train.shape[1],)))
```

---

For now, we will only add one hidden layer using keras.layers.Dense:

```python
from tensorflow.keras.layers import  Dense
my_model.add(Dense(8, activation='relu'))
```

This layer has eight hidden units and uses a rectified linear unit (relu) as the activation function.

---

Finally, we need to set the **output layer**. 

* For regression, we don’t use any activation function in the final layer because we needed to predict a number without any transformations. 

* However, **for classification, the desired output is a vector of categorical probabilities**.


---
To have this vector as an output, we need to use the **softmax activation function** that ***outputs a vector with elements having values between 0 and 1 and that sum to 1*** (just as all the probabilities of all outcomes for a random variable must sum up to 1). 

* ***Softmax*** is a mathematical function that **converts a vector of numbers into a vector of probabilities**, where the probabilities of each value are proportional to the relative scale of each value in the vector.

---
In the case of a ***binary classification problem***, a **sigmoid activation function** can also be used in the **output layer but paired with the binary_crossentropy loss**.

***Binary classification*** refers to those classification tasks that have two class labels.

Examples include:

* Email spam detection (spam or not).
* Churn prediction (churn or not).
* Conversion prediction (buy or not).
---

Since we have 3 classes to predict in our glass production data, the final softmax layer must have 3 units:

```python
my_model.add(Dense(3, activation='softmax')) #the output layer is a softmax with 3 units
```

# EXERCISE 4 - Designing a deep learning model for classification

1. To your model declaration model add an input layer using tensorflow.keras.layers.InputLayer.

2. To your model instance model add a hidden layer using tensorflow.keras.layers.Dense with 10 neurons and relu activation function.

3. To your model, add an output layer as an instance of tensorflow.keras.layers.Dense with softmax as the activation function, and the number of hidden units corresponding to the number of classes in the air quality data.

In [None]:
import pandas as pd
from collections import Counter
from sklearn.preprocessing import LabelEncoder
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import  InputLayer
from tensorflow.keras.layers import  Dense
#your code here

#train_data = pd.read_csv("air_quality_train.csv")
#test_data = pd.read_csv("air_quality_test.csv")

#print columns and their respective types
print(train_data.info())
#print the class distribution
print(Counter(train_data["Air_Quality"]))
#extract the features from the training data
x_train = train_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the training data
y_train = train_data["Air_Quality"]
#extract the features from the test data
x_test = test_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the test data
y_test = test_data["Air_Quality"]

#encode the labels into integers
le = LabelEncoder()
#convert the integer encoded labels into binary vectors
y_train=le.fit_transform(y_train.astype(str))
y_test=le.transform(y_test.astype(str))

integer_mapping = {l: i for i, l in enumerate(le.classes_)}
print("The integer mapping:\n", integer_mapping)

#convert the integer encoded labels into binary vectors
y_train = tensorflow.keras.utils.to_categorical(y_train, dtype = 'int64')
y_test = tensorflow.keras.utils.to_categorical(y_test, dtype = 'int64')

#design the model
model = Sequential()

#add the input layer
model.add(InputLayer(input_shape=(x_train.shape[1],)))

#add a hidden layer
model.add(Dense(10, activation='relu'))

#add an output layer
model.add(Dense(6, activation='softmax')) # That is how many classes we have in the Air Quality data (6 in total). Check the integer mapping of the label "Air Quality" that we want to classify.


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7782 entries, 0 to 7781
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PM2.5        7782 non-null   float64
 1   PM10         7782 non-null   float64
 2   NO           7782 non-null   float64
 3   NO2          7782 non-null   float64
 4   NOx          7782 non-null   float64
 5   NH3          7782 non-null   float64
 6   CO           7782 non-null   float64
 7   SO2          7782 non-null   float64
 8   O3           7782 non-null   float64
 9   Benzene      7782 non-null   float64
 10  Toluene      7782 non-null   float64
 11  Xylene       7782 non-null   float64
 12  AQI          7782 non-null   float64
 13  Air_Quality  7782 non-null   object 
dtypes: float64(13), object(1)
memory usage: 851.3+ KB
None
Counter({'Very Poor': 1297, 'Poor': 1297, 'Moderate': 1297, 'Satisfactory': 1297, 'Severe': 1297, 'Good': 1297})
The integer mapping:
 {'Good': 0, 'Moderate': 1,

# THEORY 5 - Setting the optimizer

Now that we’ve had a brief introduction to cross-entropy, we’ll see how to use it with our model.

1. First, to specify the use of cross-entropy when optimizing the model, we **need to set the loss parameter to categorical_crossentropy of the Model**.compile() method.

2. Second, we also need to **decide which metrics to use to evaluate our model**. For **classification**, we usually use **accuracy**. ***Accuracy calculates how often predictions equal labels and is expressed in percentages***. We will use this metric for our problem.

Finally, we will use Adam as our optimizer because it’s effective here and is commonly used.

To compile the model with all the specifications mentioned above we do the following:

```python
my_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
```

We are now ready to train our model.

# EXERCISE 5 - Setting the optimizer

1. Compile your model instance model using the categorical_crossentropy loss, adam optimizer, and accuracy as the metrics.

In [None]:
#compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# THEORY 6 - Train and evaluate the classification model

To **train a model instance my_model on the training data my_data and training labels my_labels** we do the following:

```python
my_model.fit(my_data, my_labels, epochs=10, batch_size=1, verbose=1)
```

With the command above, we set the number of epochs to 10 and the batch size to 1. 

To see the progress of the training we set verbose to true (1).

After the model is trained, we can evaluate it using the unseen test data my_test and test labels test_labels:

```python
loss, acc = my_model.evaluate(my_test, test_labels, verbose=0)
```

We take two outputs out of the .evaluate() function:

* **the value of the loss** (categorical_crossentropy)
* **accuracy** (as set in the metrics parameter of .compile()).

# EXERCISE 6 - Train and evaluate the classification model

1. Using the Model.fit() function, train your model instance model with the training data x_train and labels y_train, using 20 epochs, batch size of 4, and verbose set to 1.

Note: Running this in the LE will take almost a full minute!

In [None]:
model.fit(x_train, y_train, epochs=20, batch_size=4, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f1378a44bd0>

# THEORY 7 - Additional evaluation statistics

Sometimes having only accuracy reported is not enough or adequate. 

Accuracy is often used when data is balanced, meaning it contains an equal or almost equal number of samples from all the classes. 

However, **oftentimes data comes imbalanced**. For example in medicine, the rate of a disease is low. 

**In these cases**, ***reporting another metric such as F1-score is more adequate***.

Frequently, especially in medicine, false negatives and false positives have different consequences. For example, in medicine, if we generate a false negative it means that we claim a patient doesn’t have a disease, while they actually have it — yikes! Luckily, an F1-score is a helpful way to evaluate our model based on how badly it makes false negative mistakes.

To observe the F1-score of a trained model instance my_model, amongst other metrics, we use sklearn.metrics.classification_report:

```python
import numpy as np
from sklearn.metrics import classification_report
yhat_classes = np.argmax(my_model.predict(my_test), axis = -1)
y_true = np.argmax(my_test_labels, axis=1)
print(classification_report(y_true, yhat_classes))
```

In the code above we do the following:

* ***predict classes for all test cases my_test*** using the **.predict()** method and assign the result to the yhat_classes variable.
* using ***.argmax() convert the one-hot-encoded labels my_test_labels into the index of the class the sample belongs to***. The index corresponds to our class encoded as an integer.
* use the ***.classification_report()*** method to **calculate all the metrics**.

# EXERCISES 7 - Additional evaluation statistics

1. Using the Model.predict() method, get the predictions for your test data x_test using the trained model instance model. Assign the result to a variable called y_estimate.

2. Using np.argmax() convert the one-hot encoded labels y_estimate into the index of the class each sample in the test data belongs to with the axis parameter set to 1. Assign the result to y_estimate.

3. Using np.argmax() convert the one-hot encoded labels y_test into the index of the class each sample in the test data belongs to with the axis parameter set to 1. Assign the result to y_true.

4. Using sklearn.metrics.classification_report, print additional metrics, such as F1-score calculated between the true y_true and estimated test data labels y_estimate.


In [None]:
import numpy as np
from sklearn.metrics import classification_report

y_estimate = model.predict(x_test)

#get additional statistics
y_estimate = np.argmax(y_estimate, axis = 1)
y_true = np.argmax(y_test, axis = 1)

print(classification_report(y_true, y_estimate))

              precision    recall  f1-score   support

           0       0.86      0.94      0.90       100
           1       0.87      0.92      0.89       508
           2       0.73      0.63      0.68       172
           3       0.96      0.85      0.90       452
           4       0.59      0.78      0.67        37
           5       0.61      0.72      0.66       125

    accuracy                           0.84      1394
   macro avg       0.77      0.81      0.78      1394
weighted avg       0.85      0.84      0.84      1394



# THEORY 8 - Classification loss alternative: sparse crossentropy

As we saw before, categorical cross-entropy requires that we first integer-encode our categorical labels and then convert them to one-hot encodings using to_categorical(). 

**There is another type of loss** – **sparse categorical cross-entropy** – which is a computationally modified categorical cross-entropy loss **that allows you to leave the integer labels as they are and skip the entire procedure of encoding**.

**Sparse categorical cross-entropy** is mathematically identical to categorical cross-entropy but **introduces some computational shortcuts** that **save time in memory as well as computation** because it **uses a single integer for a class**, rather than a whole vector. This is especially ***useful when we have data with many classes to predict***.

We can specify the use of the sparse categorical crossentropy in the .compile() method:

```python
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
```

Note the following changes: we make sure that our labels are just integer encoded using the LabelEncoder() but not converted into one-hot-encodings using .to_categorical(). 

Hence, we comment out the code that uses .to_categorical().

# EXERCISE 8 - Classification loss alternative: sparse crossentropy

Using the # symbol for comments, comment out the following lines of code (Line 36 and Line 37):

```python
y_train = tensorflow.keras.utils.to_categorical(y_train, dtype = 'int64')
y_test = tensorflow.keras.utils.to_categorical(y_test, dtype = 'int64')
```

In [None]:
import pandas as pd
from collections import Counter
from sklearn.preprocessing import LabelEncoder
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import  InputLayer
from tensorflow.keras.layers import  Dense
from sklearn.metrics import classification_report
import numpy as np
#your code here

#train_data = pd.read_csv("air_quality_train.csv")
#test_data = pd.read_csv("air_quality_test.csv")

#print columns and their respective types
print(train_data.info())
#print the class distribution
print(Counter(train_data["Air_Quality"]))
#extract the features from the training data
x_train = train_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the training data
y_train = train_data["Air_Quality"]
#extract the features from the test data
x_test = test_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the test data
y_test = test_data["Air_Quality"]

#encode the labels into integers
le = LabelEncoder()
#convert the integer encoded labels into binary vectors
y_train=le.fit_transform(y_train.astype(str))
y_test=le.transform(y_test.astype(str))
#convert the integer encoded labels into binary vectors
#we comment it here because we need only integer labels for
#sparse cross-entropy
#y_train = tensorflow.keras.utils.to_categorical(y_train, dtype = 'int64')
#y_test = tensorflow.keras.utils.to_categorical(y_test, dtype = 'int64')

#design the model
model = Sequential()
#add the input layer
model.add(InputLayer(input_shape=(x_train.shape[1],)))
#add a hidden layer
model.add(Dense(10, activation='relu'))
#add an output layer
model.add(Dense(6, activation='softmax'))

#compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#train and evaluate the model
model.fit(x_train, y_train, epochs = 20, batch_size = 16, verbose = 0)

#get additional statistics
y_estimate = model.predict(x_test, verbose=0)
y_estimate = np.argmax(y_estimate, axis=1)
print(classification_report(y_test, y_estimate))
# Remember that with the loss entropy, the best is 0, so less is the best! 




<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7782 entries, 0 to 7781
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PM2.5        7782 non-null   float64
 1   PM10         7782 non-null   float64
 2   NO           7782 non-null   float64
 3   NO2          7782 non-null   float64
 4   NOx          7782 non-null   float64
 5   NH3          7782 non-null   float64
 6   CO           7782 non-null   float64
 7   SO2          7782 non-null   float64
 8   O3           7782 non-null   float64
 9   Benzene      7782 non-null   float64
 10  Toluene      7782 non-null   float64
 11  Xylene       7782 non-null   float64
 12  AQI          7782 non-null   float64
 13  Air_Quality  7782 non-null   object 
dtypes: float64(13), object(1)
memory usage: 851.3+ KB
None
Counter({'Very Poor': 1297, 'Poor': 1297, 'Moderate': 1297, 'Satisfactory': 1297, 'Severe': 1297, 'Good': 1297})
              precision    recall  f1-score   su

# THEORY 10 - Tweak the model

Now that we have run our code several times, we might be wondering if the model can be further improved.

The first thing we can try is to increase the number of epochs. Having 20 epochs, as we previously had, is usually not enough. Try changing the number of epochs, for example, to 40 and see what happens. Increasing the number of epochs naturally makes the learning longer, but as you probably observed, the results are often much better.

Other **hyperparameters you might consider changing are**: the batch size number of hidden layers number of units per hidden layer the learning rate of the optimizer the optimizer and so on.

# EXERCISE 10 - Tweak the model

1. Change the number of epochs from 20 to 30. Rerun the code and observe the results.

Note: Running this in the LE will take some time!

In [None]:
import pandas as pd
from collections import Counter
from sklearn.preprocessing import LabelEncoder
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import  InputLayer
from tensorflow.keras.layers import  Dense
from sklearn.metrics import classification_report
import numpy as np

#train_data = pd.read_csv("air_quality_train.csv")
#test_data = pd.read_csv("air_quality_test.csv")

#print columns and their respective types
print(train_data.info())
#print the class distribution
print(Counter(train_data["Air_Quality"]))
#extract the features from the training data
x_train = train_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the training data
y_train = train_data["Air_Quality"]
#extract the features from the test data
x_test = test_data[['PM2.5', 'PM10', 'NO', 'NO2', 'NOx', 'NH3', 'CO', 'SO2', 'O3', 'Benzene', 'Toluene', 'Xylene', 'AQI']]
#extract the label column from the test data
y_test = test_data["Air_Quality"]

#encode the labels into integers
le = LabelEncoder()
#convert the integer encoded labels into binary vectors
y_train=le.fit_transform(y_train.astype(str))
y_test=le.transform(y_test.astype(str))
#convert the integer encoded labels into binary vectors
y_train = tensorflow.keras.utils.to_categorical(y_train, dtype = 'int64')
y_test = tensorflow.keras.utils.to_categorical(y_test, dtype = 'int64')

#design the model
model = Sequential()
#add the input layer
model.add(InputLayer(input_shape=(x_train.shape[1],)))
#add a hidden layer
model.add(Dense(10, activation='relu'))
#add an output layer
model.add(Dense(6, activation='softmax'))

#compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#train and evaluate the model
model.fit(x_train, y_train, epochs = 30, batch_size = 16, verbose = 0)

#get additional statistics
y_estimate = model.predict(x_test)
y_estimate = np.argmax(y_estimate, axis = 1)
y_true = np.argmax(y_test, axis = 1)
print(classification_report(y_true, y_estimate))


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7782 entries, 0 to 7781
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PM2.5        7782 non-null   float64
 1   PM10         7782 non-null   float64
 2   NO           7782 non-null   float64
 3   NO2          7782 non-null   float64
 4   NOx          7782 non-null   float64
 5   NH3          7782 non-null   float64
 6   CO           7782 non-null   float64
 7   SO2          7782 non-null   float64
 8   O3           7782 non-null   float64
 9   Benzene      7782 non-null   float64
 10  Toluene      7782 non-null   float64
 11  Xylene       7782 non-null   float64
 12  AQI          7782 non-null   float64
 13  Air_Quality  7782 non-null   object 
dtypes: float64(13), object(1)
memory usage: 851.3+ KB
None
Counter({'Very Poor': 1297, 'Poor': 1297, 'Moderate': 1297, 'Satisfactory': 1297, 'Severe': 1297, 'Good': 1297})
              precision    recall  f1-score   su

# Summary

Congrats! You just created your first classification model using tabular data. Moreover, you performed multi-class classification! In this lesson, you learned:

* The task of classification and what the main difference is between the binary and multi-class classification

* How to calculate cross-entropy in practice as well as how to interpret it and use it for classification.

* How to analyze your data using pandas functionalities, and see the distribution of the categories using collections.Counter(), which might be useful for seeing if the data is imbalanced.

* How to prepare the data for classification by encoding the labels using sklearn.preprocessing.LabelEncoder() and converting them to one-hot encoding format necessary for the loss function using tensorflow.keras.utils.to_categorical().

* How to design a TensorFlow with Keras deep learning model to perform classification focusing on the final (output) layer that needs to have a softmax activation function.

* How to initialize the optimizer by using the categorical_cross_entropy loss and accuracy as the learning metrics.

* How to train and evaluate your model.

* How to use an alternative loss function sparse_categorical_crossentropy that allows you to keep your labels integer encoded and skip converting them into one-hot encoding.

* How to tweak the model to see if the performance can be improved.