<a href="https://colab.research.google.com/github/fabnancyuhp/DEEP-LEARNING/blob/main/NOTEBOOKS/CLASSIFICATION_ANNs_ON_STRUTURED_DATA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example : IRIS DATA
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.<br>
The data set consists of 150 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. 
https://rubikscode.net/2021/08/03/introduction-to-tensorflow-with-python-example/amp/



In [4]:
import pandas as pd
iris = pd.read_csv("https://raw.githubusercontent.com/fabnancyuhp/DEEP-LEARNING/main/DATA/Iris.csv")

iris = iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']]
iris.head(3)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa


The goal of the neural network, we are going to create is to predict the class of the Iris flower based on other attributes. Meaning it needs to create a model, which is going to describe a relationship between attribute values and the class.<br><br>
Now we can extract the features values and the targets.

## Data preprocessing
First, we split the dataset into feature values and target values:
* X are the feature values
* Y are the target values

In [8]:
X = iris[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
Y = iris[['Species']]

We display the distinct species:

In [7]:
Y['Species'].unique()

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

We have to encode the species using LabelEncoder from sklearn.

In [11]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(Y['Species'].unique())

Y_encode = le.transform(Y['Species'])
le.inverse_transform([0,1,2])

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

## Categorizing target values
When we deal with a multiclass classification, we use the cross-entropy loss function. There are two ways of using the cross-entropy loss in tensorflow.keras:
* tf.keras.losses.CategoricalCrossentropy
* tf.keras.losses.SparseCategoricalCrossentropy

When we use tf.keras.losses.CategoricalCrossentropy, we’ll have to convert the target values into categorical format first – with one-hot encoding, or to_categorical in Keras.<br><br>

When we have integer targets instead of categorical vectors as targets, we can use sparse categorical crossentropy. It’s an integer-based version of the categorical crossentropy loss function, which means that we don’t have to convert the targets into categorical format anymore.

https://www.machinecurve.com/index.php/2019/10/06/how-to-use-sparse-categorical-crossentropy-in-keras/

In [16]:
from tensorflow.keras.utils import to_categorical

Y_vect_cat = to_categorical(Y_encode,num_classes=3)
Y_vect_cat[0:2]

array([[1., 0., 0.],
       [1., 0., 0.]], dtype=float32)

## Split into a training set and test set

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y_vect_cat, test_size=0.3, random_state=42)

In [None]:
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization 
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Dense(10,activation='relu',input_shape=(4,)))
model.add(BatchNormalization())
model.add(Dropout(0.1))
model.add(Dense(10, activation='relu'))
model.add(Dense(3,activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train, y_train, epochs=300, batch_size=10,verbose=0)

We evaluate and the model:

In [None]:
scores = model.evaluate(X_test, y_test)
scores[1]

In [None]:
Here, we plot the model

Exemple 
https://www.kaggle.com/omnamahshivai/surgical-dataset-binary-classification?select=Surgical-deepnet.csv

# Red Wine Quality Regression

Here we handle a regression problem on Red wine Quality dataset. We import the data from github. We convert all columns in float32 because it makes the dataset more TensorFlow digestible. 

In [None]:
import pandas as pd
winequality = pd.read_csv("https://raw.githubusercontent.com/fabnancyuhp/DEEP-LEARNING/main/DATA/winequality-red.csv").astype('float32')
#winequality2 = pd.read_csv("https://storage.googleapis.com/kagglesdsdata/datasets/4458/8204/winequality-red.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210818%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210818T122549Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=26801d950eae1db42a2e3517aa8e3213339b70374dcc564e45ddf30a1cc7024ce234e2530825d1b8499bbce0a84263967eaa954e117306182825404adc4ef6c762c212a3b3732f421fd70bea468b12ac9ef6b4b9f6feb3d88b6302ffc9afc5c45cbdcfcedccaa9093d2696490d8c902109a515c6d8674fd866965ed3d8fce1c6bdb46c959b24b9d312bb85dac5cd76f16da06485719373d74b1fd58ef26ff0e5f3c7b22959e8fd8b780dcb92e131387e445d40c4203844d3c4c6610b169d6e60d34883818a063e5f61f47bec5c84d2a048f9bddcc46849220276315a5e87380e5ccb87ae82c7be1aa110e5149e16d249d670cac3e6cc85353fe4e2b8ca69a832")

In [None]:
winequality2 = pd.read_csv("https://storage.googleapis.com/kagglesdsdata/datasets/4458/8204/winequality-red.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210818%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210818T122549Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=26801d950eae1db42a2e3517aa8e3213339b70374dcc564e45ddf30a1cc7024ce234e2530825d1b8499bbce0a84263967eaa954e117306182825404adc4ef6c762c212a3b3732f421fd70bea468b12ac9ef6b4b9f6feb3d88b6302ffc9afc5c45cbdcfcedccaa9093d2696490d8c902109a515c6d8674fd866965ed3d8fce1c6bdb46c959b24b9d312bb85dac5cd76f16da06485719373d74b1fd58ef26ff0e5f3c7b22959e8fd8b780dcb92e131387e445d40c4203844d3c4c6610b169d6e60d34883818a063e5f61f47bec5c84d2a048f9bddcc46849220276315a5e87380e5ccb87ae82c7be1aa110e5149e16d249d670cac3e6cc85353fe4e2b8ca69a832")

In [None]:
winequality.head() 

In [None]:
import tensorflow
type(winequality), tensorflow.size(winequality), winequality.dtypes

We want to predict the wine quality. Then the quality is the target value. The others columns from the dataset are the values of the features. We also split the dataset into the train set and the test set.

In [None]:
from sklearn.model_selection import train_test_split
x = winequality.drop(['quality'], axis=1)
y = winequality['quality']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

We import the keras.layers and keras.models packages:

In [None]:
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, InputLayer 
from tensorflow.keras.models import Sequential

Now, we are building the neural network and compile the model. You have to notice we import the Adam optimizer to choose the learning rate.  

In [None]:
from tensorflow.keras.optimizers import Adam
model = Sequential([InputLayer(input_shape=(x.shape[1],))
                    ,BatchNormalization()
                    ,Dense(100,activation='relu')
                    ,Dropout(0.3)
                    ,Dense(100,activation='relu')
                    ,Dropout(0.3)
                    ,BatchNormalization()
                    ,Dense(100, activation='relu')
                    ,Dropout(0.3)
                    ,Dense(12,activation='relu')
                    ,Dropout(0.1)
                    ,Dense(1)])

model.compile(optimizer=Adam(learning_rate=0.001),loss='mae',metrics=['mae'])

Now, we launch the training stage with 700 epochs. The number of epochs is defining how much time the whole training set will be passed through the network.

In [None]:
history = model.fit(x_train, y_train, epochs=700, verbose=0)

In [None]:
model.evaluate(x_test, y_test)

In [None]:
pd.DataFrame(history.history).plot()

# Red Wine Quality Classification
We rework the above example using classification instead of regression. We have to guess how many classes we have in our wine dataset. From the unique values of the target values Y, we get the number of classes. In fact, we have 6 classes.

In [1]:
import pandas as pd
winequality = pd.read_csv("https://raw.githubusercontent.com/fabnancyuhp/DEEP-LEARNING/main/DATA/winequality-red.csv").astype('float32')
#winequality2 = pd.read_csv("https://storage.googleapis.com/kagglesdsdata/datasets/4458/8204/winequality-red.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210818%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210818T122549Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=26801d950eae1db42a2e3517aa8e3213339b70374dcc564e45ddf30a1cc7024ce234e2530825d1b8499bbce0a84263967eaa954e117306182825404adc4ef6c762c212a3b3732f421fd70bea468b12ac9ef6b4b9f6feb3d88b6302ffc9afc5c45cbdcfcedccaa9093d2696490d8c902109a515c6d8674fd866965ed3d8fce1c6bdb46c959b24b9d312bb85dac5cd76f16da06485719373d74b1fd58ef26ff0e5f3c7b22959e8fd8b780dcb92e131387e445d40c4203844d3c4c6610b169d6e60d34883818a063e5f61f47bec5c84d2a048f9bddcc46849220276315a5e87380e5ccb87ae82c7be1aa110e5149e16d249d670cac3e6cc85353fe4e2b8ca69a832")

In [None]:
import numpy as np
nb_classes = len(np.unique(y))
print("The number of classe is: "+str(nb_classes))

Since we have 6 classes, we have to apply the to_categorical function to y_train and y_test. 

In [None]:
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder 

le = LabelEncoder()
le.fit(y_train)
y_train_cat = le.transform(y_train)
y_test_cat = le.transform(y_test)
y_train_cat_vect = to_categorical(y_train_cat ,num_classes=6)
y_test_cat_vect = to_categorical(y_test_cat,num_classes=6)
#y_test_cat[1],np.unique(y_test_cat),np.unique(y_train_cat)

Since we deal with a 6 classes problem, we build a neural network model such that:
* The loss function is the categorical_crossentropy
* The output layer yields a 6 sized probability vector
* The output layer activation function is the softmax function  

In [None]:
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(InputLayer(input_shape=(x.shape[1],)))
model.add(BatchNormalization())
model.add(Dense(100,activation='tanh'))
model.add(Dropout(0.2))
model.add(BatchNormalization())          
model.add(Dense(100,activation='tanh'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(12,activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(6,activation='softmax'))


In [None]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
historybis = model.fit(x_train, y_train_cat_vect, epochs=700, verbose=0)

#history = model.fit(x_train, y_train, epochs=700, verbose=0)

In [None]:
model.evaluate(x_test, y_test_cat_vect)

# Exercise : Classification on Diabetes Data
https://www.analyticsvidhya.com/blog/2021/05/develop-your-first-deep-learning-model-in-python-with-keras/<br>
In this exercise, we deal with a diabetes dataset. First, you have the run the cell below to import and view the head of the dataset. 

In [None]:
import pandas as pd
from tensorflow.keras.utils import get_file
#https://raw.githubusercontent.com/fabnancyuhp/DEEP-LEARNING/main/DATA/diabetes.csv
    
csv_file = get_file('diabetes.csv', 'https://raw.githubusercontent.com/fabnancyuhp/DEEP-LEARNING/main/DATA/diabetes.csv')

diabetes = pd.read_csv(csv_file)
diabetes.head()

1) Take a look at the above-displayed data set. Which columns do you choose for the features? What is the target? Make a code to select the features and the target.


2) Split the dataset into a training set and a test set using train_test_split.

3) How many classes are there on the target? Which loss function do you choose?

4) Make a neural network and compile it with TensorFlow.keras to solve this classification problem. 
You have to respect the following architecture: BatchNormalization layer following the input of the network, 2 hidden layers with 12 units and a l2 regularizer.

5) How many trainable parameters does your neural network have?

In [None]:
#1)
X = diabetes.drop(['Outcome'],axis=1)
Y = diabetes['Outcome']
#2)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
#3)
import numpy as np
print("number of classes:"+str(len(np.unique(Y.values))))

In [None]:
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, InputLayer
from tensorflow.keras.models import Sequential
from tensorflow.keras.regularizers import l2,l1

model = Sequential()
model.add(InputLayer(input_shape=(X.shape[1],)))
model.add(BatchNormalization())
model.add(Dense(12,activation='relu',kernel_regularizer=l2(0.01)))
model.add(Dense(12,activation='relu',kernel_regularizer=l1(0.01)))
model.add(Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss= 'binary_crossentropy',metrics=['accuracy'])
model.fit(X_train,y_train,epochs=600,verbose=0,batch_size=10)

model.evaluate(X_test, y_test)

In [None]:
diab_features = diabetes.drop(['Outcome'], axis=1)
diab_target = diabetes['Outcome']

import tensorflow  as tf

dataset = tf.data.Dataset.from_tensor_slices((diab_features.values, diab_target.values))
#from_tensor_slices((df.values, target.values))

for feat, targ in dataset.take(5):
    print ('Features: {}, Target: {}'.format(feat, targ))

# Breast Cancer Categorical Dataset
https://machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/