<h1 style="text-align:center"><b><centre>Chromite Layer Classification</centre></b></h1><p></p>
<p style="font-size:115%;">In this end-to-end project tutorial, we will train an artificial neural network from scratch as a binary classifier for chromite layers prediction, while understanding the mathematics involved that allows the model to make predictions. Data used for the purpose of training the model can be accessed from <a href="https://www.kaggle.com/saurabhshahane/multivariate-geochemical-classification", target="_blank">here</a>.</p><br>
<p><b>NOTE:</b> Only the Following selected features are used for the purpose of training</p><br> <table padding="30px"><tr>
    <th>Features</th>
    <th>Description</th>
    </tr>
    <tr><td>Motherhole</td><td>Type of the Mother hole observed in image</td></tr>
    <tr><td>Holetype</td><td>Type of the hole observed</td></tr>
    <tr><td>DepthFrom</td><td>Depth from the Earth at which first observed</td></tr>
    <tr><td>DepthTo</td><td>Final depth at which ore is observed</td></tr>
    <tr><td>Cr2O3_%</td><td>Percentage of Cr2O3</td></tr>
    <tr><td>FeO_%</td><td>Percentage of FeO</td></tr>
    <tr><td>SiO2_%</td><td>Percentage of Si02</td></tr>
    <tr><td>MgO_%</td><td>Percentage of MgO</td></tr>
    <tr><td>Al2O3_%</td><td>Percentage of Al2O3</td></tr>
    <tr><td>CaO_%</td><td>Percentage of CaO</td></tr>
    <tr><td>P_%</td><td>Percentage of P</td></tr>
    <tr><td>Au_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Au</td></tr>
    <tr><td>Pt_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Pt</td></tr>
    <tr><td>Pd_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Pd</td></tr>
    <tr><td>Rh_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Rh</td></tr>
    <tr><td>Ir_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Ir</td></tr>
    <tr><td>Ru_ICP_ppm</td><td>Inductive Coupled Plasma analysis of Ru</td></tr>
    <tr><td>Filter</td><td>Filter</td></tr>
    </table>

<font face="Comic sans MS"><h2><b>Importing Project Dependencies</b></h2><p></p>
<p style="font-size:115%;">Now let us start with importing all the necessary modules.</p></font>

In [1]:
import numpy as np
import pandas as pd
import scipy.special

<font face="Comic sans MS"><p style="font-size:115%;">Importing the dataset, for the purpose of analysis and prediction.</p></font></font>

In [2]:
data = pd.read_csv('DataSet_Thaba_Classification.csv', sep=';')
print(data.head())

  ProjectCode BH_ID Motherhole  HoleType  MaxDepth  DepthFrom  DepthTo  \
0       J1103  SC06       SC06  borehole    639.35     619.33   619.36   
1       J1098  SC11       SC11  borehole    460.20     397.47   397.53   
2       J1492  MD16       MD16  borehole    278.80     151.53   151.69   
3       J1474  MD09       MD09  borehole     52.23      48.38    48.78   
4       J1097  SC63       SC63  borehole    333.75     330.35   330.59   

         Date  Cr2O3_%  FeO_%  ...  CaO_%   P_%  Au_ICP_ppm  Pt_ICP_ppm  \
0  23. Dez 08    34.96  19.29  ...   0.89  0.00        0.01        0.53   
1  23. Feb 09    39.64  20.77  ...   0.46  0.01        0.01        1.56   
2  17. Sep 13    46.28  20.81  ...   0.80  0.00        0.01        0.04   
3  09. Nov 09    39.53  19.65  ...   2.20  0.01        0.01        0.10   
4  23. Feb 09    43.11  23.51  ...   0.43  0.00        0.01        0.55   

   Pd_ICP_ppm  Rh_ICP_ppm  Ir_ICP_ppm  Ru_ICP_ppm  Stratigraphy  Filter  
0        0.16        0.14     

<font face="Comic sans MS"><h2>Data Preprocessing</h2><p></p>
<p style="font-size:115%;">Data preprocessing plays a crucial role in any kind of Machine Learning, Deep Learning project. It becomes very necessary to shape and tune the data in order to fit it according to our training needs and criteria, it is the most important part of any project! If such unprocessed data is used for training the model, the unwanted noise from the data could result in the poor performace of the model in real world applications.</p><p style="font-size:115%;">Firstly we will analyze the basic structure of our data</p></font>

In [3]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1205 entries, 0 to 1204
Data columns (total 23 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProjectCode   1205 non-null   object 
 1   BH_ID         1205 non-null   object 
 2   Motherhole    1205 non-null   object 
 3   HoleType      1205 non-null   object 
 4   MaxDepth      1205 non-null   float64
 5   DepthFrom     1205 non-null   float64
 6   DepthTo       1205 non-null   float64
 7   Date          1205 non-null   object 
 8   Cr2O3_%       1126 non-null   float64
 9   FeO_%         1125 non-null   float64
 10  SiO2_%        1126 non-null   float64
 11  MgO_%         1126 non-null   float64
 12  Al2O3_%       1125 non-null   float64
 13  CaO_%         1126 non-null   float64
 14  P_%           1125 non-null   float64
 15  Au_ICP_ppm    1205 non-null   float64
 16  Pt_ICP_ppm    1205 non-null   float64
 17  Pd_ICP_ppm    1205 non-null   float64
 18  Rh_ICP_ppm    1205 non-null 

<font face="Comic sans MS"><p style="font-size:115%;">As we can see some of our chosen features are not mathematically computable, so now we will convert those data columns into mathematically computable features.</p><p style="font-size:115%;"> Null values in our dataset can be a potential threat for poor performance of the model, in order to treat null values, we will replace the null values with the mean of the repective feature column.</p></font>

In [4]:
#We will convert our target class into either 0 and 1
y = np.array(data['Stratigraphy'])
data['Stratigraphy'] = data['Stratigraphy'].str.startswith('L')
#checking for null values, and replacing it with mean of that features column
for i in data.columns:
    if (data[i].isnull()).any():
        data[i].replace(np.nan, data[i].mean(), inplace=True)

for i in data.columns:
    print(data[i].isnull().value_counts())

print(data.info())

False    1205
Name: ProjectCode, dtype: int64
False    1205
Name: BH_ID, dtype: int64
False    1205
Name: Motherhole, dtype: int64
False    1205
Name: HoleType, dtype: int64
False    1205
Name: MaxDepth, dtype: int64
False    1205
Name: DepthFrom, dtype: int64
False    1205
Name: DepthTo, dtype: int64
False    1205
Name: Date, dtype: int64
False    1205
Name: Cr2O3_%, dtype: int64
False    1205
Name: FeO_%, dtype: int64
False    1205
Name: SiO2_%, dtype: int64
False    1205
Name: MgO_%, dtype: int64
False    1205
Name: Al2O3_%, dtype: int64
False    1205
Name: CaO_%, dtype: int64
False    1205
Name: P_%, dtype: int64
False    1205
Name: Au_ICP_ppm, dtype: int64
False    1205
Name: Pt_ICP_ppm, dtype: int64
False    1205
Name: Pd_ICP_ppm, dtype: int64
False    1205
Name: Rh_ICP_ppm, dtype: int64
False    1205
Name: Ir_ICP_ppm, dtype: int64
False    1205
Name: Ru_ICP_ppm, dtype: int64
False    1205
Name: Stratigraphy, dtype: int64
False    1205
Name: Filter, dtype: int64
<class 'pandas.co

In [5]:
def cat2num(arr):
    """Returns a dictionary of categorical unique values and assigns a categorical integer value to unique categories
    
    args-
        arr = A numpy array of features to be converted into categorical values
        
    returns-
        di = A dictionary of unique features and categorical values associated with it
        
    """
    di = dict.fromkeys(arr)
    val = np.arange(0, len(di) - 1)
    i = 0
    for j in di:
        di[j] = i
        i += 1
    return di

In [6]:
cat2numdict = {'Motherhole': np.nan, 'HoleType': np.nan, 'Stratigraphy': np.nan}
dict1 = cat2num(data['Motherhole'])
dict2 = cat2num(data['HoleType'])
dict3 = cat2num(data['Stratigraphy'])
cat2numdict['Motherhole'] = dict1
cat2numdict['HoleType'] = dict2
cat2numdict['Stratigraphy'] = dict3
data.replace(cat2numdict, inplace=True)
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1205 entries, 0 to 1204
Data columns (total 23 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ProjectCode   1205 non-null   object 
 1   BH_ID         1205 non-null   object 
 2   Motherhole    1205 non-null   int64  
 3   HoleType      1205 non-null   int64  
 4   MaxDepth      1205 non-null   float64
 5   DepthFrom     1205 non-null   float64
 6   DepthTo       1205 non-null   float64
 7   Date          1205 non-null   object 
 8   Cr2O3_%       1205 non-null   float64
 9   FeO_%         1205 non-null   float64
 10  SiO2_%        1205 non-null   float64
 11  MgO_%         1205 non-null   float64
 12  Al2O3_%       1205 non-null   float64
 13  CaO_%         1205 non-null   float64
 14  P_%           1205 non-null   float64
 15  Au_ICP_ppm    1205 non-null   float64
 16  Pt_ICP_ppm    1205 non-null   float64
 17  Pd_ICP_ppm    1205 non-null   float64
 18  Rh_ICP_ppm    1205 non-null 

In [7]:
X = np.array(data.drop(['ProjectCode', 'BH_ID', 'Date', 'Stratigraphy'], axis=1)).astype('float32')
y = np.array(data['Stratigraphy'])

In [8]:
def StandardScalar(arr):
    """Scales all the features in the range of -1 to 1, i.e normalization of the features
    
    args-
        arr = A 1D or 2D numpy array of features
        
    returns-
        arr = Normalized feature values with mean == 0 and standard_deviation == 1, for each feature column
        
    """
    try:
        for i in range(arr.shape[1]):
            mean = arr[:, i].mean()
            std = arr[:, i].std()
            arr[:, i] = (arr[:, i] - mean) / std

    except IndexError:
        mean = arr.mean()
        std = arr.std()
        arr = (arr - mean) / std

    return arr


In [9]:
print(X[:, 0].mean(), X[:, 0].std())
X = StandardScalar(X)
print(X[:, 0].mean(), X[:, 0].std())

101.54689 62.549202
-4.4320135e-08 0.99999994


In [10]:
def train_test_split(X, y, testing_size=0.2):
    """Splits the data into training and testing sets
    
    args-
        X = A 1D or 2D numpy array of features
        y = A 1D numpy array of target classes
        testing_size(deflaut_param) = (default value = 0.2) Percent data to be splitted as testing data
        
    returns-
        X_train = A numpy array of feature training set, of the size (1 - testing_size)% of the actual data
        y_train = A numpy array of target classes training set, of the size (1 - testing_size)% of the actual data
        X_test = A numpy array of feature testing/validating set, of the size (testing_size)% of the actual data
        y_test = A numpy array of target classes testing/validating set, of the size (testing_size)% of the actual data
        
    """
    total_rows_no = X.shape[0]
    testin_rows_no = int(testing_size * total_rows_no)
    rand_row_no = np.random.randint(0, total_rows_no, testin_rows_no)

    X_train = np.array(X[rand_row_no])
    X_test = np.delete(X, rand_row_no, axis=0)

    y_train = np.array(y[rand_row_no])
    y_test = np.delete(y, rand_row_no, axis=0)

    return X_train, y_train, X_test, y_test

In [11]:
X_train, y_train, X_test, y_test = train_test_split(X, y, testing_size=0.2)

In [12]:
class DenseNeuralNetwork:
    def __init__(self, input_nodes, hidden_nodes1, hidden_nodes2, output_nodes, learning_rate=0.001, epochs=1):
        """Initializes the basic architecture of the Neural Network
        
        args-
            input_nodes = Number of Input nodes in the 1st layer of the network
            hidden_nodes1 = Number of noeds in the 1st hidden layer
            hidden_nodes2 = Number of nodes in the 2nd hidden layer
            output_nodes = Number of target classes to be predicted
            learning_rate(defalut_param) = (default_value = 0.001) Learning rate used in backpropogation for 
                                           tweaking weights and biasess
            epochs(defalut_param) = (defalut_value = 1)Number of epochs
        
        """
        self.inodes = input_nodes
        self.onodes = output_nodes
        self.hnodes1 = hidden_nodes1
        self.hnodes2 = hidden_nodes2
        self.lr = learning_rate
        self.epochs = epochs
        self.wih1 = np.random.rand(self.hnodes1, self.inodes) - 0.5   # initializing weights associated with 1st hidden layer
        self.wh1h2 = np.random.rand(self.hnodes2, self.hnodes1) - 0.5 # initializing weights associated with 2nd hidden layer
        self.wh2o = np.random.rand(self.onodes, self.hnodes2) - 0.5   # initializing weights associated with output layer
        self.bih1 = np.zeros((self.hnodes1, 1)) + 0.01                # initializing biases associated with 1st hidden layer
        self.bh1h2 = np.zeros((self.hnodes2, 1)) + 0.01               # initializing biases associated with 2nd hidden layer
        self.bh2o = np.zeros((self.onodes, 1)) + 0.01                 # initializing biases associated with output layer
        self.activation_sigmoid = lambda x: scipy.special.expit(x)    # Sigmoid activation function:- 1 / 1 + e^(-x)
        self.activation_softmax = lambda x: scipy.special.softmax(x)  # Softmax activation function:- e^(xi) / Sigma(e^xi)
        self.loss = 0.0

    def forwardprop(self, sample_input):
        """Passes the sample input through the neural network and produces an output array of probabilities aka "Feed-forward"
        
        args-
            sample_input = numpy array of features
        
        """
        sample_input = sample_input.reshape(self.batch_size, -1, 1)

        hidden_inputs1 = np.matmul(self.wih1, sample_input) + self.bih1 # passing inputs through hidde layer1
        hidden_outputs1 = self.activation_sigmoid(hidden_inputs1)       

        hidden_inputs2 = np.matmul(self.wh1h2, hidden_outputs1) + self.bh1h2 # passing hidden1 outputs through hidden layer2
        hidden_outputs2 = self.activation_sigmoid(hidden_inputs2)

        classifier_inputs = np.matmul(self.wh2o, hidden_outputs2) + self.bh2o # passing through final output layer
        classifier_outputs = self.activation_softmax(classifier_inputs)

        self.outputs = classifier_outputs
        self.hidden_outputs1 = hidden_outputs1
        self.hidden_outputs2 = hidden_outputs2

    def CategoricalCrossEntropy(self, classifier_outputs, sample_output):
        """Calculates total loss during the training time by the method of "Categorical-Cross-Entropy" aka "Log-Loss"
        
        args-
            classifier_outputs = output array formulated by forwardprop method
            sample_output = actual output for respective sample_input
        
        formula-
            J = 1 / m sigma((yi)log(yi_hat)) where, 
            m = number of samples tested
            yi = actual output for the sample
            yi_hat = predicted output array from forwardprop method 
            J = total loss
            
        """
        one_hot_encoded_matrix = np.zeros((self.batch_size, self.onodes, 1)) + 0.01 
        one_hot_encoded_matrix[[i for i in range(self.batch_size)], sample_output] = 0.99 # creating one hot encoded matrix

        self.one_hot_encoded_matrix = one_hot_encoded_matrix

        self.error = np.sum(one_hot_encoded_matrix * np.log(classifier_outputs), axis=0) # Log-Loss
        self.error = np.sum(self.error)

        self.loss += self.error # Updating loss

    def Backprop(self, sample_input, sample_output):
        """Tweaks the weights and the biases of each layer by Stochastic Gradient Descent
        
        args-
            sample_input = numpy array of features
            sample_output = actual output for respective sample_input
        
        """
        sample_input = sample_input.reshape(self.batch_size, -1, 1)

        """gradint of cost function wrt weights of output layer"""
        dc_dwh2o = np.matmul((self.outputs - self.one_hot_encoded_matrix), self.hidden_outputs2.transpose(0, 2, 1))
        
        """gradint of cost function wrt weights of hidden layer2 layer"""
        ao_delta = (self.outputs - self.one_hot_encoded_matrix)


        dc_dho = np.matmul(self.wh2o.transpose(), ao_delta) # chain rule
        dho_dz = self.hidden_outputs2 * (1 - self.hidden_outputs2) # derivative of hidden_outputs2 with sigmoid as activation
        dc_dz = dc_dho * dho_dz
        dc_dwh1h2 = np.matmul(dc_dz, self.hidden_outputs1.transpose(0, 2, 1))
        
        """gradint of cost function wrt weights of hidden layer1 layer"""
        dc_dho1 = np.matmul(self.wh1h2.transpose(), dc_dho)
        dho_dz1 = self.hidden_outputs1 * (1 - self.hidden_outputs1) # derivative of hidden_outputs1 with sigmoid as activation
        dc_dz1 = dc_dho1 * dho_dz1
        dc_dwih1 = np.matmul(dc_dz1, sample_input.transpose(0, 2, 1))

        dc_dwh2o = np.mean(dc_dwh2o, axis=0)
        dc_dwh1h2 = np.mean(dc_dwh1h2, axis=0)
        dc_dwih1 = np.mean(dc_dwih1, axis=0)
        
        """Updating weights and biases"""
        self.wh2o -= self.lr * dc_dwh2o
        self.wh1h2 -= self.lr * dc_dwh1h2
        self.wih1 -= self.lr * dc_dwih1

        self.bh2o -= self.lr * np.mean(ao_delta, axis=0)
        self.bh1h2 -= self.lr * np.mean(dc_dz, axis=0)
        self.bih1 -= self.lr * np.mean(dc_dz1, axis=0)

    def fit(self, X_train, y_train):
        """trains the model and prints loss and accuracy achieved in every epoch
        
        args-
            X_train = training set of features
            y_train = training set of target classes
            
        """
        self.batch_size = 1
        for i in range(self.epochs):
            correct1 = 0
            for j in range(X_train.shape[0]):
                self.forwardprop(X_train[j])
                correct1 += self.score(y_train[j])
                self.CategoricalCrossEntropy(self.outputs, y_train[j])
                self.Backprop(X_train[j], y_train[j])
                #print(j)
            acc = correct1 / X_train.shape[0]
            print(f'Epoch: {i + 1}/{self.epochs}\n loss: {-(self.loss / X_train.shape[0])} accuracy: {acc}')
            self.loss = 0

    def score(self, output_batch):
        """Calculates the accuracy of the predictions
        
        args-
            output_batch = Actual output batch NumPy array
            
        returns-
            np.sum(correct.astype(int)) = vectorized comparison for correct outputs and converting the array into int array
            and taking sum of the array
        
        """
        correct_predicted = np.argmax(self.outputs, axis=1).reshape(-1, )
        #print(correct_predicted, output_batch)
        correct = (correct_predicted == output_batch)
        return np.sum(correct.astype(int))

    def evaluate(self, X_test, y_test):
        """Testing the model on the test set after fitting the model
        
        args-
            X_test = Testing set of features
            y_test = Tesing set of target classes
            
        """
        correct1 = 0
        for i in range(X_test.shape[0]):
            self.forwardprop(X_test[i])
            correct1 += self.score(y_test[i])
        self.Score = correct1 / X_test.shape[0]

In [13]:
model = DenseNeuralNetwork(19, 1024, 512, 2, learning_rate=0.001, epochs=5)
model.fit(X_train, y_train)

Epoch: 1/5
 loss: 0.4076499040298143 accuracy: 0.8464730290456431
Epoch: 2/5
 loss: 0.16110872475353785 accuracy: 0.9626556016597511
Epoch: 3/5
 loss: 0.13264514478807155 accuracy: 0.979253112033195
Epoch: 4/5
 loss: 0.11875690171132428 accuracy: 0.983402489626556
Epoch: 5/5
 loss: 0.11026658347217935 accuracy: 0.991701244813278


In [14]:
model.evaluate(X_test, y_test)
print(model.Score)

0.9735772357723578
