# <center> Application for Admission in Nursery School System</center>
## <center> Nursery Dataset </center>
---

### Problem Statement: 
This is a classification problem. The dataset contains information about occupation of parents and child's nursery, family structure and financial standing, and social and health picture of the family. We need to train the model and given a new dataset we need to classify whether the child is not_recommended, recommended, very_recommended, priority and spec_priority. It solves the problem of figuring out what prerequisites are needed for a child's application to be accepted by the Nursery school. The advantage of using my trained model will help parents realize what are the problems that must avoided when applying for a Nursery school.

Here's the reference for the dataset : [Nursery Dataset](https://archive.ics.uci.edu/ml/datasets/nursery)
***

### Understanding the dataset
This is a classification problem. Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools. The dataset contains information about occupation of parents and child's nursery, family structure and financial standing, and social and health picture of the family.

The hierarchical model ranks nursery-school applications according to the following concept structure:

       NURSERY            Evaluation of applications for nursery schools
       . EMPLOY           Employment of parents and child's nursery
       . . parents        Parents' occupation
       . . has_nurs       Child's nursery
       . STRUCT_FINAN     Family structure and financial standings
       . . STRUCTURE      Family structure
       . . . form         Form of the family
       . . . children     Number of children
       . . housing        Housing conditions
       . . finance        Financial standing of the family
       . SOC_HEALTH       Social and health picture of the family
       . . social         Social conditions
       . . health         Health conditions
       
Total Number of Instances: 12960

Total Number of Attributes: 8

Attribute Values:

       parents        usual, pretentious, great_pret
       has_nurs       proper, less_proper, improper, critical, very_crit
       form           complete, completed, incomplete, foster
       children       1, 2, 3, more
       housing        convenient, less_conv, critical
       finance        convenient, inconv
       social         non-prob, slightly_prob, problematic
       health         recommended, priority, not_recom

We need to train the model and given a new data we need to classify whether the child is: 

        not_recommended
        recommended
        very_recommended
        priority
        spec_priority
Alright let's start.

In [54]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

In [55]:
d = pd.read_csv('nursery_data.csv', names=["parents","has_nurs","form","children","housing","finance","social","health","class"])
d

Unnamed: 0,parents,has_nurs,form,children,housing,finance,social,health,class
0,usual,proper,complete,1,convenient,convenient,nonprob,recommended,recommend
1,usual,proper,complete,1,convenient,convenient,nonprob,priority,priority
2,usual,proper,complete,1,convenient,convenient,nonprob,not_recom,not_recom
3,usual,proper,complete,1,convenient,convenient,slightly_prob,recommended,recommend
4,usual,proper,complete,1,convenient,convenient,slightly_prob,priority,priority
5,usual,proper,complete,1,convenient,convenient,slightly_prob,not_recom,not_recom
6,usual,proper,complete,1,convenient,convenient,problematic,recommended,priority
7,usual,proper,complete,1,convenient,convenient,problematic,priority,priority
8,usual,proper,complete,1,convenient,convenient,problematic,not_recom,not_recom
9,usual,proper,complete,1,convenient,inconv,nonprob,recommended,very_recom


### Challenges with preparing the data for training
Here is how the dataset looks.

The entire dataset is in the form of strings. Also the dataset is completely classified. The challenge with the data is we need to convert the dataset which is in the form of strings into a format where we can feed it into a model. 


### Deriving insights from the dataset
This is a classificcation problem. We need to classify whether a child belongs to any of the classes as mentioned in the dataset.
### Preprocessing the data for the model
In order to do any training with the dataset, we need to first preprocess the data.
Data preprocessing plays a major role before applying any algorithm. So we need to convert the dataset which is in the form of strings to numbers.

We cannot find any correlation since the entire dataset is classifed. So we need to manually convert the strings to numbers. 
We are converting the strings to numbers or binary codes in two ways : by using categorical codes and by using dummy values which is done as follows:

**Categorical:**
```python
data[features].astype("category").cat.codes
```

**Binary values using get_dummies**
```python
pd.get_dummies(data[features])
```

In [56]:
#Categorical
features = ["has_nurs","children","housing","social"]
for column in features:
    d[column] = d[column].astype("category").cat.codes

In [57]:
features1 = ['parents','form','finance','health','class']
x1 = d.drop(features1, axis = 1)
d.head(5)

Unnamed: 0,parents,has_nurs,form,children,housing,finance,social,health,class
0,usual,3,complete,0,0,convenient,0,recommended,recommend
1,usual,3,complete,0,0,convenient,0,priority,priority
2,usual,3,complete,0,0,convenient,0,not_recom,not_recom
3,usual,3,complete,0,0,convenient,2,recommended,recommend
4,usual,3,complete,0,0,convenient,2,priority,priority


In [58]:
# Get Dummies
x2 = d.drop(features, axis = 1)

x2 = pd.get_dummies(x2)
x2.head(5)

Unnamed: 0,parents_great_pret,parents_pretentious,parents_usual,form_complete,form_completed,form_foster,form_incomplete,finance_convenient,finance_inconv,health_not_recom,health_priority,health_recommended,class_not_recom,class_priority,class_recommend,class_spec_prior,class_very_recom
0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0
1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0
2,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,0,0
3,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0
4,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0


In [59]:
#Concatinating into a dataframe
x = pd.concat([x1, x2], axis = 1)
x.head(5)

Unnamed: 0,has_nurs,children,housing,social,parents_great_pret,parents_pretentious,parents_usual,form_complete,form_completed,form_foster,...,finance_convenient,finance_inconv,health_not_recom,health_priority,health_recommended,class_not_recom,class_priority,class_recommend,class_spec_prior,class_very_recom
0,3,0,0,0,0,0,1,1,0,0,...,1,0,0,0,1,0,0,1,0,0
1,3,0,0,0,0,0,1,1,0,0,...,1,0,0,1,0,0,1,0,0,0
2,3,0,0,0,0,0,1,1,0,0,...,1,0,1,0,0,1,0,0,0,0
3,3,0,0,2,0,0,1,1,0,0,...,1,0,0,0,1,0,0,1,0,0
4,3,0,0,2,0,0,1,1,0,0,...,1,0,0,1,0,0,1,0,0,0


We manually divided some features into categorical values and some features into binary values. We divided them such that those features which have equal proirity are divided using get_dummies and the rest are divided using categorical values.
Also we split the target feature from the dataset.

In [60]:
target = ["class_not_recom", "class_priority", "class_recommend", "class_spec_prior", "class_very_recom"]
X = x.drop(target, axis = 1)

In [61]:
target = x[target]
Y = pd.DataFrame(target)
Y = np.array(Y)

In [62]:
X.shape

(12960, 16)

In [63]:
Y.shape

(12960, 5)

#### Principal Component Analysis(PCA)
It is a linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. Basically, it is a dimension-reduction tool that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set.
Since we have 16 dimensions in the data, we use Principal Component Analysis(PCA) in order to do dimensionality reduction as follows:

In [64]:
from sklearn.decomposition import PCA

In [65]:
pca = PCA(n_components=8)
xtr = pca.fit_transform(X)

In [66]:
xtr.shape

(12960, 8)

Now, xtr have been reduced to 8 dimensions which will be easier to use in any model that we use.

### Picking a model based on the insights
The model that we are going to use is Artificial Neural Networks.
Our Neural Network consists of just 3 layers : Input layer, 1 Hidden layer and a Output layer.
Input layer consists of 8 nodes.  Hiddden layer consists of 31 nodes and output nodes consists of 5 nodes.
We use 2 activation functions : tanh for hidden layer(for the value to lie between -1 & +1) and sigmoid for the output layer(for the value to lie between 0 & 1).

We split our dataset into training set containing 80%(10368) and test set containing the remaining 20%(2592).
Since the training dataset contains approximately 10000 datas, we need to divide it into batches and then feed that into a Neural Network. Here, we divide it into 10 batches, each batch containing 1036 data points.
We feed all the batches to the Neural Network with epochs as 10.

Alright, let's train our model.

In [120]:
# All Activation Functions and their Transfer Derivatives

# 1. Sigmoid / Logistic Function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def dsigmoid(x):
    return x * (1-x)

# 2. Rectified Linear Unit Function
def relu(x):
     return abs(x) * (x > 0)

def drelu(x):
     return 1. * (x > 0.)

# 3. Leaky-Relu Functions
def lrelu(x):
    return np.where(x > 0., x, x * 0.01)

def dlrelu(x):
    dx = np.ones_like(x)
    dx[x < 0.] = 0.01
    return dx

# 4. Hyperbolic Tan Function
def tanh(x):
    return np.tanh(x)

def dtanh(x):
    return 1.0 - (np.power(np.tanh(x),2))

In [121]:
def feed_forward(data_in, w0,w1, b0,b1):
    '''
    The Feed-forward considers 5 layers including input and output layer.
    
    The output layer/neuron is a classification node.
    
    returns: state of each layer
    '''
    layer0 = data_in
    layer1 = tanh(np.dot(layer0, w0)+b0)
    layer2 = sigmoid(np.dot(layer1, w1)+b1)

    return layer0, layer1, layer2

In [122]:
def backpropogate(i, layer0, layer1, layer2, actual_y, w0,w1,b0,b1, learning_rate):
    '''
    This backpropogate is only slightly different from a regular classifier
    in ways in which the output layer gradient is calculated.
    
    Since the output layer is not a function of any activation function,
    the delta doesn't need to be multiplied with the transfer derivative of the
    output layer.
    
    The rest is all the same.
    
    returns: weights and bias matrices
    '''
    l2_error = layer2 - actual_y
    l2_delta = l2_error * dsigmoid(layer2)
    dh2 = np.dot(layer1.T, l2_delta)
 
    l1_error = l2_delta.dot(w1.T)
    l1_delta = l1_error * dtanh(layer1)
    dh1 = np.dot(layer0.T, l1_delta)
    
    w1 = w1 - (learning_rate * dh2)
    w0 = w0 - (learning_rate * dh1)

    b1 = b1 - (learning_rate * np.mean(l2_delta))
    b0 = b0 - (learning_rate * np.mean(l1_delta))    
   
    if i%1==0 and (i!=0):
        loss = np.mean(np.power(layer2-actual_y, 2))
        loss_curve.append(loss)
        iters.append(int(i))
        
        if i%1 == 0:
            print("\n", int(i), loss)

        
    return w0, w1,b0,b1

In [123]:
def accuracy(testx, testy):
    correct = 0
    layer0, layer1, layer2 = feed_forward(testx,w0, w1, b0,b1)
    for i in range(len(testx)):
        if np.argmax(layer2[i]) == np.argmax(testy[i]):
            correct +=1 
            
    return f"Accuracy: {correct*100/len(testy)}%"

In [124]:
xtrain, xtest, ytrain, ytest = train_test_split(xtr,Y, test_size=0.2)

In [125]:
xtrain.shape

(10368, 8)

In [126]:
xtest.shape

(2592, 8)

In [127]:
xbatch = np.array_split(xtrain, 10)
ybatch = np.array_split(ytrain, 10)

In [128]:
np.random.seed(3)

w0 = np.random.random((8,31))
w1 = np.random.random((31,5))


b0 = np.random.random((1,1))-1
b1 = np.random.random((1,1))-1


epochs = 10

In [129]:
# Initialising variables to track loss vs iterations so we can plot the changes
loss_curve = []
iters = []

In [283]:
for j in range(len(xbatch)):
    print(f"Batch No :{j+1}")
    for i in range(epochs):
        layer0, layer1, layer2 = feed_forward(xbatch[j], w0,w1, b0,b1)
        w0,w1,b0,b1 = backpropogate(i,layer0, layer1, layer2, ybatch[j], w0,w1, b0,b1, 0.0001 )


Batch No :1

 1 0.02214744719644913

 2 0.02213842283095222

 3 0.0221296589012209

 4 0.022121142562262358

 5 0.022112861810159503

 6 0.022104805415638935

 7 0.022096962863500602

 8 0.0220893242973561

 9 0.022081880469177598
Batch No :2

 1 0.025999409132386967

 2 0.025987452636579752

 3 0.025976047011839837

 4 0.025965156666247127

 5 0.025954748592284683

 6 0.025944792163526385

 7 0.025935258948182136

 8 0.025926122538057744

 9 0.0259173583916115
Batch No :3

 1 0.023298979742698303

 2 0.023283538214900763

 3 0.023268793326801582

 4 0.02325470144583049

 5 0.023241221946335825

 6 0.023228316991473742

 7 0.0232159513315069

 8 0.023204092117215816

 9 0.023192708727238658
Batch No :4

 1 0.0263254573608811

 2 0.02631890148008365

 3 0.026312492960202293

 4 0.026306224335547378

 5 0.026300088570581467

 6 0.026294079034135182

 7 0.026288189475207047

 8 0.026282414000248305

 9 0.026276747051840098
Batch No :5

 1 0.024385403844133964

 2 0.024374604571413213

 3 

In [284]:
accuracy(xtrain, ytrain)

'Accuracy: 93.10378086419753%'

In [285]:
accuracy(xtest, ytest)

'Accuracy: 92.93981481481481%'

### Insights from the trained model and Final Conclusion based on the choosen model
After training the model for several times as mentioned above, the final accuracy we got is as follows:
For Training Data : 93.10% 
For Testing Data  : 92.93%

The final accuracy tells that given a new data about a child, it will predict correctly 90% of the time whether the child belongs to any of the classes as mentioned above.