<p style="color:#153462; 
          font-weight: bold; 
          font-size: 30px; 
          font-family: Gill Sans, sans-serif; 
          text-align: center;">
          Building an ANN</p>

### Importing Required Modules

In [33]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import (LabelEncoder,
                                   OneHotEncoder,
                                   StandardScaler)
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split

In [3]:
tf.__version__

'2.12.0'

### Data Processing

In [4]:
dataset = pd.read_csv("data/Churn_Modelling.csv")
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [5]:
X = dataset.loc[:, "CreditScore":"EstimatedSalary"].values
y = dataset.iloc[:, -1].values

In [6]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [7]:
print(y)

[1 0 1 ... 1 1 0]


#### Encoding the data

<p style="text-align: justify; text-justify: inter-word;">
   <font size=3>
       <b>Label Encoding:</b>
Label Encoding is a technique used to convert categorical variables into numerical form. It assigns a unique integer value to each category, thereby creating an ordered relationship between the values. For example, if you have three categories: "red," "green," and "blue," label encoding may assign the values 0, 1, and 2, respectively.
   </font>
</p>

In [18]:
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

<p style="text-align: justify; text-justify: inter-word;">
   <font size=3>
       <b>One-Hot Encoding</b>, on the other hand, is a technique used to convert categorical variables into a binary vector representation. It creates new binary columns for each category, where a value of 1 represents the presence of that category and 0 represents the absence. Each category is treated as a separate feature. For example, using One-Hot Encoding, the three categories "red," "green," and "blue" would be represented as [1, 0, 0], [0, 1, 0], and [0, 0, 1], respectively.
   </font>
</p>


In [25]:
# https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
ct = ColumnTransformer(transformers=[("encoder", 
                                      OneHotEncoder(), 
                                      [1] # Column number
                                     )
                                    ],
                       remainder="passthrough")
X = ct.fit_transform(X)

In [32]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


In [34]:
# Spliting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [35]:
# Applying feature scaling
sc = StandardScaler()
sc_X_train = sc.fit(X_train)
X_train = sc_X_train.transform(X_train)
X_test = sc_X_train.transform(X_test)