# Artificial Neural Network (ANN) 
---
## Overview of ANN
### What is ANN?

An Artificial Neural Network (ANN) is a computational model inspired by how the human brain works. It consists of layers of interconnected nodes (neurons), where each connection has a weight that is adjusted during training.

- Input Layer → Receives raw data.
- Hidden Layers → Extract patterns through weighted transformations + activation functions.
- Output Layer → Produces predictions (classification/regression).

### Why ANN?

- Can detect non-linear relationships between features.
- Useful in classification problems (e.g., customer churn, fraud detection).
- Provides better accuracy than simple ML algorithms for large & complex datasets.

### Technical Problems ANN Solves:

1. Linearity limitation → Traditional models (like Logistic Regression) may fail with non-linear data. ANN solves this by using hidden layers & non-linear activations.
2. High-dimensional data → ANN handles many input features efficiently.
3. Feature interactions → ANN automatically learns interactions (no manual feature engineering needed).
---

## Part 1 – Data Preprocessing
### 1. Importing libraries

In [8]:
%pip install tensorflow

import tensorflow as tf
tf.__version__

Collecting tensorflow
  Downloading tensorflow-2.20.0-cp310-cp310-win_amd64.whl (331.7 MB)
     ---------------------------------------- 0.0/331.7 MB ? eta -:--:--
     ---------------------------------------- 0.0/331.7 MB ? eta -:--:--
     ---------------------------------------- 0.0/331.7 MB ? eta -:--:--
     -------------------------------------- 0.1/331.7 MB 871.5 kB/s eta 0:06:21
     -------------------------------------- 0.1/331.7 MB 726.2 kB/s eta 0:07:37
     -------------------------------------- 0.2/331.7 MB 706.2 kB/s eta 0:07:50
     -------------------------------------- 0.2/331.7 MB 778.2 kB/s eta 0:07:06
     -------------------------------------- 0.3/331.7 MB 944.1 kB/s eta 0:05:52
     -------------------------------------- 0.3/331.7 MB 944.1 kB/s eta 0:05:52
     -------------------------------------- 0.3/331.7 MB 873.8 kB/s eta 0:06:20
     -------------------------------------- 0.4/331.7 MB 935.2 kB/s eta 0:05:55
     ---------------------------------------- 0.5/


[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


'2.20.0'

- numpy → for mathematical operations.
- pandas → for data handling.
- tensorflow → deep learning framework.

**Theoretical Terms:**

- Raw data often contains categorical features, scale differences, and irrelevant order.
- ANN needs numerical, standardized data.
- Two problems occur:
  1. Categorical variables (like Geography, Gender) → ANN can’t process text.
  2. Different scales (salary vs. age) → ANN gives more importance to higher numbers unless scaled.

### 2. Importing the dataset

In [10]:
dataset = pd.read_csv('P:\Batch\MOAZ\MLP\Deep Learning\ANN\Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

- Load dataset.
- X: Features (independent variables).
- y: Target (whether customer left → 0/1).

### 3. Encoding categorical data

In [12]:
%pip install scikit-learn

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

Collecting scikit-learnNote: you may need to restart the kernel to use updated packages.

  Using cached scikit_learn-1.7.1-cp310-cp310-win_amd64.whl (8.9 MB)
Collecting threadpoolctl>=3.1.0
  Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Collecting scipy>=1.8.0
  Using cached scipy-1.15.3-cp310-cp310-win_amd64.whl (41.3 MB)
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.5.2 scikit-learn-1.7.1 scipy-1.15.3 threadpoolctl-3.6.0



[notice] A new release of pip is available: 23.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


- Encodes Gender (Male/Female → 0/1).
- Problem solved: ANN requires numbers, not strings.

In [13]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

- One Hot Encoding Geography (France/Germany/Spain) → [1,0,0], [0,1,0], [0,0,1].
- Prevents false ordering problem (France ≠ 1 < Germany ≠ 2).

### 4. Splitting dataset

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

- Splits into Training (80%) + Test (20%).
- Training → learn weights, Testing → evaluate generalization.

### 5. Feature Scaling

In [15]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

- Scales features → mean = 0, variance = 1.
- Problem solved: ANN converges faster (gradient descent is sensitive to feature scale).

## Part 2 – Building the ANN
**Theoretical Terms:**

- Sequential Model → Stack layers in order.
- Dense Layer → Fully connected layer.
- Activation functions:
  - relu → hidden layers (solves vanishing gradient problem).
  - sigmoid → output layer for binary classification.

### Code Explanation:

In [16]:
# Initializing the ANN
ann = tf.keras.models.Sequential()

- Initialize ANN model.

In [17]:
# Adding the input layer and the first hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

- First hidden layer with 6 neurons.
- Why 6? Rule of thumb: ~average of input + output neurons.

In [18]:
# Adding the second hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

- Second hidden layer (deeper learning).

In [19]:
# Adding the output layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

- Output layer with 1 neuron.
- Sigmoid → outputs probability (0 to 1).

## Part 3 – Training the ANN
**Theoretical Terms:**

- Optimizer (Adam) → efficient gradient descent algorithm.
- Loss Function (Binary Crossentropy) → measures error for binary classification.
- Batch Size → number of samples per gradient update.
- Epochs → how many times the model sees the full dataset.

### Code Explanation:

In [20]:
# Compiling the ANN
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

- Compile ANN with optimizer + loss + evaluation metric.

In [21]:
# Training the ANN on the Training set
ann.fit(X_train, y_train, batch_size=32, epochs=100)

Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.7352 - loss: 0.5808
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8046 - loss: 0.4737
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8086 - loss: 0.4470
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8112 - loss: 0.4365
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8155 - loss: 0.4292
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8185 - loss: 0.4235
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8202 - loss: 0.4184
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8234 - loss: 0.4128
Epoch 9/100
[1m250/250[0m [32

<keras.src.callbacks.history.History at 0x2709370e5f0>

- Train for 100 epochs.
- Updates weights gradually until loss minimizes.

## Part 4 – Predictions & Evaluation
**Predicting a single observation:**

In [22]:
"""
Homework:
Use our ANN model to predict if the customer with the following informations will leave the bank: 
Geography: France
Credit Score: 600
Gender: Male
Age: 40 years old
Tenure: 3 years
Balance: $ 60000
Number of Products: 2
Does this customer have a credit card? Yes
Is this customer an Active Member: Yes
Estimated Salary: $ 50000
So, should we say goodbye to that customer?

Solution:
"""
# Predicting the result of a single observation
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 76ms/step
[[False]]


- Inputs new customer data.
- Greater than 0.5 → True = will leave, False = stays.
- One-hot encoded & scaled before prediction.

**Predicting Test Set:**

In [23]:
# Predicting the Test set results
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


- Predicts all customers in test set.

**Confusion Matrix & Accuracy:**

In [24]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
accuracy_score(y_test, y_pred)

0.8615

- Confusion Matrix → shows TP, TN, FP, FN.
- Accuracy → how many predictions were correct.

### Technical Problems Solved by ANN Steps

1. Categorical Encoding → Solves string → numeric issue.
2. Feature Scaling → Prevents bias due to different feature ranges.
3. Activation Functions → Solve non-linearity & vanishing gradient.
4. Binary Crossentropy → Proper error measurement for classification.
5. Confusion Matrix → Identifies misclassifications.