# Churn Prediction using Artificial Neural Networks (ANN)

Churn refers to the phenomenon where customers or users of a product or service discontinue their usage or subscription. It is a critical metric for businesses, as it directly impacts revenue and customer retention.

Artificial Neural Networks (ANN) are a type of machine learning model inspired by the structure and function of the human brain. ANN can be used to predict churn by analyzing patterns and relationships in large datasets.

## Steps for Churn Prediction using ANN:

1. **Data Preprocessing**: This step involves cleaning and preparing the dataset for analysis. It includes handling missing values, encoding categorical variables, and scaling numerical features.

2. **Building the ANN Model**: In this step, we define the architecture of the ANN model. It typically consists of an input layer, one or more hidden layers, and an output layer. Each layer contains multiple neurons that perform computations.

3. **Training the ANN Model**: The model is trained using a training dataset, where the input features are used to predict the churn outcome. During training, the model adjusts its weights and biases to minimize the prediction error.

4. **Evaluating the Model**: After training, the model is evaluated using a separate validation or test dataset. Various evaluation metrics such as accuracy, precision, recall, and F1 score can be used to assess the performance of the model.

5. **Making Predictions**: Once the model is trained and evaluated, it can be used to make predictions on new, unseen data. The model takes the input features and predicts the likelihood of churn for each customer.

6. **Taking Action**: Based on the predictions, businesses can take proactive measures to retain customers who are at high risk of churn. This can include targeted marketing campaigns, personalized offers, or improved customer service.

Churn prediction using ANN is a powerful technique that can help businesses identify and mitigate customer churn. By understanding the factors that contribute to churn and leveraging the predictive capabilities of ANN, businesses can take proactive steps to retain valuable customers and improve overall customer satisfaction.


## Part 1 - Data Preprocessing

In this code cell, we are performing data preprocessing steps on a dataset for churn prediction using artificial neural networks (ANN).

### Importing the libraries

We start by importing the necessary libraries for data preprocessing and analysis: `numpy`, `matplotlib.pyplot`, and `pandas`.

### Importing the dataset

Next, we import the dataset using the `pd.read_csv()` function from the `pandas` library. The dataset is stored in a CSV file named "Churn_Modelling.csv". We assign the features to the variable `X` and the target variable to the variable `y`.

### Encoding categorical data

To handle categorical data in the dataset, we use the `LabelEncoder` and `OneHotEncoder` classes from the `sklearn.preprocessing` module. We create a `ColumnTransformer` object named `ct` to apply the encoders to specific columns. The `OneHotEncoder` is applied to the column with index 1, representing the "Geography" feature. The remaining columns are passed through without any transformation. The transformed data is stored in the variable `X`.

### Splitting the dataset into the Training set and Test set

We split the dataset into training and test sets using the `train_test_split()` function from the `sklearn.model_selection` module. The `X` and `y` variables are split into `X_train`, `X_test`, `y_train`, and `y_test` with a test size of 0.2 (20% of the data) and a random state of 0.

### Feature Scaling

To ensure that all features are on the same scale, we perform feature scaling using the `StandardScaler` class from the `sklearn.preprocessing` module. We fit the scaler on the training set (`X_train`) and transform both the training and test sets (`X_train` and `X_test`) using the `fit_transform()` and `transform()` methods, respectively. The scaled test set is stored in the variable `X_test`.

The code cell outputs the transformed `X` values and the scaled `X_test` values.


In [2]:
# Part 1 - Data Preprocessing

# Importing the libraries

"""
from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])],     remainder='passthrough')
    X=np.array(columnTransformer.fit_transform(X),dtype=np.str)
Since the latest build of sklearn library 
removed categorical_features parameter 
for onehotencoder class. 
It is advised to use ColumnTransformer class for categorical datasets. 
Refer the sklearn's official documentation for futher clarifications.
"""
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv("E:\\2021\\notes 2021\\CSA501 Deep learning_amity\\data set\\Churn_Modelling.csv")
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
print (X)
print (y)


# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Geography",OneHotEncoder(),[1])], remainder= 'passthrough')
ct
X = ct.fit_transform(X)
labelencoder_X2 = LabelEncoder()
X[:, 4] = labelencoder_X2.fit_transform(X[:, 4])
X = X[: , 1:]
print (X)

"""
X = np.array(X, dtype=float)

Just adding an extra line to convert it from array of objects.
"""

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_test)




[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]
[1 0 1 ... 1 1 0]
[[0.0 0.0 619 ... 1 1 101348.88]
 [0.0 1.0 608 ... 0 1 112542.58]
 [0.0 0.0 502 ... 1 0 113931.57]
 ...
 [0.0 0.0 709 ... 0 1 42085.58]
 [1.0 0.0 772 ... 1 0 92888.52]
 [0.0 0.0 792 ... 1 0 38190.78]]
[[ 1.75486502 -0.57369368 -0.55204276 ...  0.64259497  0.9687384
   1.61085707]
 [-0.5698444  -0.57369368 -1.31490297 ...  0.64259497 -1.03227043
   0.49587037]
 [-0.5698444   1.74309049  0.57162971 ...  0.64259497  0.9687384
  -0.42478674]
 ...
 [-0.5698444   1.74309049 -0.74791227 ...  0.64259497 -1.03227043
   0.71888467]
 [ 1.75486502 -0.57369368 -0.00566991 ...  0.64259497  0.9687384
  -1.54507805]
 [ 1.75486502 -0.57369368 -0.79945688 ...  0.64259497 -1.03227043
   1.61255917]]
Epoch 1/100
Epoch 2/100
Epoch 3/1

## Artificial Neural Network (ANN) Documentation

This documentation provides an overview of the code that builds an Artificial Neural Network (ANN) for predicting churn. It explains the libraries imported and their usage in building the ANN.

### Libraries Used

The following libraries are imported in the code:

1. `keras`: Keras is a high-level neural networks API written in Python. It provides a user-friendly interface for building and training deep learning models.

2. `Sequential`: Sequential is a class from the Keras library that allows us to build a neural network model layer by layer.

3. `Dense`: Dense is a class from the Keras library that represents a fully connected layer in a neural network. It is used to add layers to the neural network model.

### Code Explanation

The code is divided into three parts: data preprocessing, building the ANN, and making predictions.

#### Part 1 - Data Preprocessing

This part is done in the above code cell.

#### Part 2 - Building the ANN

The code starts by initializing the ANN using the `Sequential` class. Then, it adds layers to the ANN using the `Dense` class.

1. Adding the input layer and the first hidden layer:
    - `classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))`
    - This line adds the first hidden layer to the ANN. It has 6 neurons, uses the 'relu' activation function, and expects an input dimension of 11.

2. Adding the second hidden layer:
    - `classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))`
    - This line adds the second hidden layer to the ANN. It has 6 neurons and uses the 'relu' activation function.

3. Adding the output layer:
    - `classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))`
    - This line adds the output layer to the ANN. It has 1 neuron and uses the 'sigmoid' activation function.

4. Compiling the ANN:
    - `classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])`
    - This line compiles the ANN by specifying the optimizer, loss function, and metrics to be used during training.

#### Part 3 - Making Predictions and Evaluating the Model

1. Fitting the ANN to the Training set:
    - `classifier.fit(X_train, y_train, batch_size=10, epochs=100)`
    - This line trains the ANN on the training set. It specifies the batch size and number of epochs for training.

2. Predicting the Test set results:
    - `y_pred = classifier.predict(X_test)`
    - This line predicts the churn values for the test set using the trained ANN.

3. Making the Confusion Matrix:
    - `from sklearn.metrics import confusion_matrix`
    - `cm = confusion_matrix(y_test, y_pred)`
    - This code calculates the confusion matrix to evaluate the performance of the model.




In [None]:
# Part 2 - Now let's make the ANN!

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense


#1. model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))

#3It means 8 input parameters,  with 12 neurons in the FIRST hidden layer.
2. #In Keras, "dense" usually refers to a single layer, whereas "sequential" 
#usually refers to an entire model, not just one layer. ... 
#Sequential refers to the way you build models in Keras using the sequential api
3. #Output_dim is the dimension of the dense embedding. The choice of 128 in classifier.add(Dense(output_dim = 128, activation = 'relu')) is quite arbitrary ,
# it just indicate the size of fully connected layer that you prefer. uniform distribution 
#input dim. Sometimes, though, you just have one dimension – which is the case with one-dimensional / flattened arrays,
#4. #Adam optimization is an extension to Stochastic gradient decent and can be used in place
# of classical 
#stochastic gradient descent to update network weights more efficiently.
# Part 1 - Data Preprocessing
# Part 2 - Now let's make the ANN! # Part 3 - Making the predictions and evaluating the model


# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
#classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
#classifier.add(Dense(6, init = 'uniform', activation = 'relu', input_dim = 11))
classifier.add(Dense(units =6, kernel_initializer = 'uniform' , activation = 'relu', input_dim =11 ))

# Adding the second hidden layer
#classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
#classifier.add(Dense(Output_dim = 6 , kernel_initializer = 'uniform' , activation = 'relu'))
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

# Part 3 - Making the predictions and evaluating the model

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print (cm)

In [4]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
?OneHotEncoder


In [4]:
?LabelEncoder

In [12]:
from sklearn.compose import ColumnTransformer
?ColumnTransformer

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv("E:\\2021\\notes 2021\\CSA501 Deep learning_amity\\data set\\Churn_Modelling.csv")
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
print (X)
#print (y)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


In [11]:
?iloc


Object `iloc` not found.


In [13]:
from sklearn.preprocessing import StandardScaler
?StandardScaler


In [15]:
import keras
from keras.models import Sequential
from keras.layers import Dense
?Sequential

In [17]:
?Dense

In [19]:
?classfier.add()

Object `classfier.add()` not found.


ModuleNotFoundError: No module named 'testCases_v4a'

In [3]:
from sklearn.metrics import confusion_matrix
?confusion_matrix

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv("E:\\2021\\notes 2021\\CSA501 Deep learning_amity\\data set\\Churn_Modelling.csv")
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
print (X)
print (y)


[1 0 1 ... 1 1 0]


In [11]:
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Geography",OneHotEncoder(),[1])], remainder= 'passthrough')
ct
print(ct)
X = ct.fit_transform(X)
labelencoder_X2 = LabelEncoder()
X[:, 4] = labelencoder_X2.fit_transform(X[:, 4])
X = X[: , 1:]
print (X)

ColumnTransformer(remainder='passthrough',
                  transformers=[('Geography', OneHotEncoder(), [1])])
[[0.0 0.0 619 ... 1 1 101348.88]
 [1.0 0.0 608 ... 0 1 112542.58]
 [0.0 0.0 502 ... 1 0 113931.57]
 ...
 [0.0 0.0 709 ... 0 1 42085.58]
 [0.0 1.0 772 ... 1 0 92888.52]
 [0.0 0.0 792 ... 1 0 38190.78]]


In [12]:
#LabelEncoder
#OneHotEncoder
#ColumnTransformer

In [13]:
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
ct = ColumnTransformer(
     [("norm1", Normalizer(norm='l1'), [0, 1]),
      ("norm2", Normalizer(norm='l1'), slice(2, 4))])
X = np.array([[0., 1., 2., 2.],
              [1., 1., 0., 1.]])
 # Normalizer scales each row of X to unit norm. A separate scaling
 # is applied for the two first and two last elements of each
 # row independently.
ct.fit_transform(X)

array([[0. , 1. , 0.5, 0.5],
       [0.5, 0.5, 0. , 1. ]])