8. Given a bank customer, build a neural network-based classifier that can
determine whether they will leave or not in the next 6 months. Dataset
Description: The case study is from an open-source dataset from Kaggle. The
dataset contains 10,000 sample points with 14 distinct features such as
CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc.
Link to the Kaggle project: https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling
Perform following steps: 1. Read the dataset. 2. Distinguish the feature and
target set and divide the data set into training and test sets. 3. Normalize the
train and test data. 4. Initialize and build the model. Identify the points of
improvement and implement the same. 5. Print the accuracy score and
confusion matrix (5 points).
&
Write a program to implement Huffman Encoding using a greedy strategy.

In [2]:
#1 
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

In [3]:
data = pd.read_csv('/Users/lokeshkhabiya/FourthYearStuff/practicals/lp-3/ML/Dataset/Churn_Modelling.csv')
data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [4]:
X = data.drop(columns=['Exited', 'CustomerId', 'Surname', 'RowNumber'])  # Exclude columns
y = data['Exited']  # Target

In [5]:
# Removing rows with missing values:
data = data.drop(['CustomerId', 'Surname', 'RowNumber'], axis = 1)
print(data.columns)

Index(['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Exited'],
      dtype='object')


In [6]:
# columns 'Geography' and 'Gender' are present in the DataFrame X
# Add additional error handling to verify the column names
columns_to_encode = ['Geography', 'Gender']
for column in columns_to_encode:
    if column not in X.columns:
        raise ValueError(f"Column '{column}' not found in the DataFrame X.")

# encode categorical variables like "Geography" and "Gender" into numerical format using one-hot encoding.
X = pd.get_dummies(X, columns=['Geography', 'Gender'], drop_first=True)

In [7]:
scaler = MinMaxScaler()
X = scaler.fit_transform(X)

In [8]:
# Step 5: Initialize and Build the Model
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(X.shape[1],)),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

In [9]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [10]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model.fit(X_train, y_train, epochs=20, batch_size=32, verbose=1)

Epoch 1/20


2024-11-07 21:18:56.198999: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x15ac8e7d0>

In [11]:
# Step 6: Evaluate the Model
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5).astype(int)



In [12]:
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(confusion)

Accuracy: 0.861
Confusion Matrix:
[[1548   59]
 [ 219  174]]


In [13]:
#2 
import heapq 

In [14]:
class HuffmanNode:
    def __init__(self, char, freq):
        self.char = char
        self.freq = freq
        self.left = None
        self.right = None
    
    def __lt__(self, other):
        return self.freq < other.freq

In [15]:
def generate_codes(root, current_code, codes):
    if root is None:
        return
    if root.char is not None:
        codes[root.char] = current_code
    generate_codes(root.left, current_code + "0", codes)
    generate_codes(root.right, current_code + "1", codes)

In [16]:
def build_huffman_tree(frequency):
    heap = []
    for char, freq in frequency.items():
        heapq.heappush(heap, HuffmanNode(char, freq))
    while len(heap) > 1:
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        merged = HuffmanNode(None, node1.freq + node2.freq)
        merged.left = node1
        merged.right = node2
        heapq.heappush(heap, merged)
    return heapq.heappop(heap)

In [17]:
def calculate_frequency(data):
    frequency = {}

    for char in data:
        if char not in frequency:
            frequency[char] = 0
        frequency[char] += 1
    return frequency

In [18]:
def huffman_encoding(data):
    frequency = calculate_frequency(data)
    huffman_tree_root = build_huffman_tree(frequency)
    codes = {}
    generate_codes(huffman_tree_root, "", codes)
    # Encode the input data
    encoded_data = "".join([codes[char] for char in data])
    return encoded_data, huffman_tree_root

def huffman_decoding(encoded_data, huffman_tree_root):
    decoded_data = ""
    current_node = huffman_tree_root
    for bit in encoded_data:
        if bit == '0':
            current_node = current_node.left
        else:
            current_node = current_node.right
        if current_node.left is None and current_node.right is None:
            decoded_data += current_node.char
            current_node = huffman_tree_root
    return decoded_data

In [19]:
if __name__ == "__main__":
    data = input("Enter the text for Huffman Encoding: ")
    encoded_data, huffman_tree_root = huffman_encoding(data)
    print(f"Encoded Data: {encoded_data}")
    decoded_data = huffman_decoding(encoded_data, huffman_tree_root)
    print(f"Decoded Data: {decoded_data}")

Enter the text for Huffman Encoding:  My Name is Lokesh


Encoded Data: 0110110011110101101011101011110111001110010001100000101000001
Decoded Data: My Name is Lokesh
