## Data Cleaning and Preprocessing:

- Missing Values: Handles missing values by imputing mean for numeric columns and mode for categorical columns.
- Categorical Encoding: Encodes Sex and CustomerStatus into binary values.
- Normalization: Normalizes numerical features using StandardScaler.
- Feature Transformation: Creates a TotalInvestments feature by summing investment-related columns.
- Feature Engineering:

- Interaction Features: Creates a new feature by interacting Age with AverageMonthlySpending.
- Polynomial Features: Generates polynomial features for AccountType.
- Feature Binning: Bins AverageMonthlySpending into categories (Low, Medium, High).
- Dimensionality Reduction:
    PCA: Applies PCA to reduce dimensions to 2 components and visualizes.
t-SNE: Applies t-SNE for visualization of high-dimensional data in 2D.
Visualization and Saving Results:

Generates scatter plots for PCA and t-SNE results.
Saves the cleaned dataset, PCA results, and t-SNE results to CSV files for submission.

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, PolynomialFeatures
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
# Assuming 'bank_data.csv' is the provided dataset
df = pd.read_csv('bank_data.csv')

# Task 1: Data Cleaning and Preprocessing

# 1. Handle Missing Values
df.fillna(df.mean(), inplace=True)  # Numeric columns
df.fillna(df.mode().iloc[0], inplace=True)  # Categorical columns

# 2. Convert Categorical Variables
df['Sex'] = LabelEncoder().fit_transform(df['Sex'])  # Male=1, Female=0
df['CustomerStatus'] = LabelEncoder().fit_transform(df['CustomerStatus'])  # Active=0, Inactive=1

# 3. Normalize Numerical Features
scaler = StandardScaler()
df[['Age', 'AccountDuration', 'AverageMonthlySpending']] = scaler.fit_transform(
    df[['Age', 'AccountDuration', 'AverageMonthlySpending']])

# 4. Feature Transformation
df['TotalInvestments'] = df[['InvestmentAccounts', 'FixedDeposits', 'MutualFunds', 'StockInvestments',
                             'BondInvestments', 'BalancedFunds', 'TaxSavings', 'ManagedAccounts',
                             'TradingAccounts', 'SpecialtyFunds', 'PreciousMetals']].sum(axis=1)

# Task 2: Feature Engineering

# 1. Create Interaction Features
df['Age_AvgMonthlySpending'] = df['Age'] * df['AverageMonthlySpending']

# 2. Generate Polynomial Features
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_features = poly.fit_transform(df[['AccountType']])
poly_df = pd.DataFrame(poly_features, columns=poly.get_feature_names_out(['AccountType']))
df = pd.concat([df, poly_df], axis=1)

# 3. Feature Binning
df['SpendingCategory'] = pd.cut(df['AverageMonthlySpending'], bins=[-np.inf, -0.5, 0.5, np.inf],
                                labels=['Low', 'Medium', 'High'])

# Task 3: Dimensionality Reduction: Number of components will depend on result. This is just an example

# Apply PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(df[['Age', 'AccountDuration', 'AverageMonthlySpending', 'InvestmentAccounts',
                                   'FixedDeposits', 'MutualFunds', 'StockInvestments', 'BondInvestments',
                                   'BalancedFunds', 'TaxSavings', 'ManagedAccounts', 'TradingAccounts',
                                   'SpecialtyFunds', 'PreciousMetals']])
df_pca = pd.DataFrame(pca_result, columns=['PC1', 'PC2'])



# Save the cleaned dataset and reports
df.to_csv('cleaned_bank_data.csv', index=False)
df_pca.to_csv('pca_results.csv', index=False)

print("Data cleaning, feature engineering, and dimensionality reduction completed. Results saved.")
