# **A3: Credit Score Prediction using Machine Learning Models**
This notebook guides you through the process of building, training, and evaluating machine learning models to predict customer credit scores.

## 1. Import Necessary Libraries

In [None]:
# Install necessary libraries (if not already installed)
!pip install pandas numpy scikit-learn xgboost tensorflow matplotlib seaborn

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, confusion_matrix, roc_auc_score, classification_report)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

### 1.1 Interpretation

## 2. Load and Explore Datasets

In [None]:
# Load train and validation datasets
train = pd.read_csv('train.csv')
vald = pd.read_csv('vald.csv')

# Display the first few rows
print(train.head())
print(vald.head())

# Check for missing values
print(train.isnull().sum())
print(train.info())

### 2.1 Interpretation

## 3. EDA (Exploratory Data Analysis)

In [None]:
# Heatmap to show correlations
sns.heatmap(train.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

# Scatter plot example: Income vs Outstanding Debt
plt.scatter(train['A_Annual_Income'], train['O_Outstanding_Debt'])
plt.xlabel("Annual Income")
plt.ylabel("Outstanding Debt")
plt.title("Income vs Debt")
plt.show()

### 3.1 Interpretation

## 4. Data Cleaning and Feature Engineering

In [None]:
# Handle missing values (Example: Impute with mean)
train.fillna(train.mean(), inplace=True)

# Feature Engineering: Create Debt-to-Income Ratio
train['Debt_Income_Ratio'] = train['O_Outstanding_Debt'] / train['A_Annual_Income']

# One-Hot Encoding of categorical variables (e.g., Occupation)
train = pd.get_dummies(train, columns=['O_Occupation'], drop_first=True)

### 4.1 Interpretation

## 5. Train-Test Split

In [None]:
# Split train.csv into training and testing sets (80/20 split)
X = train.drop(['C_Credit_Score'], axis=1)
y = train['C_Credit_Score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 5.1 Interpretation

## 6. Model Training and Testing

In [None]:
# Example: Logistic Regression
lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train, y_train)

# Predict on test data
y_pred = lr_model.predict(X_test)

# Evaluate the model
print(classification_report(y_test, y_pred))
print("AUC-ROC:", roc_auc_score(y_test, lr_model.predict_proba(X_test), multi_class='ovr'))

### 6.1 Interpretation

## 7. Validation on vald.csv

In [None]:
# Use the trained model to generate predictions on vald.csv (without labels)
vald_predictions = lr_model.predict(vald)
print("Predictions on vald.csv:", vald_predictions[:10])

### 7.1 Interpretation

## 8. Model Comparison Table

In [None]:
# Example of a model comparison table
results = pd.DataFrame({
    'Model': ['Logistic Regression'],
    'Accuracy': [accuracy_score(y_test, y_pred)],
    'Precision': [precision_score(y_test, y_pred, average='weighted')],
    'Recall': [recall_score(y_test, y_pred, average='weighted')],
    'F1-Score': [f1_score(y_test, y_pred, average='weighted')]
})
print(results)


### 8.1 Interpretation

## 9. Conclusion and Recommendations
This section should provide a comprehensive summary of the project’s overall results. Detailed interpretations of the model performances should be included, highlighting which model performed the best and explaining the reasons based on key metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Important insights from the feature analysis, such as which variables had the strongest influence on credit score predictions, should also be discussed.

Recommendations should focus on ways to enhance model performance, such as through additional hyperparameter tuning, feature engineering, or testing other algorithms. Suggestions on how customers can improve their credit scores, based on the findings (e.g., lowering debt or avoiding delayed payments), should also be included. Consider proposing potential future improvements, like exploring new features or integrating more data, to increase the models' accuracy and robustness.

Ensure that the section ties together the key outcomes of the project and reflects on the lessons learned throughout the process.
