## Data Loading
In this step, I load the loan dataset and inspect its structure to understand the features and target variable.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("loan_data.csv")

# EDA Process

In [None]:
df.shape

In [None]:
df.info()

## Data Preprocessing
In this step, I clean the dataset by handling missing values and encoding categorical variables so that the data can be used by the machine learning log_reg.

In [None]:
df.isnull().sum()

In [None]:
df.describe()

In [None]:
df.columns

# Data Cleaning

In [None]:
df= df.drop('CustomerID', axis =1)

In [None]:
df.head()

# Encode Categorical Columns Into Numbers

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
le = LabelEncoder()

cat_cols = ["Gender", "Married", "Dependents", "Education",
            "Self_Employed", "Property_Area"]

for col in cat_cols:
    df[col] = le.fit_transform(df[col])


In [None]:
df.head()

# Define Features (features) and Target (y)

In [None]:
X = df.drop("Loan_Default", axis=1)
y = df["Loan_Default"]

In [None]:
from sklearn.model_selection import train_test_split


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Logistic Regresion log_reg

## log_reg Training
In this step, I train a Logistic Regression log_reg to predict whether a loan applicant is likely to default. Logistic Regression is chosen because it is simple, interpretable, and well-suited for binary classification problems.


In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)


In [None]:
y_pred = log_reg.predict(X_test)

# Evaluate the log_reg

## log_reg Evaluation
In this step, I evaluate the trained log_reg using standard classification metrics such as accuracy and confusion matrix to understand how well the log_reg performs on unseen data.

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.show()


In [None]:
print(classification_report(y_test, y_pred))

# Features Importance

In [None]:
importance = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": log_reg.coef_[0]
}).sort_values(by="Coefficient", ascending=False)

importance

#higher coefficient → increases probability of default

#negative coefficient → decreases risk