**Step-by-Step Process**

**1. Load and Explore the Dataset**
* Importing necessary libraries.
* Reading the dataset using Pandas.
* Displaying basic information about the dataset.

**2. Exploratory Data Analysis (EDA)**
* Displaying value counts for each categorical variable.
* Understanding the distribution of data in different columns.

**3. Data Preprocessing**
* Converting categorical variables into numerical format.
* Handling missing values in the dataset.

**4. Data Splitting for Model Training**
* Separating features (X) and target variable (y).
* Splitting the dataset into training and testing sets.

**5. Feature Scaling**
* Standardizing the feature values using Standard Scaler.

**6. Model Training and Evaluation - Random Forest Classifier**
* Initializing and training the Random Forest Classifier.
* Generating classification report on the training set.
* Analyzing precision, recall, and F1-score for each class.

**Conclusion**

Random Forest Classifier exhibited outstanding accuracy of **100%**

> **1. Load and Explore the Dataset**

In [1]:
#Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

In [2]:
#Load dataset
df = pd.read_csv("loan-train.csv")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


> **2. Exploratory Data Analysis (EDA)**

In [4]:
#value counts for each categorical variable
for i in df:
  print(i)
  print(df[i].value_counts())
  print()

Loan_ID
LP001002    1
LP002328    1
LP002305    1
LP002308    1
LP002314    1
           ..
LP001692    1
LP001693    1
LP001698    1
LP001699    1
LP002990    1
Name: Loan_ID, Length: 614, dtype: int64

Gender
Male      489
Female    112
Name: Gender, dtype: int64

Married
Yes    398
No     213
Name: Married, dtype: int64

Dependents
0     345
1     102
2     101
3+     51
Name: Dependents, dtype: int64

Education
Graduate        480
Not Graduate    134
Name: Education, dtype: int64

Self_Employed
No     500
Yes     82
Name: Self_Employed, dtype: int64

ApplicantIncome
2500    9
4583    6
6000    6
2600    6
3333    5
       ..
3244    1
4408    1
3917    1
3992    1
7583    1
Name: ApplicantIncome, Length: 505, dtype: int64

CoapplicantIncome
0.0       273
2500.0      5
2083.0      5
1666.0      5
2250.0      3
         ... 
2791.0      1
1010.0      1
1695.0      1
2598.0      1
240.0       1
Name: CoapplicantIncome, Length: 287, dtype: int64

LoanAmount
120.0    20
110.0    17
100.

> **3. Data Preprocessing**

In [5]:
#Converting categorical variables into numerical format
df['Gender'] = df['Gender'].map({"Male" : 1, "Female" : 0})
df['Married'] = df['Married'].map({"Yes" : 1, "No" : 0})
df['Education'] = df['Education'].map({"Graduate" : 1, "Not Graduate" : 0})
df['Dependents'].replace('3+', 3, inplace=True)
df['Self_Employed'] = df['Self_Employed'].map({"Yes" : 1, "No" : 0})
df['Property_Area'] = df['Property_Area'].map({"Semiurban" : 1, "Urban" : 2, "Rural" : 3})
df['Loan_Status'] = df['Loan_Status'].map({"Y" : 1, "N" : 0})

In [6]:
df

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,1.0,0.0,0,1,0.0,5849,0.0,,360.0,1.0,2,1
1,LP001003,1.0,1.0,1,1,0.0,4583,1508.0,128.0,360.0,1.0,3,0
2,LP001005,1.0,1.0,0,1,1.0,3000,0.0,66.0,360.0,1.0,2,1
3,LP001006,1.0,1.0,0,0,0.0,2583,2358.0,120.0,360.0,1.0,2,1
4,LP001008,1.0,0.0,0,1,0.0,6000,0.0,141.0,360.0,1.0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,LP002978,0.0,0.0,0,1,0.0,2900,0.0,71.0,360.0,1.0,3,1
610,LP002979,1.0,1.0,3,1,0.0,4106,0.0,40.0,180.0,1.0,3,1
611,LP002983,1.0,1.0,1,1,0.0,8072,240.0,253.0,360.0,1.0,2,1
612,LP002984,1.0,1.0,2,1,0.0,7583,0.0,187.0,360.0,1.0,2,1


In [7]:
#Handling missing values in the dataset
df.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

In [8]:
rev_null = ['Gender','Married','Dependents','Self_Employed','Credit_History','LoanAmount','Loan_Amount_Term']
df[rev_null] = df[rev_null].replace({np.nan:df['Gender'].mode(),
                                     np.nan:df['Married'].mode(),
                                     np.nan:df['Dependents'].mode(),
                                     np.nan:df['Self_Employed'].mode(),
                                     np.nan:df['Credit_History'].mode(),
                                     np.nan:df['LoanAmount'].mean(),
                                     np.nan:df['Loan_Amount_Term'].mean(),})

In [9]:
df[rev_null]

Unnamed: 0,Gender,Married,Dependents,Self_Employed,Credit_History,LoanAmount,Loan_Amount_Term
0,1.0,0.0,0,0.0,1.0,342.0,360.0
1,1.0,1.0,1,0.0,1.0,128.0,360.0
2,1.0,1.0,0,1.0,1.0,66.0,360.0
3,1.0,1.0,0,0.0,1.0,120.0,360.0
4,1.0,0.0,0,0.0,1.0,141.0,360.0
...,...,...,...,...,...,...,...
609,0.0,0.0,0,0.0,1.0,71.0,360.0
610,1.0,1.0,3,0.0,1.0,40.0,180.0
611,1.0,1.0,1,0.0,1.0,253.0,360.0
612,1.0,1.0,2,0.0,1.0,187.0,360.0


In [10]:
df.isnull().sum()

Loan_ID              0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

> **4. Data Splitting for Model Training**

In [11]:
#Separating features (X) and target variable (y)
X = df.drop(columns=['Loan_ID','Loan_Status']).values
y = df['Loan_Status'].values

In [12]:
#Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)

In [13]:
X_train.shape

(491, 11)

In [14]:
X_test.shape

(123, 11)

In [15]:
y_train.shape

(491,)

In [16]:
y_test.shape

(123,)

> **5. Feature Scaling**

In [17]:
#Standardizing the feature values using Standard Scaler
sta = StandardScaler()
X_train = sta.fit_transform(X_train)
X_test = sta.transform(X_test)

> **6. Model Training and Evaluation - Random Forest Classifier**

In [18]:
# Initializing and training the Random Forest Classifier
rfc = RandomForestClassifier(criterion='entropy',random_state= 42)
rfc.fit(X_train, y_train)

In [19]:
#Analyzing precision, recall, and F1-score for each class
y_pred_train = rfc.predict(X_train)
print(classification_report(y_train, y_pred_train))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       157
           1       1.00      1.00      1.00       334

    accuracy                           1.00       491
   macro avg       1.00      1.00      1.00       491
weighted avg       1.00      1.00      1.00       491

