A machine learning model designed to predict whether a loan application will be approved or rejected based on the applicant's personal information and credit history. The model utilizes a dataset containing historical loan application data to train and validate its predictions.
The dataset used for training and testing the model consists of various features that include:
- Personal information such as gender, marital status, dependents, education, self-employed, and property area.
- Financial information such as applicant income, co-applicant income, and credit history.
- Loan-specific details such as loan amount, and loan term.
- Target variable indicating whether the loan was approved or rejected.
Libraries used: Numpy, Pandas, Matplotlib, missingno, seaborn, scipy, sklearn.
To gain insights into the dataset before building the model, I tried to visualize the model using some plots like:
BarPlot(): I have plotted the crosstab for each class with the loan prediction class(output).
Histplot(): Showing the distribution of the dataset.
Heatmap(): Checked if our attributes are codependent on one another. If were so, we could have simply used any one of them but all our attributes are independent.
Crosstab(): We observed that loan status is not dependent on gender since we have an almost equal ratio for male and female.
Boxplot(): We observed that we have outliers that need to be taken care of.
Treating NaN values: For the categorical attributes I used 'mode' to fill in the empty cells and for the numerical values, 'mean' was used.
Treating Outliers: We simply dropped the values that were not in the range.
SMOTE(): Increasing the data for the minority class.
MinMaxScaler: Linearly scales them down into a fixed range.
- Linear Regression: Accuracy- 92.59%
- KNN: Best Accuracy- 90.74%
- SVC: Accuracy- 92.59%
- Decision Tree: Best Accuracy- 92.59%
- Random Forest: Best accuracy- 92.59%
- Gradient Boosting: Best Accuracy- 85.19%