This project focuses on building an internal credit scoring system based on historical banking data. The primary objective is to predict whether a company will be able to repay a loan, aiding in risk assessment and decision-making.
Data Preprocessing:
The dataset, comprising historical banking information, is processed and cleaned. Exploratory data analysis (EDA) is conducted to understand the dataset.
Feature Engineering:
Key features are identified, and new features are created to enhance model performance. Date-time features are analyzed, cleaned, and transformed as needed.
Modeling:
Selected models for credit default prediction include logistic regression, decision tree, and linear discriminant analysis. Feature selection techniques such as Stepwise and Sequential Feature Selector are employed.
Handling Imbalanced Data:
Techniques for addressing imbalanced classes are applied to improve model robustness.
Model Evaluation:
Model performance is assessed using metrics such as accuracy, F1-score, confusion matrix, and AUC-ROC curve.
Results and Insights:
Detailed analysis of model performances for each chosen algorithm. Interpretations of the results and key takeaways.
Future Perspectives:
Recommendations for further improvements, including in-depth variable exploration and data enrichment. Suggestions for advanced modeling techniques, such as neural networks.
Conclusion:
Summary of project findings, showcasing the model's ability to discriminate between loan repayments and defaults. Establishing a foundation for decision-making in the financial domain.
- Clone the repository to your local machine.
- Open the Jupyter Notebook titled "SNI.ipynb" in a compatible environment.
- Execute the cells sequentially to reproduce the analysis and model evaluations.
Ensure you have the following Python libraries installed:
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn