In many real-world regression problems, target distributions are highly skewed, with extreme values occurring rarely but carrying significant importance. Traditional regression models often perform poorly in these regions due to global optimization objectives.
This repository presents an analytical study of imbalanced regression problems through exploratory data analysis and evaluation perspectives.
The core modeling methodology developed as part of my Honours Research Project is intentionally not included for intellectual property reasons.
This project was conducted as part of an Honours Research Project in Computer Science and Mathematics at the University of Ottawa.
The objective of the research was to study the limitations of standard regression models under target imbalance and to design methods improving predictive reliability on rare and extreme cases.
Experiments are based on datasets from the Imbalanced Regression Benchmark (Branco et al., 2019).
Example dataset:
- Bank8FM
- 4,198 observations
- 8 continuous predictors
- Strongly right-skewed target distribution
The analysis focuses on:
- Target distribution characterization
- Identification of heavy tails and skewness
- Q–Q plot diagnostics
- Visualization of rare-event regions
- Residual behavior under global regression models
These analyses highlight why standard regression approaches fail to generalize well to extreme target values.
Rather than evaluating models solely using global metrics, this project emphasizes:
- Region-specific performance analysis
- Comparison between central and tail target regions
- Limitations of aggregate error metrics
- Importance of interpretability in rare-event prediction
- Target imbalance leads to systematic underestimation of extreme values
- Global models bias predictions toward dense regions
- Aggregate metrics hide poor tail-region behavior
- Evaluation must consider distributional structure
The segmentation-based regression framework developed during this research is not included in this repository.
High-level methodological discussions are presented without implementation details.
- Python
- pandas
- NumPy
- matplotlib
- seaborn
- Jupyter Notebook
Branco, P., Torgo, L., & Ribeiro, R. (2019).
Pre-processing approaches for imbalanced regression.