Skip to content

srash007/imbalanced-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Imbalanced Regression Analysis

Overview

In many real-world regression problems, target distributions are highly skewed, with extreme values occurring rarely but carrying significant importance. Traditional regression models often perform poorly in these regions due to global optimization objectives.

This repository presents an analytical study of imbalanced regression problems through exploratory data analysis and evaluation perspectives.

The core modeling methodology developed as part of my Honours Research Project is intentionally not included for intellectual property reasons.


Research Context

This project was conducted as part of an Honours Research Project in Computer Science and Mathematics at the University of Ottawa.

The objective of the research was to study the limitations of standard regression models under target imbalance and to design methods improving predictive reliability on rare and extreme cases.


Dataset

Experiments are based on datasets from the Imbalanced Regression Benchmark (Branco et al., 2019).

Example dataset:

  • Bank8FM
  • 4,198 observations
  • 8 continuous predictors
  • Strongly right-skewed target distribution

Exploratory Data Analysis

The analysis focuses on:

  • Target distribution characterization
  • Identification of heavy tails and skewness
  • Q–Q plot diagnostics
  • Visualization of rare-event regions
  • Residual behavior under global regression models

These analyses highlight why standard regression approaches fail to generalize well to extreme target values.


Evaluation Perspective

Rather than evaluating models solely using global metrics, this project emphasizes:

  • Region-specific performance analysis
  • Comparison between central and tail target regions
  • Limitations of aggregate error metrics
  • Importance of interpretability in rare-event prediction

Key Observations

  • Target imbalance leads to systematic underestimation of extreme values
  • Global models bias predictions toward dense regions
  • Aggregate metrics hide poor tail-region behavior
  • Evaluation must consider distributional structure

Notes on Methodology

The segmentation-based regression framework developed during this research is not included in this repository.

High-level methodological discussions are presented without implementation details.


Tools Used

  • Python
  • pandas
  • NumPy
  • matplotlib
  • seaborn
  • Jupyter Notebook

References

Branco, P., Torgo, L., & Ribeiro, R. (2019).
Pre-processing approaches for imbalanced regression.

About

Research-oriented exploratory analysis of imbalanced regression problems, focusing on skewed target distributions, tail behavior, and the limitations of global evaluation metrics in rare and extreme value prediction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors