Skip to content

harshleen001/sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

sampling

Sampling Techniques for Imbalanced Credit Card Dataset

Objective

To study the importance of sampling techniques in handling imbalanced datasets and analyze how different sampling strategies affect the performance of machine learning models.

Dataset

The credit card dataset was obtained from a public GitHub repository and contains highly imbalanced class distribution.

Methodology

  • Dataset imbalance was analyzed
  • Minority class was oversampled to balance the dataset
  • Five different samples were created
  • Five sampling techniques were applied
  • Five machine learning models were trained and evaluated

Sampling Techniques

  • Simple Random Sampling
  • Systematic Sampling
  • Stratified Sampling
  • Bootstrap Sampling
  • Cross-Validation Sampling

Machine Learning Models

  • Logistic Regression
  • K-Nearest Neighbors (KNN)
  • Decision Tree
  • Random Forest
  • Support Vector Machine (SVM)

Results

The performance of each model was evaluated using accuracy. The results show that sampling techniques significantly influence model performance, with tree-based models achieving higher accuracy after balancing.

Conclusion

Sampling plays a crucial role in improving the performance of machine learning models on imbalanced datasets. Proper selection of sampling techniques leads to better and more reliable predictions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors