To study the importance of sampling techniques in handling imbalanced datasets and analyze how different sampling strategies affect the performance of machine learning models.
The credit card dataset was obtained from a public GitHub repository and contains highly imbalanced class distribution.
- Dataset imbalance was analyzed
- Minority class was oversampled to balance the dataset
- Five different samples were created
- Five sampling techniques were applied
- Five machine learning models were trained and evaluated
- Simple Random Sampling
- Systematic Sampling
- Stratified Sampling
- Bootstrap Sampling
- Cross-Validation Sampling
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
The performance of each model was evaluated using accuracy. The results show that sampling techniques significantly influence model performance, with tree-based models achieving higher accuracy after balancing.
Sampling plays a crucial role in improving the performance of machine learning models on imbalanced datasets. Proper selection of sampling techniques leads to better and more reliable predictions.