A comprehensive collection of machine learning classification projects — from medical diagnosis (Parkinson's disease) to waveform classification, heart failure prediction, and gene mutation analysis.
This repository demonstrates fundamental and advanced machine learning classification techniques across four practical domains. Each project showcases different aspects of the ML workflow: data exploration, feature engineering, hyperparameter tuning, model comparison, and ensemble methods with real-world medical and biological datasets.
Medical diagnosis classification using decision trees and random forests.
- Dataset: Parkinson's disease patient measurements
- Task: Binary classification (Parkinson's vs healthy)
- Methods: Decision trees, Random Forest, hyperparameter optimization
- Tools: Cross-validation, learning curves, tree complexity analysis
Multi-class waveform pattern classification with discriminant analysis and support vector machines.
- Dataset: Synthetic waveform patterns with noise
- Task: Multi-class classification (3 waveform types)
- Methods: Linear Discriminant Analysis (LDA), SVM (linear, polynomial, RBF, sigmoid kernels)
- Tools: Kernel comparison, feature importance, model performance evaluation
Cardiovascular risk prediction using various classification algorithms.
- Dataset: Heart failure clinical records
- Task: Binary classification (survival prediction)
- Methods: Decision trees, statistical analysis, model comparison
- Tools: Feature analysis, class imbalance handling, performance metrics
Advanced ensemble learning for predicting gene mutation activity status.
- Dataset: Gene expression and mutation data
- Task: Binary classification (active vs inactive mutations)
- Methods: Stacking ensemble, nested cross-validation, advanced feature selection
- Tools: PCA, feature engineering, balanced accuracy optimization
Each project is self-contained with its own notebooks and datasets. Get started quickly:
# Decision Tree - Parkinson's Disease
cd decision-tree-parkinson
jupyter notebook data-exploration.ipynb
# Linear Discriminant & SVM - Waveform Classification
cd linear-discriminant-svm-waveform
jupyter notebook linear-discriminant-analysis.ipynb
# Heart Failure Classification
cd heart-failure-classification
jupyter notebook data-exploration.ipynb
# Gene Mutation Prediction
cd gene-mutation-prediction/model
python complete_pipeline.py| Technology | Purpose |
|---|---|
| Python 3.8+ | Core implementation language |
| Jupyter Notebook | Interactive analysis and visualization |
| Pandas | Data manipulation and analysis |
| NumPy | Numerical computing |
| Matplotlib/Seaborn | Data visualization |
| Scikit-learn | Machine learning algorithms |
| SciPy | Statistical computations |
ML-Classification/
├── decision-tree-parkinson/ # Parkinson's disease classification
│ ├── data-exploration.ipynb
│ ├── hyperparameter-tuning.ipynb
│ ├── random-forest-comparison.ipynb
│ ├── data/
│ │ ├── parkinson_train.csv
│ │ └── parkinson_test.csv
│ └── results/
├── linear-discriminant-svm-waveform/ # Waveform classification
│ ├── linear-discriminant-analysis.ipynb
│ ├── svm-classification.ipynb
│ └── data/
│ ├── waveform_train.csv
│ └── waveform_test.csv
├── heart-failure-classification/ # Cardiovascular risk prediction
│ ├── data-exploration.ipynb
│ ├── classification-models.ipynb
│ └── data/
│ ├── HeartFailure_train.csv
│ └── HeartFailure_test.csv
├── gene-mutation-prediction/ # Gene mutation activity prediction
│ ├── complete_pipeline.py
│ ├── data/
│ └── results/
│ ├── nested_cv_results.csv
│ └── final_test_predictions.csv
└── README.md
Each folder contains:
- Complete Jupyter notebooks with analysis
- Real-world datasets
- Results and visualizations
- Comprehensive model evaluations
This project is developed for academic purposes as part of university coursework.
Built for LINFO2262 - Machine Learning: Classification and Evaluation @ UCLouvain (Universite catholique de Louvain).