Each project is organized in its own folder with:
README.md– Project overview & key insightsJupyter Notebook / Python Scripts– Code with step-by-step analysisData– Link to or sample datasets usedVisuals / Output– Plots, model evaluations, dashboards (if any)
| Project | Machine Learning Technique Used | Description |
|---|---|---|
| Breast Cancer SVM Classification | Support Vector Machine (SVM) | Predict breast cancer diagnosis using SVM. |
| COVID-19 Dashboard | Data Visualization, API Integration | Interactive global tracker for COVID-19 trends using Plotly Dash and real-time data. |
| Credit Card Fraud Detection | Random Forest, XGBoost | Detect fraudulent transactions using Random Forest and XGBoost with an emphasis on class imbalance. |
| Customer Segmentation | K-Means Clustering | Segment customers based on purchasing behavior using unsupervised learning (K-Means). |
| Customer Churn Prediction | Logistic Regression, Random Forest | Predict customer churn using classification algorithms like Random Forest and Logistic Regression. |
| Diabetes Prediction with Naive Bayes | Naive Bayes Classifier | Predict the likelihood of diabetes using Naive Bayes with health metrics. |
| Diabetes Prediction with Neural Networks | Neural Networks (Deep Learning) | Predict diabetes using deep learning models and compare with traditional methods. |
| Gene Expression Analysis with PCA, t-SNE, UMAP | PCA, t-SNE, UMAP | Visualize gene expression data using dimensionality reduction techniques (PCA, t-SNE, UMAP). |
| House Price Prediction | Linear Regression, XGBoost | Predict house prices using regression models and advanced techniques like XGBoost. |
| Hugging Face Bioinformatics | NLP, Transformers (Hugging Face) | Apply NLP models (Hugging Face transformers) for bioinformatics tasks like protein sequence analysis. |
| Machine Learning Model Evaluation | Model Evaluation Metrics (e.g., ROC-AUC, Confusion Matrix) | Visualize and compare model evaluation metrics for classification models. |
| Metagenomics Taxonomy Classification | Multi-Class Classification, Random Forest | Classify taxonomic groups from metagenomic sequence data using Random Forest. |
| ML Model from Scratch | Linear Regression, Logistic Regression | Build machine learning algorithms like linear and logistic regression from scratch using Python. |
| ML Project Template | Project Organization | Reusable template for organizing machine learning projects in a structured and reproducible way. |
| NLP Spam Classifier | Naive Bayes, TF-IDF | Classify spam emails using Naive Bayes and TF-IDF for text classification. |
| Parkinson's Disease Prediction | Binary Classification, Random Forest | Predict Parkinson’s disease using voice data analysis and classification algorithms like Random Forest. |
| Resume Screening using GPT & LangChain | NLP, GPT, LangChain | Automate resume screening using GPT and LangChain integration for information extraction. |
| Sepsis Detection from ICU Data | Time-Series Analysis, XGBoost | Detect early signs of sepsis using ICU patient time-series data with XGBoost. |
| SQL-based ML Feature Engineering | SQL, Feature Engineering | Generate ML features directly from SQL queries for structured databases. |
| Statistical Analysis Notebook | Statistical Inference, Hypothesis Testing | Explore core statistical concepts like hypothesis testing and distributions using Python libraries. |
| Survival Analysis | Kaplan-Meier, Cox Regression | Analyze time-to-event data using survival analysis techniques like Kaplan-Meier and Cox regression. |
| Transfer Learning with ResNet | Transfer Learning, CNNs | Use ResNet for transfer learning on image classification tasks with pre-trained models. |
| XGBoost Binary Classification | XGBoost, Binary Classification | Perform binary classification using XGBoost with a focus on hyperparameter tuning. |
| XGBoost Regression | XGBoost Regression | Predict continuous values using XGBoost regression models with feature selection and regularization. |
| MXNet CNN Image Classification | Convolutional Neural Networks (CNNs), MXNet | Use MXNet to build CNN-based models for image classification tasks. |
| Real-Time Image Classification CNN | Convolutional Neural Networks (CNNs) | Build a real-time image classification system using CNNs for visual recognition tasks. |
| COVID-19 Bayesian Forecasting | Bayesian Statistics, Time-Series Forecasting | Use Bayesian techniques for forecasting COVID-19 trends and uncertainties. |
| Gene Expression Forecasting ARIMA SARIMA | ARIMA, SARIMA | Forecast gene expression levels using time-series models like ARIMA and SARIMA. |
| Protein Secondary Structure Prediction (Transformer) | Transformer Networks | Predict protein secondary structures using Transformer-based deep learning models. |
| Titanic Decision Tree | Decision Tree | Predict passenger survival on the Titanic using a decision tree classifier. |
| Titanic Fare Prediction | Linear Regression, Random Forest | Predict the fare price of Titanic passengers based on various features using regression models. |
| Titanic Survival Prediction Logistic Regression | Logistic Regression | Predict Titanic passenger survival using logistic regression based on personal and travel information. |
| Customer Segmentation K-Means | K-Means Clustering | Segment customers using unsupervised learning with K-Means clustering based on purchasing behaviors. |
| LightGBM Text Classification | LightGBM, Text Classification | Build a text classification model using LightGBM for fast and efficient training on large datasets. |
| Random Forest ML Classification | Random Forest, Classification | Classify data into categories using Random Forest with various feature engineering techniques. |
| Random Forest Regression | Random Forest, Regression | Predict continuous outputs using Random Forest regression models. |
| Deep Learning Protein Structure Prediction | Deep Learning, CNN, LSTM | Use deep learning models like CNN and LSTM for predicting protein structures from sequences. |
| MLflow XGBoost | XGBoost, MLflow | Build and track XGBoost models with MLflow for scalable machine learning experiments. |
Supervised Learning
- Logistic Regression, Random Forest, Decision Tree, XGBoost, Ridge, Lasso
Unsupervised Learning
- K-Means, DBSCAN, PCA
Natural Language Processing (NLP)
- Tokenization, TF-IDF, Naive Bayes
Model Evaluation
- Confusion Matrix, ROC-AUC, F1-Score, Cross-validation
Hyperparameter Tuning
- GridSearchCV, RandomizedSearchCV
Languages
- Python, SQL
Libraries
pandas,numpy,scikit-learn,matplotlib,seaborn,XGBoost,NLTK
Visualization
- Plotly, Dash, Tableau
Pipelines
- Jupyter, Airflow, WDL, Azure Data Factory
Others
- Git, VSCode, GitHub Actions
I’m Manish Kumar, a data professional with 16+ years of experience in bioinformatics software development and data engineering. I'm now applying that experience to broader data science and machine learning domains, with a passion for solving real-world problems using data-driven solutions.
- Email: mandecent.gupta@gmail.com
- LinkedIn: linkedin.com/in/manish-kumar-0160837
If you like this repo, please consider ⭐ starring it and connecting with me on LinkedIn!