Skip to content

A curated collection of applied AI, machine learning, and data science projects — integrating predictive modeling, NLP, bioinformatics, and data engineering workflows.

Notifications You must be signed in to change notification settings

man4ish/applied-ai-data-science-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Data Science & Machine Learning Portfolio

📁 Repository Structure

Each project is organized in its own folder with:

  • README.md – Project overview & key insights
  • Jupyter Notebook / Python Scripts – Code with step-by-step analysis
  • Data – Link to or sample datasets used
  • Visuals / Output – Plots, model evaluations, dashboards (if any)

🌟 Featured Projects

Project Machine Learning Technique Used Description
Breast Cancer SVM Classification Support Vector Machine (SVM) Predict breast cancer diagnosis using SVM.
COVID-19 Dashboard Data Visualization, API Integration Interactive global tracker for COVID-19 trends using Plotly Dash and real-time data.
Credit Card Fraud Detection Random Forest, XGBoost Detect fraudulent transactions using Random Forest and XGBoost with an emphasis on class imbalance.
Customer Segmentation K-Means Clustering Segment customers based on purchasing behavior using unsupervised learning (K-Means).
Customer Churn Prediction Logistic Regression, Random Forest Predict customer churn using classification algorithms like Random Forest and Logistic Regression.
Diabetes Prediction with Naive Bayes Naive Bayes Classifier Predict the likelihood of diabetes using Naive Bayes with health metrics.
Diabetes Prediction with Neural Networks Neural Networks (Deep Learning) Predict diabetes using deep learning models and compare with traditional methods.
Gene Expression Analysis with PCA, t-SNE, UMAP PCA, t-SNE, UMAP Visualize gene expression data using dimensionality reduction techniques (PCA, t-SNE, UMAP).
House Price Prediction Linear Regression, XGBoost Predict house prices using regression models and advanced techniques like XGBoost.
Hugging Face Bioinformatics NLP, Transformers (Hugging Face) Apply NLP models (Hugging Face transformers) for bioinformatics tasks like protein sequence analysis.
Machine Learning Model Evaluation Model Evaluation Metrics (e.g., ROC-AUC, Confusion Matrix) Visualize and compare model evaluation metrics for classification models.
Metagenomics Taxonomy Classification Multi-Class Classification, Random Forest Classify taxonomic groups from metagenomic sequence data using Random Forest.
ML Model from Scratch Linear Regression, Logistic Regression Build machine learning algorithms like linear and logistic regression from scratch using Python.
ML Project Template Project Organization Reusable template for organizing machine learning projects in a structured and reproducible way.
NLP Spam Classifier Naive Bayes, TF-IDF Classify spam emails using Naive Bayes and TF-IDF for text classification.
Parkinson's Disease Prediction Binary Classification, Random Forest Predict Parkinson’s disease using voice data analysis and classification algorithms like Random Forest.
Resume Screening using GPT & LangChain NLP, GPT, LangChain Automate resume screening using GPT and LangChain integration for information extraction.
Sepsis Detection from ICU Data Time-Series Analysis, XGBoost Detect early signs of sepsis using ICU patient time-series data with XGBoost.
SQL-based ML Feature Engineering SQL, Feature Engineering Generate ML features directly from SQL queries for structured databases.
Statistical Analysis Notebook Statistical Inference, Hypothesis Testing Explore core statistical concepts like hypothesis testing and distributions using Python libraries.
Survival Analysis Kaplan-Meier, Cox Regression Analyze time-to-event data using survival analysis techniques like Kaplan-Meier and Cox regression.
Transfer Learning with ResNet Transfer Learning, CNNs Use ResNet for transfer learning on image classification tasks with pre-trained models.
XGBoost Binary Classification XGBoost, Binary Classification Perform binary classification using XGBoost with a focus on hyperparameter tuning.
XGBoost Regression XGBoost Regression Predict continuous values using XGBoost regression models with feature selection and regularization.
MXNet CNN Image Classification Convolutional Neural Networks (CNNs), MXNet Use MXNet to build CNN-based models for image classification tasks.
Real-Time Image Classification CNN Convolutional Neural Networks (CNNs) Build a real-time image classification system using CNNs for visual recognition tasks.
COVID-19 Bayesian Forecasting Bayesian Statistics, Time-Series Forecasting Use Bayesian techniques for forecasting COVID-19 trends and uncertainties.
Gene Expression Forecasting ARIMA SARIMA ARIMA, SARIMA Forecast gene expression levels using time-series models like ARIMA and SARIMA.
Protein Secondary Structure Prediction (Transformer) Transformer Networks Predict protein secondary structures using Transformer-based deep learning models.
Titanic Decision Tree Decision Tree Predict passenger survival on the Titanic using a decision tree classifier.
Titanic Fare Prediction Linear Regression, Random Forest Predict the fare price of Titanic passengers based on various features using regression models.
Titanic Survival Prediction Logistic Regression Logistic Regression Predict Titanic passenger survival using logistic regression based on personal and travel information.
Customer Segmentation K-Means K-Means Clustering Segment customers using unsupervised learning with K-Means clustering based on purchasing behaviors.
LightGBM Text Classification LightGBM, Text Classification Build a text classification model using LightGBM for fast and efficient training on large datasets.
Random Forest ML Classification Random Forest, Classification Classify data into categories using Random Forest with various feature engineering techniques.
Random Forest Regression Random Forest, Regression Predict continuous outputs using Random Forest regression models.
Deep Learning Protein Structure Prediction Deep Learning, CNN, LSTM Use deep learning models like CNN and LSTM for predicting protein structures from sequences.
MLflow XGBoost XGBoost, MLflow Build and track XGBoost models with MLflow for scalable machine learning experiments.

🔍 Machine Learning Techniques

Supervised Learning

  • Logistic Regression, Random Forest, Decision Tree, XGBoost, Ridge, Lasso

Unsupervised Learning

  • K-Means, DBSCAN, PCA

Natural Language Processing (NLP)

  • Tokenization, TF-IDF, Naive Bayes

Model Evaluation

  • Confusion Matrix, ROC-AUC, F1-Score, Cross-validation

Hyperparameter Tuning

  • GridSearchCV, RandomizedSearchCV

🛠️ Tools & Technologies

Languages

  • Python, SQL

Libraries

  • pandas, numpy, scikit-learn, matplotlib, seaborn, XGBoost, NLTK

Visualization

  • Plotly, Dash, Tableau

Pipelines

  • Jupyter, Airflow, WDL, Azure Data Factory

Others

  • Git, VSCode, GitHub Actions

👨‍💻 About Me

I’m Manish Kumar, a data professional with 16+ years of experience in bioinformatics software development and data engineering. I'm now applying that experience to broader data science and machine learning domains, with a passion for solving real-world problems using data-driven solutions.


📬 Contact

If you like this repo, please consider ⭐ starring it and connecting with me on LinkedIn!

About

A curated collection of applied AI, machine learning, and data science projects — integrating predictive modeling, NLP, bioinformatics, and data engineering workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages