Data Scientist | AI Researcher | M.S. in Computer Science (Data Science) - University of Cincinnati | LinkedIn
Updated: 11/11/2024
Welcome to my GitHub! I’m Sameer Jadhav, a data scientist with a passion for building scalable, data-driven solutions that make a real impact. Currently, I’m an AI Research Intern at Cincinnati Children’s Health and Medical Center, where I am building cutting-edge AI tools to transform healthcare research and clinical insights.
My journey in data science and AI spans several diverse domains, including healthcare, consumer goods, and insurance. I have extensive experience developing knowledge graphs, building multi-agent systems, engineering deep learning models for image and signal processing, and optimizing predictive algorithms. My technical skills are broad and deep, covering everything from statistical modeling and natural language processing to deploying production-grade applications on cloud platforms.
-
Knowledge Graphs and AI Systems for Healthcare
At Cincinnati Children's, I am developing a knowledge graph-powered, multi-agent Retrieval-Augmented Generation (RAG) system. This system uses GenAI techniques like instruction fine-tuning and vector search, enabling researchers to derive insights faster and more accurately from scientific data. -
Advanced Image Analytics in Consumer Goods
During my time with Procter & Gamble, I built image segmentation pipelines using deep learning models (ResNet, U-Net) on Azure Data Factory to streamline high-volume data processing, providing product designers with insights that enhance pre-market product quality and consumer satisfaction. -
Insurance Technology Solutions
At Accenture, I worked as an Application Development Analyst, transforming the Duck Creek Claims process. I engineered automation processes that improved data integrity, reduced data transmission errors, and optimized claim processing speed by 93%, earning me the Accenture Excellence Award.
-
Healthpedia AI: Your Wellness Companion
Healthpedia is an LLM-powered medical chatbot leveraging LangChain, Meta Llama2, and Pinecone, designed to answer complex healthcare-related queries with precision. By integrating advanced NLP techniques with RetrievalQA, it provides meaningful, context-aware responses for medical inquiries. -
Sensor-Sentinel: ML-Powered Quality Assurance
Developed a robust ML pipeline for quality assurance in wafer sensor manufacturing, utilizing KMeans clustering for segmentation and classification models (Random Forest, XGBoost) to ensure accurate and scalable sensor quality predictions.
- Languages: Python, R, SQL
- Tools & Platforms: Microsoft Azure, Databricks, Neo4j, Power BI, Streamlit, MLFlow, Docker, Flask
- Libraries & Frameworks: TensorFlow, PyTorch, Scikit-Learn, LangChain, Pinecone, SciPy, Pandas, NumPy
- Certifications: Microsoft Certified: Azure Data Scientist Associate (DP-100), Azure Data Fundamentals (DP-900), GenAI with Large Language Models
I’m passionate about applying AI to solve real-world problems, particularly in healthcare and consumer insights. My interests include:
- Large Language Models (LLMs) and Natural Language Processing
- Knowledge Graphs and Multi-Agent Systems
- Machine Learning Interpretability & Explainability
- Cloud & Scalable AI Deployments
- Image Segmentation & Computer Vision
Feel free to explore my repositories for more detailed insights into my work. I regularly update my GitHub with new projects, insights, and innovations in data science. Let’s connect on LinkedIn or reach out to me via email to discuss potential collaborations or opportunities!