Hi! I'm Vinay Varshigan, a Master's student in Computer Science at Northeastern University with a perfect 4.0 GPA, currently serving as a Teaching Assistant for Algorithms (CS5800). I'm passionate about Data Science, Machine Learning, and Data Engineering, with hands-on experience in building scalable data pipelines, ML models, and cloud infrastructure.
π Currently working on: Advanced ML models, NLP projects, and scalable data engineering solutions.
π― Looking to collaborate on: Innovative projects in AI/ML, data engineering, and cloud-based analytics.
π€ Looking for help with: Exploring cutting-edge deep learning architectures and optimizing large-scale data systems.
π± Currently learning: Advanced NLP techniques, Transformer architectures, and cloud-native data engineering.
π¬ Ask me about: Machine Learning pipelines, ETL workflows, AWS deployment, or Power BI dashboards.
β‘ Fun fact: I automated data validation workflows so well that I saved 60% processing timeβnow I automate everything, even my coffee breaks!
An intelligent smartphone recommendation engine built with Java, MVC architecture, and hybrid sentiment analysis (VADER + BERT). The system filters phones based on budget, usage type, and OS preference, while analyzing real customer reviews to provide confidence scores and personalized add-on recommendations.
Tech Stack: Java MVC Pattern Strategy Pattern NLP VADER BERT Spark Java API Apache Commons CSV
Key Features: Smart filtering, sentiment analysis (89%+ accuracy), personalized add-ons, batch navigation, responsive web UI
A hybrid multi-stage log classification pipeline that combines regex, BERT embeddings, and semantic clustering to automatically categorize system logs from various sources. Achieved 99% accuracy using Logistic Regression with SentenceTransformer embeddings.
Tech Stack: Python BERT SentenceTransformers Scikit-learn Regex DBSCAN Logistic Regression
Categories: User Actions, HTTP Status, Security Alerts, Errors, Resource Usage, System Notifications (9 classes)
Real-time wildlife detection pipeline using ResNet50 and Transformer models integrated with IoT sensors for tribal safety in remote regions. The system uses RF/mesh networks to deliver alerts even without internet connectivity, classifying animals and assigning danger levels.
Tech Stack: Python ResNet50 Transformers TensorFlow PyTorch OpenCV ESP32-CAM IoT Sensors RF Modules
Key Features: Carnivorous vs. non-carnivorous classification, danger level prediction, offline alerts, SOS integration
Scalable text classification system for e-commerce product data using Facebook's FastText library. Leverages word and subword embeddings for robust handling of misspelled and out-of-vocabulary words, achieving strong performance with minimal training time.
Tech Stack: Python FastText NLP Pandas NumPy Subword Embeddings
Key Features: Fast training, robust to typos, multi-class classification, production-ready scalability
Data Science & Machine Learning:
Data Analytics & Visualization:
- AWS Cloud Essentials - Amazon Web Services
- Introduction to Internet of Things - NPTEL IIT
- Foundations of Software Engineering and Data Management - Northeastern University
- Conversational Intelligence Analytics Engineer: Building conversational AI analytics solutions to enhance workplace intelligence and user engagement.
- AI Engineer, Data for Social Good Club - Northeastern University: Developing AI-driven solutions for social impact projects, applying machine learning to address real-world community challenges.
- Teaching Assistant (Algorithms - CS5800), Northeastern University: Supporting graduate-level algorithms course, helping students master advanced computational problem-solving techniques.
- Data Scientist Intern, Besant Technologies: Automated data validation pipelines, reducing processing time by 60%, performed EDA and SQL optimization (20% latency reduction), and built interactive Power BI dashboards.
- Secretary and Creative Group Head, SVCE CyberHub: Led cross-functional teams for hackathons and technical events, driving 3x increase in social media engagement through data-driven creative strategies.
- Software Developer Intern, Bluebase Software Solutions: Designed scalable AWS cloud infrastructure (VPC, EC2, Auto Scaling), automated deployment workflows using Linux scripts, and improved team documentation processes.
- Published Routledge Book Chapter: Blockchain, smart contracts, and AI-based fraud detection.
- Best Paper Award: Heart disease prediction using Decision Tree, Naive Bayes, and Random Forest models (83% accuracy).
- Patent Under Review: AWS-deployed IoT water monitoring system.
