Skip to content
View shubhamgogri's full-sized avatar
Block or Report

Block or report shubhamgogri

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shubhamgogri/README.md

Hi there!

I'm Shubham, a Data Scientist & Generative AI Enthusiast with Master's Degree from University Of Surrey, UK. Welcome to my GitHub portfolio! I'm a seasoned AI researcher specializing in NLP, computer vision, and speech processing, with a particular focus on large language models (LLMs). My expertise lies in developing innovative solutions that leverage cutting-edge technologies to address complex challenges. With a solid foundation in Python, TensorFlow, and PyTorch, I'm adept at implementing state-of-the-art algorithms and models. Additionally, my experience in MLOps ensures seamless deployment and management of machine learning workflows. Explore my projects across NLP, computer vision, and speech processing, and let's collaborate on transforming ideas into impactful solutions! See My full Resume here-> Link

Jump to Projects

🚀 Skills & Expertise:

Python Java MySQL Android

Python-Libraries & Frameworks:

Tensorflow Pytorch Keras OpenCV Flask Pandas scikit-learn NumPy

Deployment Tools:

Docker Amazon AWS Microsoft Azure Git GitHub Huggingface Gradio

Tools:

Pycharm Google Colab Android Studio Jupyter Postman Firebase Ubuntu Azure Data Studio Unity

💡 Projects:

Having a range of applications in multiple modalities, projects are categorised into NLP, CV, Speech and Machine learning.

🖼️ Computer Vision

  • Advanced Sparse-View CT Denoising: Applied Mir-Net and GAN based algorithms for correcting Sparse-View CT scans using image-to-image translation using University’s HPC (Condor’s System).
  • Comparative Study: Integrated Pix2Pix GANs with varied training methods, conducting a comprehensive comparative study on image generation techniques based on quantitative and qualitative analysis.
  • Publication Recognition: Abstract accepted at ICMLMI (International Conference of Machine Learning in Medical Imaging), London, 2023. Link
  • Currency Prediction App: Engineered a TensorFlow based application predicting Indian currency (85%) via pre-trained EfficientDet-Lite0 and Cloth Recognition models (76% on Cloth Patterns). Integrated TF Lite models into an Android app.
  • API Integration & Publication: Integrated ML-Kit's Object Detection, Handwritten Text Recognition APIs, and Google’s TextToSpeech API. Research on the project published in Dickensian Journal.

Vehicle Re-Identification: (TensorFlow)

  • Fine tuning for transfer learning on models like EfficientNet, ResNet-50, and MobileNet for vehicle re-identification.
  • Conducted extensive data pre-processing and augmentation, including digital image warping and rotations, to enhance dataset quality.
  • Demonstrated expertise in hyper-parameter tuning and experimental design, showcasing proficiency in data manipulation, and tackling complex problems as a data scientist.
  • Supervised Scene Classification: Engineered a ResNet-34 CNN on the "Places2 simp" dataset (40,000 128x128 images, 40 categories). Tailored ResNet-34, achieving ≥45% validation accuracy and ≥75% top-5 accuracy.
  • Training and Validation Optimization: Tuned hyperparameters, achieving superior performance, validated via confusion matrices, showcasing top-5 scores for tests.

📖 Natural Language Processing.

  • Constructed a multi-class classifier prototype utilizing the GoEmotions dataset consisting of 197,847 labeled Reddit comments.
  • Conducted four experiments involving preprocessing, N-Gram analysis, sentiment analysis using bi-directional LSTM models, achieving varied test accuracies ranging from 39.58% to 59.26%. Additionally, employed a CNN with LSTM architecture, achieving 41.09% test accuracy after 20 epochs.
  • Deployed the prototype on HuggingFace, developed a web app using Gradio, and conducted extensive API testing, integrating a CI/CD pipeline for continuous deployment and delivery.

📄 PDF Retrieval using OpenAI LLM: (RAG)

  • Leveraged OpenAI's Language Model (LLM) and Langchain library to develop a precise PDF retrieval system.
  • Integrated Pinecone Vector Database to optimize document storage and retrieval, enhancing search efficiency and accuracy.

💬 Q&A Chatbot based on LLAMA:

  • Optimized the LLAMA-7b parameter model for a supervised Q&A chatbot, refining its performance within the supervised learning paradigm.
  • Implemented PEFT (Parameter Efficient Fine tuning) , employing a LoRa-based approach to expedite training, yield superior results, and reduce computational demands.

Machine Learning & End-to-End Pipelines

  • ML Pipeline Development: Engineered machine learning pipelines encompassing feature engineering and hyperparameter tuning across multiple algorithms.
  • Azure Deployment: Deployed the pipeline on Microsoft Azure using CI/CD methodology, leveraging a Dockerization of Flask web application.

👩 Sport Celebrity Classification:

  • Technologies included Web scraping, Utilized OpenCV2 for image processing, NumPy, Pandas, PyWavelets for data manipulation, and Matplotlib for visualizing data and model performance.
  • Model Development & Deployment: Experimented with various models and parameters using GridSearchCV, achieving accuracies of 78% (logistic regression), 76% (SVM), and 70% (random forest). Developed a Flask web app to showcase the classification model, exhibiting skills in web scraping, image processing, data cleaning, optimization, and web development for ML applications.

〽️ Stock Market Analysis (Time Series)

  • Implemented Time-Series technique on the Stocks of Top companies i.e. Apple, Microsoft, Amazon, and Apple.
  • LSTM based network along with other Machine Learning Algorithms are used for this Regression Problem with qualitative analysis.
  • Processed continuous dataset and applied Regression based Machine learning models.
  • The Data science paradigms such as the Feature Engineering, Selection and Extraction is carried out on the dataset.

🗣️ Speech (Audio)

❤️ Heart Murmur Detection:

  • Heart Murmur Disease Detection: Conducted heart murmur disease detection using real patient audio samples, employing Digital Signal Processing techniques via the Librosa library.
  • EDA and Machine Learning: Applied Exploratory Data Analysis (EDA) techniques, feature engineering, and selection methods on Mel-Spectrograms and other physiological data. Utilized various machine learning models such as SVM, Random Forest Classifier, and Naïve Bayes Classifier.

📱Android Projects

  • Currency Prediction App: Engineered a TensorFlow-based application predicting Indian currency (85%) via pre-trained EfficientDet-Lite0 and Cloth Recognition models (76% on Cloth Patterns). Integrated TF Lite models into an Android app.
  • API Integration & Publication: Integrated ML-Kit's Object Detection, Handwritten Text Recognition APIs, and Google’s TextToSpeech API. Research on the project published in Dickensian Journal.
  • Application that serves as a marketplace to customize and shop the fashion wear.
  • Integrates Firestore authentication, Firebase Database and Storage.
  • API for online payment getway using RazorPay is used for digital Payments in India.
  • Task Scheduler App is an Android application in which the task can be scheduled and handled according to the priority of the task.
  • It consists of Room Database to handle all the task and set the Due Date of the same.
  • Basically the application is so-called ToDo App that saves the input into the Room Database.
  • The Project focuses on the Room Database and some of the concepts about Handling the listening events accross activities and fragments, enum class and ViewModel class for transfering the data across activites. All sorts of Data Manipulation and Basic CRUD i.e. (CREATE,READ, UPDATE, DELETE) are very well performed.

Data Analytics Project

  • Dashboard with Data analytics of current trends and respective Report of Attendance is created using PowerBI for Hybrid work culture Analysis.

📚 Certificates & Awards:

  • Master’s Dissertation project got nominated for the Electronic Engineering Industrial Advisory Board MSc Project Prize by University of Surrey. (results Awaiting)
  • "Excellence in AI Collaboration" Award: Recognized within SetSquare Surrey's Entrepreneurship programme (IKEEP and ITeK) for exceptional industry collaboration, ensuring effective completion of duties within the cohort including excellent communication abilities to achieve tasks.
  • Runners up in the GenAI hackathon, for the extensive solution to tackle the use of Generative AI for Assessments.

Github Stats

Visitor Badge

📫 Get in Touch:

Linkedin Badge Gmail Badge

🔭 Always exploring new technologies and excited to collaborate on innovative projects!

Pinned Loading

  1. Image_Correction_in_SparseView_CT Image_Correction_in_SparseView_CT Public

    Jupyter Notebook

  2. EmotionClassification EmotionClassification Public

    Go-Emotions Emotion Classification algorithm with hyperaparameter tuning with deployment on huginf face with Gradio for UI

    Jupyter Notebook

  3. mlproj mlproj Public

    Jupyter Notebook

  4. Assistive-Vission Assistive-Vission Public

    An application integrating various ML models.

    Java