Skip to content

mohitkatta01/mohitkatta01.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Applied Data Analytics Graduate

Technical Skills: Python, R, SQL, Bash Scripting, AWS, GCP, Apache Spark, Apache Kafka, Power BI, Linux Programming

Education

  • M.S, Applied Data Analytics | Boston University (Jan 2024)
  • B.S (Hons.), Computer Science | Heriot-Watt University (May 2022)

Work Experience

Graduate Data Science Research Assistant (Dec 2022 - Present)
Boston University Henry M. Goldman School of Dental Medicine

  • Examined 40,000+ American's tweets related to Vaccine and fluoride, to identify trends in beliefs.
  • Reduced manual assigning of sentiments on tweets by 40%; created a ML pipeline to perform Preprocessing, fine-tuning a transformer model to automatically assign sentiments.
  • Fine-Tuned an existing Transformer model to predict the sentiment of more than 40,000 tweets with an accuracy of 85%.
  • Developed correlation matrices and Bar plots to assess the trends over 3 time periods: Pre Pandemic, During Pandemic and post pandemic.
  • Skills: Python, Machine Learning, NLP, Predictive Modelling, Research Skills

Information Technology Help Desk Technician (Sept 2022 - Present)
Boston University Metropolitan College

  • Resolved 200+ tickets within a span of 1 year with over 90% client satisfaction rate.
  • Troubleshooted hardware and software issues such as Power BI, VMWare and Citrix Virtual Labs via ServiceNow ticketing system, phone calls and Bomgar remote desktop assistant.
  • Trained a team of 4 new hirees to assist clients with various troubleshooting tasks and handling clients face-to-face.
  • Skills: Team leading, Attention to detail, Customer-facing role, Management

Data Science Intern (Jun 2022 - Aug 2022)
Apparel Group

  • Optimized the inventory for the Charles & Keith brand by developing a sales predictive algorithm; Reduced unused inventory by an average of 15% per store.
  • Analyzed diverse data factors, such as customer demographics, store locations, seasonal trends, store sizes, and fashion cycles, to enhance the algorithm's performance.
  • Deployed a Sales Predictor Model, with an accuracy of 75% at the time of deployment, providing valuable insights and optimization solutions for the brand.
  • Skills: Time Series Forecasting, Qualitative Analysis, Natural Language Processing (NLP), MicroStrategy, Microsoft Excel, PowerBi

Undergraduate Teaching Assistant (Sep 2021 - Jun 2022)
Heriot-Watt University

  • Provided teaching and support to students in introductory computer science courses, including Java, Python and R.
  • Graded student assignments and exams, and provided feedback to help students improve their learning.
  • Led weekly discussion sections and helped students understand complex concepts and solve problems.
  • Skills: Management, Research Skills, Git, Problem Solving, Python

Data Consultant (Nov 2020 - May 2022)
COGOS Technologies

  • Analyzed data and created a platform for integrating operations post-acquisition at Cogos.
  • Led machine learning projects for route and capacity optimization after the acquisition.
  • Demonstrated expertise in data-driven decision-making and machine learning.
  • Created a data warehouse and platform pre-acquisition.
  • Skills: Project Management, MS Excel, GitHub, Open Source Softwares

Projects

Generating customer service responses using Hugging face LLMs

  • Automated the process of responding to customer's client support messages. The goal is to improve customer satisfaction and reduce the amount of time spent on customer support.
  • Designed a framework to connect a front-end chat system to a Hugging Face LLM using Streamlit.
  • Produced a Fine-tuned LLM with a large dataset (15 GB) and converted the response into an audio file using AWS Polly.
  • Scaled the system with Apache Kafka, AWS S3 and Redis – for faster access and reduced latency
  • Fine tuned the model on Google’s TPUs and hosted the rest of the framework on AWS, GitHub, and Redis Cloud.

Apache Kafka - Real-time Hate Speech analysis on Discord servers

  • Designed a framework to listen to messages on subscribed Discord servers and analyze the type of hate speech.
  • Scaled the system using Apache Kafka and Streamlit to accommodate multiple discord servers without changing any backend code.
  • Hosted the code on GitHub and Streamlit for ease of access and usage.

Trends in American's Beliefs about Fluoride from Twitter

  • Conducted in-depth analysis of public sentiment regarding water fluoridation. Employing advanced techniques to extract 80,000 relevant tweets leveraging a Web Scraping tool via Digital Ocean VPC.
  • Collected a subset of 1000 tweets for manual labeling; Later used for fine tuning a Transformer model (RoBERTa).
  • Created a ML Pipeline to pre process the data, normalize and remove unnecessary tweets for finetuning the transformer model.
  • Analyzed the sentiment of 40,000+ tweets with the Fine-Tuned RoBERTa model, enabling effective predictions on a large volume of unlabeled data.
  • Developed correlation matrices by clustering topics derived from unstructured data sources.

Size Profile Optimization

  • Forecasting size level demand at each store for any set of store-option pair.
  • Using AI/ML methodologies to determine true demand and optimize inventory.
  • The forecast was created by consuming BI reports and SQL dumps.
  • A decision tree regressor was trained on historical sales data to understand trends and predict future sales.
  • A frontend was developed using Streamlit to host the deployed model.

Stock price prediction using past stock prices and tweets

  • Collected more than 5 years worth of stoick prices for the Alphabet stock ticker. Conducted predictive analysis on historical stock prices and public sentiment data to project future stock prices.
  • Explored and evaluated the performance of 9 different Machine Learning models and three word embedding techniques, namely TF-IDF and "Bag of Words" for analysis.
  • Implemented a data pipeline for daily Twitter data processing, normalization, and sentiment analysis.
  • Identified "Bag of Words" as the optimal word embedding model and Support Vector Classifier (SVC) as the most effective classifier.

Ensemble Machine Learning Model for sentiment analysis

  • Developed an ensemble Machine Learning model by combining two Deep Leaning models with TensorFlow and OpenAI; Utilized SpaCy and NLTK libraries to prepare the extracted data for seamless model training.
  • Utilized Digital Ocean VPCs to establish connections with multiple data centers worldwide, allowing for the simultaneous extraction of terabytes of tweets for data acquisition.
  • Orchestrated a robust data pipeline utilizing SpaCy and NLTK libraries to preprocess and prepare the extracted data for seamless model training.
  • Evaluated the model with Accuracy, MCC Coefficient and other ML metrics to reveal an impressive accuracy of 77% for ensemble model, while the existing state-of-the-art model achieved approximately 85%.

Data Mining and Machine Learning - Portfolio

  • Implemented various NLP techniques such as Naive Bayes, k-means clustering, hierarchical clustering, decision trees, and linear classifiers, all evaluated using a 10-fold cross-validation approach.
  • Employed a comprehensive set of evaluation metrics, including Accuracy, F-Score, ROC/AUC curve, precision, and recall, to assess the effectiveness of each model.

Volunteering & Student Clubs

HW Tech Club
Design Director (May 2021 - Jun 2022)

  • Managed a team of 5 students, providing guidance and mentorship on UI/UX design principles and tools.
  • Organized and executed a university-wide competition for the best UI/UX frontend website, promoting creativity and innovation among students.
  • Stayed up-to-date on the latest UI/UX trends and technologies, and shared knowledge with the team through regular presentations and discussions.
  • Fostered a positive and supportive team environment, encouraging collaboration and creativity.


*Cyber Security Analyst (Oct 2020 - Jun 2022)* - Developed and maintained a Virus scanner application using Python which checks for potential scamming websites via a chrome extension. - Hosted and delivered educational YouTube talks to promote awareness on different types of Wireless Hacking and OSINT Technologies. - Collaborated with other Tech Club members to organize and execute cybersecurity workshops and events. - Kept up-to-date with the latest cybersecurity trends and technologies, and shared this knowledge with the Tech Club community. - Provided cybersecurity support and guidance to Tech Club members and other students.

*Creatives and Video Editor (Jul 2020 - May 2021)* - Created and edited engaging videos to promote the Tech Club's events and initiatives. - Collaborated with other creatives to develop and produce video content that was both informative and entertaining. - Managed the Tech Club's social media accounts and used video to create a strong online presence. - Analyzed video performance data to identify areas for improvement. - Stayed up-to-date on the latest video editing trends and technologies.

**The Uplift Foundation (Jul 2020 - Jun 2021)**
*Graphic Designer* - Created brand guidelines for posting consistent content on social media accounts - Spearheaded a team of 4 people to convert text to interative and eye-catching content.

Blog

  1. Predicting stock prices — a sentiment analysis approach
  1. Apache Kafka — Real-time Hate Speech analysis on Discord servers

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published