Passionate Data Engineer & Machine Learning Engineer with a Master's in Analytics from Northeastern. Proficient in Python, Java, ML, led impactful teams at PowerSchool, certified problem solver.
Developed with the software and tools below.
The repositories dedicated to Data Analytics projects using the R language and others focused on Data Science projects using Python language. Additionally, there are repositories include crucial frameworks such as Spring, Hibernate, and testing tools to build a well-organized Java web project.
IPEDS LLM
Location | Summary |
---|---|
https://github.com/enggabhishek/ipedsllm | Enhanced Text-to-SQL problem by leveraging Transformer architecture based on RAG pipeline in large language models (LLMs). Developed efficient and accurate Text-to-SQL infrastructure using Transformer-based LLMs, verified through rigorous testing. Deployed Langchain and LlamaIndex to create lightweight, scalable LLM applications for instant Information Retrieval and Academic Support. |
Out-of-Pattern-Detection
Location | Summary |
---|---|
https://github.com/enggabhishek/Out-of-Pattern-Detection | Developed and deployed predictive models to detect anomalies in 16 GB of HTTP request log data, enhancing Docdigitizer's cybersecurity. Tasks included building Azure Data Lake Storage Gen2, setting up Managed Identities in Azure, configuring Apache Kafka Client, ElasticSearch Cloud for OLAP, and Apache Airflow DAG on the Astronomer platform. Additionally, data extraction, cleaning, feature engineering, and streaming to ElasticSearch Cloud were performed. Kibana dashboards were set up, along with creating Azure Function Apps and Data Factory Pipelines to trigger Airflow DAGs. GitHub Repository and Git Actions were established for CI/CD deployment. The models achieved 93% accuracy in load factor estimation and anomaly detection. |
Message-Distribution-Analysis
Location | Summary |
---|---|
https://github.com/enggabhishek/Message-Distribution-Analysis | Over the past two years, a medium-sized retail company has skillfully engaged customers through diverse communication channels, including emails, web push notifications, mobile alerts, and SMS. The project began with storing campaign and message data in Azure Data Lake, which was then extracted and transferred to Snowflake for data warehousing. The data was merged, filtered, and consolidated into a "message_extended" table. Using Power BI, the company visualized this data to gain insights. For predictive analysis, the data was cleaned, transformed, and balanced with SMOTE. Multiple machine learning models were evaluated, with Gradient Boosting performing the best in predicting customer engagement. This end-to-end data pipeline enabled data-driven decisions and enhanced customer interaction strategies. |
NYC-Taxi-Trip-Data-Analysis
Location | Summary |
---|---|
https://github.com/enggabhishek/NYC-Taxi-Trip-Data-Analysis | Using Python, Hadoop, PySpark, and Tableau, massive datasets were methodically probed to uncover vital insights. Interactive dashboards revealed high-tipping zones, fee-generating areas, and surcharge-trip duration patterns, empowering stakeholders. Taxi fares were forecasted using predictive modeling approaches that took into account trip characteristics and temporal aspects. This technique made it easier to make educated decisions, optimize taxi services, and evaluate market trends. |
R Scripts Overview
Location | Summary |
---|---|
https://github.com/enggabhishek/Analytics | Projects cover Chi-Square ANOVA, Linear Regression with fish and housing datasets, Hypothesis Testing on education/sleep, and Netflix data with Regularization Techniques. |
Boston Housing
Location | Summary |
---|---|
https://github.com/enggabhishek/Boston-Housing | This project analyzes Boston's real estate market using the Boston Assessment 2021 dataset (177,091 records, 63 columns). It employs various machine learning models to predict housing prices, with LightGBM achieving 80.87% accuracy. Key features were selected using VIF to reduce multicollinearity. The analysis aims to provide insights into factors influencing Boston's housing market. |
VirtualClassroom
Location | Summary |
---|---|
https://github.com/enggabhishek/VirtualClassroom | The Virtual Classroom project employs a robust technological stack, featuring JAVA, J2EE, HTML, JSP, MySQL, Hibernate, and Spring. This comprehensive solution ensures a flexible and accessible learning environment, promoting dynamic student-teacher interactions. |