Skip to content
View enggabhishek's full-sized avatar
Block or Report

Block or report enggabhishek

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
enggabhishek/README.md

Data Engineer| Data Scientist| Machine Learning Engineer| Data Analyst| AI Software Engineer

Passionate Data Engineer & Machine Learning Engineer with a Master's in Analytics from Northeastern. Proficient in Python, Java, ML, led impactful teams at PowerSchool, certified problem solver.

Developed with the software and tools below.

Python JAVA R-Code
pandas NumPy scikitlearn
TensorFlow Qiskit
MySQL PostgreSQL SQL-Server
Kafka Jupyter Eclipse Airflow Elastic Azure


🔗 Quick Links


📍 Overview

The repositories dedicated to Data Analytics projects using the R language and others focused on Data Science projects using Python language. Additionally, there are repositories include crucial frameworks such as Spring, Hibernate, and testing tools to build a well-organized Java web project.


🧩 Project_Modules

IPEDS LLM
Location Summary
https://github.com/enggabhishek/ipedsllm Enhanced Text-to-SQL problem by leveraging Transformer architecture based on RAG pipeline in large language models (LLMs). Developed efficient and accurate Text-to-SQL infrastructure using Transformer-based LLMs, verified through rigorous testing. Deployed Langchain and LlamaIndex to create lightweight, scalable LLM applications for instant Information Retrieval and Academic Support.
Out-of-Pattern-Detection
Location Summary
https://github.com/enggabhishek/Out-of-Pattern-Detection Developed and deployed predictive models to detect anomalies in 16 GB of HTTP request log data, enhancing Docdigitizer's cybersecurity. Tasks included building Azure Data Lake Storage Gen2, setting up Managed Identities in Azure, configuring Apache Kafka Client, ElasticSearch Cloud for OLAP, and Apache Airflow DAG on the Astronomer platform. Additionally, data extraction, cleaning, feature engineering, and streaming to ElasticSearch Cloud were performed. Kibana dashboards were set up, along with creating Azure Function Apps and Data Factory Pipelines to trigger Airflow DAGs. GitHub Repository and Git Actions were established for CI/CD deployment. The models achieved 93% accuracy in load factor estimation and anomaly detection.
Message-Distribution-Analysis
Location Summary
https://github.com/enggabhishek/Message-Distribution-Analysis Over the past two years, a medium-sized retail company has skillfully engaged customers through diverse communication channels, including emails, web push notifications, mobile alerts, and SMS. The project began with storing campaign and message data in Azure Data Lake, which was then extracted and transferred to Snowflake for data warehousing. The data was merged, filtered, and consolidated into a "message_extended" table. Using Power BI, the company visualized this data to gain insights. For predictive analysis, the data was cleaned, transformed, and balanced with SMOTE. Multiple machine learning models were evaluated, with Gradient Boosting performing the best in predicting customer engagement. This end-to-end data pipeline enabled data-driven decisions and enhanced customer interaction strategies.
NYC-Taxi-Trip-Data-Analysis
Location Summary
https://github.com/enggabhishek/NYC-Taxi-Trip-Data-Analysis Using Python, Hadoop, PySpark, and Tableau, massive datasets were methodically probed to uncover vital insights. Interactive dashboards revealed high-tipping zones, fee-generating areas, and surcharge-trip duration patterns, empowering stakeholders. Taxi fares were forecasted using predictive modeling approaches that took into account trip characteristics and temporal aspects. This technique made it easier to make educated decisions, optimize taxi services, and evaluate market trends.
R Scripts Overview
Location Summary
https://github.com/enggabhishek/Analytics Projects cover Chi-Square ANOVA, Linear Regression with fish and housing datasets, Hypothesis Testing on education/sleep, and Netflix data with Regularization Techniques.
Boston Housing
Location Summary
https://github.com/enggabhishek/Boston-Housing This project analyzes Boston's real estate market using the Boston Assessment 2021 dataset (177,091 records, 63 columns). It employs various machine learning models to predict housing prices, with LightGBM achieving 80.87% accuracy. Key features were selected using VIF to reduce multicollinearity. The analysis aims to provide insights into factors influencing Boston's housing market.
VirtualClassroom
Location Summary
https://github.com/enggabhishek/VirtualClassroom The Virtual Classroom project employs a robust technological stack, featuring JAVA, J2EE, HTML, JSP, MySQL, Hibernate, and Spring. This comprehensive solution ensures a flexible and accessible learning environment, promoting dynamic student-teacher interactions.

Return


Pinned Loading

  1. ipedsllm ipedsllm Public

    Forked from hemadataworksai/ipedsllm

    HTML

  2. Out-of-Pattern-Detection Out-of-Pattern-Detection Public

    The Project report presents the development of an integrated dashboard and the application of predictive analytics techniques. The dashboard comprises visualizations that provide insights into requ…

    Jupyter Notebook

  3. Boston-Housing Boston-Housing Public

    Jupyter Notebook

  4. Message-Distribution-Analysis Message-Distribution-Analysis Public

    Jupyter Notebook

  5. NYC-Taxi-Trip-Data-Analysis NYC-Taxi-Trip-Data-Analysis Public

    Jupyter Notebook

  6. Analytics Analytics Public

    It includes all the analytical projects based on R program.

    R