Barclays Hack-O-Hire Project

Anomaly Detection

🎨 Table of Contents

Anomaly Detection
Contributors
Company Logo
Description
Technology Stack
Badges
Installation
Inital Idea
Inital PPT
Final PPT
How to Use the Project
How to Contribute to the Project
License
Security

Contributors

Company Logo

Description

This Project Was created for the Hackathon held by Barclay's called Hack-O-Hire. The Problem Statement was to develop an Anomaly Detection Framework that will help identify the potential issues and irregularities in the data when compared with the regular submissions.

Technology Stack

ETL - Apache Spark
Python
Mongo DB
Open-source Encryption/decryption algorithm
ML -Isolation Forest
Tableau
Apache Airflow

Initial Idea

Abstract:

In the realm of financial transactions, ensuring accuracy and reliability is paramount. However, trade data's vast volume and complexity pose significant challenges in detecting anomalies, which could result in erroneous calculations and payments. This project addresses this critical issue by proposing an Anomaly Detection Framework designed to identify irregularities and potential issues in trade data submissions. Leveraging advanced technologies such as Apache Spark, Airflow, and Tableau, combined with robust data engineering practices and machine learning algorithms, our solution aims to enhance accuracy, reduce manual effort, and foster self-learning across banking functions.

Aim:

This project aims to develop an Anomaly Detection Framework capable of efficiently identifying irregularities and potential issues within trade data submissions. By integrating cutting-edge technologies and best practices in data engineering and machine learning, our solution seeks to enhance the accuracy of payments, reduce manual effort, and promote self-learning across banking functions.

Our Solution:

Our solution revolves around a robust system architecture designed to handle the challenges associated with detecting anomalies in large and heterogeneous datasets. Data is retrieved from diverse sources, undergoes extraction, transformation, and loading (ETL) processes, and is stored in a suitable database system. Data Sources like Yahoo Finance, Upstox, Tradefeed, etc. A scheduling pipeline will help the workflow optimize the time taken for the process to occur. The pipeline will divide the data into multiple segments, each segment in the form queue will enter the pipeline where if the second segment in the queue is in etl process the other segment is in the loading process, and so on. Apache Spark forms the backbone of our system, enabling distributed processing of data in parallel to ensure scalability and performance. Pre-processing steps, including feature engineering, enhance the accuracy of anomaly detection. For anomaly detection, we utilize the Adtk library, which leverages historical trends in stock prices to identify abnormal patterns such as outlier data points, spike levels, and volatility shifts. The results of anomaly detection are visualized using Tableau, providing users with intuitive insights to investigate and address anomalies promptly. Airflow serves as a workflow management system, automating various stages of the data pipeline to ensure reliability and efficiency.

Conclusion:

In conclusion, our Anomaly Detection Framework offers a robust and scalable solution to the challenges of identifying irregularities in trade data submissions. By leveraging advanced technologies and best practices in data engineering and machine learning, we empower organizations to enhance accuracy, reduce manual effort, and foster self-learning across banking functions. Our solution aims to safeguard data integrity and promote operational excellence in the financial domain through continuous innovation and refinement.

Inital PPT

PPT

Final Presentation

PPT

Badges

Installation

To install and run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/parthpetkar/Barcleys_Hack_O_Hire_Project.git

Navigate to the project directory:
```
cd Barcleys_Hack_O_Hire_Project
```
Install dependencies: Create a Virtual Environment
```
python -m venv Barcleys_Hack_O_Hire_Project 
```
Create Mongo DB Collections
- DB name - Hackathon
  - Collection_1 - Live-Stock-Data
  - Collection_2 - Stock-Data-Final
  - Collection_3 - Anomalies

Set up Docker Images:

Docker build -t etlimage
Docker build -t mlimage

Run Docker Compose:
```
Docker-compose up -d 
```

How to Use the Project

To use the project, follow these steps:

Launch the application.
Create a new invoice by filling in the required details.
Save or print the generated invoice.

🛠️ Contribution guidelines for this project

We welcome contributions from the community! To contribute to this project, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/contribution).
Make your changes and commit them (git commit -am 'Add new feature').
Push to the branch (git push origin feature/contribution).
Create a new Pull Request.

License

This project is licensed under the MIT License.

Security

🔒 If you discover any security-related issues, please email parth.petkar221@vit.edu instead of using the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Assets		Assets
Creating-Data-Warehouse		Creating-Data-Warehouse
ETL-Live-Data		ETL-Live-Data
ML-Model		ML-Model
.gitignore		.gitignore
Readme.md		Readme.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly