- Anomaly Detection
- Contributors
- Company Logo
- Description
- Technology Stack
- Badges
- Installation
- Inital Idea
- Inital PPT
- Final PPT
- How to Use the Project
- How to Contribute to the Project
- License
- Security
This Project Was created for the Hackathon held by Barclay's called Hack-O-Hire. The Problem Statement was to develop an Anomaly Detection Framework that will help identify the potential issues and irregularities in the data when compared with the regular submissions.
- ETL - Apache Spark
- Python
- Mongo DB
- Open-source Encryption/decryption algorithm
- ML -Isolation Forest
- Tableau
- Apache Airflow
In the realm of financial transactions, ensuring accuracy and reliability is paramount. However, trade data's vast volume and complexity pose significant challenges in detecting anomalies, which could result in erroneous calculations and payments. This project addresses this critical issue by proposing an Anomaly Detection Framework designed to identify irregularities and potential issues in trade data submissions. Leveraging advanced technologies such as Apache Spark, Airflow, and Tableau, combined with robust data engineering practices and machine learning algorithms, our solution aims to enhance accuracy, reduce manual effort, and foster self-learning across banking functions.
This project aims to develop an Anomaly Detection Framework capable of efficiently identifying irregularities and potential issues within trade data submissions. By integrating cutting-edge technologies and best practices in data engineering and machine learning, our solution seeks to enhance the accuracy of payments, reduce manual effort, and promote self-learning across banking functions.
Our solution revolves around a robust system architecture designed to handle the challenges associated with detecting anomalies in large and heterogeneous datasets. Data is retrieved from diverse sources, undergoes extraction, transformation, and loading (ETL) processes, and is stored in a suitable database system. Data Sources like Yahoo Finance, Upstox, Tradefeed, etc. A scheduling pipeline will help the workflow optimize the time taken for the process to occur. The pipeline will divide the data into multiple segments, each segment in the form queue will enter the pipeline where if the second segment in the queue is in etl process the other segment is in the loading process, and so on. Apache Spark forms the backbone of our system, enabling distributed processing of data in parallel to ensure scalability and performance. Pre-processing steps, including feature engineering, enhance the accuracy of anomaly detection. For anomaly detection, we utilize the Adtk library, which leverages historical trends in stock prices to identify abnormal patterns such as outlier data points, spike levels, and volatility shifts. The results of anomaly detection are visualized using Tableau, providing users with intuitive insights to investigate and address anomalies promptly. Airflow serves as a workflow management system, automating various stages of the data pipeline to ensure reliability and efficiency.
In conclusion, our Anomaly Detection Framework offers a robust and scalable solution to the challenges of identifying irregularities in trade data submissions. By leveraging advanced technologies and best practices in data engineering and machine learning, we empower organizations to enhance accuracy, reduce manual effort, and foster self-learning across banking functions. Our solution aims to safeguard data integrity and promote operational excellence in the financial domain through continuous innovation and refinement.
To install and run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/parthpetkar/Barcleys_Hack_O_Hire_Project.git
-
Navigate to the project directory:
cd Barcleys_Hack_O_Hire_Project
-
Install dependencies: Create a Virtual Environment
python -m venv Barcleys_Hack_O_Hire_Project
-
Create Mongo DB Collections
- DB name - Hackathon
- Collection_1 - Live-Stock-Data
- Collection_2 - Stock-Data-Final
- Collection_3 - Anomalies
- DB name - Hackathon
-
Set up Docker Images:
Docker build -t etlimage Docker build -t mlimage
-
Run Docker Compose:
Docker-compose up -d
To use the project, follow these steps:
- Launch the application.
- Create a new invoice by filling in the required details.
- Save or print the generated invoice.
We welcome contributions from the community! To contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/contribution
). - Make your changes and commit them (
git commit -am 'Add new feature'
). - Push to the branch (
git push origin feature/contribution
). - Create a new Pull Request.
This project is licensed under the MIT License.
🔒 If you discover any security-related issues, please email parth.petkar221@vit.edu instead of using the issue tracker.