Hi all there !
Welcome to my Under-Grad Project Repository. This is a store-house of all my projects carried out during my bachelor years at VIT University (Vellore, TN, IND).
This repo will be beneficial to those (especially to sophomores, juniors and seniors) who have just started getting their hands dirty to learn something practically.
NOTE : Most of the project's source code is not available because I realized too late that I needed to save it for future reference. Apologies for the inconvenience ! - However, you can still gain a lot from every project's documentation, which is concise and should help you understand the work done.
The projects will help you gain idea about some key sub-realms of Information Technology
and also the usage of some important tools and platforms. Below is the list of projects in this repository and a summary about them đź”˝
Credit Card Fraud Detection System - A Comparison Study on various Machine Learning and Deep Learning Algorithms - Soft Computingđź”»
This project aims to address the rising issue of credit card frauds using advanced soft computing and machine learning methods. As digital payments become prevalent, so does the vulnerability to fraud, necessitating robust detection systems.
Key components of the project include :
- Objective : To compare the effectiveness of different machine learning and deep learning techniques in detecting credit card fraud using datasets (such as those from Kaggle).
- Techniques Used : The study evaluates 7 modules/techniquesđź”»
• Convolutional Neural Network (CNN)
• Decision Tree Algorithm
• K Nearest Neighbour (KNN)
• Support Vector Machine (SVM)
• Naïve Bayes Classifier
• Random Forest Algorithm
• Artificial Neural Network (ANN)
• Confusion Matrix
NOTE : Confusion Matrix is not an algorithm - it's just a technique for summarizing the performance of a classification algorithm.
- Evaluation Metrics : Performance is measured using metrics such as Accuracy, Precision, Recall (Sensitivity), and F1-score to assess the models' effectiveness in fraud detection.
- Methodology : The project involves training models on datasets, evaluating their performance using specified metrics, and comparing results with existing studies to determine the most efficient techniques.
- Significance : Given the significant increase in credit card fraud cases globally, this research is crucial for identifying the most reliable methods for fraud detection, thereby enhancing security measures in digital transactions.
This project utilized a dataset Credit Card Fraud Detection sourced from Kaggle, specifically provided by Google. This dataset contains records of approximately 284,807 transactions across European countries. Of these transactions, only 492 (0.17%) were identified as fraudulent, highlighting the imbalance between fraudulent and valid transactions. All features in the dataset are numerical, facilitating analysis using various machine learning and deep learning techniques. The 'Amount' feature represents the transaction amount and can be crucial for cost-sensitive learning, while the 'Class' feature serves as the response variable, taking the value 1 for fraud and 0 otherwise. The entire project was implemented in Python 3.9.4 within the VS Code Python Jupyter Notebook environment using Anaconda's virtual environment. The columns labeled V1 to V28 in the dataset represent values resulting from Principal Component Analysis (PCA) transformation. PCA is a method used for dimensionality reduction in datasets, originating from linear algebra. It transforms original variables into weighted linear combinations of those variables, capturing maximum variance while minimizing correlations between them. In PCA, eigenvectors and eigenvalues derived from the covariance matrix of the original data are used to create new orthogonal components. An eigenvector of a linear transformation is a vector that changes only by a scalar factor when the transformation is applied, with the corresponding eigenvalue scaling the eigenvector. Among the models examined, Artificial Neural Network (ANN)
achieved the highest Accuracy
(99.95%
), while Convolutional Neural Network (CNN)
had the lowest (93.65%
). SVM
demonstrated the highest Precision
(100.00%
), and ANN excelled in Recall
(99.98%
) and F1 Score
(99.97%
). Overall, ANN
proved to be the most effective method
, showcasing superior performance in training, testing, and application. Future enhancements could focus on expanding datasets to further improve model accuracy, potentially reducing false positives and enhancing fraud detection capabilities significantly.
Stock Market Prediction - Data Mining Techniquesđź”»
This project focuses on predicting Tata Steel stock prices using linear regression and decision tree regression, leveraging Python and libraries such as Numpy, Matplotlib, Pandas, and Sci-Kit Learn on Google Colab. The primary goal is to forecast stock prices by analyzing historical data, assisting investors in making informed decisions about buying or selling stocks despite the challenges posed by numerous influencing factors like a company's financial status and national policies. The project treats the prediction problem as a regression task, highlighting the effectiveness of data mining techniques in producing accurate stock price forecasts. The methodology involves steps such as importing necessary modules, reading and cleaning the dataset, visualizing data, splitting the dataset, predicting future values, and evaluating predictions using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Squared Error (MSE) metrics. Linear regression, which finds the best-fit line for data points, and decision tree regressor, which uses a flowchart-like structure for decision-making, were used as the primary algorithms. The evaluation showed that linear regression had an RMSE
of 9.3406
, MAE
of 87.2485
, and MSE
of 12061.9769
, while the decision tree regressor had an RMSE
of 9.3444
, MAE
of 87.3190
, and MSE
of 12078.8146
, indicating that linear regression was slightly more efficient by a margin of 0.0038%
based on RMSE
. The project concludes that linear regression is marginally more accurate for this dataset, providing the authors with valuable insights into the implementation and evaluation of these algorithms in stock market prediction.
COVID 19 : Self Heath Analysis - Web Technologiesđź”»
The project is a web application designed to provide a pre-testing platform for users to assess their health status for potential COVID-19 symptoms before seeking further medical testing. The application aims to help manage healthcare needs by offering an initial self-assessment tool for a large population. Users can register for free to receive updates and notifications, with registered users receiving an email if their health status indicates a need for immediate testing and medical attention. Technically, the project employs HTML, CSS, JavaScript, jQuery, and AngularJS for the frontend, with NodeJS and ExpressJS for the backend, and MongoDB as the database. The application includes four main pages: Page 1
provides pandemic updates, guidelines for social distancing, hand sanitizing, and mask-wearing, and includes a link for unregistered users to analyze their health status. Page 2
features a registration form with validation using JavaScript, jQuery, and AngularJS, and buttons for form submission and navigation to the analysis page. Page 3
is the login page for existing users with validation and an alert displaying the username upon entry. Page 4
is the main assessment page where users answer questions about their health, recent activities, and potential exposure to the virus; responses are recorded, and users receive feedback. The application ensures a smooth user experience with form validation and feedback mechanisms, including color changes upon option selection and the ability to reset responses by refreshing the page. The backend, built with NodeJS and ExpressJS, and MongoDB as the database, includes a server.js
file for server operations and a database collection named UserDetails
to store user data. This comprehensive and functional web application aids in the early detection of COVID-19 symptoms, contributing to public health efforts during the pandemic.
Facial Image Suppression to Maintain Data Privacy - Network and Information Securityđź”»
This project focuses on developing a method to anonymize facial images for privacy protection, particularly in the context of video surveillance. Traditional de-identification techniques, like blurring or black-boxing, are inadequate in preserving privacy without compromising image quality. The proposed method involves using blended facial composites to obscure identities while retaining key facial features. The project highlights the dramatic increase in data sharing online and the necessity for privacy protection against unauthorized access and misuse of facial images. The proposed model describes an approach to de-identifying images using a modified averaging technique and storing these images in a database. The method uses facial landmark detection to map key facial features. Implementation utilizes Python and the dlib library to extract facial landmarks and apply the de-identification process, including specific functions for transforming images, color-coding features, and recognizing facial features. The results and discussion section compares the proposed method with traditional black-boxing, demonstrating superior performance in terms of non-detection accuracy by face recognition software. The proposed method achieves an 80%
accuracy rate of non-detection compared to 20%
for black-boxing. Additionally, the recovery function accurately reconstructs the original images with a confidence level of 85.08%
using the Kairos tool. The project concludes that the modified averaging technique effectively de-identifies and recovers images, outperforming traditional methods. Future work includes developing a transmission medium for secure image suppression and recovery. The project showcases an innovative approach to balancing privacy and image quality, with potential applications in surveillance and other areas requiring secure facial image handling.
Library Management System - C Programmingđź”»
This project is an advanced C programming application aimed at simplifying library operations. The system incorporates key C programming concepts like structures, file handling, strings, arrays, pointers, functions, loops, and conditional statements. The application is menu-driven and offers several functionalities: displaying a welcome message, user authentication, adding books to the database, searching for books, viewing book information, deleting books, and updating user credentials. Upon launch, users see a welcome screen and proceed to a login page, where incorrect credentials entered thrice lead to a login failure message. The main menu provides options for managing books and user credentials, including adding book details such as Book ID, Book Name, Author Name, Student Name, Date of adding, Searching for books by Name, Viewing all books with full details, Deleting books by Book ID, and Updating user passwords. Users can exit the application after completing their tasks. The system ensures efficient and organized library management, reducing the complexity and confusion associated with traditional methods.
Library Management System - Database Management System (DBMS)đź”»
This project aims to modernize traditional library systems by designing and implementing a comprehensive database to streamline library operations. This system addresses the limitations of manual record-keeping by ensuring data safety, ease of access, and efficient management of books and user records. It features functionalities such as real-time book availability, overdue fine tracking, and user management. The database includes entities for libraries, branches, books, authors, publishers, staff, borrowers, and visitors, with specified relationships and constraints. The project outlines scenarios for data removal, updates, and retrievals, ensuring robust database interactions. Additionally, it details the creation of SQL queries and PL/SQL functions and procedures to demonstrate the system's capabilities in handling complex data operations, using Oracle database 11g database platform.
Masquerading Website Recognition Using Machine Learning Technique(s) Via Web-Browser - Information Security Managementđź”»
This project addresses the rising threat of phishing attacks targeting high-use websites such as those for banking and cloud storage by developing an intelligent, browser-based solution to protect user credentials. Utilizing a Random Forest Classifier, trained on a dataset of vulnerable websites through Scikit-learn, the system can quickly and privately detect phishing sites directly within the browser, bypassing network delays and server dependencies. The aim is to create a lightweight browser plugin, leveraging Javascript for execution and reducing reliance on extensive machine learning libraries, to provide rapid detection and user alerts against phishing threats. Phishing, a prevalent social engineering attack, deceives users into disclosing sensitive information via fraudulent websites that mimic legitimate ones. Existing detection systems often rely on network-dependent features or incomplete blacklists. This project proposes a machine learning-based approach using URL characteristics to detect phishing sites and protect users by implementing the solution on the client side, enhancing speed and privacy. The project seeks to convert the underlying methodology into a Javascript-based browser plugin. Given Javascript's limitations with machine learning libraries, the solution must be efficient. The trained Random Forest model will be exported into a format usable by the browser to enable real-time phishing detection. The system trains the Random Forest classifier with features that can be extracted client-side, splits the dataset for training and testing, and exports the trained model to JSON. The browser plugin uses this model to evaluate websites in real-time, alerting users if a site is deemed phishing. The solution focuses on being lightweight and quick for effective phishing detection. The implementation includes modules for data preprocessing, training, exporting the model, and frontend Javascript files (Manifest.json, Background.js, Content.js). The system was tested on various platforms (Codetantra.com, GitHub.com, and a phishing site, blockchain-tech.com) and delivered an accuracy
of 94.66%
. A comparative study showed our system’s efficiency against existing systems, achieving the best accuracy
of 97.36%
, with an F-measure of 0.974
and ROC
area of 0.996
. Future enhancements could include using more features without compromising speed, caching frequent results, employing Worker Threads for faster classification, and ensuring stored results do not increase susceptibility to pharming attacks.
Online Examination System - Open Source Programming đź”»
This project addresses the challenges of conducting fair online exams during the COVID : 19 pandemic. It aims to mitigate malpractice through features like frequent webcam snapshots and robust proctoring, contrasting with existing platforms like VIT LMS and Code Tantra, which have limitations in proctoring and accessibility. The proposed system offers dynamic assessment methods, including text, audio, and video submissions, and supports small, frequent quizzes to enhance learning. The platform includes user-friendly interfaces for login, dashboards, test commencement, and immediate scoring, with detailed proctoring reports available for teachers. Despite its comprehensive features, the project acknowledges that no system is foolproof and emphasizes the importance of educating students on ethical exam practices.
Securing ATM Transactions - Information Security Assessment & Auditđź”»
This project focuses on enhancing the security of Automated Teller Machines (ATMs) through the integration of advanced technologies. ATMs, ubiquitous in modern banking, facilitate various financial transactions such as withdrawals, deposits, balance inquiries, and fund transfers. Despite their convenience, ATMs face significant security challenges, including theft and fraud facilitated by technological advancements. This project proposes a hybrid security model incorporating cutting-edge technologies like fingerprint scanners, QR code scanning, 3D facial recognition, and OTP authentication via GSM modules to mitigate these risks. The introduction emphasizes the evolution of ATMs from basic cash dispensers to sophisticated devices integrating voice-enabled interfaces and satellite connectivity, yet vulnerable to cybercrimes such as card cloning and PIN theft through spy devices. Traditional security measures like PINs and magnetic strip or chip cards are deemed insufficient against modern threats. The authors advocate for multifaceted security enhancements, starting with fingerprint scanners, lauding their uniqueness but cautioning against potential vulnerabilities from replicated fingerprints on materials like acetate sheets. Facial recognition, despite advancements like 3D imaging, faces challenges such as ambient light requirements and susceptibility to spoofing with masks. OTP authentication via GSM modules emerges as a robust alternative, ensuring secure transactions by generating temporary passwords sent directly to users' mobile phones. The use of dynamic QR codes further enhances security by eliminating the need for physical card insertion, thus preventing card cloning. The study emphasizes a hybrid approach combining these technologies to bolster ATM security, acknowledging potential drawbacks such as synchronization issues with centralized databases. The experimental setup and result analysis section details the implementation and testing phases of these technologies, highlighting challenges and improvements. It includes discussions on the operational logistics of fingerprint scanners, facial recognition systems, OTP generation, and QR code scanners, culminating in recommendations for future enhancements. The conclusion underscores the project's contributions towards making ATM transactions safer and more efficient, paving the way for further research and practical implementations integrating these technologies into real-world banking environments.
Overall, the project underscores the imperative of adapting ATM security to contemporary threats through innovative technological integrations, ensuring robust protection against evolving cyber risks while maintaining user convenience and operational efficiency in financial transactions.
Smart Fridge / Refrigerator (Mobile Application) - Human Computer Interactionđź”»
The project focuses on evaluating and enhancing the user interface of a Smart Fridge/Refrigerator system using Nielsen’s Heuristics for Human-Computer Interaction. The system employs AI and IoT technologies to monitor and manage food stored inside, ensuring optimal conditions and alerting users about food quality and expiration dates. Key features include a mobile application and a built-in display for user interaction. The evaluation against Nielsen’s Heuristics reveals strengths such as clear system status visibility, real-world language use, user control with undo options, and consistency in interface design. However, challenges like security vulnerabilities due to software bugs are highlighted, necessitating rigorous testing and maintenance. Overall, the project aims to provide a user-friendly experience for a diverse user base, emphasizing ease of use, functionality, and system reliability in smart appliance interaction.
SQL Injection Prevention System using Machine Learning Algorithms - Artificial Intelligenceđź”»
This project addresses the critical issue of securing web applications against SQL injection attacks, which are among the most prevalent vulnerabilities in web systems. The system employs a combination of PHP, JavaScript, and regular expressions to implement a defensive mechanism aimed at protecting web resources from SQL injection. The approach involves developing an application that allows users to secure their web applications using SQL injection prevention techniques such as input validation, parameterized queries, escaping, and stored procedures. The methodology integrates machine learning, specifically leveraging Gradient Boosting Classifier algorithms, to enhance detection capabilities. This involves creating a dataset through tokenization and feature extraction from SQL injection attack codes, either generated or imported from reliable sources like Kaggle or UCI Machine Learning Repository. The dataset is split into training and testing sets, with the majority used for training the Gradient Boosting Classifier model. The model is trained to distinguish between SQL injection codes and normal/valid codes based on features extracted through tokenization and categorical feature encoding. In testing, the trained model successfully identifies SQL injection codes from a dataset of 13,780
codes, detecting 9,731
instances of SQL injection and 4,049
normal/valid codes accurately. The results demonstrate the effectiveness of the proposed approach in classifying and preventing SQL injection attacks, thereby enhancing the security posture of web applications. Overall, the project contributes to mitigating one of the most significant threats to web application security using a robust combination of traditional security practices and advanced machine learning techniques.
Student Behavioral Analysis via Emotional Speech Recognition - Machine Learning đź”»
This project aims to analyze and predict the emotions expressed by students during their interactions with teachers in online classes. ​ The goal is to understand the students' understanding of the topics being taught based on their tone of speech. ​ The project focuses on recognizing emotions such as neutral, anger, joy, and sorrow from the students' spoken replies. ​This project follows a technical explanation path and involves several steps and techniques. ​ Data augmentation is used to generate fresh training examples by adding tiny perturbations to the original dataset. ​ Four data augmentation strategies are employed : noise, shifting, pitch shifting, and time stretching. ​ These techniques are applied to the audio data before generating Mel Frequency Cepstral Coefficients (MFCC) features, which are used to train the deep learning model. The MFCC feature extraction technique involves several steps such as pre-emphasis, framing, windowing, Fast Fourier Transform (FFT), Mel Scale Filter Bank, low-energy computation, and Discrete Cosine Transform (DCT). ​ These steps help transform the audio recordings into a format that the model can understand. ​This project utilizes a Convolutional Neural Network (CNN) model for speech recognition. ​ The model receives training data, including expression labels, and undergoes intensity normalization. ​ The data is then split into train and test groups, and the CNN model is built and trained using the training data. ​ The trained model is used to predict the emotions expressed in the students' speech. ​This project references several research studies that have explored similar topics, including the use of hybrid neural networks, deep neural networks, and CNN algorithms for speech emotion recognition. ​ The studies highlight the use of different datasets, data augmentation techniques, and feature extraction methods to improve the accuracy of emotion recognition. ​This project includes code implementation and analysis, including data visualization, data augmentation, MFCC extraction, model training, and evaluation. ​ The results of testing the trained model on the testing dataset are summarized in a table, showing the testing accuracy for various emotions. ​In conclusion, this project demonstrates the effectiveness of the CNN model in recognizing emotions from students' speech. ​ The overall accuracy of the model is 87.80%
, with higher accuracy observed for female emotions compared to male emotions. ​ The project highlights the potential of using machine learning techniques to analyze student behavior and emotions in online learning environments.