Skip to content

spoorthy1-reddy/CS5342-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to Network Security Bot

Project Description

This project develops an interactive quiz bot aimed at enhancing learning in network security. It features a user- friendly interface for seamless interaction, an Embedding Model to convert text into vectors for efficient comparison, and a Vector Database (Chroma DB) to match similar knowledge documents. The bot uses Large Language Models (LLM) to process user inputs and generate accurate, contextual responses. Drawing from various sources such as textbooks, lecture slides, and online materials, the system provides personalized feedback, correct answers, and citations, ensuring an engaging and effective learning experience in network security.. The quiz includes multiple-choice questions, true/false questions, and open-ended questions. Finally, the bot will provide feedback on the user's answers if it is correct or not along with the reference source documentation title.

System Architecture

User Input:

The user enters a prompt in the user interface.

Embedding Generation:

The system translates the user query into a numerical embedding, encapsulating its semantic meaning.

Vector Database:

This embedding is then sent to a vector database, where it’s matched against precomputed embeddings of various documents. The database returns a list of documents most relevant to the user's query based on similarity.

Contextual Prompt Generation:

The system creates a new prompt by combining the user's input with relevant documents as context. This added context enhances the prompt, equipping the Language Model with additional information.

Local Language Model Processing (LLM):

The contextualized prompt is sent to the local Language Model, which generates a response that considers both the user query and relevant background documents.

User Interface Display:

The system displays the response, along with citations or references to the original context documents. This transparency allows users to verify the sources and builds credibility.

Video

QuizbotRecording.1.mp4

Prerequisite

Install python3
Create virtual environment
python3 -m venv venv
./venv/bin/activate - for MAC users
./venv/Scripts/activate - for WINDOWS users

Requirements

*pip install langchain==0.3.6
*pip install chromadb==0.5.11
*pip install llama_models==0.0.47
*pip install urllib3==2.2.3
*pip install pypdf==5.1.0
*pip install python-dotenv==1.0.1
*pip install requests==2.32.3
*pip install google-auth==2.35.0
*pip install nltk==3.9.1
*pip install pillow==10.4.0
*pip install ipython==8.28.0
*pip install PyPDF2==3.0.1
*pip install zipp==3.20.2
*pip install dataclasses-json==0.6.7

Step by step instructions for executions

Step 1: Clone the repo git clone https://github/com/nlyson/CS5342Project

Step 2: Create virtual environment python3 -m venv venv

Step 3: Install dependencies pip install -r requirements.txt

Step 4: Pull the model (make sure you have ollama running... Google 'how to run ollama locally' for help) ollama pull llama3.2

Step 5: Open VS Code code .

Step 6: Open the src/llama_train.ipynb notebook

Step 7: Run 'Dependencies' cell.

Step 8: Run 'Embed Documents' cell. (use vectorstore = Chroma.from_documents(persist_directory=<path_to_local_dir>, documents=splits, embedding=embeddings) if you want to save your embeddings locally.

Step 9: Run 'Initialize Llama locally' cell.

Step 10: Run 'Generate prompt with GUI using gradio' cell.

Step 11: Broswe to http://127.0.0.1:7860 in your browser.

Step 12: In the text box, enter one of the following:

  1. Generate a multiple choice network security question and provide feedback to my answer.
  2. Generate a true/false network security question and provide feedback to my answer.
  3. Generate a short answer network security question and provide feedback to my answer.

Step 13: You can also be creative. Try telling the Chatbot to give you questions, grade them, and give you feedback.

Identifying issues and Implementing Solutions

Issue 1: We initially embedded and stored the preprocessed documents into a vector database. We used similar matching to retrieve the results, but the results didn’t come in question/answer format.

Solution 1: We augment the results with the llama3.2 model. We were able to augment results with generated content.

Issue 2: After giving questions and answers, the quizbot would not give citations.

Solution 2: We used metadata stored with the embeddings in the vector database. After performing a similar match and passing results to the LLM, we saved the metadata (document and page number) and appended the metadata to the results.

Issue 3: Overfitting in Quiz Logic: Overfitting can lead the quiz bot to perform well on familiar questions from its training data but may hinder its ability to handle new or varied quiz scenarios.

Solution 3: Improving the quiz bot’s generalization can be achieved by using a diverse and representative dataset, including a wide range of real-world question types, and regularly retraining the model with fresh data.

Features

The bot generates quizzes on both random and specific topics based on user preferences, using its trained datasets. It includes various question types, such as multiple-choice, true or false, and open-ended questions. After the quiz, the bot provides feedback to the user with a score. The bot functions as both a quiz bot and a chatbot, allowing for a versatile and interactive experience.

Describe training data and data formats

Our bot is trained using lecture slides and the Network Security slides(Network Security Essentials: Applications and Standards, Sixth Edition by William Stallings.

Suggestion:

To enhance the project's scalability, consider integrating real-time data updates for quizzes, ensuring that the bot stays current with new network security trends and topics. Additionally, implementing user performance tracking over time could provide valuable insights for personalized learning paths.

Feedback:

The proposal is clear and well-structured, with strong use of LLMs and vector databases. To improve, consider adding adaptive learning based on user performance. User testing during development will also help ensure the bot is effective and user-friendly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •