Welcome to Network Security Bot

Project Description

This project develops an interactive quiz bot aimed at enhancing learning in network security. It features a user- friendly interface for seamless interaction, an Embedding Model to convert text into vectors for efficient comparison, and a Vector Database (Chroma DB) to match similar knowledge documents. The bot uses Large Language Models (LLM) to process user inputs and generate accurate, contextual responses. Drawing from various sources such as textbooks, lecture slides, and online materials, the system provides personalized feedback, correct answers, and citations, ensuring an engaging and effective learning experience in network security.. The quiz includes multiple-choice questions, true/false questions, and open-ended questions. Finally, the bot will provide feedback on the user's answers if it is correct or not along with the reference source documentation title.

System Architecture

User Input:

The user enters a prompt in the user interface.

Embedding Generation:

The system translates the user query into a numerical embedding, encapsulating its semantic meaning.

Vector Database:

This embedding is then sent to a vector database, where it’s matched against precomputed embeddings of various documents. The database returns a list of documents most relevant to the user's query based on similarity.

Contextual Prompt Generation:

The system creates a new prompt by combining the user's input with relevant documents as context. This added context enhances the prompt, equipping the Language Model with additional information.

Local Language Model Processing (LLM):

The contextualized prompt is sent to the local Language Model, which generates a response that considers both the user query and relevant background documents.

User Interface Display:

The system displays the response, along with citations or references to the original context documents. This transparency allows users to verify the sources and builds credibility.

Video

QuizbotRecording.1.mp4

Prerequisite

Install python3
Create virtual environment
python3 -m venv venv
./venv/bin/activate - for MAC users
./venv/Scripts/activate - for WINDOWS users

Requirements

*pip install langchain==0.3.6
*pip install chromadb==0.5.11
*pip install llama_models==0.0.47
*pip install urllib3==2.2.3
*pip install pypdf==5.1.0
*pip install python-dotenv==1.0.1
*pip install requests==2.32.3
*pip install google-auth==2.35.0
*pip install nltk==3.9.1
*pip install pillow==10.4.0
*pip install ipython==8.28.0
*pip install PyPDF2==3.0.1
*pip install zipp==3.20.2
*pip install dataclasses-json==0.6.7

Step by step instructions for executions

Step 1: Clone the repo git clone https://github/com/nlyson/CS5342Project

Step 2: Create virtual environment python3 -m venv venv

Step 3: Install dependencies pip install -r requirements.txt

Step 4: Pull the model (make sure you have ollama running... Google 'how to run ollama locally' for help) ollama pull llama3.2

Step 5: Open VS Code code .

Step 6: Open the src/llama_train.ipynb notebook

Step 7: Run 'Dependencies' cell.

Step 8: Run 'Embed Documents' cell. (use vectorstore = Chroma.from_documents(persist_directory=<path_to_local_dir>, documents=splits, embedding=embeddings) if you want to save your embeddings locally.

Step 9: Run 'Initialize Llama locally' cell.

Step 10: Run 'Generate prompt with GUI using gradio' cell.

Step 11: Broswe to http://127.0.0.1:7860 in your browser.

Step 12: In the text box, enter one of the following:

Generate a multiple choice network security question and provide feedback to my answer.
Generate a true/false network security question and provide feedback to my answer.
Generate a short answer network security question and provide feedback to my answer.

Step 13: You can also be creative. Try telling the Chatbot to give you questions, grade them, and give you feedback.

Identifying issues and Implementing Solutions

Issue 1: We initially embedded and stored the preprocessed documents into a vector database. We used similar matching to retrieve the results, but the results didn’t come in question/answer format.

Solution 1: We augment the results with the llama3.2 model. We were able to augment results with generated content.

Issue 2: After giving questions and answers, the quizbot would not give citations.

Solution 2: We used metadata stored with the embeddings in the vector database. After performing a similar match and passing results to the LLM, we saved the metadata (document and page number) and appended the metadata to the results.

Issue 3: Overfitting in Quiz Logic: Overfitting can lead the quiz bot to perform well on familiar questions from its training data but may hinder its ability to handle new or varied quiz scenarios.

Solution 3: Improving the quiz bot’s generalization can be achieved by using a diverse and representative dataset, including a wide range of real-world question types, and regularly retraining the model with fresh data.

Features

The bot generates quizzes on both random and specific topics based on user preferences, using its trained datasets. It includes various question types, such as multiple-choice, true or false, and open-ended questions. After the quiz, the bot provides feedback to the user with a score. The bot functions as both a quiz bot and a chatbot, allowing for a versatile and interactive experience.

Describe training data and data formats

Our bot is trained using lecture slides and the Network Security slides(Network Security Essentials: Applications and Standards, Sixth Edition by William Stallings.

Suggestion:

To enhance the project's scalability, consider integrating real-time data updates for quizzes, ensuring that the bot stays current with new network security trends and topics. Additionally, implementing user performance tracking over time could provide valuable insights for personalized learning paths.

Feedback:

The proposal is clear and well-structured, with strong use of LLMs and vector databases. To improve, consider adding adaptive learning based on user performance. User testing during development will also help ensure the bot is effective and user-friendly.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
.gitignore		.gitignore
Flow_diagram.png		Flow_diagram.png
QuizbotRecording.mp4		QuizbotRecording.mp4
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to Network Security Bot

Project Description

System Architecture

User Input:

Embedding Generation:

Vector Database:

Contextual Prompt Generation:

Local Language Model Processing (LLM):

User Interface Display:

Video

Prerequisite

Requirements

Step by step instructions for executions

Identifying issues and Implementing Solutions

Features

Describe training data and data formats

Suggestion:

Feedback:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

spoorthy1-reddy/CS5342-project

Folders and files

Latest commit

History

Repository files navigation

Welcome to Network Security Bot

Project Description

System Architecture

User Input:

Embedding Generation:

Vector Database:

Contextual Prompt Generation:

Local Language Model Processing (LLM):

User Interface Display:

Video

Prerequisite

Requirements

Step by step instructions for executions

Identifying issues and Implementing Solutions

Features

Describe training data and data formats

Suggestion:

Feedback:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages