This project develops an interactive quiz bot aimed at enhancing learning in network security. It features a user- friendly interface for seamless interaction, an Embedding Model to convert text into vectors for efficient comparison, and a Vector Database (Chroma DB) to match similar knowledge documents. The bot uses Large Language Models (LLM) to process user inputs and generate accurate, contextual responses. Drawing from various sources such as textbooks, lecture slides, and online materials, the system provides personalized feedback, correct answers, and citations, ensuring an engaging and effective learning experience in network security.. The quiz includes multiple-choice questions, true/false questions, and open-ended questions. Finally, the bot will provide feedback on the user's answers if it is correct or not along with the reference source documentation title.
The user enters a prompt in the user interface. The system translates the user query into a numerical embedding, encapsulating its semantic meaning. This embedding is then sent to a vector database, where it’s matched against precomputed embeddings of various documents. The database returns a list of documents most relevant to the user's query based on similarity.
The system creates a new prompt by combining the user's input with relevant documents as context. This added context enhances the prompt, equipping the Language Model with additional information. The contextualized prompt is sent to the local Language Model, which generates a response that considers both the user query and relevant background documents. The system displays the response, along with citations or references to the original context documents. This transparency allows users to verify the sources and builds credibility.
QuizbotRecording.1.mp4
Create virtual environment
python3 -m venv venv
./venv/bin/activate - for MAC users
./venv/Scripts/activate - for WINDOWS users
*pip install langchain==0.3.6
*pip install chromadb==0.5.11
*pip install llama_models==0.0.47
*pip install urllib3==2.2.3
*pip install pypdf==5.1.0
*pip install python-dotenv==1.0.1
*pip install requests==2.32.3
*pip install google-auth==2.35.0
*pip install nltk==3.9.1
*pip install pillow==10.4.0
*pip install ipython==8.28.0
*pip install PyPDF2==3.0.1
*pip install zipp==3.20.2
*pip install dataclasses-json==0.6.7
Step 1: Clone the repo git clone https://github/com/nlyson/CS5342Project
Step 2: Create virtual environment python3 -m venv venv
Step 3: Install dependencies pip install -r requirements.txt
Step 4: Pull the model (make sure you have ollama running... Google 'how to run ollama locally' for help) ollama pull llama3.2
Step 5: Open VS Code code .
Step 6: Open the src/llama_train.ipynb notebook
Step 7: Run 'Dependencies' cell.
Step 8: Run 'Embed Documents' cell. (use vectorstore = Chroma.from_documents(persist_directory=<path_to_local_dir>, documents=splits, embedding=embeddings) if you want to save your embeddings locally.
Step 9: Run 'Initialize Llama locally' cell.
Step 10: Run 'Generate prompt with GUI using gradio' cell.
Step 11: Broswe to http://127.0.0.1:7860 in your browser.
Step 12: In the text box, enter one of the following:
- Generate a multiple choice network security question and provide feedback to my answer.
- Generate a true/false network security question and provide feedback to my answer.
- Generate a short answer network security question and provide feedback to my answer.
Step 13: You can also be creative. Try telling the Chatbot to give you questions, grade them, and give you feedback.
Issue 1: We initially embedded and stored the preprocessed documents into a vector database. We used similar matching to retrieve the results, but the results didn’t come in question/answer format.
Solution 1: We augment the results with the llama3.2 model. We were able to augment results with generated content.
Issue 2: After giving questions and answers, the quizbot would not give citations.
Solution 2: We used metadata stored with the embeddings in the vector database. After performing a similar match and passing results to the LLM, we saved the metadata (document and page number) and appended the metadata to the results.
Issue 3: Overfitting in Quiz Logic: Overfitting can lead the quiz bot to perform well on familiar questions from its training data but may hinder its ability to handle new or varied quiz scenarios.
Solution 3: Improving the quiz bot’s generalization can be achieved by using a diverse and representative dataset, including a wide range of real-world question types, and regularly retraining the model with fresh data.
The bot generates quizzes on both random and specific topics based on user preferences, using its trained datasets. It includes various question types, such as multiple-choice, true or false, and open-ended questions. After the quiz, the bot provides feedback to the user with a score. The bot functions as both a quiz bot and a chatbot, allowing for a versatile and interactive experience.
Our bot is trained using lecture slides and the Network Security slides(Network Security Essentials: Applications and Standards, Sixth Edition by William Stallings.
To enhance the project's scalability, consider integrating real-time data updates for quizzes, ensuring that the bot stays current with new network security trends and topics. Additionally, implementing user performance tracking over time could provide valuable insights for personalized learning paths.
The proposal is clear and well-structured, with strong use of LLMs and vector databases. To improve, consider adding adaptive learning based on user performance. User testing during development will also help ensure the bot is effective and user-friendly.
