<a href="https://colab.research.google.com/github/muilyang12/database-design-class-quiz-generator/blob/main/Database_Design_Class_Quiz_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain langchain-community langchainhub langchain-chroma langchain-openai pypdf

In [20]:
from google.colab import userdata

In [21]:
import os
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')

llm = ChatOpenAI(model="gpt-4o-mini")

In [22]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

pdf_paths = [
    "/content/Database Design Chapter 01.pdf",
    "/content/Database Design Chapter 02.pdf",
    "/content/Database Design Chapter 03.pdf",
    "/content/Database Design Chapter 04.pdf",
]

loaders = [PyPDFLoader(path) for path in pdf_paths]

all_documents = []
for loader in loaders:
    all_documents.extend(loader.load())

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = text_splitter.split_documents(all_documents)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

In [24]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain import hub

system_prompt = (
    """
Your task is to create a well-crafted set of questions and answers for a test based on a specific topic. These questions will be used to assess students' understanding of the material handled in DB design class. The provided retrieved context contains the content covered in the class. Your questions should be derived from this material.

<Question Examples>
Q: Which of the following is the navigational data model?
1. Network Model
2. Object Oriented Model
3. ER Model
4. None of these


<Context>
{context}
"""
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "Databases and Database Users"})
print(response["answer"])

response = rag_chain.invoke({"input": "Please make questions about Database System Concepts and Architecture"})
print(response["answer"])

response = rag_chain.invoke({"input": "Please make questions about Entity-Relationship (ER) Model."})
print(response["answer"])

response = rag_chain.invoke({"input": "Please make questions about Enhanced Entity-Relationship (EER) Model."})
print(response["answer"])

Here is a set of questions and answers based on the topic of "Databases and Database Users" from your DB design class:

**Question 1:**
What are the two main categories of database users?
1. Actors on the Scene and Workers Behind the Scene
2. Database Administrators and End Users
3. Developers and Analysts
4. Data Scientists and Data Engineers

**Answer 1:**
1. Actors on the Scene and Workers Behind the Scene

---

**Question 2:**
Who belongs to the category of “Actors on the Scene” in the context of database users?
1. Database Administrators
2. Software Developers
3. Users who execute queries and control the database content
4. System Operators

**Answer 2:**
3. Users who execute queries and control the database content

---

**Question 3:**
Which of the following describes the role of “Workers Behind the Scene”?
1. They design and develop the applications that interact with the database.
2. They use the database to perform data analysis.
3. They maintain the hardware and software tha