
-----

# **`Interview Question Creater`**

----


### **Import Libraries**

In [83]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import TokenTextSplitter
from langchain.docstore.document import Document
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

### **Set Up OpenAI API Key**

In [54]:
load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")

### **Load the Data**

In [55]:
file_path = r"E:\Practice python\Generative AI Materials\Langchain\Interview_question_creater_web_app\data\SDG.pdf"

loader = PyPDFLoader(file_path)

data = loader.load()

In [56]:
data

[Document(metadata={'source': 'E:\\Practice python\\Generative AI Materials\\Langchain\\Interview_question_creater_web_app\\data\\SDG.pdf', 'page': 0}, page_content=''),
 Document(metadata={'source': 'E:\\Practice python\\Generative AI Materials\\Langchain\\Interview_question_creater_web_app\\data\\SDG.pdf', 'page': 1}, page_content=''),
 Document(metadata={'source': 'E:\\Practice python\\Generative AI Materials\\Langchain\\Interview_question_creater_web_app\\data\\SDG.pdf', 'page': 2}, page_content='IN THE YEAR 2015, LEADERS FROM 193 COUNTRIES OF THE WORLD \nCAME TOGETHER TO FACE THE FUTURE.\nAnd what they saw was daunting. Famines. Drought. Wars. Plagues. Poverty. \nNot just in some faraway place, but in their own cities and towns and villages.\nThey knew things didn’t have to be this way. They knew we had enough  \nfood to feed the world, but that it wasn’t getting shared. They knew there \nwere medicines for HIV and other diseases, but they cost a lot. They knew  \nthat earthquakes

In [57]:
# Let's check Length of data
len(data)

24

### **Concatenates the content from multiple pages into a single string**

In [58]:
# Initialize an empty string to hold the generated questions
question_gen = ""

# Iterate through each page in the data collection
for page in data:
    # Append the content of the current page to the question_gen string
    question_gen += page.page_content

# Output the final concatenated string of page contents
print(question_gen)

IN THE YEAR 2015, LEADERS FROM 193 COUNTRIES OF THE WORLD 
CAME TOGETHER TO FACE THE FUTURE.
And what they saw was daunting. Famines. Drought. Wars. Plagues. Poverty. 
Not just in some faraway place, but in their own cities and towns and villages.
They knew things didn’t have to be this way. They knew we had enough  
food to feed the world, but that it wasn’t getting shared. They knew there 
were medicines for HIV and other diseases, but they cost a lot. They knew  
that earthquakes and floods were inevitable, but that the high death  
tolls were not. 
They also knew that billions of people worldwide shared their hope for a 
better future.
So leaders from these countries created a plan called the Sustainable 
Development Goals (SDGs). This set of 17 goals imagines a future just 15 years 
off that would be rid of poverty and hunger, and safe from the worst effects of 
climate change. It’s an ambitious plan. 
But there’s ample evidence that we can succeed. In the past 15 years, the 
inte

### **Create Chunks of Data**

In [59]:
# Create an instance of TokenTextSplitter for generating question splits
splitter_ques_gen = TokenTextSplitter(
    model_name="gpt-3.5-turbo",  # Specify the model to use for tokenizing text
    chunk_size=10000,              # Set the maximum size of each text chunk to 1000 tokens
    chunk_overlap=200             # Allow an overlap of 200 tokens between consecutive chunks
)

In [60]:
# Use the splitter_ques_gen instance to split the concatenated question text into chunks
chunk_ques_gen = splitter_ques_gen.split_text(question_gen)

In [61]:
type(chunk_ques_gen) # we can see it's in list format we need to convert it into document format

list

In [62]:
# Create a list of Document objects, each initialized with a chunk of text
document_ques_gen = [Document(page_content=chunk) for chunk in chunk_ques_gen]

In [63]:
document_ques_gen

[Document(metadata={}, page_content='IN THE YEAR 2015, LEADERS FROM 193 COUNTRIES OF THE WORLD \nCAME TOGETHER TO FACE THE FUTURE.\nAnd what they saw was daunting. Famines. Drought. Wars. Plagues. Poverty. \nNot just in some faraway place, but in their own cities and towns and villages.\nThey knew things didn’t have to be this way. They knew we had enough  \nfood to feed the world, but that it wasn’t getting shared. They knew there \nwere medicines for HIV and other diseases, but they cost a lot. They knew  \nthat earthquakes and floods were inevitable, but that the high death  \ntolls were not. \nThey also knew that billions of people worldwide shared their hope for a \nbetter future.\nSo leaders from these countries created a plan called the Sustainable \nDevelopment Goals (SDGs). This set of 17 goals imagines a future just 15 years \noff that would be rid of poverty and hunger, and safe from the worst effects of \nclimate change. It’s an ambitious plan. \nBut there’s ample evidence 

In [64]:
type(document_ques_gen[0])

langchain_core.documents.base.Document

In [65]:
# Create an instance of TokenTextSplitter for generating answer splits
splitter_ans_gen = TokenTextSplitter(
    model_name="gpt-3.5-turbo",  # Specify the model to use for tokenizing text
    chunk_size=1000,              # Set the maximum size of each text chunk to 1000 tokens
    chunk_overlap=100             # Allow an overlap of 100 tokens between consecutive chunks
)

In [66]:
# Use the splitter_ans_gen instance to split the list of Document objects into smaller segments
document_ans_gen = splitter_ans_gen.split_documents(document_ques_gen)

In [67]:
document_ans_gen

[Document(metadata={}, page_content='IN THE YEAR 2015, LEADERS FROM 193 COUNTRIES OF THE WORLD \nCAME TOGETHER TO FACE THE FUTURE.\nAnd what they saw was daunting. Famines. Drought. Wars. Plagues. Poverty. \nNot just in some faraway place, but in their own cities and towns and villages.\nThey knew things didn’t have to be this way. They knew we had enough  \nfood to feed the world, but that it wasn’t getting shared. They knew there \nwere medicines for HIV and other diseases, but they cost a lot. They knew  \nthat earthquakes and floods were inevitable, but that the high death  \ntolls were not. \nThey also knew that billions of people worldwide shared their hope for a \nbetter future.\nSo leaders from these countries created a plan called the Sustainable \nDevelopment Goals (SDGs). This set of 17 goals imagines a future just 15 years \noff that would be rid of poverty and hunger, and safe from the worst effects of \nclimate change. It’s an ambitious plan. \nBut there’s ample evidence 

### **Define LLM**

In [68]:
# Create an instance of ChatOpenAI for question generation
llm_ques_gen_pipeline = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # Specify the model to use for generating questions
    temperature=0.3,              # Set the randomness of the output; lower values make the output more deterministic
)

### **Define Prompt Template**

In [69]:
# Define a prompt template for generating questions based on coding materials
prompt_template = '''You are an expert at creating questions based on coding materials and documentation. 
Your goal is to prepare a coder or programmer for their exam and coding interview by asking questions about the text below:

------------------------------------------------------------------------------------------------
{text}  
------------------------------------------------------------------------------------------------

Create questions that will prepare the coders or programmers for their exam, ensuring they do not lose any important information.
Questions:
'''

In [70]:
# Create an instance of PromptTemplate for generating questions using the defined prompt_template
PROMPT_QUESTION = PromptTemplate(
    template=prompt_template,        # Use the previously defined prompt template
    input_variables=["text"],        # Specify the input variable that will be replaced in the template
)

In [71]:
# Define a template for refining practice questions based on coding materials and documentation
refine_template = ("""
You are an expert at creating practice questions based on coding material and documentation.
Your goal is to help a coder or programmer prepare for a coding test.
We have received some practice questions to a certain extent: {existing_answer}.  # Placeholder for existing questions
We have the option to refine the existing questions or add new ones.  # Indicate that refinement or addition is possible
(only if necessary) with some more context below.  # Clarify that additional context may be provided if needed
------------
{text}  # Placeholder for the additional context to refine the questions
------------

Given the new context, refine the original questions in English.  # Instruction to refine questions based on the provided context
If the context is not helpful, please provide the original questions.  # Instruction to fallback on original questions if context is inadequate
QUESTIONS:
"""
)

In [72]:
# Create an instance of PromptTemplate for refining practice questions using the defined refine_template
REFINE_PROMPT_QUESTION = PromptTemplate(
    input_variables=["existing_answer", "text"],  # Specify the input variables to be used in the template
    template=refine_template,                       # Use the previously defined refine template for question refinement
)

In [73]:
# Load a summarization chain for generating questions and refining them
ques_gen_chain = load_summarize_chain(
    llm=llm_ques_gen_pipeline,            # Specify the language model pipeline for question generation
    chain_type="refine",                  # Set the type of chain to "refine" for improving existing questions
    verbose=True,                         # Enable verbose output for detailed logging during execution
    question_prompt=PROMPT_QUESTION,      # Use the defined prompt template for generating questions
    refine_prompt=REFINE_PROMPT_QUESTION   # Use the defined prompt template for refining questions
)

### **Generate Questions**

In [75]:
# Execute the question generation and refinement process on the list of documents
ques = ques_gen_chain.run(document_ques_gen)

# Print the generated questions
print(ques)

  ques = ques_gen_chain.run(document_ques_gen)




[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are an expert at creating questions based on coding materials and documentation. 
Your goal is to prepare a coder or programmer for their exam and coding interview by asking questions about the text below:

------------------------------------------------------------------------------------------------
IN THE YEAR 2015, LEADERS FROM 193 COUNTRIES OF THE WORLD 
CAME TOGETHER TO FACE THE FUTURE.
And what they saw was daunting. Famines. Drought. Wars. Plagues. Poverty. 
Not just in some faraway place, but in their own cities and towns and villages.
They knew things didn’t have to be this way. They knew we had enough  
food to feed the world, but that it wasn’t getting shared. They knew there 
were medicines for HIV and other diseases, but they cost a lot. They knew  
that earthquakes and floods were inevitable, but that the high death  
tolls were not

### **Load OpenAI model to Create Embeddings**

In [77]:
embeddings = OpenAIEmbeddings()

### **Set Up FAISS Vector Database**

- Vectors will be stored in memory.

In [79]:
# Create a FAISS vector store from the list of document answers using specified embeddings
vector_store = FAISS.from_documents(document_ans_gen, embeddings)

### **Set Up OpenAI Model for Generating Answer**

In [80]:
# Create an instance of ChatOpenAI for generating answers with specific parameters
llm_answer_gen = ChatOpenAI(
    temperature=0.1,                # Set the randomness of the output; lower values make the output more deterministic
    model="gpt-3.5-turbo"           # Specify the model to use for generating answers
)

In [82]:
question_list = ques.split("\n") # Questions are separated by new lines
question_list

['1. What is the name of the plan created by leaders from 193 countries in 2015 to address global challenges such as poverty, hunger, and climate change?',
 '2. How many Sustainable Development Goals (SDGs) are outlined in the plan?',
 '3. What is the goal of the SDG related to ending extreme poverty by 2030?',
 '4. How has hunger decreased globally in the past 20 years, and what actions can be taken to further reduce hunger and malnutrition?',
 '5. What progress has been made in achieving gender equality and empowering women and girls, and what are the remaining challenges?',
 '6. Why is it important to ensure access to clean drinking water and sanitation for all by 2030?',
 '7. How has access to electricity improved globally between 1990 and 2010, and what are the challenges associated with meeting energy needs sustainably?',
 '8. What are some key strategies outlined in the SDGs to promote economic growth, full employment, and decent work for all?',
 '9. How can technological progre

### **Define Retrival Q/A Chian**

In [84]:
# Create a RetrievalQA chain for generating answers using a specified language model and retrieval method
answer_generation_chain = RetrievalQA.from_chain_type(
    llm=llm_answer_gen,                       # Specify the language model to use for generating answers
    chain_type="stuff",                       # Set the type of chain to "stuff" for simple retrieval
    retriever=vector_store.as_retriever()     # Use the FAISS vector store as the retriever for fetching relevant documents
)

### **Answer each question from the list and save the results to a file**

In [85]:
for question in question_list:
    print("Question: ", question)  # Print the current question to the console
    answer = answer_generation_chain.run(question)  # Generate an answer for the current question
    print("Answer: ", answer)  # Print the generated answer to the console
    print("--------------------------------------------------\\n\\n")  # Print a separator for readability

    # Save the question and answer to a file
    with open("answers.txt", "a") as f:  # Open the file in append mode
        f.write("Question: " + question + "\\n")  # Write the question to the file
        f.write("Answer: " + answer + "\\n")  # Write the answer to the file
        f.write("--------------------------------------------------\\n\\n")  # Write a separator in the file

Question:  1. What is the name of the plan created by leaders from 193 countries in 2015 to address global challenges such as poverty, hunger, and climate change?
Answer:  The plan created by leaders from 193 countries in 2015 to address global challenges such as poverty, hunger, and climate change is called the Sustainable Development Goals (SDGs).
--------------------------------------------------\n\n
Question:  2. How many Sustainable Development Goals (SDGs) are outlined in the plan?
Answer:  There are 17 Sustainable Development Goals (SDGs) outlined in the plan.
--------------------------------------------------\n\n
Question:  3. What is the goal of the SDG related to ending extreme poverty by 2030?
Answer:  The goal related to ending extreme poverty by 2030 is to "End extreme poverty in all forms by 2030."
--------------------------------------------------\n\n
Question:  4. How has hunger decreased globally in the past 20 years, and what actions can be taken to further reduce hun

----

**The End**

-------