### Checked other resources #21403

hubblehox · 2024-05-08T04:56:57Z

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_community.document_loaders import PDFMinerLoader
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
import json

class MCQGenerator:
    def __init__(self, pdf_path, model_name, num_questions):
        self.loader = PDFMinerLoader(pdf_path)
        self.model_name = model_name
        self.num_questions=num_questions

    def load_and_clean_document(self):
        data = self.loader.load()
        docs = data[0].page_content
        cleaned_docs = [doc.replace('\n', ' ') for doc in docs]
        self.cleaned_docs = "".join(cleaned_docs)
        print("...........PDF data extracted...........")
        print(self.cleaned_docs)
        print("...........PDF data extracted...........")

    def create_mcq_model(self):
        class Mcq(BaseModel):
            strand: str
            sub_strand: str
            topic: str
            learning_objective_1: str
            learning_objective_2: str
            learning_objective_3: str
            question: str
            options_a: str
            options_b: str
            options_c: str
            options_d: str
            correct_answer: str
            answer_explanation: str
            blooms_taxonomy_level: str

        self.parser = JsonOutputParser(pydantic_object=Mcq)
        self.model = ChatOpenAI(model_name=self.model_name, temperature=0)

    def define_prompt_template(self):
        system_message = f""" I ll help you generate {self.num_questions} multiple-choice questions (MCQs) with specific criteria. Here’s the task breakdown for clarity:

                    1. Question Criteria:
                    i.	Each MCQ will have four options, including one correct answer. The options "None of the above" and "All of the above" are not to be used.
                    ii.	An explanation will be provided for why the selected answer is correct.

                    2. Content Requirements:
                    i.	The questions should assess a teacher's analytical, computational, and logical thinking skills alongside their knowledge. Each question must integrate these components.
                    ii.	The questions should be distinct and cover different concepts without repetition.

                    3. Learning Objectives:
                    i.	Each question will include multiple learning objectives derived from the question and its options.

                    4. Taxonomy Levels:
                    i.	Questions will be aligned with specific levels of Bloom's Taxonomy: Understand, Apply, and Analyze.


            The output must be formatted in JSON form as: strand, sub_strand, topic, learning_objective_1, learning_objective_2, learning_objective_3,
            question, option_a, option_b, option_c, option_d, correct_answer, answer_explanation, blooms_taxonomy_level

            """
         
        chat_template = ChatPromptTemplate.from_messages(
            [
                SystemMessage(content=system_message),
                HumanMessagePromptTemplate.from_template("You must Generate {num_questions} multiple-choice questions using {text} "),
            ]
        )
        self.chat_template = chat_template

    def generate_mcqs(self):
        chain = self.chat_template | self.model | self.parser
        print("..................Chain is Running...........")
        results = chain.invoke({"num_questions": self.num_questions,"text": self.cleaned_docs})
        return results

    def save_results_to_json(self, results, file_path):
        print("Json printing")
        json_string = json.dumps(results, skipkeys=True, allow_nan=True, indent=4)
        with open(file_path, "w") as outfile:
            outfile.write(json_string)

# Example usage
if __name__ == "__main__":
    pdf_path = "FDT_C1_M1_SU1.pdf"
    file_path = r'F:\Company_Data\15_teacher_tagging\Tagging\Json\lang_out_13.json'
    model_name="gpt-4-turbo-2024-04-09"
    num_questions = 13

    generator = MCQGenerator(pdf_path,model_name, num_questions)
    generator.load_and_clean_document()
    generator.create_mcq_model()
    generator.define_prompt_template()
    results = generator.generate_mcqs()
    generator.save_results_to_json(results, file_path)

Description

a) I want generate more than 20 MCQ from provided pdf
FDT_C1_M1_SU1.pdf

b) It able to generate 12 MCQ from pdf. But i want to genearte more than 25 MCQ.
lang_out_13.json

b) i have attached code for your references

System Info

pdfminer.six
langchain_community
langchain_openai
langchain_core
ipykernel
openpyxl

window system
python version = 3.11

aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
attrs==23.2.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
comm==0.2.2
cryptography==42.0.5
dataclasses-json==0.6.4
debugpy==1.8.1
decorator==5.1.1
distro==1.9.0
et-xmlfile==1.1.0
executing==2.0.1
frozenlist==1.4.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
ipykernel==6.29.4
ipython==8.23.0
jedi==0.19.1
jsonpatch==1.33
jsonpointer==2.4
jupyter_client==8.6.1
jupyter_core==5.7.2
langchain-community==0.0.34
langchain-core==0.1.46
langchain-openai==0.1.3
langsmith==0.1.51
marshmallow==3.21.1
matplotlib-inline==0.1.7
multidict==6.0.5
mypy-extensions==1.0.0
nest-asyncio==1.6.0
numpy==1.26.4
openai==1.23.6
openpyxl==3.1.2
orjson==3.10.1
packaging==23.2
pandas==2.2.2
parso==0.8.4
pdfminer.six==20231228
platformdirs==4.2.1
prompt-toolkit==3.0.43
psutil==5.9.8
pure-eval==0.2.2
pycparser==2.22
pydantic==2.7.1
pydantic_core==2.18.2
Pygments==2.17.2
python-dateutil==2.9.0.post0
pytz==2024.1
pywin32==306
PyYAML==6.0.1
pyzmq==26.0.2
regex==2024.4.16
requests==2.31.0
six==1.16.0
sniffio==1.3.1
SQLAlchemy==2.0.29
stack-data==0.6.3
tenacity==8.2.3
tiktoken==0.6.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.3
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
yarl==1.9.4

Originally posted by @Umeshbalande in #21013

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) Ɑ: parsing Related to output parser module 🔌: openai Primarily related to OpenAI integrations labels May 8, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 7, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 14, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

### Checked other resources #21403

### Checked other resources #21403

hubblehox commented May 8, 2024

### Checked other resources #21403

### Checked other resources #21403

Comments

hubblehox commented May 8, 2024