Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

### Checked other resources #21403

Closed
4 tasks done
hubblehox opened this issue May 8, 2024 · 0 comments
Closed
4 tasks done

### Checked other resources #21403

hubblehox opened this issue May 8, 2024 · 0 comments
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🔌: openai Primarily related to OpenAI integrations Ɑ: parsing Related to output parser module

Comments

@hubblehox
Copy link

Checked other resources

  • I added a very descriptive title to this question.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

from langchain_community.document_loaders import PDFMinerLoader
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
import json

class MCQGenerator:
    def __init__(self, pdf_path, model_name, num_questions):
        self.loader = PDFMinerLoader(pdf_path)
        self.model_name = model_name
        self.num_questions=num_questions

    def load_and_clean_document(self):
        data = self.loader.load()
        docs = data[0].page_content
        cleaned_docs = [doc.replace('\n', ' ') for doc in docs]
        self.cleaned_docs = "".join(cleaned_docs)
        print("...........PDF data extracted...........")
        print(self.cleaned_docs)
        print("...........PDF data extracted...........")

    def create_mcq_model(self):
        class Mcq(BaseModel):
            strand: str
            sub_strand: str
            topic: str
            learning_objective_1: str
            learning_objective_2: str
            learning_objective_3: str
            question: str
            options_a: str
            options_b: str
            options_c: str
            options_d: str
            correct_answer: str
            answer_explanation: str
            blooms_taxonomy_level: str

        self.parser = JsonOutputParser(pydantic_object=Mcq)
        self.model = ChatOpenAI(model_name=self.model_name, temperature=0)

    def define_prompt_template(self):
        system_message = f""" I ll help you generate {self.num_questions} multiple-choice questions (MCQs) with specific criteria. Here’s the task breakdown for clarity:

                    1. Question Criteria:
                    i.	Each MCQ will have four options, including one correct answer. The options "None of the above" and "All of the above" are not to be used.
                    ii.	An explanation will be provided for why the selected answer is correct.

                    2. Content Requirements:
                    i.	The questions should assess a teacher's analytical, computational, and logical thinking skills alongside their knowledge. Each question must integrate these components.
                    ii.	The questions should be distinct and cover different concepts without repetition.

                    3. Learning Objectives:
                    i.	Each question will include multiple learning objectives derived from the question and its options.

                    4. Taxonomy Levels:
                    i.	Questions will be aligned with specific levels of Bloom's Taxonomy: Understand, Apply, and Analyze.


            The output must be formatted in JSON form as: strand, sub_strand, topic, learning_objective_1, learning_objective_2, learning_objective_3,
            question, option_a, option_b, option_c, option_d, correct_answer, answer_explanation, blooms_taxonomy_level

            """
         
        chat_template = ChatPromptTemplate.from_messages(
            [
                SystemMessage(content=system_message),
                HumanMessagePromptTemplate.from_template("You must Generate {num_questions} multiple-choice questions using {text} "),
            ]
        )
        self.chat_template = chat_template

    def generate_mcqs(self):
        chain = self.chat_template | self.model | self.parser
        print("..................Chain is Running...........")
        results = chain.invoke({"num_questions": self.num_questions,"text": self.cleaned_docs})
        return results

    def save_results_to_json(self, results, file_path):
        print("Json printing")
        json_string = json.dumps(results, skipkeys=True, allow_nan=True, indent=4)
        with open(file_path, "w") as outfile:
            outfile.write(json_string)

# Example usage
if __name__ == "__main__":
    pdf_path = "FDT_C1_M1_SU1.pdf"
    file_path = r'F:\Company_Data\15_teacher_tagging\Tagging\Json\lang_out_13.json'
    model_name="gpt-4-turbo-2024-04-09"
    num_questions = 13

    generator = MCQGenerator(pdf_path,model_name, num_questions)
    generator.load_and_clean_document()
    generator.create_mcq_model()
    generator.define_prompt_template()
    results = generator.generate_mcqs()
    generator.save_results_to_json(results, file_path)

Description

a) I want generate more than 20 MCQ from provided pdf
FDT_C1_M1_SU1.pdf

b) It able to generate 12 MCQ from pdf. But i want to genearte more than 25 MCQ.
lang_out_13.json

b) i have attached code for your references

System Info

pdfminer.six
langchain_community
langchain_openai
langchain_core
ipykernel
openpyxl

window system
python version = 3.11

aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
attrs==23.2.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
comm==0.2.2
cryptography==42.0.5
dataclasses-json==0.6.4
debugpy==1.8.1
decorator==5.1.1
distro==1.9.0
et-xmlfile==1.1.0
executing==2.0.1
frozenlist==1.4.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
idna==3.7
ipykernel==6.29.4
ipython==8.23.0
jedi==0.19.1
jsonpatch==1.33
jsonpointer==2.4
jupyter_client==8.6.1
jupyter_core==5.7.2
langchain-community==0.0.34
langchain-core==0.1.46
langchain-openai==0.1.3
langsmith==0.1.51
marshmallow==3.21.1
matplotlib-inline==0.1.7
multidict==6.0.5
mypy-extensions==1.0.0
nest-asyncio==1.6.0
numpy==1.26.4
openai==1.23.6
openpyxl==3.1.2
orjson==3.10.1
packaging==23.2
pandas==2.2.2
parso==0.8.4
pdfminer.six==20231228
platformdirs==4.2.1
prompt-toolkit==3.0.43
psutil==5.9.8
pure-eval==0.2.2
pycparser==2.22
pydantic==2.7.1
pydantic_core==2.18.2
Pygments==2.17.2
python-dateutil==2.9.0.post0
pytz==2024.1
pywin32==306
PyYAML==6.0.1
pyzmq==26.0.2
regex==2024.4.16
requests==2.31.0
six==1.16.0
sniffio==1.3.1
SQLAlchemy==2.0.29
stack-data==0.6.3
tenacity==8.2.3
tiktoken==0.6.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.3
typing-inspect==0.9.0
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
yarl==1.9.4

Originally posted by @Umeshbalande in #21013

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) Ɑ: parsing Related to output parser module 🔌: openai Primarily related to OpenAI integrations labels May 8, 2024
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 7, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 14, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🔌: openai Primarily related to OpenAI integrations Ɑ: parsing Related to output parser module
Projects
None yet
Development

No branches or pull requests

1 participant