<img src="https://github.com/Shubhwithai/GRE_Geometry_quiz/blob/main/Group%2042.png?raw=true" width="" height="50">

Educhain is a powerful Python package that leverages Generative AI to create
engaging and personalized educational content. From generating multiple-choice questions to crafting comprehensive lesson plans, Educhain makes it easy to apply AI in various educational scenarios.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JjMkDqxsi9lfdEn_Rptn3NTT45z6mTu1?usp=sharing)




## **Bulk Generation Questions using Educhain**  

Bulk Generation Questions using Educhain is a powerful feature that allows educators to quickly create large sets of high-quality questions for exams, quizzes, and practice sessions. With automated generation based on subject, topic, and difficulty level, it helps streamline content creation, saving time and ensuring comprehensive coverage of learning objectives.

###**Setup and Installation**

In [None]:
!pip install educhain langchain_anthropic

###**Setup API Keys**

In [2]:
from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")
os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

###**Generate Bulk Questions with Educhain**

In [7]:
from langchain_openai import ChatOpenAI
from educhain import Educhain, LLMConfig
import json
import os


openai = ChatOpenAI(model="gpt-4o") #For Best Quality Use gpt-4o Model

openai_config = LLMConfig(custom_model=openai)

client = Educhain(openai_config)


# Define your Topics Data in Json Format
example_topics_data = [
    {
        "topic": "Mathematics",
        "subtopics": [
            {
                "name": "Fractions",
                "learning_objectives": [
                    "Convert proper fractions to improper fractions and mixed numbers",
                    "Add and subtract fractions with like denominators",
                    "Find equivalent fractions using multiplication and division"
                ]
            },
        ]
    }
]

topic_json_path = "topics.json"

# Save example data to file
with open(topic_json_path, 'w') as f:
  json.dump(example_topics_data, f, indent=4)

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        # total_questions=10,
        questions_per_objective=10,
        max_workers=5,
        output_format="json",
        max_retries=2,
        difficulty="medium"
    )

print(f"\nGeneration completed!")
print(result.json)

Created CSV file for continuous saving: questions_20250416_113815.csv


Generating Multiple Choice questions: 100%|██████████| 3/3 [00:27<00:00,  9.30s/it]

Questions saved to JSON: questions_20250416_113815.json

--- Generation Summary ---
Total Learning Objectives: 3
Target Total Questions: 30
Base Questions per Objective: 10 (plus 0 objectives with +1)
Total Questions Generated: 30
Duplicate Questions Detected: 0
Failed Batches: 0
Partial Success Batches: 0
Average Questions per Successful Batch: 10.00
Questions continuously saved to: questions_20250416_113815.csv

Generation completed!
<bound method BaseModel.json of BulkMCQList(questions=[BulkMCQ(question='What is the value of the expression 5 + 3 × 2?', options=[Option(text='11', correct='true'), Option(text='16', correct='false'), Option(text='13', correct='false'), Option(text='10', correct='false')], explanation='According to the order of operations (PEMDAS/BODMAS), multiplication comes before addition. Thus, 3 × 2 = 6, and then 5 + 6 = 11.', difficulty='easy', metadata={'topic': 'Arithmetic', 'subtopic': 'Order of Operations', 'learning_objective': 'Understand and apply the order




###**Generate Bulk Questions With Diffrent Question Types**



- ✅ Multiple Choice

- ✅ Fill in the blanks

- ✅ Short Answer

- ✅ True/False Questions


In [None]:
import json
import os
from educhain import Educhain

client = Educhain()

example_topics_data = [
      {
        "topic": "Indian History",
        "subtopics": [
            {
                "name": "Ancient India",
                "learning_objectives": [
                    "Understand the Indus Valley Civilization.",
                    "Learn about the Vedic Period.",
                    "Study the Mauryan Empire and its administration.",
                    "Know about the Gupta Empire and its contributions."
                ]
            },
            {
                "name": "Modern India",
                "learning_objectives": [
                    "Study the arrival of Europeans in India.",
                    "Learn about the British Raj and its policies.",
                    "Understand the Indian National Movement.",
                    "Know about the Indian Independence and Partition."
                ]
            }
        ]
    },
]

topic_json_path = "topics.json"

# Save example data to file
with open(topic_json_path, 'w') as f:
  json.dump(example_topics_data, f, indent=4)

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        questions_per_objective=10,
        max_workers=5,
        output_format="pdf",
        max_retries=2,
        difficulty="medium",
        batch_size=5,
        question_type="True/False", # #supported types : "Multiple Choice", "Short Answer", "True/False", "Fill in the Blank"
)
print(f"\nGeneration completed!")
result.dict()

###**Generate Bulk Questions Using Json File Input**


In [None]:
from langchain_openai import ChatOpenAI
from educhain import Educhain, LLMConfig
import json
import os

client = Educhain()

topic_json_path = "/content/topics_Neet.json" # Enter Your File Path

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        questions_per_objective=5,
        max_workers=5,
        output_format="json", ## Supoorted Format CSV,PDF
        max_retries=2,
        difficulty="medium"
    )

print(f"\nGeneration completed!")

ValueError: Topic must be a path to a JSON file with the required structure.

##**Generate Bulk Questions with Diffrent Models**


####Educhain Model Configuration

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from educhain import Educhain, LLMConfig
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from google.colab import userdata


gemini_flash = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=userdata.get("GOOGLE_API_KEY")
    )


llama3_groq = ChatOpenAI(
    model="deepseek-r1-distill-llama-70b",
    openai_api_base="https://api.groq.com/openai/v1",
    openai_api_key=userdata.get("GROQ_API_KEY")
)


claude = ChatAnthropic(model='claude-3-5-sonnet-20240620')

###Bulk Question Generation Using Gemini

In [None]:
import json
import os
from educhain import Educhain

Gemini_config = LLMConfig(custom_model=gemini_flash) ##Config Gemini Model Using Educhain

client = Educhain(Gemini_config)

example_topics_data = [
      {
        "topic": "Indian History",
        "subtopics": [
            {
                "name": "Ancient India",
                "learning_objectives": [
                    "Understand the Indus Valley Civilization.",
                    "Learn about the Vedic Period.",
                    "Study the Mauryan Empire and its administration.",
                    "Know about the Gupta Empire and its contributions."
                ]
            },
            {
                "name": "Modern India",
                "learning_objectives": [
                    "Study the arrival of Europeans in India.",
                    "Learn about the British Raj and its policies.",
                    "Understand the Indian National Movement.",
                    "Know about the Indian Independence and Partition."
                ]
            }
        ]
    },
]

topic_json_path = "topics.json"

# Save example data to file
with open(topic_json_path, 'w') as f:
  json.dump(example_topics_data, f, indent=4)

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        total_questions=10,
        # questions_per_objective=10,
        max_workers=5,
        output_format="json",
        max_retries=2,
        custom_instructions="Generate clear, grade-appropriate multiple choice questions",
        difficulty="medium"
    )

print(f"\nGeneration completed!")
result.dict()

###**Bulk Question Generation Using Claude With Custum Response Model & Prompt Template**


In [None]:
from pydantic import BaseModel, Field
from typing import List, Dict, Any
import json
import os
from pathlib import Path
from dotenv import load_dotenv
from educhain import Educhain
from google.colab import userdata

Claude_config = LLMConfig(custom_model=claude)

client = Educhain(Gemini_config)

# Define custom models
class Option(BaseModel):
    text: str = Field(description="The text of the option")
    correct: bool = Field(description="Whether this option is correct")

class DataSufficiencyQuestion(BaseModel):
    question_text: str = Field(description="The text of the question")
    statement_1: str = Field(description="Statement 1")
    statement_2: str = Field(description="Statement 2")
    options: List[Option] = Field(description="List of options for the question")
    explanation: str = Field(description="Explanation of the correct answer")
    metadata: Dict[str, Any] = Field(description="Additional metadata including section, subsection, topic, and subtopic.")
    difficulty_level: str = Field(description="Difficulty level of the question (e.g., Easy, Medium, Hard)")
    difficulty_rating: float = Field(description="Difficulty rating of the question (e.g., 3.5/5)")
    estimated_time: int = Field(description="Estimated time to solve the question in seconds")

class DataSufficiencyQuestionList(BaseModel):
    questions: List[DataSufficiencyQuestion] = Field(description="List of Data Sufficiency questions")

# GMAT Data Sufficiency Prompt Template
GMAT_DATA_SUFFICIENCY_PROMPT_TEMPLATE = """
Generate {num} GMAT-style Data Sufficiency questions following these specifications:

Section: Data Insights
Subsection: Data Sufficiency
Topic: Arithmetic DS
Subtopic: {subtopic}
Difficulty: {difficulty_level} (Easy, Medium, Hard)

**Learning Objectives:**
{learning_objective}

**Question Format Requirements:**
1. Ensure the question is clear and concise.
2. Include all necessary information for solving the problem.
3. Provide two statements (Statement 1 and Statement 2) to evaluate sufficiency.
4. Ensure the difficulty level matches the specified value (Easy, Medium, Hard).
5. Provide a difficulty rating (e.g., 3.5/5) based on the complexity of the question.
6. Provide an estimated time to solve the question in seconds:
   - Easy: 60-90 seconds
   - Medium: 120-150 seconds
   - Hard: 180-210 seconds
7. Ensure that no two questions follow the same pattern or structure.
8. Questions should mimic the style and complexity of real GMAT questions.

**Important Notes:**
- Avoid generating variations of the same question (e.g., changing only numbers or variables).
- Ensure diversity in question patterns by varying the operations, contexts, and problem structures.
- For **Easy** questions, focus on basic concepts and straightforward calculations.
- For **Medium** questions, include multi-step problems or require application of concepts.
- For **Hard** questions, incorporate complex problem-solving, abstract reasoning, or real-world scenarios.

The response MUST return a list of questions in the following JSON format:
{{
  "questions": [
    {{
      "question_text": "Clear question statement",
      "statement_1": "First statement",
      "statement_2": "Second statement",
      "options": [
        {{
          "text": "Statement (1) ALONE is sufficient, but statement (2) alone is not sufficient.",
          "correct": true/false
        }},
        {{
          "text": "Statement (2) ALONE is sufficient, but statement (1) alone is not sufficient.",
          "correct": true/false
        }},
        {{
          "text": "BOTH statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient.",
          "correct": true/false
        }},
        {{
          "text": "EACH statement ALONE is sufficient.",
          "correct": true/false
        }},
        {{
          "text": "Statements (1) and (2) TOGETHER are NOT sufficient.",
          "correct": true/false
        }}
      ],
      "explanation": "Detailed explanation of why the selected option is correct",
      "metadata": {{
        "section": "Data Insights",
        "subsection": "Data Sufficiency",
        "topic": "Arithmetic DS",
        "subtopic": "{subtopic}"
      }},
      "difficulty_level": "Easy/Medium/Hard",
      "difficulty_rating": 3.5,
      "estimated_time": 90
    }}
  ]
}}
"""

# Example topic structure for GMAT Data Sufficiency
topics_data = [
    {
        "topic": "Arithmetic DS",
        "subtopics": [
            {
                "name": "Fractions (DS)",
                "learning_objectives": [
                    "Understand how to interpret and manipulate fractions in data sufficiency problems.",
                    "Solve GMAT-style data sufficiency problems involving fraction operations.",
                    "Analyze data sufficiency questions to determine if the given information is sufficient to solve problems involving fractions."
                ]
            },
            {
                "name": "Decimals (DS)",
                "learning_objectives": [
                    "Understand how to interpret and manipulate decimals in data sufficiency problems.",
                    "Solve GMAT-style data sufficiency problems involving decimal operations.",
                    "Analyze data sufficiency questions to determine if the given information is sufficient to solve problems involving decimals."
                ]
            }
        ]
    }
]

client = Educhain()

# Save topics data to a JSON file
topics_file = "gmat_ds_topics.json"
with open(topics_file, "w") as f:
    json.dump(topics_data, f, indent=2)

# Generate questions using bulk generation
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
    topic=topics_file,
    # total_questions=30,
    questions_per_objective=10,
    max_workers=5,
    output_format="json",
    max_retries=3,
    prompt_template=GMAT_DATA_SUFFICIENCY_PROMPT_TEMPLATE,
    question_model=DataSufficiencyQuestion,
    question_list_model=DataSufficiencyQuestionList,
    difficulty_level="Medium"
)
print(f"\nGeneration completed!")
result.model_dump()

Generating questions: 100%|██████████| 6/6 [01:34<00:00, 15.82s/it]

Questions saved to: questions_20250325_144018.json

--- Generation Summary ---
Total Learning Objectives: 6
Target Total Questions: 60
Base Questions per Objective: 10 (plus 0 objectives with +1)
Total Questions Generated: 60
Failed Batches: 0
Partial Success Batches: 0
Average Questions per Successful Batch: 10.00

Generation completed!





{'questions': [{'question_text': 'What is the value of x if x is a decimal number?',
   'statement_1': 'x is greater than 0.5.',
   'statement_2': 'x is less than 1.0.',
   'options': [{'text': 'Statement (1) ALONE is sufficient, but statement (2) alone is not sufficient.',
     'correct': False},
    {'text': 'Statement (2) ALONE is sufficient, but statement (1) alone is not sufficient.',
     'correct': False},
    {'text': 'BOTH statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient.',
     'correct': True},
    {'text': 'EACH statement ALONE is sufficient.', 'correct': False},
    {'text': 'Statements (1) and (2) TOGETHER are NOT sufficient.',
     'correct': False}],
   'explanation': 'Both statements together indicate that x is between 0.5 and 1.0, which does not determine a unique value for x.',
   'metadata': {'section': 'Data Insights',
    'subsection': 'Data Sufficiency',
    'topic': 'Arithmetic DS',
    'subtopic': 'Decimals (DS)'},
   'difficulty_le

###Bulk question Generation with CSV Format

In [None]:
from langchain_openai import ChatOpenAI
from educhain import Educhain, LLMConfig
import json
import os

client = Educhain()

example_topics_data = [
     {
        "topic": "Chemistry",
        "subtopics": [
            {
                "name": "Organic Chemistry",
                "learning_objectives": [
                    "Understand reaction mechanisms.",
                    "Analyze different types of organic reactions.",
                    "Identify and name organic compounds.",
                    "Understand stereochemistry."
                ]
            },
            {
                "name": "Physical Chemistry",
                "learning_objectives": [
                    "Understand chemical kinetics and thermodynamics.",
                    "Apply the concepts of electrochemistry.",
                    "Solve problems related to chemical equilibrium.",
                    "Understand surface chemistry."
                ]
            }
        ]
    }
]

topic_json_path = "topics.json"


with open(topics_file, "w") as f:
    json.dump(topics_data, f, indent=2)

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        # total_questions=10,
        questions_per_objective=5,
        max_workers=5,
        output_format="csv", ## You can Define as PDF Format Also
        max_retries=2,
        custom_instructions="Generate clear, grade-appropriate multiple choice questions",
        difficulty="medium"
)

print(f"\nGeneration completed!")

Generating questions: 100%|██████████| 8/8 [00:29<00:00,  3.75s/it]

Questions saved to: questions_20250325_144048.csv

--- Generation Summary ---
Total Learning Objectives: 8
Target Total Questions: 40
Base Questions per Objective: 5 (plus 0 objectives with +1)
Total Questions Generated: 40
Failed Batches: 0
Partial Success Batches: 0
Average Questions per Successful Batch: 5.00

Generation completed!





###Read Csv File From Colab Files

In [None]:
import pandas as pd

# Path to the CSV file
csv_file_path = "/content/questions_20250311_083344.csv"  # Replace with your CSV file path

# Read the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

# Display the first few rows of the DataFrame
print("\nFirst 5 Rows:")
print(df.head())

# Display basic statistics (for numeric columns)
print("\nBasic Statistics:")
print(df.describe())

###Bulk question Generation with PDF Format

In [None]:
from langchain_openai import ChatOpenAI
from educhain import Educhain, LLMConfig
import json
import os

client = Educhain()

example_topics_data = [
  {
        "topic": "Mathematics",
        "subtopics": [
            {
                "name": "Calculus",
                "learning_objectives": [
                    "Understand the concepts of limits and continuity.",
                    "Apply differentiation techniques to various functions.",
                    "Apply integration techniques to solve definite and indefinite integrals.",
                    "Solve differential equations."
                ]
            },
            {
                "name": "Coordinate Geometry",
                "learning_objectives": [
                    "Understand the equations of lines, circles, parabolas, ellipses, and hyperbolas.",
                    "Find the intersection of lines and conic sections.",
                    "Apply coordinate geometry to solve geometric problems."
                ]
            },
             {
                "name": "Trigonometry",
                "learning_objectives": [
                    "Master trigonometric identities and equations.",
                    "Solve problems involving heights and distances.",
                    "Understand inverse trigonometric functions."
                ]
            }
        ]
    },
]

topic_json_path = "topics.json"

# Save example data to file
with open(topic_json_path, 'w') as f:
  json.dump(example_topics_data, f, indent=4)

# Generate questions with total questions specified
result, output_file, total_generated, failed_batches = client.qna_engine.bulk_generate_questions(
        topic=topic_json_path,
        # total_questions=10,
        questions_per_objective=5,
        max_workers=5,
        output_format="pdf", ## You can Define as pdf Format
        max_retries=2,
        custom_instructions="Generate clear, grade-appropriate multiple choice questions",
        difficulty="medium"
)

print(f"\nGeneration completed!")

###Read Pdf  From Colab Files

In [None]:
!pip install pdfminer.six

In [None]:
from pdfminer.high_level import extract_text

# Path to the PDF file
pdf_file_path = "/content/questions_20250311_084636.pdf"  # Replace with your PDF file path

# Extract text from the PDF
text = extract_text(pdf_file_path)

# Display the extracted text
print(text)