<a href="https://colab.research.google.com/github/ee-adii/Machine-Learning-Projects/blob/main/mcq_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install PyPDF2
import PyPDF2
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

# Ensure the necessary NLTK data packages are downloaded
nltk.download('punkt')
nltk.download('stopwords')

def extract_text_from_pdf(pdf_path):
    """Extracts text from a PDF file."""
    pdf_text = ""
    with open(pdf_path, 'rb') as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        for page in pdf_reader.pages:
            pdf_text += page.extract_text()
    return pdf_text

def summarize_text(text, word_limit=1000):
    """Summarizes the text to a specified word limit."""
    stop_words = set(stopwords.words("english"))
    words = word_tokenize(text)
    freq_table = {}

    # Frequency table of words
    for word in words:
        word = word.lower()
        if word in stop_words:
            continue
        if word in freq_table:
            freq_table[word] += 1
        else:
            freq_table[word] = 1

    sentences = sent_tokenize(text)
    sentence_value = {}

    # Assigning a score to each sentence
    for sentence in sentences:
        for word, freq in freq_table.items():
            if word in sentence.lower():
                if sentence in sentence_value:
                    sentence_value[sentence] += freq
                else:
                    sentence_value[sentence] = freq

    # Average value of a sentence
    sum_values = sum(sentence_value.values())
    average_value = int(sum_values / len(sentence_value))

    summary = ""

    # Creating summary
    for sentence in sentences:
        if sentence in sentence_value and sentence_value[sentence] > (1.2 * average_value):
            summary += " " + sentence

    summary_words = word_tokenize(summary)

    # If the summary is too long, truncate it to the word limit
    if len(summary_words) > word_limit:
        summary = " ".join(summary_words[:word_limit])

    return summary

def compress_pdf_to_summary(pdf_path, word_limit=1000):
    """Extracts text from a PDF and summarizes it into a specified word limit."""
    text = extract_text_from_pdf(pdf_path)
    summary = summarize_text(text, word_limit=word_limit)
    return summary

# Example usage
pdf_path = '/content/drive/MyDrive/SDG.pdf'
summary = compress_pdf_to_summary(pdf_path)
print(summary)



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


IN THE YEAR 2015 , LEADERS FROM 193 COUNTRIES OF THE WORLD CAME TOGETHER TO FACE THE FUTURE . They knew we had enough food to feed the world , but that it wasn ’ t getting shared . This set of 17 goals imagines a future just 15 years off that would be rid of poverty and hunger , and safe from the worst effects of climate change . Present in nearly 170 countries and territories , we help nations make the Goals a reality . Yes , it ’ s an ambitious goal—but we believe it can be done . In 2000 , the world committed to halving the number of people living in extreme poverty by the year 2015 and we met this goal . However , more than 800 million people around the world still live on less than $ 1.25 a day— that ’ s about the equivalent of the entire population of Europe living in extreme poverty . END HUNGER , ACHIEVE FOOD SECURITY AND IMPROVED NUTRITION AND PROMOTE SUSTAINABLE AGRICUL TURE In the past 20 years , hunger has dropped by almost half . But for the sake of the nearly 1 out of eve

In [None]:
import spacy
from collections import Counter
import random


In [None]:
# Load English tokenizer, tagger, parser, NER, and word vectors
nlp = spacy.load("en_core_web_sm")


def generate_mcqs(text, num_questions=5):
    # text = clean_text(text)
    if text is None:
        return []

    # Process the text with spaCy
    doc = nlp(text)

    # Extract sentences from the text
    sentences = [sent.text for sent in doc.sents]

    # Ensure that the number of questions does not exceed the number of sentences
    num_questions = min(num_questions, len(sentences))

    # Randomly select sentences to form questions
    selected_sentences = random.sample(sentences, num_questions)

    # Initialize list to store generated MCQs
    mcqs = []

    # Generate MCQs for each selected sentence
    for sentence in selected_sentences:
        # Process the sentence with spaCy
        sent_doc = nlp(sentence)

        # Extract entities (nouns) from the sentence
        nouns = [token.text for token in sent_doc if token.pos_ == "NOUN"]

        # Ensure there are enough nouns to generate MCQs
        if len(nouns) < 2:
            continue

        # Count the occurrence of each noun
        noun_counts = Counter(nouns)

        # Select the most common noun as the subject of the question
        if noun_counts:
            subject = noun_counts.most_common(1)[0][0]

            # Generate the question stem
            question_stem = sentence.replace(subject, "______")

            # Generate answer choices
            answer_choices = [subject]

            # Add some random words from the text as distractors
            distractors = list(set(nouns) - {subject})

            # Ensure there are at least three distractors
            while len(distractors) < 3:
                distractors.append("[Distractor]")  # Placeholder for missing distractors

            random.shuffle(distractors)
            for distractor in distractors[:3]:
                answer_choices.append(distractor)

            # Shuffle the answer choices
            random.shuffle(answer_choices)

            # Append the generated MCQ to the list
            correct_answer = chr(64 + answer_choices.index(subject) + 1)  # Convert index to letter
            mcqs.append((question_stem, answer_choices, correct_answer))

    return mcqs


In [None]:
tech_text = summary
mcqs = generate_mcqs(tech_text, num_questions=5)  # Pass the selected number of questions
# Ensure each MCQ is formatted correctly as (question_stem, answer_choices, correct_answer)
mcqs_with_index = [(i + 1, mcq) for i, mcq in enumerate(mcqs)]

for question in mcqs_with_index:
    print("Question", question[0], ":", question[1][0])
    print("Options:")
    options = question[1][1]
    for i, option in enumerate(options):
        print(f"{chr(97 + i)}) {option}")
    print("Correct Answer:", question[1][2])
    print("\n")

Question 1 : In the 25 ______ before the SDGs , we made big strides—preventable child deaths dropped by more than half , and maternal mortality went down by almost as much .
Options:
a) strides
b) years
c) mortality
d) deaths
Correct Answer: B


Question 2 : And yet four billion ______ have no way of getting online , the vast majority of them in developing countries .
Options:
a) way
b) countries
c) majority
d) people
Correct Answer: D


Question 3 : ______ scarcity affects more than 40 percent of people around the world , and that number is projected to go even higher as a result of climate change .
Options:
a) climate
b) scarcity
c) Water
d) world
Correct Answer: C


Question 4 : But we can take a new ______—more international cooperation , protecting wetlands and rivers , sharing water-treatment technologies—that leads to accomplishing this Goal .
Options:
a) water
b) rivers
c) path
d) technologies
Correct Answer: C


Question 5 : ENSURE ______ AND SUSTAINABLE MANAGEMENT OF WATER AN