<a href="https://colab.research.google.com/github/gulyasbence03/EssayGradingChatGPT/blob/main/EssayGrading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#23_05d  Automated Essay Scorer:
    Design a system that scores essays based on factors like grammar, coherence, and vocabulary.
    Use Natural Language Processing techniques and get assistance with libraries like NLTK or spaCy.

In [None]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
!pip install language-tool-python

In [9]:
import nltk
from nltk.corpus import stopwords
from nltk import pos_tag
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
import language_tool_python

def tokenize_and_filter_punctuation(text):
    tokens = nltk.word_tokenize(text.lower())
    tokens = [token for token in tokens if len(token) > 1 or token.isalnum() or token in {"'", '"', '`'}]
    tokens = [token[:-2] if token.endswith("'s") else token for token in tokens]
    tokens = [token.replace('"', '') for token in tokens]
    return tokens

def filter_common_words(words):
    stop_words = set(stopwords.words('english'))
    filtered_words = [word for word in words if word not in stop_words]
    return filtered_words

def calculate_unique_word_count(text):
    print("Checking Vocabulary...")
    words = tokenize_and_filter_punctuation(text)
    filtered_words = filter_common_words(words)
    filtered_words = [word for word in filtered_words if word != '']
    unique_words = set(filtered_words)
    max_unique_word_count = len(filtered_words)

    if max_unique_word_count == 0:
        unique_word_count_grade = 1
    else:
        unique_word_count_ratio = len(unique_words) / max_unique_word_count
        if unique_word_count_ratio >= 0.75:
            unique_word_count_grade = 5
        elif 0.75 > unique_word_count_ratio >= 0.60:
            unique_word_count_grade = 4
        elif 0.60 > unique_word_count_ratio >= 0.45:
            unique_word_count_grade = 3
        elif 0.45 > unique_word_count_ratio >= 0.30:
            unique_word_count_grade = 2
        else:
            unique_word_count_grade = 1

    print("Vocabulary Check Complete✔️")
    return unique_word_count_grade

def is_related(word1, word2):
    synsets1 = wordnet.synsets(word1)
    synsets2 = wordnet.synsets(word2)

    for synset1 in synsets1:
        for synset2 in synsets2:
            if synset1.wup_similarity(synset2) is not None and synset1.wup_similarity(synset2) > 0.6:
                return True
    return False

def check_coherence(text):
    print("Checking Coherence...")
    tokens = nltk.word_tokenize(text.lower())
    tokens = [token for token in tokens if len(token) > 1 and token not in stopwords.words('english')]
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(token) for token in tokens]
    pos_tags = pos_tag(tokens)
    coherence_score = 0

    # Calculate the maximum possible coherence score based on the number of adjacent word pairs
    max_coherence_score = max(len(tokens) - 1, 1)

    for i in range(len(pos_tags) - 1):
        word1, pos1 = pos_tags[i]
        word2, pos2 = pos_tags[i + 1]

        if is_related(word1, word2) and pos1 == pos2:
            coherence_score += 1

    # Calculate coherence grade relative to the essay length
    coherence_ratio = coherence_score / max_coherence_score

    if coherence_ratio >= 0.06:
        coherence_grade = 5
    elif 0.06 > coherence_ratio >= 0.04:
        coherence_grade = 4
    elif 0.04 > coherence_ratio >= 0.02:
        coherence_grade = 3
    elif 0.02 > coherence_ratio >= 0.01:
        coherence_grade = 2
    else:
        coherence_grade = 1

    print("Coherence Check Complete✔️")
    return coherence_grade


def provide_feedback(vocab_grade, coherence_grade, grammar_grade):
    if vocab_grade < 5:
        print("⚠️ The essay could be improved with a richer vocabulary. ⚠️")

    if coherence_grade < 5:
        print("⚠️ The essay lacks coherence. Consider improving the flow between sentences and paragraphs. ⚠️")

    if grammar_grade < 5:
        print("⚠️ The essay contains grammar issues. Consider reviewing and correcting them. ⚠️")

def check_grammar(text):
    print("Checking Grammar...")
    tool = language_tool_python.LanguageTool('en-US')
    matches = tool.check(text)
    return matches

def assign_grammar_grade(grammar_issues, essay_length):
    total_issues = len(grammar_issues)

    # Calculate the ratio of grammar issues relative to the essay length
    issues_ratio = total_issues / essay_length

    print("Grammar Check Complete✔️")
    # Adjust the thresholds based on the ratio
    if issues_ratio <= 0.01:
        return 5  # Perfect grammar
    elif issues_ratio <= 0.02:
        return 4  # Minor issues
    elif issues_ratio <= 0.03:
        return 3  # Moderate issues
    elif issues_ratio <= 0.04:
        return 2  # Substantial issues
    else:
        return 1  # Numerous issues

def calculate_overall_grade(grammar_grade, vocab_grade, coherence_grade):
    overall_grade = (grammar_grade + vocab_grade + coherence_grade) / 3
    return round(overall_grade)

essay = """
To begin with pollution and damage to the environment is the most serious and difficult problem for countries of all over the world. Scientists of different countries predict a global ecocatastrophe if people won’t change their attitude to our planet.

First of all a huge damage to the environment brings a transport. People can’t imagine their living without cars, buses, trains, ships and planes. But it’s an open secret that one of disadvantage of these accustomed things is harmful exhaust. Needless to say that use of environment friendly engines helps us to save atmosphere from pollution.

In addition to this our rivers and seas are in not less danger situation. It’s a fact of common knowledge that numerous factories and plants pour off their waste to ponds. Obviously that cleaning manufacturing water helps to avoid extinction of ocean residents.

Apart from this I’m inclined to believe that every person can and must contribute to solving this important problem. Doing a little steps for protection our environment every day we will be able to save our Earth. And it’s a task of each of us.
"""

vocab_grade = calculate_unique_word_count(essay)
print(f"Vocab: {vocab_grade}\n")

coherence_grade = check_coherence(essay)
print(f"Coherence: {coherence_grade}\n")

grammar_issues = check_grammar(essay)
essay_length = len(essay.split())
grammar_grade = assign_grammar_grade(grammar_issues, essay_length)
print(f"Grammar: {grammar_grade}\n")

overall_grade = calculate_overall_grade(grammar_grade, vocab_grade, coherence_grade)
print(f"\nOverall essay grade: {overall_grade}\n")

provide_feedback(vocab_grade, coherence_grade, grammar_grade)

# Examples from : https://engxam.com/handbook/essays-sample-answers-comments-b2-first-fce/


Checking Vocabulary...
Vocabulary Check Complete✔️
Vocab: 5

Checking Coherence...
Coherence Check Complete✔️
Coherence: 5

Checking Grammar...
Grammar Check Complete✔️
Grammar: 3


Overall essay grade: 4

⚠️ The essay contains grammar issues. Consider reviewing and correcting them. ⚠️
