# IELTSTutor - Empower your IELTS Writing
IELTS Tutor is an AI agent designed to grade and provide detailed feedback on IELTS Academic Writing tasks 1 and 2 using LangGraph and Google's Gemini. The agent guides users through uploading a task description (image, PDF, or text), submitting a written response, and receiving structured feedback and band scores based on official IELTS band descriptors. It intelligently classifies the task (Task 1 vs. Task 2), identifies specific formats (e.g., bar chart, opinion essay), and provides task-specific instructions to guide the evaluation.

## Significance of the project
Recognizing that a minimum IELTS score of 6.5 is typically required for university admission and that academic writing presents the greatest challenge with an average score of only 6.0 (according to [ielts.org](https://ielts.org/researchers/our-research/test-statistics#Test_performance)), I intend to develop a free-to-use AI-powered IELTS Writing Assistant. This tool will offer comprehensive feedback and band score estimations to specifically aid students in enhancing their writing abilities for the IELTS exam.

## Overview of the project workflow
* The user will be required to enter the task description. It can either be a task 1 or task 2 of any type; the model will automatically detect that. The task description can be in plain text, PDF, and/or images, as tasks 1 mostly contain some kind of graphs.
* The user will upload their answer to the chat.
* The agent will automatically detect whether the task description is for task 1 or task 2 and which type accordingly.
* The agent will provide band score and detailed feedback based on official IELTS band descriptors and suggested structure for the required task type.

## Details of the project

### Get started
Start by installing and importing the LangGraph SDK and LangChain support for the Gemini API.

In [1]:
# Remove conflicting packages from the Kaggle base environment.
!pip uninstall -qqy kfp jupyterlab libpysal thinc spacy fastai ydata-profiling google-cloud-bigquery google-generativeai
# Install langgraph and the packages used in this lab.
!pip install -qU 'langgraph==0.3.21' 'langchain-google-genai==2.1.2' 'langgraph-prebuilt==0.1.7'

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.0/138.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m434.1/434.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.3/47.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.6/223.6 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h

### Set up the API key
The `GOOGLE_API_KEY` environment variable can be set to automatically configure the underlying API. This works for both the official Gemini Python SDK and for LangChain/LangGraph. 

To run the following cell, your API key must be stored it in a [Kaggle secret](https://www.kaggle.com/discussions/product-feedback/114053) named `GOOGLE_API_KEY`.

If you don't already have an API key, you can grab one from [AI Studio](https://aistudio.google.com/app/apikey). You can find [detailed instructions in the docs](https://ai.google.dev/gemini-api/docs/api-key).

To make the key available through Kaggle secrets, choose `Secrets` from the `Add-ons` menu and follow the instructions to add your key or enable it for this notebook.

In [2]:
import os
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

### Import necessary libraries

In [3]:
import os
import base64
import io
import json
from typing import TypedDict, Optional, Any, Dict, List, Literal
from PIL import Image
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage, BaseMessage
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages 

### Define the state
Defines the IeltsGraderState dictionary structure, which acts as the memory holding all information passed between steps in the agent.

In [4]:
class IeltsGraderState(TypedDict):
    """Represents the state of our IELTS grading graph."""

    # Raw Inputs (Set by interactive nodes)
    task_description_input: Any # User-provided text or path
    user_answer_input: Any    # User-provided text or path

    # Processed Inputs
    task_description_content: Optional[List[Dict[str, Any]]] # Langchain multimodal format (text/image data)
    user_answer_text: Optional[str] # Plain text of the user's answer

    # Task Identification Results
    task_number: Optional[Literal[1, 2]] # Detected Task (1 or 2)
    task_1_type: Optional[Literal[ # If Task 1, the specific type
        'bar_chart', 'line_graph', 'pie_chart', 'table_chart',
        'map', 'process_diagram', 'multiple_charts'
    ]]
    task_2_type: Optional[Literal[ # If Task 2, the specific type
        'opinion', 'discussion', 'problem_solution',
        'advantages_disadvantages', 'double_question'
    ]]

    # Intermediate Analysis / Tool Outputs
    intermediate_results: Dict[str, Any] # e.g., storing visual summary

    # Evaluation Output
    evaluation_criteria: Optional[Dict[str, Dict[str, Any]]] # Detailed scores & feedback per criterion
    overall_score: Optional[float] # Final calculated overall band score
    final_feedback_report: Optional[str] # User-facing formatted report

    # Control Flow & Error Handling
    error_message: Optional[str] # Stores error messages if any step fails

### Official IELTS band descriptors and recommended structures
Defines the core knowledge base: the detailed IELTS band descriptors (IELTS_RUBRICS) for Task 1 and Task 2 (including TA/TR, CC, LR, GRA) and the suggested paragraph structures (TASK_STRUCTURES) for different task types.

In [5]:
# Define Task 1 Rubrics
task1_rubrics = {
    "TA": { # Task Achievement
        9: """All the requirements of the task are fully and appropriately satisfied.
There may be extremely rare lapses in content.""",
        8: """The response covers all the requirements of the task appropriately, relevantly and sufficiently.
(Academic) Key features are skilfully selected, and clearly presented, highlighted and illustrated.
(General Training) All bullet points are clearly presented, and appropriately illustrated or extended.
There may be occasional omissions or lapses in content.""",
        7: """The response covers the requirements of the task.
The content is relevant and accurate - there may be a few omissions or lapses. The format is appropriate.
(Academic) Key features which are selected are covered and clearly highlighted but could be more fully or more appropriately illustrated or extended.
(Academic) It presents a clear overview, the data are appropriately categorised, and main trends or differences are identified.
(General Training) All bullet points are covered and clearly highlighted but could be more fully or more appropriately illustrated or extended. It presents a clear purpose. The tone is consistent and appropriate to the task. Any lapses are minimal.""",
        6: """The response focuses on the requirements of the task and an appropriate format is used.
(Academic) Key features which are selected are covered and adequately highlighted. A relevant overview is attempted. Information is appropriately selected and supported using figures/data.
(General Training) All bullet points are covered and adequately highlighted. The purpose is generally clear. There may be minor inconsistencies in tone.
Some irrelevant, inappropriate or inaccurate information may occur in areas of detail or when illustrating or extending the main points.
Some details may be missing (or excessive) and further extension or illustration may be needed.""",
        5: """The response generally addresses the requirements of the task. The format may be inappropriate in places.
(Academic) Key features which are selected are not adequately covered. The recounting of detail is mainly mechanical. There may be no data to support the description.
(General Training) All bullet points are presented but one or more may not be adequately covered. The purpose may be unclear at times. The tone may be variable and sometimes inappropriate.
There may be a tendency to focus on details (without referring to the bigger picture).
The inclusion of irrelevant, inappropriate or inaccurate material in key areas detracts from the task achievement.
There is limited detail when extending and illustrating the main points.""",
        4: """The response is an attempt to address the task.
(Academic) Few key features have been selected.
(General Training) Not all bullet points are presented.
(General Training) The purpose of the letter is not clearly explained and may be confused. The tone may be inappropriate.
The format may be inappropriate.""",
        3: """The response does not address the requirements of the task (possibly because of misunderstanding of the data/diagram/situation).
Key features/bullet points which are presented may be largely irrelevant.
Limited information is presented, and this may be used repetitively.""",
        2: """The content barely relates to the task.""",
        1: """The content is wholly unrelated to the task.
Responses of 20 words or fewer are rated at Band 1.
Any copied rubric must be discounted.""",
        0: """Should only be used where a candidate did not attend or attempt the question in any way, used a language other than English throughout, or where there is proof that a candidate's answer has been totally memorised."""
    },
    "CC": { # Coherence and Cohesion
        9: """The message can be followed effortlessly.
Cohesion is used in such a way that it very rarely attracts attention.
Any lapses in coherence or cohesion are minimal.
Paragraphing is skilfully managed.""",
        8: """The message can be followed with ease.
Information and ideas are logically sequenced, and cohesion is well managed.
Occasional lapses in coherence or cohesion may occur.
Paragraphing is used sufficiently and appropriately.""",
        7: """Information and ideas are logically organised and there is a clear progression throughout the response. A few lapses may occur.
A range of cohesive devices including reference and substitution is used flexibly but with some inaccuracies or some over/under use.""",
        6: """Information and ideas are generally arranged coherently and there is a clear overall progression.
Cohesive devices are used to some good effect but cohesion within and/or between sentences may be faulty or mechanical due to misuse, overuse or omission.
The use of reference and substitution may lack flexibility or clarity and result in some repetition or error.""",
        5: """Organisation is evident but is not wholly logical and there may be a lack of overall progression. Nevertheless, there is a sense of underlying coherence to the response.
The relationship of ideas can be followed but the sentences are not fluently linked to each other.
There may be limited/overuse of cohesive devices with some inaccuracy.
The writing may be repetitive due to inadequate and/or inaccurate use of reference and substitution.""",
        4: """Information and ideas are evident but not arranged coherently, and there is no clear progression within the response.
Relationships between ideas can be unclear and/or inadequately marked. There is some use of basic cohesive devices, which may be inaccurate or repetitive.
There is inaccurate use or a lack of substitution or referencing.""",
        3: """There is no apparent logical organisation. Ideas are discernible but difficult to relate to each other.
Minimal use of sequencers or cohesive devices. Those used do not necessarily indicate a logical relationship between ideas.
There is difficulty in identifying referencing.""",
        2: """There is little relevant message, or the entire response may be off-topic.
There is little evidence of control of organisational features.""",
        1: """The writing fails to communicate any message and appears to be by a virtual non-writer.
Responses of 20 words or fewer are rated at Band 1.""",
        0: """Should only be used where a candidate did not attend or attempt the question in any way, used a language other than English throughout, or where there is proof that a candidate's answer has been totally memorised.""" # Assuming 0 applies similarly
    },
    "LR": { # Lexical Resource
         9: """Full flexibility and precise use are evident within the scope of the task.
A wide range of vocabulary is used accurately and appropriately with very natural and sophisticated control of lexical features.
Minor errors in spelling and word formation are extremely rare and have minimal impact on communication.""",
         8: """A wide resource is fluently and flexibly used to convey precise meanings within the scope of the task.
There is skilful use of uncommon and/or idiomatic items when appropriate, despite occasional inaccuracies in word choice and collocation.
Occasional errors in spelling and/or word formation may occur, but have minimal impact on communication.""",
         7: """The resource is sufficient to allow some flexibility and precision.
There is some ability to use less common and/or idiomatic items.
An awareness of style and collocation is evident, though inappropriacies occur.
There are only a few errors in spelling and/or word formation, and they do not detract from overall clarity.""",
         6: """The resource is generally adequate and appropriate for the task.
The meaning is generally clear in spite of a rather restricted range or a lack of precision in word choice.
If the writer is a risk-taker, there will be a wider range of vocabulary used but higher degrees of inaccuracy or inappropriacy.
There are some errors in spelling and/or word formation, but these do not impede communication.""",
         5: """The resource is limited but minimally adequate for the task.
Simple vocabulary may be used accurately but the range does not permit much variation in expression.
There may be frequent lapses in the appropriacy of word choice, and a lack of flexibility is apparent in frequent simplifications and/or repetitions.
Errors in spelling and/or word formation may be noticeable and may cause some difficulty for the reader.""",
         4: """The resource is limited and inadequate for or unrelated to the task. Vocabulary is basic and may be used repetitively.
There may be inappropriate use of lexical chunks (e.g. memorised phrases, formulaic language and/or language from the input material).
Inappropriate word choice and/or errors in word formation and/or in spelling may impede meaning.""",
         3: """The resource is inadequate (which may be due to the response being significantly underlength).
Possible over-dependence on input material or memorised language.
Control of word choice and/or spelling is very limited, and errors predominate. These errors may severely impede meaning.""",
         2: """The resource is extremely limited with few recognisable strings, apart from memorised phrases.
There is no apparent control of word formation and/or spelling.""",
         1: """No resource is apparent, except for a few isolated words.
Responses of 20 words or fewer are rated at Band 1.""",
         0: """Should only be used where a candidate did not attend or attempt the question in any way, used a language other than English throughout, or where there is proof that a candidate's answer has been totally memorised."""
    },
    "GRA": { # Grammatical Range and Accuracy
         9: """A wide range of structures within the scope of the task is used with full flexibility and control.
Punctuation and grammar are used appropriately throughout.
Minor errors are extremely rare and have minimal impact on communication.""",
         8: """A wide range of structures within the scope of the task is flexibly and accurately used.
The majority of sentences are error-free, and punctuation is well managed.
Occasional, non-systematic errors and inappropriacies occur, but have minimal impact on communication.""",
         7: """A variety of complex structures is used with some flexibility and accuracy.
Grammar and punctuation are generally well controlled, and error-free sentences are frequent.
A few errors in grammar may persist, but these do not impede communication.""",
         6: """A mix of simple and complex sentence forms is used but flexibility is limited.
Examples of more complex structures are not marked by the same level of accuracy as in simple structures.
Errors in grammar and punctuation occur, but rarely impede communication.""",
         5: """The range of structures is limited and rather repetitive.
Although complex sentences are attempted, they tend to be faulty, and the greatest accuracy is achieved on simple sentences.
Grammatical errors may be frequent and cause some difficulty for the reader.
Punctuation may be faulty.""",
         4: """A very limited range of structures is used.
Subordinate clauses are rare and simple sentences predominate.
Some structures are produced accurately but grammatical errors are frequent and may impede meaning.
Punctuation is often faulty or inadequate.""",
         3: """Sentence forms are attempted, but errors in grammar and punctuation predominate (except in memorised phrases or those taken from the input material). This prevents most meaning from coming through.
Length may be insufficient to provide evidence of control of sentence forms.""",
         2: """There is little or no evidence of sentence forms (except in memorised phrases).""",
         1: """No rateable language is evident.
Responses of 20 words or fewer are rated at Band 1.""",
         0: """Should only be used where a candidate did not attend or attempt the question in any way, used a language other than English throughout, or where there is proof that a candidate's answer has been totally memorised."""
    }
}

# Define Task 2 Rubrics, Reusing Task 1 definitions where appropriate
task2_rubrics = {
    "TR": { # Task Response
        9: """The prompt is appropriately addressed and explored in depth.
A clear and fully developed position is presented which directly answers the question/s.
Ideas are relevant, fully extended and well supported.
Any lapses in content or support are extremely rare.""",
        8: """The prompt is appropriately and sufficiently addressed.
A clear and well-developed position is presented in response to the question/s.
Ideas are relevant, well extended and supported.
There may be occasional omissions or lapses in content.""",
        7: """The main parts of the prompt are appropriately addressed.
A clear and developed position is presented.
Main ideas are extended and supported but there may be a tendency to over-generalise or there may be a lack of focus and precision in supporting ideas/material.""",
        6: """The main parts of the prompt are addressed (though some may be more fully covered than others). An appropriate format is used.
A position is presented that is directly relevant to the prompt, although the conclusions drawn may be unclear, unjustified or repetitive.
Main ideas are relevant, but some may be insufficiently developed or may lack clarity, while some supporting arguments and evidence may be less relevant or inadequate.""",
        5: """The main parts of the prompt are incompletely addressed. The format may be inappropriate in places.
The writer expresses a position, but the development is not always clear.
Some main ideas are put forward, but they are limited and are not sufficiently developed and/or there may be irrelevant detail.
There may be some repetition.""",
        4: """The prompt is tackled in a minimal way, or the answer is tangential, possibly due to some misunderstanding of the prompt. The format may be inappropriate.
A position is discernible, but the reader has to read carefully to find it.
Main ideas are difficult to identify and such ideas that are identifiable may lack relevance, clarity and/or support.
Large parts of the response may be repetitive.""",
        3: """No part of the prompt is adequately addressed, or the prompt has been misunderstood.
No relevant position can be identified, and/or there is little direct response to the question/s.
There are few ideas, and these may be irrelevant or insufficiently developed.""",
        2: """The content is barely related to the prompt.
No position can be identified.
There may be glimpses of one or two ideas without development.""",
        1: """The content is wholly unrelated to the prompt.
Responses of 20 words or fewer are rated at Band 1.
Any copied rubric must be discounted.""",
        0: """Should only be used where a candidate did not attend or attempt the question in any way, used a language other than English throughout, or where there is proof that a candidate's answer has been totally memorised."""
    },
    # Reuse CC, LR, GRA from Task 1 as they are identical
    "CC": task1_rubrics["CC"],
    "LR": task1_rubrics["LR"],
    "GRA": task1_rubrics["GRA"]
}

# Combine them into the main IELTS_RUBRICS dictionary
IELTS_RUBRICS = {
    "Task 1": task1_rubrics,
    "Task 2": task2_rubrics
}



# TASK STRUCTURES Definition
TASK_STRUCTURES = {
    "Task 1": {
        'bar_chart': "1. Intro: Paraphrase task, mention chart type, axes, units, overall trend/highest/lowest. 2. Overview: Summarize main groupings or comparisons. 3. Detail Para 1: Describe significant categories/bars. 4. Detail Para 2: Describe other categories/bars, making comparisons.",
        'line_graph': "1. Intro: Paraphrase task, mention graph type, axes, units, overall trend. 2. Overview: Summarize main trend(s) or key features (start/end points, peaks). 3. Detail Para 1: Describe trends for one/two lines or from start to mid-point. 4. Detail Para 2: Describe trends for remaining lines or from mid-point to end, making comparisons.",
        'pie_chart': "1. Intro: Paraphrase task, mention chart type, what it represents overall. 2. Overview: Summarize largest/smallest segments, maybe combined categories. 3. Detail Para 1: Describe largest segments with data. 4. Detail Para 2: Describe smaller/other significant segments, making comparisons.",
        'table_chart': "1. Intro: Paraphrase task, mention table content. 2. Overview: Summarize main patterns, highest/lowest values overall or by key category. 3. Detail Para 1: Describe data for key rows/columns. 4. Detail Para 2: Describe other significant data points, making comparisons.",
        'map': "1. Intro: Paraphrase task, state map locations/time periods. 2. Overview: Summarize the main change(s) overall (e.g., more residential, industrialization). 3. Detail Para 1: Describe changes in one key area/time period (e.g., north side, first map). 4. Detail Para 2: Describe changes in another key area/time period, noting comparisons.",
        'process_diagram': "1. Intro: Paraphrase task, state process name/purpose. 2. Overview: State number of stages, start/end points. 3. Detail Para 1: Describe initial stages sequentially using appropriate sequencing language. 4. Detail Para 2: Describe later stages sequentially until the end.",
        'multiple_charts': "1. Intro: Paraphrase task, mention types and topics of charts. 2. Overview: Summarize main point from each chart OR the main relationship between them. 3. Detail Para 1: Describe key features of Chart 1. 4. Detail Para 2: Describe key features of Chart 2, linking to Chart 1 where relevant.",
        'unknown': "Standard Task 1 structure: Intro (Paraphrase, mention visual type if possible), Overview (Summarize main trends/features), Detail Paragraphs (Describe specific data/elements with comparisons)."
    },
    "Task 2": {
        'opinion': "1. Intro: Paraphrase question, clearly state your opinion (agree/disagree/extent). 2. Body Para 1: Main reason supporting your opinion + explanation/example. 3. Body Para 2: Second reason supporting your opinion + explanation/example. (Alternative: Discuss opposing view briefly then refute). 4. Conclusion: Summarize main points and restate opinion in different words.",
        'discussion': "1. Intro: Paraphrase question, introduce both views, state your own opinion (if asked, or can be in conclusion). 2. Body Para 1: Discuss the first view + reasons/examples. 3. Body Para 2: Discuss the second view + reasons/examples. 4. Conclusion: Summarize both views, give a balanced concluding thought or state your clear opinion.",
        'problem_solution': "1. Intro: Paraphrase topic, state the problem(s) and that solutions will be discussed. 2. Body Para 1: Explain the main problem(s) or cause(s) + effects/examples. (Can be 1-2 paragraphs). 3. Body Para 2: Propose solution(s) + explain how they work/why they are effective. (Can be 1-2 paragraphs). 4. Conclusion: Summarize problem(s) and solution(s), give final recommendation/outlook.",
        'advantages_disadvantages': "1. Intro: Paraphrase topic, state that advantages and disadvantages will be discussed. (State your opinion if asked: 'Do advantages outweigh...?') 2. Body Para 1: Discuss advantage(s) + explanation/examples. 3. Body Para 2: Discuss disadvantage(s) + explanation/examples. 4. Conclusion: Summarize points, give balanced view or state opinion on whether advantages outweigh disadvantages (if asked).",
        'double_question': "1. Intro: Paraphrase topic, briefly introduce both questions. 2. Body Para 1: Answer the first question + reasons/examples. 3. Body Para 2: Answer the second question + reasons/examples. 4. Conclusion: Briefly summarize answers to both questions.",
        'unknown': "Standard Task 2 essay structure: Intro (Hook, Background, Thesis/Outline). Body Paragraphs (Topic sentence, Explanation, Example). Conclusion (Summarize, Restate thesis, Final thought)."
    }
}

print("✅ Rubrics and Task Structures defined correctly.")

✅ Rubrics and Task Structures defined correctly.


### LLM initialization
Sets up the connection to the Large Language Models. It initializes a standard text model (llm) for simpler tasks and a more powerful multimodal model (multimodal_llm) capable of processing images and handling the complex evaluation step.

In [6]:
# Initialize the standard model for text-based tasks (classification, etc.)
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.1,
    request_options={"timeout": 120}
)

# Initialize the multimodal model for tasks involving images or complex evaluation
multimodal_llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0.2,
    request_options={"timeout": 300}
)

### Input and loading
Contains the functions for the interactive part (ask_for_*) that prompt the user for input, and the processing functions (load_*) that take the user's input (text, file path for Image/PDF/TXT) and load/convert it into the format needed by the agent (text strings and base64 encoded image data). Handles basic file loading errors.

In [7]:
def ask_for_task_description(state: IeltsGraderState) -> Dict[str, Any]:
    """Prompts the user to input the task description (text, path to PDF, or path to image)."""
    print("\n--- Task Description Input ---")
    try:
        user_input = input("➡️ Enter Task Description text OR full path to PDF/Image file (e.g., /kaggle/input/.../task1.pdf or /kaggle/working/chart.png): \n")
        if not user_input or not user_input.strip():
            return {"error_message": "Task description input cannot be empty."}
        print(f"   Received task description input: '{user_input.strip()[:100]}...'")
        # Store the raw input; loading/validation happens in the next node
        return {"task_description_input": user_input.strip(), "error_message": None}
    except EOFError: # Handles script execution ending unexpectedly during input
        return {"error_message": "Input stream closed. Cannot get task description."}
    except Exception as e:
        return {"error_message": f"Error getting task description input: {e}"}

def ask_for_user_answer(state: IeltsGraderState) -> Dict[str, Any]:
    """Prompts the user to input their answer (text or path to .txt file)."""
    # Check for errors from previous steps before proceeding
    if state.get("error_message"): return {"error_message": state["error_message"]}

    print("\n--- User Answer Input ---")
    try:
        user_input = input("➡️ Enter your Answer text OR full path to a .txt file containing your answer: \n")
        if not user_input or not user_input.strip():
            return {"error_message": "User answer input cannot be empty."}
        print(f"   Received user answer input: '{user_input.strip()[:100]}...'")
        # Store the raw input
        return {"user_answer_input": user_input.strip(), "error_message": None}
    except EOFError:
         return {"error_message": "Input stream closed. Cannot get user answer."}
    except Exception as e:
        return {"error_message": f"Error getting user answer input: {e}"}
print("✅ Input node functions defined.")

✅ Input node functions defined.


In [8]:
def load_task_description(state: IeltsGraderState) -> Dict[str, Any]:
    """Loads and processes the task description input (text, PDF, image) into Langchain format."""
    print("--- Loading Task Description ---")
    if state.get("error_message"): return {"error_message": state["error_message"]} # Propagate previous errors

    task_input = state.get('task_description_input')
    content_list = [] # To store Langchain formatted content [{type: 'text', text: '...'}, {type: 'image_url', ...}]
    error_message = None

    if not task_input:
         return {"error_message": "Task description input missing in state."}

    try:
        # Check if the input string is a valid, existing file path
        if isinstance(task_input, str) and os.path.exists(task_input) and os.path.isfile(task_input):
            print(f"   Processing file path: {task_input}")
            file_ext = os.path.splitext(task_input)[1].lower()

            # Handle Images
            if file_ext in ['.png', '.jpg', '.jpeg', '.webp', '.gif']:
                try:
                    print(f"   Encoding image file ({file_ext})...")
                    image = Image.open(task_input)
                    buffer = io.BytesIO()
                    # Ensure image is in RGB format for broad compatibility
                    if image.mode in ['RGBA', 'P', 'LA']:
                        image = image.convert('RGB')
                    image.save(buffer, format="JPEG") # Standardize to JPEG
                    encoded_string = base64.b64encode(buffer.getvalue()).decode("utf-8")
                    # Add a text placeholder and the image data
                    content_list.append({"type": "text", "text": f"[Image Provided: {os.path.basename(task_input)}]"})
                    content_list.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_string}"}})
                    print("   Image encoded successfully.")
                except Exception as img_e:
                    print(f"   ⚠️ Error processing image file: {img_e}")
                    error_message = f"Failed to load/encode image file '{os.path.basename(task_input)}': {img_e}"
                    # Add error text to content list for context
                    content_list.append({"type": "text", "text": f"[Error loading image: {error_message}]"})

            # Handle PDFs
            elif file_ext == '.pdf':
                try:
                    print("   Extracting text from PDF...")
                    pdf_text = ""
                    reader = pypdf.PdfReader(task_input)
                    for page_num, page in enumerate(reader.pages):
                        pdf_text += page.extract_text() or "" # Add text from each page
                        if page_num < len(reader.pages) - 1:
                             pdf_text += "\n\n--- Page Break ---\n\n" # Indicate page breaks
                    if pdf_text.strip():
                        content_list.append({"type": "text", "text": pdf_text})
                        print(f"   Extracted {len(reader.pages)} page(s) of text from PDF.")
                    else:
                        print("   ⚠️ PDF contained no extractable text.")
                        content_list.append({"type": "text", "text": f"[PDF file provided ({os.path.basename(task_input)}) but contained no extractable text]"})
                except Exception as pdf_e:
                    print(f"   ⚠️ Error processing PDF file: {pdf_e}")
                    error_message = f"Failed to read PDF file '{os.path.basename(task_input)}': {pdf_e}"
                    content_list.append({"type": "text", "text": f"[Error loading PDF: {error_message}]"})

            # Handle Plain Text Files
            elif file_ext == '.txt':
                 try:
                     print("   Reading text file...")
                     with open(task_input, 'r', encoding='utf-8') as f:
                         text_content = f.read()
                         content_list.append({"type": "text", "text": text_content})
                         print(f"   Loaded text from '{os.path.basename(task_input)}'.")
                 except Exception as txt_e:
                     print(f"   ⚠️ Error reading text file: {txt_e}")
                     error_message = f"Failed to read text file '{os.path.basename(task_input)}': {txt_e}"
                     content_list.append({"type": "text", "text": f"[Error loading text file: {error_message}]"})

            # Unsupported file types
            else:
                error_message = f"Unsupported file type provided for task description: '{file_ext}'"
                print(f"   ⚠️ {error_message}")
                content_list.append({"type": "text", "text": f"[Unsupported file type: {os.path.basename(task_input)}]"})

        # Handle direct text input
        elif isinstance(task_input, str):
            print("   Processing raw text description.")
            content_list.append({"type": "text", "text": task_input})

        # Handle other unexpected input types
        else:
            error_message = f"Unexpected input type for task description: {type(task_input).__name__}. Expected text or file path."
            print(f"   ⚠️ {error_message}")


        # Final check: if content is empty and no specific error was caught, raise a general one.
        if not content_list and not error_message:
            error_message = "Could not process the provided task description input."
            print(f"   ⚠️ {error_message}")


    except Exception as e:
        print(f"   ⚠️ Unexpected error in load_task_description: {e}")
        error_message = f"Failed to process task description input: {e}"

    # Return processed content and any accumulated error message
    return {"task_description_content": content_list, "error_message": error_message}

def load_user_answer(state: IeltsGraderState) -> Dict[str, Any]:
    """Loads the user's answer (text or from .txt file) into plain text."""
    print("--- Loading User Answer ---")
    if state.get("error_message"): return {"error_message": state["error_message"]} # Propagate errors

    answer_input = state.get('user_answer_input')
    answer_text = None
    error_message = None

    if not answer_input:
        return {"error_message": "User answer input missing in state."}

    try:
        # Check if it's a file path
        if isinstance(answer_input, str) and os.path.exists(answer_input) and os.path.isfile(answer_input):
            print(f"   Processing file path: {answer_input}")
            file_ext = os.path.splitext(answer_input)[1].lower()
            if file_ext == '.txt':
                try:
                    print("   Reading text file...")
                    with open(answer_input, 'r', encoding='utf-8') as f:
                        answer_text = f.read()
                    if not answer_text or not answer_text.strip():
                         error_message = f"Answer file '{os.path.basename(answer_input)}' is empty or contains only whitespace."
                         print(f"   ⚠️ {error_message}")
                         answer_text = None # Treat as no answer provided
                    else:
                         print(f"   Loaded answer text from '{os.path.basename(answer_input)}'.")
                except Exception as txt_e:
                    print(f"   ⚠️ Error reading answer file: {txt_e}")
                    error_message = f"Failed to read answer file '{os.path.basename(answer_input)}': {txt_e}"
            # Add support for other formats like .docx if needed using libraries like 'python-docx'
            # elif file_ext == '.docx': ...
            else:
                error_message = f"Unsupported file type for answer: '{file_ext}'. Please provide a .txt file or paste the text directly."
                print(f"   ⚠️ {error_message}")

        # Handle direct text input
        elif isinstance(answer_input, str):
            print("   Processing raw answer text.")
            if not answer_input.strip():
                 error_message = "Provided answer text is empty or contains only whitespace."
                 print(f"   ⚠️ {error_message}")
            else:
                 answer_text = answer_input

        # Handle other unexpected input types
        else:
            error_message = f"Unexpected input type for user answer: {type(answer_input).__name__}. Expected text or file path."
            print(f"   ⚠️ {error_message}")

        # If processing failed but resulted in no text and no specific error
        if not answer_text and not error_message:
             error_message = "Could not process the provided user answer input."
             print(f"   ⚠️ {error_message}")

    except Exception as e:
        print(f"   ⚠️ Unexpected error in load_user_answer: {e}")
        error_message = f"Failed to process user answer input: {e}"

    # Return loaded text and any error
    return {"user_answer_text": answer_text, "error_message": error_message}

print("✅ Loading node functions defined.")

✅ Loading node functions defined.


### Task identification and classification
Includes the functions that use the standard LLM (llm) to analyze the loaded task description (identify_task_number to determine Task 1 vs Task 2) and then classify the specific type (classify_task_1_type or classify_task_2_type). Contains fallback logic using keywords if the LLM fails.


In [9]:
def identify_task_number(state: IeltsGraderState) -> Dict[str, Any]:
    """Identifies whether the task is Task 1 or Task 2 using LLM and heuristics."""
    print("--- Identifying Task Number ---")
    if state.get("error_message"): return {"error_message": state["error_message"]}

    description_content = state.get("task_description_content")
    if not description_content: return {"error_message": "Task description content missing."}

    text_parts = [item['text'] for item in description_content if item['type'] == 'text']
    description_text = "\n".join(text_parts)
    has_image = any(item['type'] == 'image_url' for item in description_content)
    print(f"   Analyzing description (has_image={has_image})...")

    prompt = f"""Analyze the IELTS task description to determine if it is Task 1 (describes visual info like charts, maps, processes) or Task 2 (essay question).
Keywords Task 1: shows, illustrates, depicts, diagram, table, graph, chart, map. Image implies Task 1.
Keywords Task 2: discuss, opinion, agree, disagree, advantages, problem, solution, causes, effects, extent.
Image provided: {'Yes' if has_image else 'No'}.
Description Text:
---
{description_text}
---
Respond ONLY with '1' or '2'.
"""
    task_number = None
    try:
        response = llm.invoke([HumanMessage(content=prompt)])
        task_number_str = response.content.strip().replace("'", "").replace("\"", "")
        if task_number_str == '1': task_number = 1
        elif task_number_str == '2': task_number = 2
        else: print(f"   LLM classification unclear ('{task_number_str}'). Using fallback.")
    except Exception as e:
        print(f"   ⚠️ LLM Error during task number identification: {e}. Using fallback.")

    # Fallback
    if task_number is None:
        t1_keywords = ['show', 'illustrate', 'depict', 'diagram', 'table', 'graph', 'chart', 'map', 'plan', 'process']
        t2_keywords = ['discuss', 'opinion', 'agree', 'disagree', 'solution', 'cause', 'effect', 'advantage', 'disadvantage', 'extent', 'problem']
        desc_lower = description_text.lower()
        if has_image: task_number = 1; print("   Fallback: Task 1 (Image detected).")
        elif any(kw in desc_lower for kw in t2_keywords): task_number = 2; print("   Fallback: Task 2 (Essay keywords detected).")
        elif any(kw in desc_lower for kw in t1_keywords): task_number = 1; print("   Fallback: Task 1 (Visual keywords detected).")
        else: task_number = 2; print("   Fallback: Defaulting to Task 2.") # Default assumption

    print(f"   ==> Identified Task Number: {task_number}")
    return {"task_number": task_number, "error_message": None}

def classify_task_1_type(state: IeltsGraderState) -> Dict[str, Any]:
    """Classifies the specific type of Task 1."""
    print("--- Classifying Task 1 Type ---")
    if state.get("error_message"): return {"error_message": state["error_message"]}
    description_content = state.get("task_description_content")
    if not description_content: return {"error_message": "Task description content missing."}

    llm_input_content = []
    text_parts = []
    has_image = False
    for item in description_content:
        llm_input_content.append(item)
        if item['type'] == 'text': text_parts.append(item['text'])
        if item['type'] == 'image_url': has_image = True
    prompt_text = "\n".join(text_parts)
    print(f"   Analyzing Task 1 description/visual (has_image={has_image})...")

    classification_prompt = f"""Analyze the provided IELTS Task 1 description and image (if present). Identify the specific visual type.
Description Text: --- {prompt_text} --- Image Provided: {'Yes' if has_image else 'No'}
Choose ONE type: 'bar_chart', 'line_graph', 'pie_chart', 'table_chart', 'map', 'process_diagram', 'multiple_charts'. Respond 'unknown' if unclear. Respond ONLY with the type name.
"""
    combined_input_for_llm = [{"type": "text", "text": classification_prompt}] + llm_input_content
    task_type = None
    allowed_types = ['bar_chart', 'line_graph', 'pie_chart', 'table_chart', 'map', 'process_diagram', 'multiple_charts', 'unknown']

    try:
        llm_to_use = multimodal_llm if has_image else llm
        response = llm_to_use.invoke([HumanMessage(content=combined_input_for_llm)])
        task_type_str = response.content.strip().lower().replace("'", "").replace("\"", "")
        if task_type_str in allowed_types: task_type = task_type_str
        else: print(f"   LLM classification invalid ('{task_type_str}'). Using fallback.")
    except Exception as e:
        print(f"   ⚠️ LLM Error during Task 1 type classification: {e}. Using fallback.")

    # Fallback
    if task_type is None:
        desc_lower = prompt_text.lower()
        if 'process' in desc_lower or 'how something is made' in desc_lower or 'stages of' in desc_lower: task_type = 'process_diagram'
        elif 'map' in desc_lower or 'plan of' in desc_lower: task_type = 'map'
        elif 'table' in desc_lower: task_type = 'table_chart'
        elif 'pie chart' in desc_lower: task_type = 'pie_chart'
        elif 'bar chart' in desc_lower or 'bar graph' in desc_lower: task_type = 'bar_chart'
        elif 'line graph' in desc_lower or 'line chart' in desc_lower: task_type = 'line_graph'
        elif 'charts show' in desc_lower or 'graphs show' in desc_lower or ('chart' in desc_lower and 'graph' in desc_lower): task_type = 'multiple_charts'
        else: task_type = 'unknown'
        print(f"   Fallback classification: {task_type}")

    print(f"   ==> Identified Task 1 Type: {task_type}")
    return {"task_1_type": task_type, "error_message": None}

def classify_task_2_type(state: IeltsGraderState) -> Dict[str, Any]:
    """Classifies the specific type of Task 2 essay."""
    print("--- Classifying Task 2 Type ---")
    if state.get("error_message"): return {"error_message": state["error_message"]}
    description_content = state.get("task_description_content")
    if not description_content: return {"error_message": "Task description content missing."}

    text_parts = [item['text'] for item in description_content if item['type'] == 'text']
    description_text = "\n".join(text_parts)
    print(f"   Analyzing Task 2 question: '{description_text[:150]}...'")

    prompt = f"""Analyze the IELTS Task 2 essay question. Identify the specific type.
Question: --- {description_text} ---
Types & Keywords:
'opinion': agree/disagree, opinion, extent
'discussion': discuss both views/sides
'problem_solution': causes, problems, solutions, solve
'advantages_disadvantages': advantages, disadvantages, outweigh
'double_question': Two distinct questions
Choose ONE type: 'opinion', 'discussion', 'problem_solution', 'advantages_disadvantages', 'double_question'. Respond 'unknown' if unclear. Respond ONLY with the type name.
"""
    task_type = None
    allowed_types = ['opinion', 'discussion', 'problem_solution', 'advantages_disadvantages', 'double_question', 'unknown']

    try:
        response = llm.invoke([HumanMessage(content=prompt)])
        task_type_str = response.content.strip().lower().replace("'", "").replace("\"", "")
        if task_type_str in allowed_types: task_type = task_type_str
        else: print(f"   LLM classification invalid ('{task_type_str}'). Using fallback.")
    except Exception as e:
        print(f"   ⚠️ LLM Error during Task 2 type classification: {e}. Using fallback.")

    # Fallback
    if task_type is None:
        desc_lower = description_text.lower()
        q_count = description_text.count('?')
        if 'to what extent' in desc_lower or 'do you agree or disagree' in desc_lower or "what is your opinion" in desc_lower: task_type = 'opinion'
        elif 'discuss both' in desc_lower and ('and give your' in desc_lower or 'and give opinion' in desc_lower): task_type = 'discussion'
        elif 'advantages' in desc_lower and 'disadvantages' in desc_lower: task_type = 'advantages_disadvantages'
        elif ('problem' in desc_lower or 'cause' in desc_lower) and ('solution' in desc_lower or 'solve' in desc_lower): task_type = 'problem_solution'
        elif q_count >= 2 and ('why' in desc_lower or 'what' in desc_lower or 'how' in desc_lower): task_type = 'double_question'
        else: task_type = 'unknown'
        print(f"   Fallback classification: {task_type}")

    print(f"   ==> Identified Task 2 Type: {task_type}")
    return {"task_2_type": task_type, "error_message": None}

print("✅ Identification and Classification node functions defined.")

✅ Identification and Classification node functions defined.


### Summarize graph and evaluation
extract_visual_summary function (for Task 1 images) using the multimodal LLM (multimodal_llm) to identify key visual features. Crucially, it contains the main evaluate_writing function, which uses the multimodal LLM, all gathered context (task type, description, answer, rubrics, structure, visual summary), and detailed feedback for each criterion, outputting the results as structured JSON.

In [10]:
def extract_visual_summary(state: IeltsGraderState) -> Dict[str, Any]:
    """Extracts key features from the visual for Task 1 using multimodal LLM."""
    print("--- Extracting Visual Summary (Task 1) ---")
    if state.get("error_message"): return {"error_message": state["error_message"]}

    description_content = state.get("task_description_content", [])
    task_1_type = state.get("task_1_type", "unknown")

    llm_input_content = []
    text_parts = []
    has_valid_image = False
    image_load_error = None
    for item in description_content:
        if item['type'] == 'image_url' and item['image_url']['url'].startswith('data:image'):
            llm_input_content.append(item)
            has_valid_image = True
        elif item['type'] == 'text':
            llm_input_content.append(item)
            text_parts.append(item['text'])
            if "[Error loading image:" in item['text']: image_load_error = item['text']

    prompt_text = "\n".join(text_parts)

    if not has_valid_image:
         summary = "No visual provided or visual could not be processed."
         if image_load_error: summary = f"Visual summary skipped due to image loading error noted: {image_load_error}"
         print(f"   Skipping visual summary: {summary}")
         return {"intermediate_results": {**state.get("intermediate_results", {}), "visual_summary": summary}, "error_message": None}

    print(f"   Extracting summary for visual type: {task_1_type}")
    summary_prompt = f"""As an expert IELTS Task 1 analyst, examine the visual (type: {task_1_type}) and its description.
Task Description Text: --- {prompt_text} --- Image Data is Provided.
Identify and list the *essential key features, main trends, significant data points, and necessary comparisons* that MUST be included in a high-scoring (Band 7+) Task 1 report. Be specific with values/labels. Do NOT write the report. Output ONLY a bulleted list.
Key Features List:
"""
    combined_input_for_llm = [{"type": "text", "text": summary_prompt}] + llm_input_content
    visual_summary = "Error: Summary extraction failed."
    try:
        print("   Invoking multimodal LLM for summary...")
        response = multimodal_llm.invoke([HumanMessage(content=combined_input_for_llm)])
        visual_summary = response.content.strip()
        print(f"   ==> Extracted Visual Summary:\n{visual_summary}")
    except Exception as e:
        visual_summary = f"Error extracting summary via LLM: {e}"
        print(f"   ⚠️ LLM Error during visual summary: {e}")

    intermediate_results = {**state.get("intermediate_results", {}), "visual_summary": visual_summary}
    return {"intermediate_results": intermediate_results, "error_message": None}

print("✅ Visual Summary node function defined.") 

✅ Visual Summary node function defined.


Helper function for rounding


In [11]:
import math

def round_to_nearest_half(number: float) -> float:
    """Rounds a number to the nearest 0.5 (e.g., 6.1 -> 6.0, 6.25 -> 6.5, 6.75 -> 7.0)."""
    return round(number * 2) / 2.0

print("✅ Helper function defined.")

✅ Helper function defined.


In [12]:
def evaluate_writing(state: IeltsGraderState) -> Dict[str, Any]:
    """Performs evaluation, asks LLM for text output, parses it robustly, and calculates overall score."""
    print("--- Evaluating Writing ---")
    if state.get("error_message"): return {"error_message": state["error_message"]}

    # --- Gather context (same as before) ---
    task_number = state.get("task_number")
    task_1_type = state.get("task_1_type")
    task_2_type = state.get("task_2_type")
    user_answer = state.get("user_answer_text")
    description_content = state.get("task_description_content", [])
    visual_summary = state.get("intermediate_results", {}).get("visual_summary")

    if not task_number or not user_answer:
        return {"error_message": "Cannot evaluate: Missing task number or user answer."}

    if task_number == 1:
        task_type_name = task_1_type or 'unknown'; rubric_key = "Task 1"; response_criterion = "TA"
    else:
        task_type_name = task_2_type or 'unknown'; rubric_key = "Task 2"; response_criterion = "TR"
    criteria_names = [response_criterion, "CC", "LR", "GRA"]
    print(f"   Evaluating Task {task_number} ({task_type_name}) answer...")

    try: # Prepare rubrics/structure for prompt
        rubrics_for_task = IELTS_RUBRICS[rubric_key]
        structure_guideline = TASK_STRUCTURES[rubric_key].get(task_type_name, TASK_STRUCTURES[rubric_key]['unknown'])
        rubric_prompt_text = f"Key points from IELTS {rubric_key} Band Descriptors (Focus on Bands 9, 8, 7, 6, 5):\n"
        for crit in criteria_names:
             rubric_prompt_text += f"\n{crit}:\n"
             for band in [9, 8, 7, 6, 5]:
                 desc = rubrics_for_task[crit].get(band, f"Band {band} N/A")
                 rubric_prompt_text += f"  Band {band}: {desc[:150].strip()}...\n"
    except KeyError as e:
        return {"error_message": f"Configuration error: Missing rubrics/structure for {e}."}

    description_text = "\n".join([item['text'] for item in description_content if item['type'] == 'text'])

    # --- Evaluation Prompt ---
    eval_prompt_parts = [
        f"You are a meticulous, fair, and highly detailed IELTS Writing examiner. Your goal is to provide an accurate band score and exceptionally thorough, constructive feedback.",
        f"Evaluate the user's answer for IELTS Writing Task {task_number} ({task_type_name}).",
        f"\nTask Description Context:\n---\n{description_text}\n---"
    ]
    if task_number == 1 and visual_summary and "Error" not in visual_summary and "No visual" not in visual_summary and "not be processed" not in visual_summary:
        eval_prompt_parts.append(f"\nExpected Key Features (from visual analysis):\n---\n{visual_summary}\n---")
    elif task_number == 1:
        eval_prompt_parts.append(f"\nNote Regarding Visual: {visual_summary or 'Not generated.'}\n---")

    eval_prompt_parts.extend([
        f"\nUser's Answer:\n---\n{user_answer}\n---",
        f"\nSuggested Structure Guide for '{task_type_name}':\n---\n{structure_guideline}\n---",
        f"\nReference IELTS Band Descriptors ({rubric_key}):\n---\n{rubric_prompt_text}\n---",
        f"\nEvaluation Instructions:",
        f"1. Accuracy is paramount. Evaluate *strictly* based on the IELTS criteria: {response_criterion}, CC, LR, GRA.",
        f"2. **Assign a score for EACH criterion using the IELTS scale (1.0 to 9.0) in increments of 0.5** (e.g., 6.0, 6.5, 7.0, 7.5).",
        f"3. **Assign a '.5' score (e.g., 7.5) if performance surpasses the lower band but doesn't consistently meet the next whole band.**",
        f"4. **Provide VERY DETAILED, SPECIFIC, and CONSTRUCTIVE feedback for EACH criterion.**",
        f"    - Justify the score with clear explanations referencing the band descriptors.",
        f"    - **Cite specific examples or patterns from the user's text.** Use brief quotes if helpful (e.g., 'The phrase \"increaseMENT\" shows a word formation error.').",
        f"    - Explicitly list **Strengths** (what the user did well according to the criteria).",
        f"    - Explicitly list **Weaknesses/Areas for Improvement** (specific issues and *actionable* advice on how to improve). Be precise (e.g., 'Improve topic sentences by...', 'Vary sentence structures by including complex sentences like...', 'Check for subject-verb agreement errors such as...').",
        f"    - Ensure feedback addresses all aspects of the criterion (e.g., for CC: organization, paragraphing, cohesion; for LR: range, accuracy, collocation; for GRA: range, accuracy, punctuation).",
        f"    - For {response_criterion}: Be specific about which task requirements were met/missed (e.g., 'Key feature X was well-described, but the comparison between Y and Z was missing.' or 'The first part of the question was addressed, but the second part lacked sufficient development.').",
        f"5. **Output the results as plain text, following this EXACT format for EACH criterion:**",
        f"--- CRITERION: {response_criterion} ---",
        f"Score: [SCORE_FLOAT_1.0_to_9.0_in_0.5_steps]",
        f"Feedback:",
        f"Strengths: [List specific strengths here, citing examples/patterns]",
        f"Weaknesses/Areas for Improvement: [List specific weaknesses and actionable advice, citing examples/patterns]",
        f"--- CRITERION: CC ---",
        f"Score: [SCORE_FLOAT_1.0_to_9.0_in_0.5_steps]",
        f"Feedback:",
        f"Strengths: [List specific strengths here]",
        f"Weaknesses/Areas for Improvement: [List specific weaknesses and actionable advice]",
        f"--- CRITERION: LR ---",
        f"Score: [SCORE_FLOAT_1.0_to_9.0_in_0.5_steps]",
        f"Feedback:",
        f"Strengths: [List specific strengths here]",
        f"Weaknesses/Areas for Improvement: [List specific weaknesses and actionable advice]",
        f"--- CRITERION: GRA ---",
        f"Score: [SCORE_FLOAT_1.0_to_9.0_in_0.5_steps]",
        f"Feedback:",
        f"Strengths: [List specific strengths here]",
        f"Weaknesses/Areas for Improvement: [List specific weaknesses and actionable advice]",
        f"--- END EVALUATION ---",
        f"6. **Do NOT include any other text, apologies, or formatting.** Ensure feedback under each criterion is substantial."
    ])

    
    final_eval_prompt = "\n".join(eval_prompt_parts)
    llm_eval_input_messages = [HumanMessage(content=description_content + [{"type": "text", "text": final_eval_prompt}])]
    eval_llm = multimodal_llm if any(item.get('type') == 'image_url' for item in description_content) else llm

    parsed_evaluation_criteria = {}
    calculated_overall_score = None
    error_message = None
    possible_scores_set = {i / 2.0 for i in range(2, 19)} # 1.0, 1.5, ..., 9.0

    try:
        print("   Invoking LLM for evaluation (text output)...")
        # Increase timeout slightly more if needed, evaluation can be long
        # eval_llm.request_options = {"timeout": 360}
        response = eval_llm.invoke(llm_eval_input_messages)
        response_content = response.content.strip()
        print("   LLM evaluation response received.")
        print(f"   Raw LLM Response (up to 2000 chars):\n---\n{response_content[:2000]}\n---")

        # --- ROBUST Text Parsing Logic (with NameError fix) ---
        current_criterion = None
        current_score = None # Use this variable to hold score for the current section
        current_feedback_lines = []

        response_to_parse = response_content + "\n--- END EVALUATION ---" # Append end marker

        for line in response_to_parse.splitlines():
            line_stripped = line.strip()

            # Check for new criterion marker OR end marker
            is_marker = line_stripped.startswith("--- CRITERION:")
            is_end_marker = line_stripped.startswith("--- END EVALUATION ---")

            if is_marker or is_end_marker:
                # Finalize the PREVIOUS criterion before starting new or ending
                if current_criterion and current_criterion in criteria_names:
                    feedback_text = "\n".join(current_feedback_lines).strip()
                    # *** FIX: Check current_score, not score ***
                    if current_score is None:
                         print(f"   ⚠️ Critical Error: Score not found/parsed for {current_criterion}.")
                         # Set score to a placeholder like 0 or None to indicate failure
                         parsed_evaluation_criteria[current_criterion] = {"score": None, "feedback": feedback_text or "(Feedback extracted, but score parsing failed)"}
                    elif not feedback_text:
                         print(f"   ⚠️ Warning: Feedback seems empty for {current_criterion}.")
                         parsed_evaluation_criteria[current_criterion] = {"score": current_score, "feedback": "(Feedback not extracted or provided by LLM)"}
                    else:
                        # Store valid parsed data
                        parsed_evaluation_criteria[current_criterion] = {
                            "score": current_score,
                            "feedback": feedback_text
                        }
                    # Reset for next criterion
                    current_score = None
                    current_feedback_lines = []

                # Extract the new criterion name (if it's a criterion marker)
                if is_marker:
                    current_criterion = line_stripped.replace("--- CRITERION:", "").replace("---", "").strip()
                    if current_criterion not in criteria_names:
                         print(f"   ⚠️ Warning: Found unexpected criterion marker '{current_criterion}'.")
                         current_criterion = None # Ignore this section
                    else:
                         print(f"   Parsing section for: {current_criterion}")
                else: # It's the end marker
                    current_criterion = None
                continue # Move to the next line

            # If we are inside a valid criterion section being processed
            if current_criterion:
                line_lower_stripped = line_stripped.lower()
                # Find score line (only once per section)
                if current_score is None and line_lower_stripped.startswith("score:"):
                    try:
                        score_str = line_stripped.split(":", 1)[1].strip()
                        parsed_score = float(score_str)
                        if parsed_score in possible_scores_set:
                            current_score = parsed_score # Store the valid score
                            print(f"      Found score: {current_score}")
                        else:
                            print(f"   ⚠️ Warning: Invalid score value '{parsed_score}' for {current_criterion}.")
                    except Exception as score_e:
                        print(f"   ⚠️ Error parsing score for {current_criterion} from '{line_stripped}': {score_e}")
                    continue # Don't add score line to feedback

                # Skip the literal "Feedback:" line
                elif line_lower_stripped == "feedback:":
                    continue

                # Otherwise, add to feedback lines
                elif line_stripped: # Add non-empty lines
                    current_feedback_lines.append(line_stripped) # Collect feedback lines

        # --- Validation and Score Calculation ---
        num_parsed = len(parsed_evaluation_criteria)
        print(f"   Finished parsing. Found data for {num_parsed} criteria sections.")
        if num_parsed != 4:
            missing = [c for c in criteria_names if c not in parsed_evaluation_criteria]
            error_message = f"Parsing Error: Expected 4 criteria, but only found {num_parsed}. Missing or failed to parse: {missing}."
            print(f"   ⚠️ {error_message}")
        else:
            valid_scores = []
            failed_scores = []
            for crit, data in parsed_evaluation_criteria.items():
                if data.get('score') is not None: valid_scores.append(data['score'])
                else: failed_scores.append(crit)

            if failed_scores:
                error_message = f"Parsing Error: Failed to extract valid scores for criteria: {failed_scores}."
                print(f"   ⚠️ {error_message}")
            else:
                avg_score = sum(valid_scores) / 4.0
                calculated_overall_score = round_to_nearest_half(avg_score)
                print(f"   Evaluation parsed successfully. Calculated Overall Score: {calculated_overall_score}")

    except Exception as e:
        error_message = f"Error during evaluation processing: {e}"
        print(f"   ⚠️ {error_message}")
        import traceback
        traceback.print_exc()

    if calculated_overall_score is not None and not error_message:
        return { "evaluation_criteria": parsed_evaluation_criteria, "overall_score": calculated_overall_score, "error_message": None }
    else:
         final_error = error_message or "Evaluation failed: Could not parse all results."
         return {"evaluation_criteria": parsed_evaluation_criteria, "error_message": final_error} # Pass partial results if error occurred late

print("✅ Evaluation node function updated for Text Output parsing.")

✅ Evaluation node function updated for Text Output parsing.


### Report compilation and error handling
Includes compile_final_report which takes the JSON evaluation results from the state and formats them into a user-friendly Markdown report. It also defines handle_error, a safety-net node that catches errors from any previous step and produces a simple error message report.

In [13]:
def handle_error(state: IeltsGraderState) -> Dict[str, Any]:
    """Catches errors and formats an error report."""
    # print(f"--- Error Handler Node ---") # Suppressed
    error_message = state.get("error_message", "Unknown error.")
    # print(f"   Error caught: {error_message}") # Suppressed
    final_report = f"## IELTS Writing Grading Failed\n\nAn error occurred:\n\n```\n{error_message}\n```\n\nPlease review inputs or check logs."
    return {"final_feedback_report": final_report, "error_message": error_message}


def compile_final_report(state: IeltsGraderState) -> Dict[str, Any]:
    """Compiles the final report using full criterion names."""
    # print("--- Compiling Final Report ---") # Suppressed
    error_message = state.get("error_message")
    if error_message:
        # print(f"   Skipping report due to previous error: {error_message}") # Suppressed
        # Use the error handling node's output instead if an error occurred earlier
        # This node assumes successful evaluation data is present if reached via normal flow
         final_report = f"## Grading Incomplete\n\nAn error occurred before final report compilation:\n```\n{error_message}\n```"
         return {"final_feedback_report": final_report}


    evaluation = state.get("evaluation_criteria")
    overall_score = state.get("overall_score")
    task_number = state.get("task_number")
    task_1_type = state.get("task_1_type")
    task_2_type = state.get("task_2_type")

    if not evaluation or overall_score is None or not task_number:
        err = "Cannot compile report: Essential evaluation results missing."
        # print(f"   ⚠️ {err}") # Suppressed
        return {"final_feedback_report": f"## Grading Incomplete\n\n{err}"}

    criterion_full_names = {
        "TA": "Task Achievement",
        "TR": "Task Response",
        "CC": "Coherence and Cohesion",
        "LR": "Lexical Resource",
        "GRA": "Grammatical Range and Accuracy"
    }

    task_type_name = task_1_type if task_number == 1 else task_2_type
    task_type_name = (task_type_name or f"Task {task_number}").replace('_', ' ').title()
    response_criterion_abbr = "TA" if task_number == 1 else "TR"
    criteria_order = [response_criterion_abbr, "CC", "LR", "GRA"]

    report_parts = [
        f"# IELTS Writing Task {task_number} ({task_type_name}) - Feedback Report",
        f"\n## Overall Estimated Band Score: {overall_score}\n"
        f"{'-'*40}"
    ]

    all_criteria_present = True
    for crit_abbr in criteria_order:
        crit_data = evaluation.get(crit_abbr)
        crit_full_name = criterion_full_names.get(crit_abbr, crit_abbr)

        if isinstance(crit_data, dict):
            score = crit_data.get("score")
            feedback = crit_data.get("feedback", "_No specific feedback provided._").strip()
            score_display = score if score is not None else "N/A (Parsing Error)"
            report_parts.append(f"\n### {crit_full_name} (Score: {score_display})\n")
            report_parts.append(f"{feedback}\n")
            if score is None: all_criteria_present = False
        else:
             all_criteria_present = False
             # Use full name in the error message
             report_parts.append(f"\n### {crit_full_name} (Score: N/A)\n")
             report_parts.append("_Evaluation data missing for this criterion._\n")

    # if not all_criteria_present: print("   Warning: Some criteria data missing/parsing failed.") # Suppressed

    final_report = "\n".join(report_parts)
    # print("   ✅ Final report compiled.") # Suppressed
    return {"final_feedback_report": final_report, "error_message": None}

### Graph definition
This section uses LangGraph (StateGraph) to connect all the previously defined node functions into a coherent workflow. It sets the entry point (ask_for_task_description) and defines the conditional logic (using functions like check_for_errors, decide_task_path) that directs the flow of information and control between nodes based on the current state (e.g., routing to Task 1 vs Task 2 path, handling errors). 

In [14]:
print("\n--- Defining LangGraph Workflow ---")
workflow = StateGraph(IeltsGraderState)

# Add nodes
print("   Adding nodes...")
# (Keep the workflow.add_node(...) calls from the previous version here)
workflow.add_node("ask_for_task_description", ask_for_task_description)
workflow.add_node("load_task_description", load_task_description)
workflow.add_node("ask_for_user_answer", ask_for_user_answer)
workflow.add_node("load_user_answer", load_user_answer)
workflow.add_node("identify_task_number", identify_task_number)
workflow.add_node("classify_task_1_type", classify_task_1_type)
workflow.add_node("classify_task_2_type", classify_task_2_type)
workflow.add_node("extract_visual_summary", extract_visual_summary)
workflow.add_node("evaluate_writing", evaluate_writing)
workflow.add_node("compile_final_report", compile_final_report)
workflow.add_node("handle_error", handle_error)


# Set entry point
workflow.set_entry_point("ask_for_task_description")
print("   Entry point set.")

# Conditional routing functions (remain the same logic)
def check_for_errors(state: IeltsGraderState) -> Literal["error", "continue"]:
    return "error" if state.get("error_message") else "continue"

def decide_task_path(state: IeltsGraderState) -> Literal["error", "task_1_branch", "task_2_branch"]:
    if state.get("error_message"): return "error"
    task_num = state.get("task_number")
    if task_num == 1: return "task_1_branch"
    if task_num == 2: return "task_2_branch"
    print("   Warning: Task number invalid for routing.")
    state['error_message'] = state.get('error_message', '') + " Task number identification failed for routing."
    return "error"

def decide_task_1_visual_path(state: IeltsGraderState) -> Literal["error", "summarize_visual", "evaluate"]:
    if state.get("error_message"): return "error"
    has_image_data = any(item.get('type') == 'image_url' and item['image_url']['url'].startswith('data:image')
                         for item in state.get("task_description_content", []))
    task_1_type = state.get("task_1_type")
    if has_image_data and task_1_type != 'unknown':
        print("   Routing Task 1 to: extract_visual_summary")
        return "summarize_visual"
    else:
        print("   Routing Task 1 directly to: evaluate_writing")
        return "evaluate"

# Add edges
print("   Adding edges...")
workflow.add_conditional_edges("ask_for_task_description", check_for_errors, {"continue": "load_task_description", "error": "handle_error"})
workflow.add_conditional_edges("load_task_description", check_for_errors, {"continue": "ask_for_user_answer", "error": "handle_error"})
workflow.add_conditional_edges("ask_for_user_answer", check_for_errors, {"continue": "load_user_answer", "error": "handle_error"})
workflow.add_conditional_edges("load_user_answer", check_for_errors, {"continue": "identify_task_number", "error": "handle_error"})
workflow.add_conditional_edges("identify_task_number", decide_task_path, {"task_1_branch": "classify_task_1_type", "task_2_branch": "classify_task_2_type", "error": "handle_error"})
workflow.add_conditional_edges("classify_task_1_type", decide_task_1_visual_path, {"summarize_visual": "extract_visual_summary", "evaluate": "evaluate_writing", "error": "handle_error"})
workflow.add_conditional_edges("classify_task_2_type", check_for_errors, {"continue": "evaluate_writing", "error": "handle_error"})
workflow.add_conditional_edges("extract_visual_summary", check_for_errors, {"continue": "evaluate_writing", "error": "handle_error"})
workflow.add_conditional_edges("evaluate_writing", check_for_errors, {"continue": "compile_final_report", "error": "handle_error"})
workflow.add_edge("compile_final_report", END)
workflow.add_edge("handle_error", END)

print("   Edges defined.")
app = workflow.compile()


--- Defining LangGraph Workflow ---
   Adding nodes...
   Entry point set.
   Adding edges...
   Edges defined.


In [15]:
print("\n" + "="*60)
print("🚀 Starting Interactive IELTS Grader 🚀")
print("="*60)
print("ℹ️ You will be prompted for Task Description and Answer.")
print("   Provide text directly or the FULL path to a file (Image/PDF/TXT).")

# Define the initial empty state for the graph
initial_state = IeltsGraderState(
    task_description_input=None, user_answer_input=None,
    task_description_content=None, user_answer_text=None,
    task_number=None, task_1_type=None, task_2_type=None,
    intermediate_results={}, evaluation_criteria=None,
    overall_score=None, final_feedback_report=None, error_message=None
)

# Configuration for the graph execution
config = {"recursion_limit": 25} # Max steps allowed in the graph

# === Run the Agent ===
# This call starts the graph and will pause for user input()
final_state = None
try:
    # The main execution call
    final_state = app.invoke(initial_state, config=config)

    # --- Display Final Report ---
    print("\n" + "="*60)
    print("🏁 Grading Process Finished 🏁")
    print("="*60)
    # Retrieve and print the final report (or error message)
    # Use a default message if the key is somehow missing
    report = final_state.get("final_feedback_report", "Error: Final state incomplete, no report generated.")
    print("\n" + report) # Print the Markdown formatted report

except KeyboardInterrupt:
    print("\n\n❌ Process interrupted by user (Ctrl+C).")
except Exception as e:
     # Catch unexpected errors during the execution flow
     print(f"\n\n❌ An unexpected error occurred during graph execution: {e}")
     import traceback
     traceback.print_exc() # Print detailed traceback for debugging
     # Optionally print the state at the time of error
     if final_state:
        print("\n--- State at time of error ---")
        # Avoid printing large potentially sensitive data like encoded images
        state_summary = {k: (type(v).__name__ + (f" len={len(v)}" if hasattr(v, '__len__') and k != 'task_description_content' else ''))
                         for k, v in final_state.items()}
        try:
             print(json.dumps(state_summary, indent=2))
        except TypeError:
             print(str(state_summary)) # Fallback to string

# === End of Invocation ===
print("\n--- Script execution attempt complete ---")


🚀 Starting Interactive IELTS Grader 🚀
ℹ️ You will be prompted for Task Description and Answer.
   Provide text directly or the FULL path to a file (Image/PDF/TXT).

--- Task Description Input ---

🏁 Grading Process Finished 🏁

## IELTS Writing Grading Failed

An error occurred:

```
Error getting task description input: raw_input was called, but this frontend does not support input requests.
```

Please review inputs or check logs.

--- Script execution attempt complete ---
