Skip to content

FSA Correction, overall backend pipeline#12

Merged
JohnnyWan1123 merged 18 commits intomainfrom
correction
Jan 16, 2026
Merged

FSA Correction, overall backend pipeline#12
JohnnyWan1123 merged 18 commits intomainfrom
correction

Conversation

@Ren-creater
Copy link
Copy Markdown
Collaborator

see README.md in the correction folder

Copilot AI review requested due to automatic review settings January 15, 2026 19:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new FSA Correction Module that compares student FSAs against expected FSAs by leveraging existing validation functions. The module provides detailed feedback about structural errors, language equivalence, and optionally minimality checking.

Changes:

  • Added new correction module (correction.py and __init__.py) with pipeline for FSA comparison
  • Removed unused Answer schema (answer.py) and related imports from schemas module
  • Added comprehensive test suite for the correction module (test_correction.py)
  • Updated schema documentation to remove Answer schema references

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
evaluation_function/correction/correction.py Implements 4-step correction pipeline: validation, minimality check, structural analysis, and language equivalence
evaluation_function/correction/__init__.py Exports correction module functions and classes
evaluation_function/correction/README.md Documents correction module architecture and usage
evaluation_function/test/test_correction.py Comprehensive test suite for correction functionality
evaluation_function/schemas/answer.py Removed unused Answer schema definitions
evaluation_function/schemas/__init__.py Removed Answer schema exports
evaluation_function/schemas/README.md Removed Answer schema documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +120 to +121
if is_valid_fsa(fsa):
return False, []
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is inverted. is_valid_fsa(fsa) returns an empty list when valid and a list of errors when invalid. An empty list is falsy in Python, so this check returns False when the FSA is valid, which is incorrect. Should be if is_valid_fsa(fsa): to check if there are any errors (truthy non-empty list).

Copilot uses AI. Check for mistakes.
if len(minimized.states) < len(fsa.states):
return False, ValidationError(
message=f"FSA is not minimal: has {len(fsa.states)} states but can be reduced to {len(minimized.states)}",
code=ErrorCode.NOT_COMPLETE,
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong error code for minimality check. The error is about minimality, not completeness. Should use a more appropriate error code or consider adding a NOT_MINIMAL error code if one doesn't exist.

Suggested change
code=ErrorCode.NOT_COMPLETE,
code=ErrorCode.NOT_MINIMAL,

Copilot uses AI. Check for mistakes.
Comment on lines +350 to +351
if is_valid_fsa(student_fsa) or is_valid_fsa(expected_fsa):
return False, None, None
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same inverted logic issue as in trace_string. Since is_valid_fsa() returns an empty list for valid FSAs (falsy), this condition will be False when both FSAs are valid, which is incorrect. The logic should check for errors: if is_valid_fsa(student_fsa) or is_valid_fsa(expected_fsa) should be checking if there ARE errors (truthy non-empty list).

Copilot uses AI. Check for mistakes.
Comment on lines +227 to +230
def test_quick_equivalence_check_different(self, dfa_accepts_a, dfa_accepts_a_or_b):
is_equiv, hint, hint_type = quick_equivalence_check(dfa_accepts_a, dfa_accepts_a_or_b)
assert is_equiv is False
assert hint is not None
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will likely fail due to the bug in quick_equivalence_check line 350-351 where the validation logic is inverted. Both test FSAs are valid, so the function will incorrectly return (False, None, None) immediately, but the test expects a hint to be present.

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +141
def test_trace_accepted(self, dfa_accepts_a):
accepted, trace = trace_string(dfa_accepts_a, "a")
assert accepted is True
assert trace == ["q0", "q1"]
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail due to the bug in trace_string line 120 where the validation check is inverted. Since dfa_accepts_a is valid, is_valid_fsa will return an empty list (falsy), making the condition truthy, causing the function to incorrectly return (False, []).

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +146
def test_trace_rejected(self, dfa_accepts_a):
accepted, trace = trace_string(dfa_accepts_a, "b")
assert accepted is False
assert trace == ["q0", "q2"]
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will also fail due to the same inverted validation logic bug in trace_string at line 120.

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +151
def test_fsa_accepts(self, dfa_accepts_a):
assert fsa_accepts(dfa_accepts_a, "a") is True
assert fsa_accepts(dfa_accepts_a, "b") is False
assert fsa_accepts(dfa_accepts_a, "aa") is False
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While fsa_accepts correctly uses accepts_string from the validation module, these tests verify the function works correctly, which is good. However, it's worth noting that this function could be considered redundant since it's just a thin wrapper around accepts_string.

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +156
def test_empty_string(self, dfa_accepts_a):
accepted, trace = trace_string(dfa_accepts_a, "")
assert accepted is False
assert trace == ["q0"]
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail due to the inverted validation logic bug in trace_string at line 120.

Copilot uses AI. Check for mistakes.
"""

import pytest
from evaluation_function.schemas import FSA, ValidationError, ErrorCode
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'FSA' is not used.

Suggested change
from evaluation_function.schemas import FSA, ValidationError, ErrorCode
from evaluation_function.schemas import ValidationError, ErrorCode

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if len(minimized.states) < len(fsa.states):
return False, ValidationError(
message=f"FSA is not minimal: has {len(fsa.states)} states but can be reduced to {len(minimized.states)}",
code=ErrorCode.NOT_COMPLETE,
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ErrorCode.NOT_COMPLETE is semantically incorrect for minimality errors. This error code is intended for completeness validation (missing transitions), not for FSA minimality. Consider using a more appropriate error code or adding a NOT_MINIMAL code to the ErrorCode enum.

Suggested change
code=ErrorCode.NOT_COMPLETE,
code=ErrorCode.NOT_MINIMAL,

Copilot uses AI. Check for mistakes.
def analyze_fsa_correction(
student_fsa: FSA,
expected_fsa: FSA,
check_minimality: bool = False
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name 'check_minimality' shadows the function 'check_minimality' defined at line 132. This creates ambiguity and can lead to confusion. Consider renaming the parameter to 'require_minimality' or 'verify_minimality'.

Copilot uses AI. Check for mistakes.
Returns:
(are_equivalent, counterexample_hint, hint_type)
"""
if is_valid_fsa(student_fsa) or is_valid_fsa(expected_fsa):
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition should return False when validation errors exist (truthy list), but the logic is inverted. The function is_valid_fsa returns a list of errors (empty list when valid). The condition should be 'if is_valid_fsa(student_fsa) or is_valid_fsa(expected_fsa):' but with a check like 'if len(is_valid_fsa(student_fsa)) > 0 or len(is_valid_fsa(expected_fsa)) > 0:' to correctly identify invalid FSAs.

Suggested change
if is_valid_fsa(student_fsa) or is_valid_fsa(expected_fsa):
student_validation_errors = is_valid_fsa(student_fsa)
expected_validation_errors = is_valid_fsa(expected_fsa)
if len(student_validation_errors) > 0 or len(expected_validation_errors) > 0:

Copilot uses AI. Check for mistakes.
@JohnnyWan1123 JohnnyWan1123 merged commit b071a80 into main Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants