Add grammatical form equivalence checking for cross-linguistic case aliases by powera · Pull Request #362 · powera/greenland

powera · 2026-02-25T02:49:58Z

Summary

This PR introduces a new module for normalizing and comparing grammatical form strings, with support for language-specific case name aliases. This enables the scoring system to recognize when different linguistic terminology refers to the same grammatical form (e.g., Lithuanian locative vs. cross-linguistic inessive).

Key Changes

New module: langtools/form_equivalences.py
- normalize_grammatical_form(): Resolves language-specific case aliases and reorders components to a canonical order (case → number → gender)
- are_grammatical_forms_equivalent(): Compares two form strings accounting for aliases and component ordering
- Handles parsing of form strings in the role/lang_component_component format
New module: langtools/lt/case_equivalences.py
- Defines Lithuanian case aliases mapping "inessive" → "locative"
- Includes documentation explaining the linguistic rationale (Lithuanian locative is functionally equivalent to the cross-linguistic inessive case)
Updated: benchmarks/lib/runners/sentence_decomposition_runner.py
- Integrated are_grammatical_forms_equivalent() into _grammatical_form_similarity() method
- Now returns full credit (1.0) when forms are equivalent according to language-specific rules
- Applied code formatting improvements (line length, consistency)
Updated: tests/benchmarks/test_sentence_decomposition_scoring.py
- Added test_0062_lt_inessive_scores_same_as_locative() to verify Lithuanian case equivalence scoring
- Applied code formatting improvements for consistency
New test file: tests/langtools/test_form_equivalences.py
- Comprehensive test coverage for normalization and equivalence checking
- Tests alias resolution, component reordering, case-insensitivity, and language-specific behavior

Implementation Details

Form normalization uses regex parsing to extract role, language code, and components
Components are categorized (case/other, number, gender) and reordered to ensure consistent canonical form
Language-specific aliases are pluggable via the _LANG_CASE_ALIASES dictionary
All form string comparisons are case-insensitive
Forms without language prefixes (e.g., "preposition/base") are returned unchanged

https://claude.ai/code/session_01XLApSPLmn4brnG74uriPkQ

Introduces langtools.form_equivalences with normalize_grammatical_form() and are_grammatical_forms_equivalent() to handle cases where different term conventions describe the same grammatical form. The first equivalence defined is Lithuanian locative = inessive: the traditional Lithuanian grammar term "locative" (vietininkas) is the same case that cross-linguistic typologists and LLMs familiar with Finnish/ Estonian/Hungarian often call "inessive". The 0062 sentence-decomposition benchmark scorer now calls are_grammatical_forms_equivalent() inside _grammatical_form_similarity(), so noun/lt_locative_singular and noun/lt_inessive_singular (or any reordering such as noun/lt_singular_inessive) all receive full credit. https://claude.ai/code/session_01XLApSPLmn4brnG74uriPkQ

powera · 2026-02-25T02:51:44Z

@codex for review - is this needed? We could have per-language context to avoid needing to convert output.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07525d9898

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T02:55:57Z

src/langtools/form_equivalences.py

+import re
+from typing import Dict, List, Optional, Tuple
+
+from langtools.lt.case_equivalences import LT_CASE_ALIASES


Avoid importing LT package for static alias map

Importing LT_CASE_ALIASES via langtools.lt.case_equivalences forces Python to execute langtools/lt/__init__.py first, which eagerly imports Wiktionary-related modules (langtools.lt.types and then clients.wiktionary.client) and pulls in extra runtime dependencies unrelated to form comparison. As a result, using langtools.form_equivalences (and now sentence decomposition scoring) can fail at import time in environments that only need benchmark scoring but not the Wiktionary stack; this alias table should live in a dependency-light module outside the langtools.lt package init path.

Useful? React with 👍 / 👎.

form_equivalences.py now reads langtools/<lang>/case_equivalences.py on first use via importlib.util.spec_from_file_location, bypassing the language package's __init__.py entirely. This means benchmark scoring environments that only need form comparison never pay the cost of the Wiktionary/SQLAlchemy stack that langtools.lt.__init__ pulls in. langtools/lt/case_equivalences.py is re-introduced as a plain, import-free data file exposing CASE_ALIASES = {"inessive": "locative"}. Any language can add its own case_equivalences.py without editing form_equivalences.py. https://claude.ai/code/session_01XLApSPLmn4brnG74uriPkQ

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add grammatical form equivalence checking for cross-linguistic case aliases#362

Add grammatical form equivalence checking for cross-linguistic case aliases#362
powera wants to merge 2 commits intomainfrom
claude/lithuanian-case-support-IQuyQ

powera commented Feb 25, 2026

Uh oh!

powera commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

powera commented Feb 25, 2026

Summary

Key Changes

Implementation Details

Uh oh!

powera commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants