## How to calculate a score for each rule compliance

### Given an output sentence, the simplified version, does it still violate the rule?

- Reverse every transformation rule into a fast “is this violated?” predicate.

- Aggregate predicates with interpretable weights → a single 0-1 compliance score.

- Unit-test each predicate and the global scorer before you feed rewards into PPO TRL.

- Monitor per-rule violation rates so you know exactly what to fix when the score drops.

#### Step-by step

##### Sanity Check

1. Create a unit-test set of complex ↔ gold-simplified pairs.

2. Compute rewards for:
- the gold simplification (should be high),
- the original sentence (should be low, especially on rule compliance),
- clearly bad outputs (random word order, content dropped – should be low).

If these qualitative expectations hold, your scalar function is probably informative enough for RL.

### 1. Create a mirror function for every rule that returns 
- as of now it returns binary variables. later, weights can be assigned and scores can be averaged
- 1 = compliant, 0 = not-compliant

In [1]:
import os
import spacy
import regex as re

from typing import List, Set, Generator, Dict, Any
from spacy.tokens import Doc, Token
from word2num_de import word_to_number
from sentence_transformers import SentenceTransformer, util

from helper import SUBORDINATE_MARKERS, COORD_CONJ

W0914 15:56:26.447000 18892 site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


  warn("The installed version of bitsandbytes was compiled without GPU support. "


In [2]:
nlp = spacy.load("de_core_news_lg")
from german_compound_splitter import comp_split

utf8_file = os.path.join("german_dict", "german_utf8.dic")
ahoc = comp_split.read_dictionary_from_file(utf8_file) #activate the compound_spliter

Loading data file - german_dict/german_utf8.dic


## Generate Checker functions for compound and converted numbers

In [3]:
def ok_numbers_converted(doc: Doc) -> float:
    """
    A single, self-contained function to check for unconverted numbers.
    This has NO external dependencies to ensure the correct logic is always executed.
    """
    # All necessary constants are defined INSIDE this function
    NUMBER_DICT = {
    # Ordinals
    "erster": "1.", "zweiter": "2.", "dritter": "3.", "vierter": "4.", "fünfter": "5.", "sechster": "6.", "siebter": "7.",
    "achter": "8.", "neunter": "9.", "zehnter": "10.", "elfter": "11.", "zwölfter": "12.",
    # Fractions
    "halb": "0.5", "eineinhalb": "1.5", "zweieinhalb": "2.5", "dreieinhalb": "3.5", "viereinhalb": "4.5",
    "fünfeinhalb": "5.5", "sechseinhalb": "6.5", "siebeneinhalb": "7.5", "achteinhalb": "8.5", "neuneinhalb": "9.5", "zehneinhalb": "10.5",
}
    RE_NUMERIC = re.compile(r"^\d+([.,]\d+)?$")
    RE_ORDINAL = re.compile(r"^\d+\.$")

    # --- Internal helper to check each token ---

    def _is_unconverted_internal(token: Token) -> bool:
        """Internal helper to check a single token."""
        # This is the 'like_num' logic, using token attributes correctly
        text, lemma = token.text, token.lemma_
        text_lower, lemma_lower = text.lower(), lemma.lower()
        is_like_num = False
        if lemma_lower in NUMBER_DICT or text_lower in NUMBER_DICT: is_like_num = True
        elif RE_NUMERIC.match(text) or RE_ORDINAL.match(text): is_like_num = True
        elif text.isdigit(): is_like_num = True
        else:
            try:
                word_to_number(lemma_lower)
                is_like_num = True
            except Exception:
                try:
                    word_to_number(text_lower)
                    is_like_num = True
                except Exception:
                    is_like_num = False
        
        # 'is_number' logic
        is_a_number = False
        if token.text.lower() == "ein" and token.pos_ != "NUM":
            is_a_number = False
        else:
            is_a_number = is_like_num or token.pos_ == "NUM"

        # 'is_number_word_that_should_be_converted' logic
        if not is_a_number:
            return False
        return not token.text.isdigit()

    # --- Main calculation ---
    violating_tokens = [token for token in doc if _is_unconverted_internal(token)]
    violation_count = len(violating_tokens)
    total_tokens = len(doc)
    
    score = 1.0 # Default score is perfect (1.0)
    if total_tokens > 0: # Calculate penalty based on violations
        penalty = min(1.0, violation_count / total_tokens) # Normalize the penalty
        score = 1.0 - penalty

    return score # if I want to return all for tracking/debugging violation_count, total_tokens, violating_tokens

In [4]:
def has_unsplit_compound(doc: spacy.tokens.Doc, ahoc: set) -> bool:
    """
    Checks if a document contains any unsplit compound nouns
    that should be simplified according to the given rules.

    This function iterates through each token in a spaCy Doc object and
    applies a set of heuristics to determine if it is a compound that
    should have been split but wasn't.

    Args:
        doc (spacy.tokens.Doc): The spaCy document object to check.
        ahoc (set): A lexicon or set of valid German words for checking
                    the validity of split parts.

    Returns:
        bool: True if at least one unsplit compound is found, False otherwise.
    """
    for token in doc:
        # Step 1: Preliminary checks on the token based on your logic.
        # This combines the logic from your `check_compound_split` and
        # `should_split` functions.
        if token.pos_ != "NOUN" or token.ent_type_ in {"PER", "LOC", "ORG"}:
            continue

        # Step 2: Attempt to split the compound using your splitter.
        parts =  comp_split.dissect(token.text, ahoc)
        
        # Step 3: Check if the token is a compound that can be split.
        if len(parts) < 2:
            continue
        
        # Step 4: Apply your "Konvens" rule check.
        # This rule suggests that if both the first and last parts of a
        # compound are short (<= 4 characters), it's not considered
        # "hard to read" and shouldn't be flagged as a violation.
        if len(parts[0]) <= 4 and len(parts[-1]) <= 4:
            continue

        # Step 5: Check if the split parts are valid words in the lexicon.
        # This ensures we don't try to split non-compounds or proper nouns
        # that aren't marked as named entities.
        valid_parts_count = sum(p.lower() in ahoc for p in parts)
        
        # Step 6: If a majority of the parts are valid, it's a compound that
        # should have been split. We've found a violation.
        if valid_parts_count / len(parts) >= 0.8:
            print(f"Violation detected: Found unsplit compound '{token.text}'")
            return True
            
    # If the loop completes without finding any violations, the rule is followed.
    return False

In [5]:
def has_apposition(doc: spacy.tokens.Doc) -> bool: #regex finds likely comma apposition
    if any(tok.dep_ == "app" for tok in doc):
        return True
    # Fallback: regex check for ', ... ,'
    # Only trigger if pattern matches (not followed by "die", "der", etc.)
    match = re.search(r', (?!die |der |das |und |aber |weil |obwohl )[^,]+,', doc.text)
    return bool(match)

def has_subordinate_clause(doc: spacy.tokens.Doc) -> bool:
    for tok in doc:
        if (tok.text.lower() in SUBORDINATE_MARKERS and 
            (tok.dep_ in "cp" or tok.pos_ == "SCONJ")):
            return True
    return False

def has_coordinate_clause(doc: spacy.tokens.Doc) -> bool:
    """Check if the document contains a coordinate clause."""
    return any(tok.dep_ == "cd" and tok.text.lower() in COORD_CONJ for tok in doc)

def has_disallowed_tense(doc: spacy.tokens.Doc) -> bool:
    for tok in doc:
        if tok.pos_ in ("VERB", "AUX"):
            tense = tok.morph.get("Tense")
            form = tok.morph.get("VerbForm")
            mood = tok.morph.get("Mood")
            if ("Pres" not in tense and "Part" not in form) or ("Sub" in mood):
                return True
    return False

def is_passive(doc: spacy.tokens.Doc) -> bool:
    # Werden + participle: Vorgangspassiv (event passive)
    has_werden = any(tok.lemma_ == "werden" and tok.pos_ == "AUX" for tok in doc)
    has_participle = any(tok.pos_ == "VERB" and "Part" in tok.morph.get("VerbForm", []) for tok in doc)
    if has_werden and has_participle:
        return True
    # Sein + participle: Zustandspassiv (state passive), only for transitive verbs!
    has_sein = any(tok.lemma_ == "sein" and tok.pos_ == "AUX" for tok in doc)
    if has_sein and has_participle:
        # Check: is the main verb transitive (does it take an object)?
        if any(tok.dep_ in {"oa", "obj"} for tok in doc):  # object present
            return True
    return False

# Collect the main functions to compute reward function

### Example Sentence

- has_apposition(doc) returns True/False
- float(has_apposition(doc)) returns 1.0 (True, violated), 0.0 (False, compliant)
- what does the ok counterpart do?
  - flips the perspective

| original text flag | has_xy -> float | ok_no_xy | meaning |
| --- | --- | --- | --- |
| apposition present | 1.0 | 0.0 | not okay |
| no apposition | 0.0 | 1.0 | okay |

| Function | Returns | Meaning |
| --- | --- | --- |
| float(has_xy(...)) | 1.0 if violation, 0.0 if compliant | "Badness Score"|
| ok_no_xy | 1.0 if compliant, 0.0 if violation | "Goodness Score" |

In [6]:
def ok_no_apposition(doc)           -> float: return 1.0 - float(has_apposition(doc))
def ok_no_subordinate_clause(doc)   -> float: return 1.0 - float(has_subordinate_clause(doc))
#def ok_no_coordinate_clause(doc)    -> float: return 1.0 - float(has_coordinate_clause(doc))
def ok_active_voice(doc)            -> float: return 1.0 - float(is_passive(doc))
def ok_allowed_verb_tense(doc)      -> float: return 1.0 - float(has_disallowed_tense(doc))

In [None]:
# 1. Create a single "source of truth" generator function
def _find_unsplit_compounds_gen(doc: spacy.tokens.Doc, ahoc: Set[str]) -> Generator[spacy.tokens.Token, None, None]:
    """
    A generator that yields each token that is an unsplit compound violation.
    This contains the core, shared logic.
    """
    for token in doc:
        # Initial checks: must be a NOUN and not a named entity
        if token.pos_ != "NOUN" or token.ent_type_ in {"PER", "LOC", "ORG"}:
            continue
        if not token.text or not token.text.strip() or not any(char.isalpha() for char in token.text): #adding fix
            continue

        #parts = comp_split.dissect(token.text, ahoc)
        parts = [] # Default to an empty list
        try:
            # --- THIS IS THE KEY ---
            # We call the potentially problematic library function inside a try block
            parts = comp_split.dissect(token.text, ahoc)
        except IndexError:
            # If the library crashes with an IndexError, we catch it,
            # print a warning, and simply continue to the next token.
            # print(f"Warning: german_compound_splitter failed on token: '{token.text}'. Skipping.")
            continue
        
        if not parts or len(parts) < 2:
            continue
        if len(parts[0]) <= 4 and len(parts[-1]) <= 4:
            continue

        valid_parts_count = sum(p.lower() in ahoc for p in parts)
        
        if valid_parts_count / len(parts) >= 0.8:
            yield token # Yield the violating token and continue the loop

def count_unsplit_compounds(doc: spacy.tokens.Doc, ahoc: Set[str]) -> int:
    """Counts ALL unsplit compounds"""
    return sum(1 for _ in _find_unsplit_compounds_gen(doc, ahoc))

def find_all_unsplit_compounds(doc: spacy.tokens.Doc, ahoc: Set[str]) -> list[spacy.tokens.Token]:
    """Gets a list of all violating tokens. Useful for debugging."""
    return list(_find_unsplit_compounds_gen(doc, ahoc))

def ok_no_compounds(doc: spacy.tokens.Doc) -> float:
    """
    Computes a compliance score (0-1) for compound splitting.
    """
    violation_count = count_unsplit_compounds(doc, ahoc)
    noun_count = len([token for token in doc if token.pos_ == "NOUN"])
    
    if noun_count == 0:
        return 1.0
    
    penalty = min(1.0, violation_count / noun_count)
    return 1.0 - penalty

In [8]:
# def count_unconverted_numbers(doc: spacy.tokens.Doc) -> int:
#     """
#     Counts number words that should have been converted but weren't.
#     This function is a helper for `ok_numbers_converted`.
#     """
#     violation_count = 0
#     for token in doc:
#         # Check if the token is a number word that should have been converted
#         if is_number_word_that_should_be_converted(token):
#             violation_count += 1
#     return violation_count

# def ok_numbers_converted(doc: spacy.tokens.Doc) -> float:
#     """
#     Computes a compliance score (0-1) for number word conversion.
#     """
#     violation_count = count_unconverted_numbers(doc)
#     total_tokens = len(doc)
    
#     if total_tokens == 0:
#         return 1.0
    
#     # Penalize based on the ratio of unconverted numbers to total tokens.
#     penalty = min(1.0, violation_count / total_tokens)
#     return 1.0 - penalty

In [9]:
s1 = nlp("Herr Müller, der Projektleiter, kam gestern.")   # has apposition
s2 = nlp("Der Projektleiter Herr Müller kam gestern.")     # no apposition

print(ok_no_apposition(s1))   # 0.0  ← rule broken
print(ok_no_apposition(s2)) 

0.0
1.0


In [10]:
# Example sentences to test
sentence_with_compound = "Der Riesenräder und das Riesenrad oder die Riesenräder sind sehr speziell."
sentence_without_compound = "Die Hütte steht in der Sonne."

# Process the sentences with spaCy
doc1 = nlp(sentence_with_compound)
doc2 = nlp(sentence_without_compound)

# Test the function
print(f"Checking sentence: '{sentence_with_compound}'")
has_compound_violation = has_unsplit_compound(doc1, ahoc)
print(f"Has unsplit compound violation: {has_compound_violation}\n")

print(f"Checking sentence: '{sentence_without_compound}'")
has_compound_violation = has_unsplit_compound(doc2, ahoc)
print(f"Has unsplit compound violation: {has_compound_violation}\n")

Checking sentence: 'Der Riesenräder und das Riesenrad oder die Riesenräder sind sehr speziell.'
Dissect compound:  Riesenrad
Violation detected: Found unsplit compound 'Riesenrad'
Has unsplit compound violation: True

Checking sentence: 'Die Hütte steht in der Sonne.'
Dissect compound:  Hütte
Dissect compound:  Sonne
Has unsplit compound violation: False



In [11]:
# Example with a sentence from a table
sentence1 = "Der Donaudampfschifffahrtskapitänsmützenabzeichen und zweihundert Riesenrad sind sehr speziell."
doc1 = nlp(sentence1)

# Test the functions
print(f"Checking sentence: '{sentence1}'")
print("-" * 20)
print(f"Compound violation score: {ok_no_compounds(doc1):.2f}")
print(f"Number violation score: {ok_numbers_converted(doc1):.2f}")
print("-" * 20)

sentence2 = "Die Hütte steht in der Sonne."
doc2 = nlp(sentence2)

print(f"\nChecking sentence: '{sentence2}'")
print("-" * 20)
print(f"Compound violation score: {ok_no_compounds(doc2):.2f}")
print(f"Number violation score: {ok_numbers_converted(doc2):.2f}")
print("-" * 20)

sent3 = nlp("Das zweite Haus wurde drei Mal verkauft und kostet hundert Euro.")

print(f"\nChecking sentence: '{sent3.text}'")
print("-" * 20)
print(f"Compound violation score: {ok_no_compounds(sent3):.2f}")
print(f"Number violation score: {ok_numbers_converted(sent3):.2f}")
print("-" * 20)

Checking sentence: 'Der Donaudampfschifffahrtskapitänsmützenabzeichen und zweihundert Riesenrad sind sehr speziell.'
--------------------
Dissect compound:  Donaudampfschifffahrtskapitänsmützenabzeichen
Dissect compound:  Riesenrad
Compound violation score: 0.00
Number violation score: 0.00
--------------------

Checking sentence: 'Die Hütte steht in der Sonne.'
--------------------
Dissect compound:  Hütte
Dissect compound:  Sonne
Compound violation score: 1.00
Number violation score: 0.00
--------------------

Checking sentence: 'Das zweite Haus wurde drei Mal verkauft und kostet hundert Euro.'
--------------------
Dissect compound:  Haus
Dissect compound:  Mal
Dissect compound:  Euro
Compound violation score: 1.00
Number violation score: 0.00
--------------------


### 2. Collect them all in a single rule wrapper
- adjust the weights to reflect the rule's importance
- Normalise everything to [0, 1] before weighting.
- Keep the weights in a config file; you will probably retune them a few times.
- Evaluate each feature per sentence only if you need finer control.

In [12]:
RULE_CHECKS = {
    "apposition"        : ok_no_apposition,
    "subord_clause"     : ok_no_subordinate_clause,
    #"coord_clause"      : ok_no_coordinate_clause,
    "voice_active"      : ok_active_voice,
    "verb_tense"        : ok_allowed_verb_tense,
    "no_compounds"      : ok_no_compounds,
    "numbers_converted" : ok_numbers_converted,
}

RULE_WEIGHTS = {           # must sum to 1.0
    "apposition"        : 0.20,
    "subord_clause"     : 0.10,
    #"coord_clause"      : 0.10,
    "voice_active"      : 0.20,
    "verb_tense"        : 0.15,
    "no_compounds"      : 0.25,
    "numbers_converted" : 0.10,
}


In [13]:
# Normalize weights to ensure they sum to 1.0
total_weight = sum(RULE_WEIGHTS.values())
if total_weight != 1.0:
    for key in RULE_WEIGHTS:
        RULE_WEIGHTS[key] /= total_weight
    print("Warning: RULE_WEIGHTS did not sum to 1.0. They have been normalized.")

In [14]:
print(total_weight) #should print 1.0

1.0


### 3. Create the scorer

In [15]:
def rule_compliance_score(text: str) -> float:
    """
    Computes a weighted 0-1 compliance score for a simplified sentence.
    This is the core function for rule-based rewards.
    """
    doc = nlp(text)
    total_score = 0.0
    
    # Iterate through each rule and its corresponding checker function
    for name, check_func in RULE_CHECKS.items():
        # Call the checker function and get the score
        score = check_func(doc)
        
        # Add the weighted score to the total
        total_score += RULE_WEIGHTS[name] * score
        
    return total_score

### 4. Compute a meaning-preservation score
- embedding-level similarity which works cross-lingually without refences
- https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html

In [16]:
from sentence_transformers import SentenceTransformer, util, SimilarityFunction
from sentence_transformers.evaluation import SimilarityFunction
from sklearn.metrics.pairwise import cosine_similarity

In [17]:
print("Loading SBERT model...")
# model = SentenceTransformer("sentence-transformers/distiluse-base-multilingual-cased-v2")
# model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2') #smaller & faster model
model = SentenceTransformer('deepset/gbert-base') #-large
model.similarity_fn_name = SimilarityFunction.DOT_PRODUCT
print(f"Model similarity function set to: '{model.similarity_fn_name}'")
print("Model loaded.")

Loading SBERT model...


No sentence-transformers model found with name deepset/gbert-base. Creating a new one with mean pooling.


Model similarity function set to: 'dot'
Model loaded.


In [18]:
def calculate_semantic_similarity(original_sentence: str, simplified_sentence: str) -> float:
    """
    Calculates the meaning preservation score using SBERT embeddings.
    Returns a score between 0.0 and 1.0.
    """
    # Encode sentences with normalization for faster comparison
    emb_original = model.encode(original_sentence, normalize_embeddings=True)
    emb_simplified = model.encode(simplified_sentence, normalize_embeddings=True)
    
    
    # 2. Calculate the cosine similarity between the two vectors. This returns a tensor.
    #dot product on normalized embeddings is equivalent to cosine similarity but cosine will re-normalize embeddings
    #again. dot product is faster and more efficient
    similarity_score = model.similarity(emb_original, emb_simplified)
    
    #convert the final tensor to a float.
    return similarity_score.item()

In [19]:
# # --- EXAMPLE USAGE ---
# original = "Der Ausflug nach Potsdam, der am vergangenen Wochenende stattfand, war ein voller Erfolg."

# # Test with a good simplification
# simplified_good = "Der Ausflug nach Potsdam am Wochenende war ein Erfolg."
# score_good = calculate_semantic_similarity(original, simplified_good)
# print(f"\nOriginal: '{original}'")
# print(f"Simplified (Good): '{simplified_good}'")
# print(f"SBERT Meaning Preservation Score: {score_good:.4f}") # Expect a high score (e.g., > 0.9)

# # Test with a simplification that loses nuance
# simplified_lossy = "Der Ausflug war ein Erfolg."
# score_lossy = calculate_semantic_similarity(original, simplified_lossy)
# print(f"\nSimplified (Lossy): '{simplified_lossy}'")
# print(f"SBERT Meaning Preservation Score: {score_lossy:.4f}") # Expect a lower score

# # Test with a simplification that changes the meaning
# simplified_bad = "Der Ausflug nach Potsdam war ein Misserfolg."
# score_bad = calculate_semantic_similarity(original, simplified_bad)
# print(f"\nSimplified (Bad): '{simplified_bad}'")
# print(f"SBERT Meaning Preservation Score: {score_bad:.4f}") # Expect a much lower score|

# simplified_worst = "der Ausflug war Weihnachtsmann."
# score_worst = calculate_semantic_similarity(original, simplified_worst)
# print(f"\nSimplified (Worst): '{simplified_worst}'")
# print(f"SBERT Meaning Preservation Score: {score_worst:.4f}") # Expect the worst score

# Compute Grammar Score

In [20]:
import language_tool_python

In [21]:
tool = language_tool_python.LanguageTool('de')

In [22]:
def calculate_grammar_score(text: str) -> float:
    """
    Computes a grammatical compliance score (0-1) for a sentence.
    A score of 1.0 means no errors were found.
    """
    error_count = 0
    
    if not text.strip():
        return 1.0 # Return perfect score for empty strings
    
    # Check the text for errors
    matches = tool.check(text)

    grammar_errors = [match for match in matches if match.category == 'GRAMMAR']


    if matches:
        #print(f"Found {len(grammar_errors)} grammar errors")
        error_count = len(grammar_errors) #track number of errors
        #for match in matches:
        #    print(f"Error: {match.message}")
    
    # Get a base for normalization, by counting number of tokens
    token_count = len(text.split())
    
    if token_count == 0:
        return 1.0
        
    # Calculate the penalty. We use a simple normalization. Capped at 1. Make sure to be less harsh on short sentences
    #penalty = min(1.0, error_count / token_count) ## linear penalty
    penalty = error_count / (error_count + token_count) #softer penalty, as errors increase, penalty -> 1.0 asymptotically
    
    # Return the compliance score
    return 1.0 - penalty

### Testing Fucntions of grammar calculation

In [23]:
text = "Man sollte das vermeiden.."
text_bad = "Dieser Satz sind falsch. Man sollte das vermeiden."

print(f"Grammar score (good): {calculate_grammar_score(text):.2f}")       # Expect 1.0
print(f"Grammar score (bad): {calculate_grammar_score(text_bad):.2f}")  # Expect < 1.0

Grammar score (good): 1.00
Grammar score (bad): 0.89


In [24]:
# This sentence has multiple errors, but the tool will likely only report the first one it finds.
text_bad_1 = "Diese Sätze sind gut nicht schreiben." 
# After fixing the first error, the tool will now find the second one.
text_bad_2 = "Dieser Satz sind gut nicht schreiben."

print(f"Sentence: '{text_bad_1}'")
score1 = calculate_grammar_score(text_bad_1)
print(f"Grammar score: {score1:.2f}")  # Expects a penalty for 1 error: 1 - (1 / (1 + 6)) = 0.86

print(f"\nSentence: '{text_bad_2}'")
score2 = calculate_grammar_score(text_bad_2)
print(f"Grammar score: {score2:.2f}")  # Expects a penalty for 1 different error: 1 - (1 / (1 + 6)) = 0.86

Sentence: 'Diese Sätze sind gut nicht schreiben.'
Grammar score: 1.00

Sentence: 'Dieser Satz sind gut nicht schreiben.'
Grammar score: 0.86


In [25]:
##testing

matches = tool.check(text_bad)

print(f"Found {len(matches)} errors in the sentence: '{text_bad}'\n")

# Let's print each error individually to see them clearly
for i, match in enumerate(matches):
    print(f"--- Error {i+1} ---")
    print(f"Message: {match.message}")
    print(f"Category: {match.category}")
    print(f"Rule ID: {match.ruleId}")
    print(f"Context: '{match.context}'")
    print(f"Suggested replacements: {match.replacements}\n")

Found 1 errors in the sentence: 'Dieser Satz sind falsch. Man sollte das vermeiden.'

--- Error 1 ---
Message: Bitte prüfen, ob hier „ist“ stehen sollte.
Category: GRAMMAR
Rule ID: DE_SUBJECT_VERB_AGREEMENT
Context: 'Dieser Satz sind falsch. Man sollte das vermeiden.'
Suggested replacements: ['ist']



### 4. Plug into the global reward function

Fix a few candidate distributions (that sum to 1.0). For example:

Baseline (0.5, 0.3, 0.2)

Rules-heavy (0.6, 0.2, 0.2)

Meaning-heavy (0.3, 0.6, 0.1)

Grammar-sensitive (0.4, 0.2, 0.4)

In [None]:
# weights = [
#     {"rules_score": 0.5, "meaning_score": 0.3, "grammar_score": 0.2},
#     {"rules_score": 0.6, "meaning_score": 0.2, "grammar_score": 0.2},
#     {"rules_score": 0.3, "meaning_score": 0.6, "grammar_score": 0.1},
#     {"rules_score": 0.4, "meaning_score": 0.2, "grammar_score": 0.4},
# ]

# weights = {
#     "rules_score":   0.5,
#     "meaning_score": 0.3,
#     "grammar_score": 0.2,
# }


In [None]:
def compute_reward(src: str, simpl: str, weights: dict) -> float:
    """
    Combines the rule compliance, meaning preservation and grammar scores into one.
    
    Args:
        src (str): The original, complex sentence.
        simpl (str): The simplified output sentence.
    
    Returns:
        float: The final reward score.
    """

    # Ensure weights sum is very close avoiding floating point issues
    if abs(sum(weights.values()) - 1.0) > 1e-9:
        # I've also added a more helpful error message
        raise ValueError(f"Reward weights must sum to 1.0, but they sum to {sum(weights.values())}")
    
    
    r_rules   = rule_compliance_score(simpl)
    r_meaning = calculate_semantic_similarity(src, simpl)
    r_grammar = calculate_grammar_score(simpl)

    reward = (weights["rules_score"]   * r_rules +
              weights["meaning_score"] * r_meaning +
              weights["grammar_score"] * r_grammar)
    
    return reward 

## Test

Add a gold simplification set and check that those examples always score >= 0.9 while the originals score <= 0.5. That sanity check tells you whether any rule weight is mistuned

In [None]:
# # --- Example 2: A bad, non-compliant sentence ---
# src_sentence_2 = "Der zweite Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen."
# bad_simplified_sentence = "Weihnachtsmann. Der 2te Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen."
    
# print("--- Sanity Check 2: Bad Simplification ---")
# print(f"Original:   '{src_sentence_2}'")
# print(f"Simplified: '{bad_simplified_sentence}'")
    
#     # This should show a low compliance score due to unconverted numbers
# compliance_score_2 = rule_compliance_score(bad_simplified_sentence)
# print(f"\nRule Compliance Score: {compliance_score_2:.2f}")
    
# reward_2 = compute_reward(src_sentence_2, bad_simplified_sentence, weights)
# print(f"Final Reward Score:    {reward_2:.2f}")
# print("-" * 30)

#     # --- Example 3: Offline Diagnostics ---
# print("--- Offline Diagnostics Example ---")
# simplified_doc = nlp(bad_simplified_sentence)
    
# violations_by_rule: Dict[str, Any] = {}
# for name, check_func in RULE_CHECKS.items():
#     score = check_func(simplified_doc)
#     violations_by_rule[name] = 1.0 - score

# for name, violation in violations_by_rule.items():
#     print(f"Rule: {name:<20} | Violations: {violation:.2f} (0=good, 1=bad)")

--- Sanity Check 2: Bad Simplification ---
Original:   'Der zweite Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen.'
Simplified: 'Weihnachtsmann. Der 2te Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen.'
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten

Rule Compliance Score: 0.56
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten
Final Reward Score:    0.77
------------------------------
--- Offline Diagnostics Example ---
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten
Rule: apposition           | Violations: 0.00 (0=good, 1=bad)
Rule: subord_clause        | Violations: 0.00 (0=good, 1=bad)
Rule: voice_active         | Violations: 1.00 (0=good, 1=bad)
Rule: verb_tense           | Violations: 1.00 (0=good, 1=bad)
Rule: no_compounds         | Violations: 0.00 (0=good, 1=bad)
Rule: numbers_converted    | Violations: 0.91 (0=good, 1=bad)


In [None]:
# # --- Example 2: A bad, non-compliant sentence ---
# src_sentence_2 = "Der zweite Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen."
# bad_simplified_sentence = "Der 2te Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den erster Alliierten gewonnen."
    
# print("--- Sanity Check 2: Bad Simplification ---")
# print(f"Original:   '{src_sentence_2}'")
# print(f"Simplified: '{bad_simplified_sentence}'")
    
#     # This should show a low compliance score due to unconverted numbers
# compliance_score_2 = rule_compliance_score(bad_simplified_sentence)
# print(f"\nRule Compliance Score: {compliance_score_2:.2f}")
    
# reward_2 = compute_reward(src_sentence_2, bad_simplified_sentence, weights)
# print(f"Final Reward Score:    {reward_2:.2f}")
# print("-" * 30)

#     # --- Example 3: Offline Diagnostics ---
# print("--- Offline Diagnostics Example ---")
# simplified_doc = nlp(bad_simplified_sentence)
    
# violations_by_rule: Dict[str, Any] = {}
# for name, check_func in RULE_CHECKS.items():
#     score = check_func(simplified_doc)
#     violations_by_rule[name] = 1.0 - score

# for name, violation in violations_by_rule.items():
#     print(f"Rule: {name:<20} | Violations: {violation:.2f} (0=good, 1=bad)")

--- Sanity Check 2: Bad Simplification ---
Original:   'Der zweite Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den ersten Alliierten gewonnen.'
Simplified: 'Der 2te Weltkrieg, der in Europa von 1939 bis 1945 dauerte, wurde von den erster Alliierten gewonnen.'
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten

Rule Compliance Score: 0.56
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten
Final Reward Score:    0.76
------------------------------
--- Offline Diagnostics Example ---
Dissect compound:  Weltkrieg
Dissect compound:  Alliierten
Rule: apposition           | Violations: 0.00 (0=good, 1=bad)
Rule: subord_clause        | Violations: 0.00 (0=good, 1=bad)
Rule: voice_active         | Violations: 1.00 (0=good, 1=bad)
Rule: verb_tense           | Violations: 1.00 (0=good, 1=bad)
Rule: no_compounds         | Violations: 0.00 (0=good, 1=bad)
Rule: numbers_converted    | Violations: 0.90 (0=good, 1=bad)


In [None]:
# # --- Test Execution ---
# test_text = """
# Ein kurzer Bericht von meinem Ausflug in Potsdam heute, am 16. August 2025.
# Am ersten Tag war das Wetter super. Wir nahmen die zweite Straßenbahn und die Fahrt dauerte nur eine halbe Stunde.
# Es waren zwanzig Leute an Bord und ein Ticket kostete fünf Euro. Für den Eintritt zum dritten Turm zahlten wir 12,50 Euro.
# Ich sah nur ein Fahrrad, aber es war ein schönes Modell. Der ganze Ausflug war ein voller Erfolg.
# """

# doc = nlp(test_text)

# # Run the specific check
# numbers_check_function = RULE_CHECKS["numbers_converted"]
# score = numbers_check_function(doc)

# # Let's see the details
# #violation_count = count_unconverted_numbers(doc)
# total_tokens = len(doc)

# print(f"--- Rule Check: 'numbers_converted' ---")
# print(f"Text: '{test_text.strip()}'")
# print("-" * 20)
# print(f"Total tokens in doc: {total_tokens}")
# #print(f"Unconverted number words found: {violation_count}")
# print(f"Compliance Score: {score:.2f}")

--- Rule Check: 'numbers_converted' ---
Text: 'Ein kurzer Bericht von meinem Ausflug in Potsdam heute, am 16. August 2025.
Am ersten Tag war das Wetter super. Wir nahmen die zweite Straßenbahn und die Fahrt dauerte nur eine halbe Stunde.
Es waren zwanzig Leute an Bord und ein Ticket kostete fünf Euro. Für den Eintritt zum dritten Turm zahlten wir 12,50 Euro.
Ich sah nur ein Fahrrad, aber es war ein schönes Modell. Der ganze Ausflug war ein voller Erfolg.'
--------------------
Total tokens in doc: 86
Compliance Score: 0.06
