Skip to content

Add FHIR Java ResourceBundle Translator#273

Merged
jy95 merged 8 commits intomainfrom
bundle-translator
Jan 1, 2026
Merged

Add FHIR Java ResourceBundle Translator#273
jy95 merged 8 commits intomainfrom
bundle-translator

Conversation

@jy95
Copy link
Owner

@jy95 jy95 commented Jan 1, 2026

Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.

Summary by CodeRabbit

Release Notes

  • Chores
    • Migrated localization translation system from properties files to Java resource bundles, streamlining the translation workflow and improving application maintainability.
    • Updated automated translation infrastructure with modernized tooling and refined language support configuration to optimize localization coverage and translation efficiency.

✏️ Tip: You can customize this high-level summary in your review settings.

Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

Warning

Rate limit exceeded

@jy95 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 51 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ab7c79 and 3f73df6.

📒 Files selected for processing (2)
  • .github/workflows/translate_bundles.yml
  • scripts/translate_bundles.py
📝 Walkthrough

Walkthrough

Replaces a properties-file translation script with a new Java resource bundle translator. Adds FHIRBundleTranslator class that masks ICU/variable tokens, translates content via Argos Translate, and generates target-language bundle files. Updates workflow to orchestrate new bundle translation process.

Changes

Cohort / File(s) Summary
New Translation Script
scripts/translate_bundles.py
Introduces FHIRBundleTranslator class with token masking (ICU patterns, placeholders, variables), translation via Argos Translate, and Java class generation with language-suffixed naming. Includes CLI entry point for directory-based bundle discovery and file generation with _<tgt>.java suffix.
Removed Legacy Script
scripts/translate_properties.py
Deletes entire properties-file translation workflow including model management, per-value translation, and properties file I/O.
Updated Workflow
.github/workflows/translate_bundles.yml
Renames workflow; updates permissions (contents: read→write); removes pt-BR/zh-Hans/zh-Hant locales; adds argostranslate installation step; replaces properties translation with new Java bundle translation step via translate_bundles.py; updates PR creation logic with new branch naming scheme and body text.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / Workflow
    participant Trans as FHIRBundleTranslator
    participant Argos as Argos Translate
    participant FS as File System

    CLI->>Trans: __init__(src_lang='en', tgt_lang='es')
    Trans->>Argos: _setup_translator()
    Argos->>Argos: Load/install model (en→es)
    Argos-->>Trans: Model ready

    CLI->>FS: Discover *_en.java files in l10n/
    FS-->>CLI: File list

    loop For each eligible bundle file
        CLI->>Trans: create_translated_java_class(source_path)
        
        Note over Trans: Update class name<br/>with language suffix
        
        Trans->>FS: Read source content
        FS-->>Trans: Java class content
        
        loop For each bundle entry
            Trans->>Trans: Mask ICU/variable tokens
            Trans->>Argos: Translate masked text
            Argos-->>Trans: Translated text
            Trans->>Trans: Restore masked tokens
        end
        
        Trans->>FS: Write to _<tgt>.java file
        FS-->>Trans: File written
    end

    Trans-->>CLI: Translation complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Bundles hop from tongue to tongue,
Tokens masked, their songs are sung,
Argos whispers, translations bloom,
Java classes fill each room—
Properties fade, new bundles reign,
Language flows through code's domain!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly matches the primary change: introducing a FHIR Java ResourceBundle Translator (the new FHIRBundleTranslator class and scripts/translate_bundles.py).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

codacy-production bot commented Jan 1, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (b0e5008) 1563 1563 100.00%
Head commit (3f73df6) 3126 (+1563) 3126 (+1563) 100.00% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#273) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
scripts/translate_bundles.py (4)

14-14: Optional: Use non-greedy matching in JAVADOC_PATTERN.

The greedy .* could match too much if there are multiple similar patterns on the same line (unlikely but possible).

🔎 Proposed refinement
-JAVADOC_PATTERN = re.compile(r'\* (.*) \(([a-z]{2})\) resource bundle')
+JAVADOC_PATTERN = re.compile(r'\* (.*?) \(([a-z]{2})\) resource bundle')

50-54: Recommended: Make exception handling more specific.

The broad Exception catch makes debugging difficult. Consider catching specific translation-related exceptions and logging more context.

🔎 Proposed refinement for better error diagnostics
     try:
         translated = argostranslate.translate.translate(masked_text, self.src, self.tgt)
-    except Exception as e:
-        print(f"Translation error: {e}")
+    except Exception as e:
+        print(f"Translation error for text '{text[:50]}...': {type(e).__name__}: {e}")
         return text

This provides the exception type and shows the text that failed (truncated for readability).

Based on learnings, the static analysis hint flagging BLE001 is valid here.


104-106: Recommended: Refine file filtering logic.

The filter "_" not in os.path.basename(f) is overly broad and could exclude legitimate base bundle files that happen to have underscores in their names (e.g., My_Base_Bundle.java). Consider filtering specifically for language suffixes.

🔎 Proposed refinement for more precise filtering
     search_pattern = os.path.join(args.dir, "**/l10n/*.java")
-    source_files = [f for f in glob.glob(search_pattern, recursive=True) 
-                    if "_" not in os.path.basename(f) and "package-info" not in f]
+    # Match files like MyBundle_es.java, MyBundle_pt.java but not MyBundle.java
+    lang_suffix_pattern = re.compile(r'_[a-z]{2}(-[A-Z][a-z]+)?\.java$')
+    source_files = [f for f in glob.glob(search_pattern, recursive=True) 
+                    if not lang_suffix_pattern.search(os.path.basename(f)) 
+                    and "package-info" not in f]

This specifically looks for language suffixes like _es, _pt, _zh-Hans rather than any underscore.


112-113: Optional: Add error handling for individual file processing.

If one file fails to process, the entire script continues without indication. Consider adding try-except to log failures while processing remaining files.

🔎 Proposed enhancement for resilient processing
+    failed = []
     for file_path in source_files:
-        translator.create_translated_java_class(file_path)
+        try:
+            translator.create_translated_java_class(file_path)
+        except Exception as e:
+            print(f"   [ERROR] Failed to process {file_path}: {e}")
+            failed.append(file_path)
+    
+    if failed:
+        print(f"\n⚠️  {len(failed)} file(s) failed to process:")
+        for f in failed:
+            print(f"  - {f}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0e5008 and ab4c921.

📒 Files selected for processing (3)
  • .github/workflows/translate_properties.yml
  • scripts/translate_bundles.py
  • scripts/translate_properties.py
💤 Files with no reviewable changes (1)
  • scripts/translate_properties.py
🧰 Additional context used
🪛 Ruff (0.14.10)
scripts/translate_bundles.py

52-52: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: submit-pypi
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (1)
.github/workflows/translate_properties.yml (1)

83-83: LGTM!

The workflow correctly integrates with the new translate_bundles.py script. The argument change from --tgt-lang to --tgt matches the updated argument parser in the new script.

# Captures Javadoc to update target language
JAVADOC_PATTERN = re.compile(r'\* (.*) \(([a-z]{2})\) resource bundle')
# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
ICU_PATTERN = re.compile(r'\{[^{}]+\}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/translate_bundles.py | head -50

Repository: jy95/fds

Length of output: 2225


🏁 Script executed:

# Search for nested braces in Java bundle files
find . -name "*.java" -type f | xargs grep -l "plural\|choice\|select" 2>/dev/null | head -20

Repository: jy95/fds

Length of output: 1530


🏁 Script executed:

# Look for ICU MessageFormat patterns with nested braces in the codebase
rg '{[^}]*{[^}]*}[^}]*}' --type=java -C2 2>/dev/null

Repository: jy95/fds

Length of output: 34


🏁 Script executed:

# Examine the actual DosageFields bundle files for ICU MessageFormat patterns
cat ./common/src/main/java/io/github/jy95/fds/common/l10n/DosageFields.java | head -100

Repository: jy95/fds

Length of output: 3304


🏁 Script executed:

# Search more broadly for ICU MessageFormat patterns in Java files
rg 'plural|choice|select' ./common/src/main/java/io/github/jy95/fds/common/l10n/ -A2 -B2

Repository: jy95/fds

Length of output: 19279


🏁 Script executed:

# Test regex behavior: see what ICU_PATTERN actually matches
python3 << 'EOF'
import re

ICU_PATTERN = re.compile(r'\{[^{}]+\}')

# Test cases
test_cases = [
    "{0}",  # Simple variable
    "{minValue}",  # Named variable
    "{count, plural, one {# item} other {# items}}",  # Nested structure
    "{gender, select, male {He} female {She} other {They}}",  # Nested select
    "{time, number, integer}",  # Simple format
]

print("Testing ICU_PATTERN: r'\\{[^{}]+\\}'")
print("=" * 80)

for test in test_cases:
    matches = ICU_PATTERN.findall(test)
    print(f"\nInput:   {test}")
    print(f"Matches: {matches}")
    if not matches:
        print("  ⚠️  NO MATCH - This would NOT be masked!")

EOF

Repository: jy95/fds

Length of output: 506


🏁 Script executed:

# Check if the codebase actually uses nested ICU MessageFormat patterns
find . -path ./node_modules -prune -o -name "*.properties" -o -name "*Messages*.java" | xargs grep -E '\{[^}]*,\s*(plural|choice|select)' 2>/dev/null | head -20

Repository: jy95/fds

Length of output: 34


Critical: ICU_PATTERN cannot handle nested braces in actual bundle content.

The pattern r'\{[^{}]+\}' uses a negated character class that cannot match nested brace structures. The codebase contains actual nested ICU MessageFormat patterns throughout the resource bundles:

  • {0, plural, one{{0} time} other{{0} times}}
  • {condition, select, 0{{minValue} to {maxValue} {unit}} 1{up to {maxValue} {unit}} other{at least {minValue} {unit}}}
  • {period, plural, one{every {periodUnit}} other{every {period} {periodUnit}}}

The regex will only match innermost braces, leaving outer structures unmasked. When translation occurs, the structure becomes malformed and unrecoverable on unmask, corrupting translations.

🔎 Consider using a character-by-character stack-based parser

Replace the regex-based masking with a depth-aware parser:

-# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
-ICU_PATTERN = re.compile(r'\{[^{}]+\}')
+# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
+def find_icu_variables(text):
+    """Find all ICU variable spans, handling nested braces."""
+    results = []
+    depth = 0
+    start = None
+    for i, char in enumerate(text):
+        if char == '{':
+            if depth == 0:
+                start = i
+            depth += 1
+        elif char == '}':
+            depth -= 1
+            if depth == 0 and start is not None:
+                results.append((start, i + 1, text[start:i + 1]))
+                start = None
+    return results

Then update line 48 in process_text_content to iterate over results and mask each span.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In scripts/translate_bundles.py around line 16, the current ICU_PATTERN =
re.compile(r'\{[^{}]+\}') fails on nested braces and corrupts ICU MessageFormat
strings; replace the regex approach with a small depth-aware,
character-by-character parser function that scans the text, tracks brace depth,
records start/end indexes for each top-level brace span (handling nested braces
correctly), and returns a list of spans; then remove or ignore ICU_PATTERN and
update the call at line 48 in process_text_content to iterate over the returned
spans and mask each span (instead of using the regex matches), ensuring all
nested ICU structures are fully masked and later unmasked intact.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/translate_bundles.py (1)

52-56: Consider more specific exception handling.

The bare except Exception: catches all exceptions broadly. If argostranslate defines more specific exception types for translation failures, catching those specifically would make error handling more precise.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab4c921 and ff3145e.

📒 Files selected for processing (1)
  • scripts/translate_bundles.py
🧰 Additional context used
🪛 Ruff (0.14.10)
scripts/translate_bundles.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)


54-54: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: submit-pypi
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (4)
scripts/translate_bundles.py (4)

10-10: Good fix for escaped quote handling.

The updated pattern (?:[^"\\]|\\.)* correctly handles escaped characters in bundle values, addressing the previous review concern.


76-76: Good fix for Javadoc preservation.

The pattern now correctly preserves the original descriptive text (group 1) while updating the language code, addressing the previous review concern.


93-94: Good addition of file existence check.

The warning message helpfully informs users when files are being overwritten, addressing the previous review concern about silent overwrites.


100-121: LGTM! Well-structured main function.

The file discovery logic correctly filters translated files (containing "_") and package-info files, and provides clear feedback when no source files are found.

jy95 added 2 commits January 1, 2026 11:25
Updated GitHub Actions workflow to use newer action versions and added a step for translating Java test files.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/translate_bundles.yml (1)

83-87: Consider parameterizing the hardcoded directory path.

The hardcoded path common/src/main/java/io/github/jy95/fds/common/l10n reduces flexibility. If the directory structure changes, the workflow will need manual updates.

🔎 Suggested refactor to use workflow input

Add a workflow input for the source directory:

     inputs:
       tgt_lang:
         description: 'Target language (must be supported by argostranslate)'
         required: true
         type: choice
         options:
           - ar
           # ... (other languages)
+      src_dir:
+        description: 'Source directory containing l10n bundles'
+        required: false
+        default: 'common/src/main/java/io/github/jy95/fds/common/l10n'
+        type: string

Then reference it in the translation step:

       - name: 🤖 Translate Java Resource Bundles
         run: |
           # We point to the directory containing your l10n classes
-          python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "common/src/main/java/io/github/jy95/fds/common/l10n"
+          python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "${{ github.event.inputs.src_dir }}"
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff3145e and 8ab7c79.

📒 Files selected for processing (1)
  • .github/workflows/translate_bundles.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: submit-pypi
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (5)
.github/workflows/translate_bundles.yml (5)

1-6: LGTM! Permissions correctly configured for PR creation.

The contents: write permission is necessary for the create-pull-request action to create commits and push changes.


68-76: LGTM! Modern action versions and proper caching configured.

The workflow uses up-to-date GitHub Actions and enables pip caching for improved performance.


94-104: LGTM! PR creation properly configured.

The PR creation step uses the latest action version, properly references secrets, and provides clear, descriptive metadata including the automation tool used.


88-93: Remove unused SRC_DIR environment variable.

The translate_java_test_strings.py script does not use the SRC_DIR environment variable. The script accepts --test-dir as a command-line argument (defaulting to common/src/test/java/io/github/jy95/fds) and does not read any environment variables. Remove the env section to clean up the workflow.

Likely an incorrect or invalid review comment.


14-63: No action needed. The language codes listed are all supported by argostranslate and conform to its expected ISO 639-1 format. The use of "zh" (rather than zh-Hans/zh-Hant variants) is correct for argostranslate's API. The implementation includes runtime validation that will fail with a clear error if an unsupported language is requested.

jy95 added 2 commits January 1, 2026 11:29
Updated dependency installation and translation script execution. Changed the version of create-pull-request action.
@jy95 jy95 merged commit 8571e44 into main Jan 1, 2026
20 checks passed
@jy95 jy95 deleted the bundle-translator branch January 1, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant