Add FHIR Java ResourceBundle Translator by jy95 · Pull Request #273 · jy95/fds

jy95 · 2026-01-01T09:51:15Z

Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.

Summary by CodeRabbit

Release Notes

Chores
- Migrated localization translation system from properties files to Java resource bundles, streamlining the translation workflow and improving application maintainability.
- Updated automated translation infrastructure with modernized tooling and refined language support configuration to optimize localization coverage and translation efficiency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.

coderabbitai · 2026-01-01T09:51:25Z

Warning

Rate limit exceeded

@jy95 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 51 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ab7c79 and 3f73df6.

📒 Files selected for processing (2)

.github/workflows/translate_bundles.yml
scripts/translate_bundles.py

📝 Walkthrough

Walkthrough

Replaces a properties-file translation script with a new Java resource bundle translator. Adds FHIRBundleTranslator class that masks ICU/variable tokens, translates content via Argos Translate, and generates target-language bundle files. Updates workflow to orchestrate new bundle translation process.

Changes

Cohort / File(s)	Summary
New Translation Script `scripts/translate_bundles.py`	Introduces `FHIRBundleTranslator` class with token masking (ICU patterns, placeholders, variables), translation via Argos Translate, and Java class generation with language-suffixed naming. Includes CLI entry point for directory-based bundle discovery and file generation with `_<tgt>.java` suffix.
Removed Legacy Script `scripts/translate_properties.py`	Deletes entire properties-file translation workflow including model management, per-value translation, and properties file I/O.
Updated Workflow `.github/workflows/translate_bundles.yml`	Renames workflow; updates permissions (contents: read→write); removes pt-BR/zh-Hans/zh-Hant locales; adds argostranslate installation step; replaces properties translation with new Java bundle translation step via `translate_bundles.py`; updates PR creation logic with new branch naming scheme and body text.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / Workflow
    participant Trans as FHIRBundleTranslator
    participant Argos as Argos Translate
    participant FS as File System

    CLI->>Trans: __init__(src_lang='en', tgt_lang='es')
    Trans->>Argos: _setup_translator()
    Argos->>Argos: Load/install model (en→es)
    Argos-->>Trans: Model ready

    CLI->>FS: Discover *_en.java files in l10n/
    FS-->>CLI: File list

    loop For each eligible bundle file
        CLI->>Trans: create_translated_java_class(source_path)
        
        Note over Trans: Update class name<br/>with language suffix
        
        Trans->>FS: Read source content
        FS-->>Trans: Java class content
        
        loop For each bundle entry
            Trans->>Trans: Mask ICU/variable tokens
            Trans->>Argos: Translate masked text
            Argos-->>Trans: Translated text
            Trans->>Trans: Restore masked tokens
        end
        
        Trans->>FS: Write to _<tgt>.java file
        FS-->>Trans: File written
    end

    Trans-->>CLI: Translation complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Bundles hop from tongue to tongue,
Tokens masked, their songs are sung,
Argos whispers, translations bloom,
Java classes fill each room—
Properties fade, new bundles reign,
Language flows through code's domain!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title directly matches the primary change: introducing a FHIR Java ResourceBundle Translator (the new FHIRBundleTranslator class and scripts/translate_bundles.py).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-01-01T09:52:07Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.00% (target: -1.00%)	✅ ∅

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`b0e5008`)	1563	1563	100.00%
Head commit (`3f73df6`)	3126 (+1563)	3126 (+1563)	100.00% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#273)	0	0	∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

scripts/translate_bundles.py (4)
14-14: Optional: Use non-greedy matching in JAVADOC_PATTERN.

The greedy .* could match too much if there are multiple similar patterns on the same line (unlikely but possible).
🔎 Proposed refinement
-JAVADOC_PATTERN = re.compile(r'\* (.*) $([a-z]{2})$ resource bundle')
+JAVADOC_PATTERN = re.compile(r'\* (.*?) $([a-z]{2})$ resource bundle')
50-54: Recommended: Make exception handling more specific.

The broad Exception catch makes debugging difficult. Consider catching specific translation-related exceptions and logging more context.
🔎 Proposed refinement for better error diagnostics
     try:
         translated = argostranslate.translate.translate(masked_text, self.src, self.tgt)
-    except Exception as e:
-        print(f"Translation error: {e}")
+    except Exception as e:
+        print(f"Translation error for text '{text[:50]}...': {type(e).__name__}: {e}")
         return text
This provides the exception type and shows the text that failed (truncated for readability).
Based on learnings, the static analysis hint flagging BLE001 is valid here.

104-106: Recommended: Refine file filtering logic.

The filter "_" not in os.path.basename(f) is overly broad and could exclude legitimate base bundle files that happen to have underscores in their names (e.g., My_Base_Bundle.java). Consider filtering specifically for language suffixes.
🔎 Proposed refinement for more precise filtering
     search_pattern = os.path.join(args.dir, "**/l10n/*.java")
-    source_files = [f for f in glob.glob(search_pattern, recursive=True) 
-                    if "_" not in os.path.basename(f) and "package-info" not in f]
+    # Match files like MyBundle_es.java, MyBundle_pt.java but not MyBundle.java
+    lang_suffix_pattern = re.compile(r'_[a-z]{2}(-[A-Z][a-z]+)?\.java$')
+    source_files = [f for f in glob.glob(search_pattern, recursive=True) 
+                    if not lang_suffix_pattern.search(os.path.basename(f)) 
+                    and "package-info" not in f]
This specifically looks for language suffixes like _es, _pt, _zh-Hans rather than any underscore.
112-113: Optional: Add error handling for individual file processing.

If one file fails to process, the entire script continues without indication. Consider adding try-except to log failures while processing remaining files.
🔎 Proposed enhancement for resilient processing
+    failed = []
     for file_path in source_files:
-        translator.create_translated_java_class(file_path)
+        try:
+            translator.create_translated_java_class(file_path)
+        except Exception as e:
+            print(f"   [ERROR] Failed to process {file_path}: {e}")
+            failed.append(file_path)
+    
+    if failed:
+        print(f"\n⚠️  {len(failed)} file(s) failed to process:")
+        for f in failed:
+            print(f"  - {f}")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0e5008 and ab4c921.

📒 Files selected for processing (3)

.github/workflows/translate_properties.yml
scripts/translate_bundles.py
scripts/translate_properties.py

💤 Files with no reviewable changes (1)

scripts/translate_properties.py

🧰 Additional context used

🪛 Ruff (0.14.10)

scripts/translate_bundles.py

52-52: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: submit-pypi
GitHub Check: Analyze (java-kotlin)

🔇 Additional comments (1)

.github/workflows/translate_properties.yml (1)

83-83: LGTM!

The workflow correctly integrates with the new translate_bundles.py script. The argument change from --tgt-lang to --tgt matches the updated argument parser in the new script.

scripts/translate_bundles.py

coderabbitai · 2026-01-01T10:00:18Z

scripts/translate_bundles.py

+# Captures Javadoc to update target language
+JAVADOC_PATTERN = re.compile(r'\* (.*) \(([a-z]{2})\) resource bundle')
+# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
+ICU_PATTERN = re.compile(r'\{[^{}]+\}')


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/translate_bundles.py | head -50

Repository: jy95/fds

Length of output: 2225

🏁 Script executed:

# Search for nested braces in Java bundle files find . -name "*.java" -type f | xargs grep -l "plural\|choice\|select" 2>/dev/null | head -20

Repository: jy95/fds

Length of output: 1530

🏁 Script executed:

# Look for ICU MessageFormat patterns with nested braces in the codebase rg '{[^}]*{[^}]*}[^}]*}' --type=java -C2 2>/dev/null

Repository: jy95/fds

Length of output: 34

🏁 Script executed:

# Examine the actual DosageFields bundle files for ICU MessageFormat patterns cat ./common/src/main/java/io/github/jy95/fds/common/l10n/DosageFields.java | head -100

Repository: jy95/fds

Length of output: 3304

🏁 Script executed:

# Search more broadly for ICU MessageFormat patterns in Java files rg 'plural|choice|select' ./common/src/main/java/io/github/jy95/fds/common/l10n/ -A2 -B2

Repository: jy95/fds

Length of output: 19279

🏁 Script executed:

# Test regex behavior: see what ICU_PATTERN actually matches python3 << 'EOF' import re ICU_PATTERN = re.compile(r'\{[^{}]+\}') # Test cases test_cases = [ "{0}", # Simple variable "{minValue}", # Named variable "{count, plural, one {# item} other {# items}}", # Nested structure "{gender, select, male {He} female {She} other {They}}", # Nested select "{time, number, integer}", # Simple format ] print("Testing ICU_PATTERN: r'\\{[^{}]+\\}'") print("=" * 80) for test in test_cases: matches = ICU_PATTERN.findall(test) print(f"\nInput: {test}") print(f"Matches: {matches}") if not matches: print(" ⚠️ NO MATCH - This would NOT be masked!") EOF

Repository: jy95/fds

Length of output: 506

🏁 Script executed:

# Check if the codebase actually uses nested ICU MessageFormat patterns find . -path ./node_modules -prune -o -name "*.properties" -o -name "*Messages*.java" | xargs grep -E '\{[^}]*,\s*(plural|choice|select)' 2>/dev/null | head -20

Repository: jy95/fds

Length of output: 34

Critical: ICU_PATTERN cannot handle nested braces in actual bundle content.

The pattern r'\{[^{}]+\}' uses a negated character class that cannot match nested brace structures. The codebase contains actual nested ICU MessageFormat patterns throughout the resource bundles:

{0, plural, one{{0} time} other{{0} times}}

{condition, select, 0{{minValue} to {maxValue} {unit}} 1{up to {maxValue} {unit}} other{at least {minValue} {unit}}}

{period, plural, one{every {periodUnit}} other{every {period} {periodUnit}}}

The regex will only match innermost braces, leaving outer structures unmasked. When translation occurs, the structure becomes malformed and unrecoverable on unmask, corrupting translations.

🔎 Consider using a character-by-character stack-based parser

Replace the regex-based masking with a depth-aware parser:

-# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...} -ICU_PATTERN = re.compile(r'\{[^{}]+\}') +# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...} +def find_icu_variables(text): + """Find all ICU variable spans, handling nested braces.""" + results = [] + depth = 0 + start = None + for i, char in enumerate(text): + if char == '{': + if depth == 0: + start = i + depth += 1 + elif char == '}': + depth -= 1 + if depth == 0 and start is not None: + results.append((start, i + 1, text[start:i + 1])) + start = None + return results

Then update line 48 in process_text_content to iterate over results and mask each span.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In scripts/translate_bundles.py around line 16, the current ICU_PATTERN = re.compile(r'\{[^{}]+\}') fails on nested braces and corrupts ICU MessageFormat strings; replace the regex approach with a small depth-aware, character-by-character parser function that scans the text, tracks brace depth, records start/end indexes for each top-level brace span (handling nested braces correctly), and returns a list of spans; then remove or ignore ICU_PATTERN and update the call at line 48 in process_text_content to iterate over the returned spans and mask each span (instead of using the regex matches), ensuring all nested ICU structures are fully masked and later unmasked intact.

scripts/translate_bundles.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/translate_bundles.py (1)

52-56: Consider more specific exception handling.

The bare except Exception: catches all exceptions broadly. If argostranslate defines more specific exception types for translation failures, catching those specifically would make error handling more precise.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab4c921 and ff3145e.

📒 Files selected for processing (1)

scripts/translate_bundles.py

🧰 Additional context used

🪛 Ruff (0.14.10)

scripts/translate_bundles.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)

54-54: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: submit-pypi
GitHub Check: Analyze (java-kotlin)

🔇 Additional comments (4)

scripts/translate_bundles.py (4)

10-10: Good fix for escaped quote handling.

The updated pattern (?:[^"\\]|\\.)* correctly handles escaped characters in bundle values, addressing the previous review concern.

76-76: Good fix for Javadoc preservation.

The pattern now correctly preserves the original descriptive text (group 1) while updating the language code, addressing the previous review concern.

93-94: Good addition of file existence check.

The warning message helpfully informs users when files are being overwritten, addressing the previous review concern about silent overwrites.

100-121: LGTM! Well-structured main function.

The file discovery logic correctly filters translated files (containing "_") and package-info files, and provides clear feedback when no source files are found.

scripts/translate_bundles.py

Updated GitHub Actions workflow to use newer action versions and added a step for translating Java test files.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/translate_bundles.yml (1)

83-87: Consider parameterizing the hardcoded directory path.

The hardcoded path common/src/main/java/io/github/jy95/fds/common/l10n reduces flexibility. If the directory structure changes, the workflow will need manual updates.

🔎 Suggested refactor to use workflow input

Add a workflow input for the source directory:

     inputs:
       tgt_lang:
         description: 'Target language (must be supported by argostranslate)'
         required: true
         type: choice
         options:
           - ar
           # ... (other languages)
+      src_dir:
+        description: 'Source directory containing l10n bundles'
+        required: false
+        default: 'common/src/main/java/io/github/jy95/fds/common/l10n'
+        type: string

Then reference it in the translation step:

       - name: 🤖 Translate Java Resource Bundles
         run: |
           # We point to the directory containing your l10n classes
-          python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "common/src/main/java/io/github/jy95/fds/common/l10n"
+          python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "${{ github.event.inputs.src_dir }}"

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff3145e and 8ab7c79.

📒 Files selected for processing (1)

.github/workflows/translate_bundles.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Codacy Static Code Analysis
GitHub Check: submit-pypi
GitHub Check: Analyze (java-kotlin)

🔇 Additional comments (5)

.github/workflows/translate_bundles.yml (5)

1-6: LGTM! Permissions correctly configured for PR creation.

The contents: write permission is necessary for the create-pull-request action to create commits and push changes.

68-76: LGTM! Modern action versions and proper caching configured.

The workflow uses up-to-date GitHub Actions and enables pip caching for improved performance.

94-104: LGTM! PR creation properly configured.

The PR creation step uses the latest action version, properly references secrets, and provides clear, descriptive metadata including the automation tool used.

88-93: Remove unused SRC_DIR environment variable.

The translate_java_test_strings.py script does not use the SRC_DIR environment variable. The script accepts --test-dir as a command-line argument (defaulting to common/src/test/java/io/github/jy95/fds) and does not read any environment variables. Remove the env section to clean up the workflow.

Likely an incorrect or invalid review comment.

14-63: No action needed. The language codes listed are all supported by argostranslate and conform to its expected ISO 639-1 format. The use of "zh" (rather than zh-Hans/zh-Hant variants) is correct for argostranslate's API. The implementation includes runtime validation that will fail with a clear error if an unsupported language is requested.

.github/workflows/translate_bundles.yml

Updated dependency installation and translation script execution. Changed the version of create-pull-request action.

Delete scripts/translate_properties.py

b1d4011

Replace translate_properties.py with translate_bundles.py

ab4c921

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

Enhance regex patterns for bundle translation

ff3145e

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

scripts/translate_bundles.py Show resolved Hide resolved

jy95 added 2 commits January 1, 2026 11:25

Update translate_bundles.yml workflow

8ab7c79

Updated GitHub Actions workflow to use newer action versions and added a step for translating Java test files.

Update default directory for Java files search

e2c1df7

coderabbitai bot reviewed Jan 1, 2026

View reviewed changes

.github/workflows/translate_bundles.yml Show resolved Hide resolved

jy95 added 2 commits January 1, 2026 11:29

Update default directory to use environment variable

5db917a

Refactor translation workflow and update dependencies

3f73df6

Updated dependency installation and translation script execution. Changed the version of create-pull-request action.

jy95 merged commit 8571e44 into main Jan 1, 2026
20 checks passed

jy95 deleted the bundle-translator branch January 1, 2026 10:35

Uh oh!

Conversation

jy95 commented Jan 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

codacy-production bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jy95 commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2026 •

edited

Loading

codacy-production bot commented Jan 1, 2026 •

edited

Loading