Conversation
Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.
|
Warning Rate limit exceeded@jy95 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 51 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughReplaces a properties-file translation script with a new Java resource bundle translator. Adds Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI / Workflow
participant Trans as FHIRBundleTranslator
participant Argos as Argos Translate
participant FS as File System
CLI->>Trans: __init__(src_lang='en', tgt_lang='es')
Trans->>Argos: _setup_translator()
Argos->>Argos: Load/install model (en→es)
Argos-->>Trans: Model ready
CLI->>FS: Discover *_en.java files in l10n/
FS-->>CLI: File list
loop For each eligible bundle file
CLI->>Trans: create_translated_java_class(source_path)
Note over Trans: Update class name<br/>with language suffix
Trans->>FS: Read source content
FS-->>Trans: Java class content
loop For each bundle entry
Trans->>Trans: Mask ICU/variable tokens
Trans->>Argos: Translate masked text
Argos-->>Trans: Translated text
Trans->>Trans: Restore masked tokens
end
Trans->>FS: Write to _<tgt>.java file
FS-->>Trans: File written
end
Trans-->>CLI: Translation complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
scripts/translate_bundles.py (4)
14-14: Optional: Use non-greedy matching in JAVADOC_PATTERN.The greedy
.*could match too much if there are multiple similar patterns on the same line (unlikely but possible).🔎 Proposed refinement
-JAVADOC_PATTERN = re.compile(r'\* (.*) \(([a-z]{2})\) resource bundle') +JAVADOC_PATTERN = re.compile(r'\* (.*?) \(([a-z]{2})\) resource bundle')
50-54: Recommended: Make exception handling more specific.The broad
Exceptioncatch makes debugging difficult. Consider catching specific translation-related exceptions and logging more context.🔎 Proposed refinement for better error diagnostics
try: translated = argostranslate.translate.translate(masked_text, self.src, self.tgt) - except Exception as e: - print(f"Translation error: {e}") + except Exception as e: + print(f"Translation error for text '{text[:50]}...': {type(e).__name__}: {e}") return textThis provides the exception type and shows the text that failed (truncated for readability).
Based on learnings, the static analysis hint flagging
BLE001is valid here.
104-106: Recommended: Refine file filtering logic.The filter
"_" not in os.path.basename(f)is overly broad and could exclude legitimate base bundle files that happen to have underscores in their names (e.g.,My_Base_Bundle.java). Consider filtering specifically for language suffixes.🔎 Proposed refinement for more precise filtering
search_pattern = os.path.join(args.dir, "**/l10n/*.java") - source_files = [f for f in glob.glob(search_pattern, recursive=True) - if "_" not in os.path.basename(f) and "package-info" not in f] + # Match files like MyBundle_es.java, MyBundle_pt.java but not MyBundle.java + lang_suffix_pattern = re.compile(r'_[a-z]{2}(-[A-Z][a-z]+)?\.java$') + source_files = [f for f in glob.glob(search_pattern, recursive=True) + if not lang_suffix_pattern.search(os.path.basename(f)) + and "package-info" not in f]This specifically looks for language suffixes like
_es,_pt,_zh-Hansrather than any underscore.
112-113: Optional: Add error handling for individual file processing.If one file fails to process, the entire script continues without indication. Consider adding try-except to log failures while processing remaining files.
🔎 Proposed enhancement for resilient processing
+ failed = [] for file_path in source_files: - translator.create_translated_java_class(file_path) + try: + translator.create_translated_java_class(file_path) + except Exception as e: + print(f" [ERROR] Failed to process {file_path}: {e}") + failed.append(file_path) + + if failed: + print(f"\n⚠️ {len(failed)} file(s) failed to process:") + for f in failed: + print(f" - {f}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/translate_properties.ymlscripts/translate_bundles.pyscripts/translate_properties.py
💤 Files with no reviewable changes (1)
- scripts/translate_properties.py
🧰 Additional context used
🪛 Ruff (0.14.10)
scripts/translate_bundles.py
52-52: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: submit-pypi
- GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (1)
.github/workflows/translate_properties.yml (1)
83-83: LGTM!The workflow correctly integrates with the new
translate_bundles.pyscript. The argument change from--tgt-langto--tgtmatches the updated argument parser in the new script.
| # Captures Javadoc to update target language | ||
| JAVADOC_PATTERN = re.compile(r'\* (.*) \(([a-z]{2})\) resource bundle') | ||
| # Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...} | ||
| ICU_PATTERN = re.compile(r'\{[^{}]+\}') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n scripts/translate_bundles.py | head -50Repository: jy95/fds
Length of output: 2225
🏁 Script executed:
# Search for nested braces in Java bundle files
find . -name "*.java" -type f | xargs grep -l "plural\|choice\|select" 2>/dev/null | head -20Repository: jy95/fds
Length of output: 1530
🏁 Script executed:
# Look for ICU MessageFormat patterns with nested braces in the codebase
rg '{[^}]*{[^}]*}[^}]*}' --type=java -C2 2>/dev/nullRepository: jy95/fds
Length of output: 34
🏁 Script executed:
# Examine the actual DosageFields bundle files for ICU MessageFormat patterns
cat ./common/src/main/java/io/github/jy95/fds/common/l10n/DosageFields.java | head -100Repository: jy95/fds
Length of output: 3304
🏁 Script executed:
# Search more broadly for ICU MessageFormat patterns in Java files
rg 'plural|choice|select' ./common/src/main/java/io/github/jy95/fds/common/l10n/ -A2 -B2Repository: jy95/fds
Length of output: 19279
🏁 Script executed:
# Test regex behavior: see what ICU_PATTERN actually matches
python3 << 'EOF'
import re
ICU_PATTERN = re.compile(r'\{[^{}]+\}')
# Test cases
test_cases = [
"{0}", # Simple variable
"{minValue}", # Named variable
"{count, plural, one {# item} other {# items}}", # Nested structure
"{gender, select, male {He} female {She} other {They}}", # Nested select
"{time, number, integer}", # Simple format
]
print("Testing ICU_PATTERN: r'\\{[^{}]+\\}'")
print("=" * 80)
for test in test_cases:
matches = ICU_PATTERN.findall(test)
print(f"\nInput: {test}")
print(f"Matches: {matches}")
if not matches:
print(" ⚠️ NO MATCH - This would NOT be masked!")
EOFRepository: jy95/fds
Length of output: 506
🏁 Script executed:
# Check if the codebase actually uses nested ICU MessageFormat patterns
find . -path ./node_modules -prune -o -name "*.properties" -o -name "*Messages*.java" | xargs grep -E '\{[^}]*,\s*(plural|choice|select)' 2>/dev/null | head -20Repository: jy95/fds
Length of output: 34
Critical: ICU_PATTERN cannot handle nested braces in actual bundle content.
The pattern r'\{[^{}]+\}' uses a negated character class that cannot match nested brace structures. The codebase contains actual nested ICU MessageFormat patterns throughout the resource bundles:
{0, plural, one{{0} time} other{{0} times}}{condition, select, 0{{minValue} to {maxValue} {unit}} 1{up to {maxValue} {unit}} other{at least {minValue} {unit}}}{period, plural, one{every {periodUnit}} other{every {period} {periodUnit}}}
The regex will only match innermost braces, leaving outer structures unmasked. When translation occurs, the structure becomes malformed and unrecoverable on unmask, corrupting translations.
🔎 Consider using a character-by-character stack-based parser
Replace the regex-based masking with a depth-aware parser:
-# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
-ICU_PATTERN = re.compile(r'\{[^{}]+\}')
+# Protection for ICU variables: {0}, {period}, {maxFrequency, plural, ...}
+def find_icu_variables(text):
+ """Find all ICU variable spans, handling nested braces."""
+ results = []
+ depth = 0
+ start = None
+ for i, char in enumerate(text):
+ if char == '{':
+ if depth == 0:
+ start = i
+ depth += 1
+ elif char == '}':
+ depth -= 1
+ if depth == 0 and start is not None:
+ results.append((start, i + 1, text[start:i + 1]))
+ start = None
+ return resultsThen update line 48 in process_text_content to iterate over results and mask each span.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In scripts/translate_bundles.py around line 16, the current ICU_PATTERN =
re.compile(r'\{[^{}]+\}') fails on nested braces and corrupts ICU MessageFormat
strings; replace the regex approach with a small depth-aware,
character-by-character parser function that scans the text, tracks brace depth,
records start/end indexes for each top-level brace span (handling nested braces
correctly), and returns a list of spans; then remove or ignore ICU_PATTERN and
update the call at line 48 in process_text_content to iterate over the returned
spans and mask each span (instead of using the regex matches), ensuring all
nested ICU structures are fully masked and later unmasked intact.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/translate_bundles.py (1)
52-56: Consider more specific exception handling.The bare
except Exception:catches all exceptions broadly. If argostranslate defines more specific exception types for translation failures, catching those specifically would make error handling more precise.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/translate_bundles.py
🧰 Additional context used
🪛 Ruff (0.14.10)
scripts/translate_bundles.py
33-33: Avoid specifying long messages outside the exception class
(TRY003)
54-54: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: submit-pypi
- GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (4)
scripts/translate_bundles.py (4)
10-10: Good fix for escaped quote handling.The updated pattern
(?:[^"\\]|\\.)*correctly handles escaped characters in bundle values, addressing the previous review concern.
76-76: Good fix for Javadoc preservation.The pattern now correctly preserves the original descriptive text (group 1) while updating the language code, addressing the previous review concern.
93-94: Good addition of file existence check.The warning message helpfully informs users when files are being overwritten, addressing the previous review concern about silent overwrites.
100-121: LGTM! Well-structured main function.The file discovery logic correctly filters translated files (containing "_") and package-info files, and provides clear feedback when no source files are found.
Updated GitHub Actions workflow to use newer action versions and added a step for translating Java test files.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/translate_bundles.yml (1)
83-87: Consider parameterizing the hardcoded directory path.The hardcoded path
common/src/main/java/io/github/jy95/fds/common/l10nreduces flexibility. If the directory structure changes, the workflow will need manual updates.🔎 Suggested refactor to use workflow input
Add a workflow input for the source directory:
inputs: tgt_lang: description: 'Target language (must be supported by argostranslate)' required: true type: choice options: - ar # ... (other languages) + src_dir: + description: 'Source directory containing l10n bundles' + required: false + default: 'common/src/main/java/io/github/jy95/fds/common/l10n' + type: stringThen reference it in the translation step:
- name: 🤖 Translate Java Resource Bundles run: | # We point to the directory containing your l10n classes - python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "common/src/main/java/io/github/jy95/fds/common/l10n" + python scripts/translate_bundles.py --tgt "${{ github.event.inputs.tgt_lang }}" --dir "${{ github.event.inputs.src_dir }}"
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/translate_bundles.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: submit-pypi
- GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (5)
.github/workflows/translate_bundles.yml (5)
1-6: LGTM! Permissions correctly configured for PR creation.The
contents: writepermission is necessary for thecreate-pull-requestaction to create commits and push changes.
68-76: LGTM! Modern action versions and proper caching configured.The workflow uses up-to-date GitHub Actions and enables pip caching for improved performance.
94-104: LGTM! PR creation properly configured.The PR creation step uses the latest action version, properly references secrets, and provides clear, descriptive metadata including the automation tool used.
88-93: Remove unusedSRC_DIRenvironment variable.The
translate_java_test_strings.pyscript does not use theSRC_DIRenvironment variable. The script accepts--test-diras a command-line argument (defaulting tocommon/src/test/java/io/github/jy95/fds) and does not read any environment variables. Remove theenvsection to clean up the workflow.Likely an incorrect or invalid review comment.
14-63: No action needed. The language codes listed are all supported by argostranslate and conform to its expected ISO 639-1 format. The use of "zh" (rather than zh-Hans/zh-Hant variants) is correct for argostranslate's API. The implementation includes runtime validation that will fail with a clear error if an unsupported language is requested.
Updated dependency installation and translation script execution. Changed the version of create-pull-request action.
Implements a translator for FHIR Java ResourceBundles, handling text translation while preserving ICU syntax and variables. It includes regex patterns for bundle entry, class names, and Javadoc, and generates new Java class files with translated content.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.