fix: add support for additional languages/dialects (closes #9) by Deepak8858 · Pull Request #24 · master-wayne7/safe_text

Deepak8858 · 2026-04-07T09:38:20Z

This PR adds support for Bengali (bn), Gujarati (gu), Punjabi (pa), Swahili (sw), and Urdu (ur). These languages are now part of the Language enum and have corresponding bad word lists in assets/data/.

Summary by CodeRabbit

New Features
- Expanded language support: Bengali, Gujarati, Punjabi, Swahili, and Urdu — selectable in app settings for a localized experience.
Content
- Added corresponding language data files (word lists) to improve localized handling and coverage across the newly supported languages.

…ne7#9)

coderabbitai · 2026-04-07T09:38:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2cbf355b-78cc-4c53-bec8-e53f64f88b5e

📥 Commits

Reviewing files that changed from the base of the PR and between 6220865 and 692f24c.

📒 Files selected for processing (1)

assets/data/bn.txt

✅ Files skipped from review due to trivial changes (1)

assets/data/bn.txt

📝 Walkthrough

Walkthrough

Adds five new languages (Bengali, Gujarati, Punjabi, Swahili, Urdu): new text asset files for each language and corresponding enum values and file-code mappings in lib/src/models/language.dart. No other code or public APIs changed.

Changes

Cohort / File(s)	Summary
Language Data Assets `assets/data/bn.txt`, `assets/data/gu.txt`, `assets/data/pa.txt`, `assets/data/sw.txt`, `assets/data/ur.txt`	Added five plain-text data files containing newline-delimited token lists: Bengali (`bn.txt`, 11 lines), Gujarati (`gu.txt`, 11 lines), Punjabi (`pa.txt`, 16 lines), Swahili (`sw.txt`, 21 lines), Urdu (`ur.txt`, 19 lines). No code changes in these files.
Language Model `lib/src/models/language.dart`	Added enum members `bengali`, `gujarati`, `punjabi`, `swahili`, `urdu`; extended `LanguageExtension.fromString(String)` to accept `bn`/`bengali`, `gu`/`gujarati`, `pa`/`punjabi`, `sw`/`swahili`, `ur`/`urdu`; updated `fileCode` getter to return `bn`, `gu`, `pa`, `sw`, `ur` for the new enums.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

updated the version to 2.0.0, introduced multi lingual filtering, #1: Adds new language enum values and corresponding asset files—directly related to these multilingual additions.

Suggested labels

enhancement

Poem

🐰 A hop, a nibble, five new tongues to say,
Bengali, Gujarati — bright as day,
Punjabi, Swahili, Urdu join the run,
Tokens tucked in files, one by one,
I nibble code and dance — hooray, well done!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding support for five new languages (Bengali, Gujarati, Punjabi, Swahili, Urdu) to the Language enum with corresponding data files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

assets/data/bn.txt (1)

1-13: Remove duplicate tokens to reduce redundant matching.

khanki (Line 1, Line 11) and ghu (Line 5, Line 13) are duplicated.

Proposed cleanup

 khanki
 madarchod
 bal
 choda
 ghu
 khankir
 magi
 saala
 bhenchod
 shala
-khanki
 tor
-ghu

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@assets/data/bn.txt` around lines 1 - 13, Remove the duplicate offensive
tokens by keeping a single instance of "khanki" and a single instance of "ghu"
in the token list (remove the second "khanki" and the second "ghu"), ensuring
the file contains only unique tokens; preserve the original ordering of first
occurrences when deduplicating and save the cleaned list back to the same file.

lib/src/models/language.dart (1)

318-332: Add tests for the new language mappings and data loading.

The new fromString/fileCode mappings are correct, but there’s no coverage for Bengali/Gujarati/Punjabi/Swahili/Urdu in test/language_data_test.dart (see test/language_data_test.dart:1-96). Please add round-trip mapping tests plus at least one bad-word detection check per new language.

Also applies to: 494-503

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/src/models/language.dart` around lines 318 - 332, Add unit tests to cover
the new Language mappings and data loading for Bengali, Gujarati, Punjabi,
Swahili, and Urdu: in the language data test file add round-trip mapping
assertions that Language.fromString(...) returns the expected enum (e.g.,
Language.fromString('bn') -> Language.bengali) and that the enum returns the
correct fileCode/string representation (via the existing fileCode/toString
helper) for each new language, and add at least one bad-word detection assertion
per language using the existing bad-word lookup helper used by other tests
(reuse the same pattern/assertions as existing tests for other languages so they
load language data and detect a known bad word). Ensure you reference and
exercise Language.fromString, the enum values (Language.bengali,
Language.gujarati, Language.punjabi, Language.swahili, Language.urdu), and the
fileCode accessor so the tests assert both mapping directions and bad-word
detection.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@assets/data/bn.txt`:
- Around line 1-13: Remove the duplicate offensive tokens by keeping a single
instance of "khanki" and a single instance of "ghu" in the token list (remove
the second "khanki" and the second "ghu"), ensuring the file contains only
unique tokens; preserve the original ordering of first occurrences when
deduplicating and save the cleaned list back to the same file.

In `@lib/src/models/language.dart`:
- Around line 318-332: Add unit tests to cover the new Language mappings and
data loading for Bengali, Gujarati, Punjabi, Swahili, and Urdu: in the language
data test file add round-trip mapping assertions that Language.fromString(...)
returns the expected enum (e.g., Language.fromString('bn') -> Language.bengali)
and that the enum returns the correct fileCode/string representation (via the
existing fileCode/toString helper) for each new language, and add at least one
bad-word detection assertion per language using the existing bad-word lookup
helper used by other tests (reuse the same pattern/assertions as existing tests
for other languages so they load language data and detect a known bad word).
Ensure you reference and exercise Language.fromString, the enum values
(Language.bengali, Language.gujarati, Language.punjabi, Language.swahili,
Language.urdu), and the fileCode accessor so the tests assert both mapping
directions and bad-word detection.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfb5ec04-dd84-48ed-8e37-3ab26d635071

📥 Commits

Reviewing files that changed from the base of the PR and between 0a57026 and 6220865.

📒 Files selected for processing (6)

assets/data/bn.txt
assets/data/gu.txt
assets/data/pa.txt
assets/data/sw.txt
assets/data/ur.txt
lib/src/models/language.dart

)

master-wayne7 · 2026-04-08T05:37:59Z

Hi @Deepak8858, I accidentally merged and reverted this earlier. I’ve recreated the PR from my side, thanks again for the contribution 🙌

fix: add support for additional languages/dialects (closes master-way…

6220865

…ne7#9)

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

fix: remove duplicate tokens in Bengali data (safe_text master-wayne7#24

692f24c

)

master-wayne7 approved these changes Apr 8, 2026

View reviewed changes

master-wayne7 merged commit 8072187 into master-wayne7:main Apr 8, 2026
3 checks passed

master-wayne7 mentioned this pull request Apr 8, 2026

Revert "fix: add support for additional languages/dialects (closes #9)" #25

Merged

master-wayne7 pushed a commit that referenced this pull request Apr 8, 2026

fix: remove duplicate tokens in Bengali data (safe_text #24)

47e43ff

coderabbitai bot mentioned this pull request Apr 8, 2026

Release/2.1.2 #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add support for additional languages/dialects (closes #9)#24

fix: add support for additional languages/dialects (closes #9)#24
master-wayne7 merged 2 commits intomaster-wayne7:mainfrom
Deepak8858:fix/issue-9-new-languages

Deepak8858 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

master-wayne7 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Deepak8858 commented Apr 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

master-wayne7 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deepak8858 commented Apr 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading