Skip to content

Better cs translation#1325

Merged
AdrianAtZyte merged 6 commits into
scrapinghub:masterfrom
honzaHlavnicka:better-cs-translation
May 7, 2026
Merged

Better cs translation#1325
AdrianAtZyte merged 6 commits into
scrapinghub:masterfrom
honzaHlavnicka:better-cs-translation

Conversation

@honzaHlavnicka
Copy link
Copy Markdown
Contributor

This pull request improves Czech language parsing in dateparser.
I updated cs.yaml by adding additional word forms, grammatical cases, and synonyms.
It also introduces support for Czech cardinal and ordinal numbers (including inflected forms), adds Czech variants for AM/PM, and enables parsing of expressions like "půl druhé" (1:30).

Notes:

  • Regenerated cs.py from the updated YAML
  • Added tests covering the new functionality
  • All tests are passing

Comment thread dateparser_data/supplementary_language_data/date_translation_data/cs.yaml Outdated
Comment on lines -27 to -29
july:
- Črc
- črv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this accidental? I wonder if we should keep črc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was, although I have never saw anyone using this variant. Should I put it back?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now črv seems to be gone instead.

@AdrianAtZyte AdrianAtZyte requested a review from serhii73 April 29, 2026 11:27
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.10%. Comparing base (b72ed09) to head (d485102).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1325      +/-   ##
==========================================
+ Coverage   97.08%   97.10%   +0.02%     
==========================================
  Files         235      235              
  Lines        2877     2904      +27     
==========================================
+ Hits         2793     2820      +27     
  Misses         84       84              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AdrianAtZyte AdrianAtZyte self-requested a review April 29, 2026 11:50
@AdrianAtZyte
Copy link
Copy Markdown
Contributor

Could you check the 2 test issues?

…ury to relative-type-regex

- Fix broken simplification regex: '(?<=(za|před)\s)vteřin(u|ou)|sekund(u|ou)\b'
  had incorrect operator precedence — the second alternative 'sekund(u|ou)\b' matched
  unconditionally (no lookbehind), transforming e.g. 'před 45 sekundou' into
  '45 1 sekundu'. Wrapped both alternatives in a non-capturing group so the
  lookbehind applies to both.

- Move \1 decade ago / in \1 decade / \1 century ago / in \1 century entries
  from relative-type to relative-type-regex in both cs.py and cs.yaml. These
  entries contain regex patterns (\d+[.,]?\d*); placing them in relative-type
  caused their literal text (including '?', '[', ']') to be indexed as wordchars,
  making '?' unique to Czech and causing Spanish text detection to fail.
@serhii73 serhii73 requested a review from wRAR May 7, 2026 09:30
The expected translation for 'prvního ledna 2021 v půl šesté' had a
double space before the time component. Since 'v' is in the Czech skip
list, it is removed without leaving an extra space, so the correct
expected value is '1. january 2021 5:30' (single space).
@AdrianAtZyte AdrianAtZyte merged commit 33ec7ef into scrapinghub:master May 7, 2026
15 checks passed
@AdrianAtZyte
Copy link
Copy Markdown
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants