Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Russian abbreviation weekdays #484

Conversation

christianfm10
Copy link
Contributor

@christianfm10 christianfm10 commented Jan 9, 2019

Added russian abbreviation weekdays:
Monday: Пнд
Tuesday: Втр
Wednesday: Срд
Thursday: Чтв
Friday: Птн
Saturday: Сбт
Sunday: Вск

Fixes #436

@codecov
Copy link

codecov bot commented Jan 9, 2019

Codecov Report

Merging #484 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #484      +/-   ##
==========================================
+ Coverage   94.99%   94.99%   +<.01%     
==========================================
  Files         302      302              
  Lines        2457     2459       +2     
==========================================
+ Hits         2334     2336       +2     
  Misses        123      123
Impacted Files Coverage Δ
dateparser/data/date_translation_data/ru.py 100% <ø> (ø) ⬆️
dateparser/timezone_parser.py 97.91% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66d5470...ceaa0fc. Read the comment docs.

@eLRuLL
Copy link
Contributor

eLRuLL commented Jan 10, 2019

We'll need a maintener that knows russian that could approve this.

Also it becomes more evident that we should be splitting all this "dates" related information to something more structured. Maybe inside every language we should separate the files to:

  • days
  • months

Just to give more perspective, but maybe this approach could affect performance. Maybe just one file with a better structure defining all that information by sections.

@asadurski I would also like your input here. Thanks

@eLRuLL eLRuLL self-requested a review January 10, 2019 15:12
Copy link
Contributor

@eLRuLL eLRuLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christianfm10 please include tests

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume more common two-letter abbreviations are handled elsewhere, new three-letter abbreviations look good to me, except for one strange capitalization.

wednesday:
- СрД
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the last letter shouldn't be capitalized, so that would be Срд

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already fixed it. By the way, can you review the tests that i added? 🙏

@asadurski
Copy link
Member

Thanks for taking care of these!
However, for those updates to work, you need to run scripts/write_complete_data.py, which will merge them with already existing language tokens and will update ru.py language data file only with differences.
Then you need to commit that file too.
(please see https://github.com/scrapinghub/dateparser/blob/master/CONTRIBUTING.rst#guidelines-for-editing-translation-data)
Also, as @eLRuLL already mentioned, tests are essential here, so please include them too.

@christianfm10
Copy link
Contributor Author

@asadurski thanks for your reply. I added "ru.py" file and tests, what do you think?

Copy link
Contributor

@eLRuLL eLRuLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reviewing this as accepted to not interfere with @lopuhin and @asadurski .

Thank you guys for jumping into this.

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍 Left a suggestion for one more common abbreviation for Sunday, and a question regarding case.

- Субботу
sunday:
- Вск
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, another common one is Вскр

@@ -66,34 +66,41 @@
],
"monday": [
"понедельник",
"пн"
"пн",
"Пнд"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not related to this PR, but I'm curious if case matters here, and why is пн lower but Пнд capitalized?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization doesn't matter when processing dates. However, If we add two different versions of the same word both are added to the py file. This is going to be fixed in this PR by @asadurski: #487

@asadurski asadurski added this to To review in Project board Jun 8, 2019
@Gallaecio
Copy link
Member

@christianfm10 Do you think you will have time to address @lopuhin’s feedback?

Otherwise, @eLRuLL, @lopuhin, @asadurski, shall we merge as is?

@eLRuLL
Copy link
Contributor

eLRuLL commented Apr 28, 2020

totally up to @lopuhin and @asadurski at this point I think

@lopuhin
Copy link
Member

lopuhin commented Apr 29, 2020

In terms of Russian language, changes are good. From the dataparser side, I have the following questions:

  • does capitalization matter? If yes then I can check in more detail which capitalization is better
  • yaml and py files are different, is this fine?

@noviluni noviluni added the language language related issue/PR label Apr 30, 2020
@Gallaecio
Copy link
Member

yaml and py files are different, is this fine?

It is not fine. We need to add a check for this to the CI.

@noviluni
Copy link
Collaborator

Hi @lopuhin
There is a PR adding tests to avoid merging different yaml - py files: #663.

On the other side, as answered in the inline comment, capitalization doesn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language language related issue/PR
Projects
Project board
To review
Development

Successfully merging this pull request may close these issues.

Missing abbreviation of Russian Thursday
6 participants