Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make use of most common content locales #805

Merged
merged 13 commits into from
Feb 9, 2021
Merged

Conversation

mirceachira
Copy link
Contributor

closes #714

@mirceachira mirceachira self-assigned this Oct 7, 2020
@Gallaecio
Copy link
Member

Hmm… I don’t see how these changes could cause those CI issues 😕

@noviluni
Copy link
Collaborator

noviluni commented Oct 8, 2020

Hi @mirceachira, @Gallaecio

When running python order_languages.py (As @mirceachira did), we get the order from https://github.com/unicode-cldr/cldr-dates-full.git.

Those errors in the pipelines are happening because CLDR included new languages, and when we try to import the languages from the list of languages ( dateparser.data.language_info.language_order) we can't find them.

I think the new locales are: ccp, ceb, ff-Latn, ia, jv, ku, mi, sd, tg, tt, wo, xh, yue-Hans and yue-Hant.

To create the files for the new languages we should run:

  1. python get_cldr_data.py It creates the json files
  2. python write_complete_data.py It creates the py files.

However, we should also:

  • Update the "supported locales" list in the docs
  • Update the avoid_languages list from dateparser.dateparser_scripts. We may need to add some of the new languages or remove some of the old languages from this list.
  • Run again the order_languages script as it should reflect the avoid_languages changes.

@mirceachira would you like to work on this? I think you could create a separate PR just running the three scripts (and updating the docs and the avoid_languages list). Once we merge that, we can come here and update this PR and we won't see any issue.

Of course, if you don't want/ you can't I can do it. :)

@mirceachira
Copy link
Contributor Author

Hey @noviluni for sure I would like to pick that up! Thanks for the detailed tips on it!

@noviluni noviluni added the hacktoberfest-accepted This pull request looks valid for Hackoctoberfest even if it has not been approved yet. label Oct 26, 2020
@noviluni
Copy link
Collaborator

Hi @mirceachira, you will be able run python dateparser_scripts/order_languages.py without errors after merging this: #825

However, I checked what you did and I think this is wrong: https://github.com/scrapinghub/dateparser/pull/805/files#diff-1c848ec911779bd8e06fea6e61745c57c43a463bab86d8ef955f2e29ec8484d0R60 because is generating multiple times the same language in the language_order list in languages_info.py.

On the other hand, I would really appreciate it if you add the date when that list was created in this comment: https://github.com/scrapinghub/dateparser/pull/805/files#diff-1c848ec911779bd8e06fea6e61745c57c43a463bab86d8ef955f2e29ec8484d0R15 to allow us to update in in the future.

@noviluni
Copy link
Collaborator

Hi @mirceachira, we merged the other PR (#825), could you rebase this PR against master? 🙂

@codecov
Copy link

codecov bot commented Oct 31, 2020

Codecov Report

Merging #805 (2278a7e) into master (9f81cf3) will decrease coverage by 0.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #805      +/-   ##
==========================================
- Coverage   98.37%   98.15%   -0.22%     
==========================================
  Files         231      229       -2     
  Lines        2518     2495      -23     
==========================================
- Hits         2477     2449      -28     
- Misses         41       46       +5     
Impacted Files Coverage Δ
dateparser/data/date_translation_data/af.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/agq.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ak.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/am.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ar.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/as.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/asa.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ast.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/az-Cyrl.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/az-Latn.py 100.00% <ø> (ø)
... and 211 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f81cf3...f3be2a4. Read the comment docs.

@mirceachira
Copy link
Contributor Author

@noviluni simply running the script generates the same problem you mentioned here #805 (comment)

Though there are much fewer tests that required fixing now after #825 . I added the languages accordingly to 'avoid_languages' and fixed the remaining issues.

Please note that I removed this test: 11ba3fb
I'm not sure why the locale order resulted in its failure as well as the failure in the shortname test that I fixed here 876941e

dateparser_scripts/order_languages.py Outdated Show resolved Hide resolved
dateparser_scripts/write_complete_data.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@noviluni noviluni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mirceachira!

I think we are almost ready to merge this! I'm ok with the changes in the tests, but I would need you to revert the avoid_languages set as suggested and run this: python dateparser_scripts/order_languages.py again.

After that I think we will be able to merge it! 🚀

dateparser_scripts/order_languages.py Outdated Show resolved Hide resolved
Co-authored-by: Marc Hernández <noviluni@gmail.com>
dateparser_scripts/order_languages.py Outdated Show resolved Hide resolved
Co-authored-by: Marc Hernández <noviluni@gmail.com>
@noviluni noviluni closed this Nov 25, 2020
@noviluni noviluni reopened this Nov 25, 2020
@noviluni noviluni closed this Nov 25, 2020
@noviluni noviluni reopened this Nov 25, 2020
@noviluni
Copy link
Collaborator

I've been opening and closing this PR because we just switched from travis-ci.org to travis-ci.com but the checks show the old travis-ci.org...

Anyway, it seems that all tests passed in travis-ci.com: https://travis-ci.com/github/scrapinghub/dateparser/builds/204189662

@noviluni
Copy link
Collaborator

Ok, this is a hard PR to review 😅

You should change the "in" code by "id" in the most_common_locales list too, and after that you should run python dateparser_scripts/write_complete_data.py.

@noviluni
Copy link
Collaborator

Hi @mirceachira, thanks for fixing this. As travis-ci stopped working we can't run the tests, so I can't merge this. I started migrating to github actions (WIP PR: https://github.com/scrapinghub/dateparser/pull/859/files) after that we will be able to run the tests and merge this :)

@noviluni noviluni closed this Feb 9, 2021
@noviluni noviluni reopened this Feb 9, 2021
Copy link
Collaborator

@noviluni noviluni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@noviluni noviluni merged commit 615c729 into master Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted This pull request looks valid for Hackoctoberfest even if it has not been approved yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the default language used order
3 participants