Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use versioned CLDR data #825

Merged
merged 3 commits into from
Oct 30, 2020
Merged

Conversation

noviluni
Copy link
Collaborator

@noviluni noviluni commented Oct 29, 2020

We have been using an old version of CLDR data. It has been tried to update the data from CLDR in the past but it's something really hard to do.

I changed the dateparser_scripts.utils.get_raw_data function to point to a specific version and updated the files accordingly.

That version was released in April 2017 and it is the most similar I found to the current data (I would say "exactly the same version"). Most of the changes you will see are related to the order, as it's important to keep all ordered to update the data easily in the future.

Once I fixed that I downloaded that version data (JSON files, script: python dateparser_scripts/get_cldr_data.py), I updated the resources (py files, script: dateparser_scripts/write_complete_data.py) and I run the tests and all is working as expected, so I think this is a good starting point. Once we merge this we can gradually update the CLDR versions to get the most recent version.

(you can review this easier by going commit by commit)

closes: #824

@noviluni noviluni added this to the 1.1.0 milestone Oct 29, 2020
@noviluni noviluni added the hacktoberfest-accepted This pull request looks valid for Hackoctoberfest even if it has not been approved yet. label Oct 29, 2020
@codecov
Copy link

codecov bot commented Oct 29, 2020

Codecov Report

Merging #825 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #825   +/-   ##
=======================================
  Coverage   98.30%   98.30%           
=======================================
  Files         231      231           
  Lines        2590     2590           
=======================================
  Hits         2546     2546           
  Misses         44       44           
Impacted Files Coverage Δ
dateparser/data/date_translation_data/af.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/agq.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ak.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/am.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ar.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/as.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/asa.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/ast.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/az-Cyrl.py 100.00% <ø> (ø)
dateparser/data/date_translation_data/az-Latn.py 100.00% <ø> (ø)
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 355052a...b176f48. Read the comment docs.

@noviluni
Copy link
Collaborator Author

BTW, I removed the orderedset dependency for the scripts.

@noviluni noviluni mentioned this pull request Oct 30, 2020
7 tasks
@noviluni
Copy link
Collaborator Author

noviluni commented Oct 30, 2020

related to this: #487

@noviluni
Copy link
Collaborator Author

I created this ticket with the step by step guide to update the CLDR in the future once we merge this: #826

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀

@noviluni noviluni merged commit c7fb631 into scrapinghub:master Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted This pull request looks valid for Hackoctoberfest even if it has not been approved yet.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants