Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make order of previous locales deterministic #851

Merged
merged 1 commit into from
Dec 16, 2020

Conversation

surkova
Copy link
Contributor

@surkova surkova commented Dec 3, 2020

When using set for storing previously seen locales we eventually end up in the land of non-determinism when looping through them:

Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = set()
>>> a.add("ru")
>>> a.add("es")
>>> a
{'es', 'ru'}
>>> a.add("en")
>>> a
{'es', 'en', 'ru'}
...

Since python 3.7 insertion order is preserved in dictionaries, so we can take advantage of this.

I'm not sure what the policy is as for support of python < 3.7, on travis we seem to be running even 3.5, then my test will fail there. However, when looking at this discussion, I see there's an appetite for making this logic deterministic. Approach with dict should not be slowing parsing either:

>>> b = {}
>>> b.update({"es": "es"})  # first string parsed
>>> b
{'es': 'es'}
>>> b.update({"ru": "ru"})  # second string parsed
>>> b
{'es': 'es', 'ru': 'ru'}
>>> b.update({"es": "es"})  # third string parsed
>>> b
{'es': 'es', 'ru': 'ru'}
...

@codecov
Copy link

codecov bot commented Dec 3, 2020

Codecov Report

Merging #851 (dc0c1d4) into master (939e722) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #851      +/-   ##
==========================================
+ Coverage   98.33%   98.37%   +0.03%     
==========================================
  Files         231      231              
  Lines        2590     2591       +1     
==========================================
+ Hits         2547     2549       +2     
+ Misses         43       42       -1     
Impacted Files Coverage Δ
dateparser/date.py 100.00% <100.00%> (ø)
dateparser/languages/dictionary.py 100.00% <0.00%> (+0.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 939e722...dc0c1d4. Read the comment docs.

@surkova
Copy link
Contributor Author

surkova commented Dec 3, 2020

I can change to use collections.OrderedDict if we want to keep this change and need to support < 3.7.

Copy link
Collaborator

@noviluni noviluni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @surkova!

Could you kindly address what I added in the comments?

Thanks!

dateparser/date.py Outdated Show resolved Hide resolved
dateparser/date.py Outdated Show resolved Hide resolved
@surkova surkova force-pushed the previous-locales-deterministic branch from 3dfae75 to dc0c1d4 Compare December 3, 2020 12:37
@surkova surkova requested a review from noviluni December 3, 2020 12:41
@@ -457,7 +457,7 @@ def date_strings():
yield stripped_date_string

if self.try_previous_locales:
for locale in self.previous_locales:
for locale in self.previous_locales.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💄

Suggested change
for locale in self.previous_locales.keys():
for locale in self.previous_locales:

I’m just mentioning this in case .keys() was not added on purpose. But given .keys() is more explicit, maybe the current code is more readable. ✔️ either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as-is. For me, it seems more accurate to explicitly indicate that we iterate through the keys. 👍

@noviluni noviluni merged commit a45b23e into scrapinghub:master Dec 16, 2020
@noviluni
Copy link
Collaborator

Thanks again @surkova! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants