-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for dates with dots and spaces #1028
support for dates with dots and spaces #1028
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1028 +/- ##
=======================================
Coverage 98.29% 98.29%
=======================================
Files 234 234
Lines 2694 2694
=======================================
Hits 2648 2648
Misses 46 46
Continue to review full report at Codecov.
|
@Gallaecio sorry to bother but can you look into this PR? This is a regression for us (#1010) 🙏 Many thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, great job!
@@ -223,7 +223,7 @@ class _parser: | |||
|
|||
def __init__(self, tokens, settings): | |||
self.settings = settings | |||
self.tokens = list(tokens) | |||
self.tokens = [(t[0].strip(), t[1]) for t in list(tokens)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can drop the list
cast now:
self.tokens = [(t[0].strip(), t[1]) for t in list(tokens)] | |
self.tokens = [(t[0].strip(), t[1]) for t in tokens] |
@@ -35,7 +35,7 @@ | |||
|
|||
RE_SANITIZE_SKIP = re.compile(r'\t|\n|\r|\u00bb|,\s\u0432\b|\u200e|\xb7|\u200f|\u064e|\u064f', flags=re.M) | |||
RE_SANITIZE_RUSSIAN = re.compile(r'([\W\d])\u0433\.', flags=re.I | re.U) | |||
RE_SANITIZE_PERIOD = re.compile(r'(?<=\D+)\.', flags=re.U) | |||
RE_SANITIZE_PERIOD = re.compile(r'(?<=[^0-9\s])\.', flags=re.U) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💄
RE_SANITIZE_PERIOD = re.compile(r'(?<=[^0-9\s])\.', flags=re.U) | |
RE_SANITIZE_PERIOD = re.compile(r'(?<=[^\d\s])\.', flags=re.U) |
Can someone merge this to fix #1010 ? 🙏 |
Closes #1010
this will support dates such as
26 .10.21
or26 . 10.21
, in date26 . 10.21
the first period was getting removed in sanitization even when it was between numerals only(surrounded by spaces) which was not needed. that is why changed the period sanitization regex a bit