Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word "do" is parsed as "tomorrow" #567

Open
ArthurSens opened this issue Sep 14, 2019 · 5 comments
Open

Word "do" is parsed as "tomorrow" #567

ArthurSens opened this issue Sep 14, 2019 · 5 comments

Comments

@ArthurSens
Copy link

ArthurSens commented Sep 14, 2019

Command

from dateparser import search as se
print(dateparser.parse("today"))
print(se.search_dates("Do I have anything on the next month?", settings={"PREFER_DATES_FROM" : "future"}))

Current result

2019-09-14 20:09:17.088302
[('Do', datetime.datetime(2019, 9, 15, 0, 0))]

Expected result

2019-09-14 20:09:17.088302
None

To be honest, it would be nice if next month could be parsed by getting the current month and adding 1...buuuuut the main reason of the issue is that the word "Do" is being interpreted for some reason

@augustomen
Copy link

"Do" is short for "Domingo" - Spanish for Sunday. Include argument add_detected_language to understand why it happens:

>>> search_dates('Do I have', settings={"PREFER_DATES_FROM" : "future"}, add_detected_language=True)
[('Do', datetime.datetime(2019, 10, 6, 0, 0), 'es')]

Restrict languages to remove these noises. Unfortunately, "next month" is not detected:

>>> search_dates('Do I have anything on the next month?', languages=['en'], settings={"PREFER_DATES_FROM" : "future"}, add_detected_language=True)
None

@ArthurSens
Copy link
Author

Oh that's good to know! I didn't know about the add_detected_language

But is that the intended behavior?

I mean... I get that it detected a word that might mean a date in spanish, but the sentence is not written in spanish at all and clearly the word "Do" does not mean "Sunday" at this sentence in particular

@Gallaecio
Copy link
Member

dateparser cannot detect the language of the whole text and act accordingly.

@ArthurSens Do you want this ticket to remain open as a request to add language detection to dateparser?

Otherwise, since you can use languages as @augustomen suggested, I think we should close this.

@Gallaecio
Copy link
Member

Hmm… dateparser does have some kind of language detection functionality, so maybe this is not too far-fetched.

@ArthurSens
Copy link
Author

I don't have any needs, since I use this module only with some things that I do as a hobby and I don't have the necessary knowledge to help you guys with a PR

But if you think it's not troublesome, I guess it would be appreciated by the community (I'd be happy for sure 😄 )

(it's your call to close this or not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants