Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore parameter #5

Open
noviluni opened this issue Jun 17, 2020 · 5 comments · May be fixed by #74
Open

ignore parameter #5

noviluni opened this issue Jun 17, 2020 · 5 comments · May be fixed by #74
Assignees
Labels
enhancement New feature or request

Comments

@noviluni
Copy link
Contributor

noviluni commented Jun 17, 2020

I think it could be really cool to add an optional parameter to ignore some words.

Example:

>>> parse('twenty one')
'21'

>>> parse('twenty one', ignore=["one"])
'20 one'

or

>>> parse('I have three apples and one pear.')
'I have 3 apples and 1 pear.'

>>> parse('I have three apples and one pear.', ignore=["three"])
'I have three apples and 1 pear.'
@arnavkapoor arnavkapoor added the enhancement New feature or request label Jun 17, 2020
@arnavkapoor arnavkapoor self-assigned this Jun 17, 2020
@Gallaecio
Copy link
Member

Unless we can think of a popular use case, or this is trivial to implement, it may be better to leave something like this for last.

@arnavkapoor
Copy link
Collaborator

@Gallaecio one use case is probably words like 'two second', 'a second' which @noviluni mentioned in #6 , where we don't want to parse second as 2nd. Of course, we should handle most of these ambiguous cases within the main logic, but as we expand to more languages this option might be useful.

@noviluni
Copy link
Contributor Author

noviluni commented Jun 18, 2020

Yeah, this idea came from this example: omitting "second". It could be also useful if we start accepting some sentences like "one and a quarter" (1.25) as "quarter" it's also a verb or a noun "the first quarter".

However, it's not necessary to implement this now. We should first accept multiple languages and then see how can we add specific rules for different languages, so we don't need to develop this now, it can be postponed. 👍

@Manish-210
Copy link
Contributor

Can we store the usual words like quarter, half separately and use them whenever required.

@Gallaecio
Copy link
Member

I’m not even sure if this is a good idea to implement this. My thinking is:

  • In cases where it’s not possible for parse_number to determine that a word should not be translated into a number, but the user knows, the most flexible approach for the user to prevent words being translated into numbers is to process the string before passing it to parse_number, e.g. extracting what can be a number with a regular expression.

  • In cases where parse_number should be able to determine that a word should not be translated as a number, ideally we should aim to handle that transparently without requiring the user to provide a specific list of words to ignore, and only other parameters that may play a role into deciding which words to ignore (e.g. languages, like dateparser supports).

@dhananjaypai08 dhananjaypai08 linked a pull request Jan 6, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants