Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate number-parser into dateparser and price-parser #61

Open
Manish-210 opened this issue Mar 17, 2021 · 15 comments
Open

Integrate number-parser into dateparser and price-parser #61

Manish-210 opened this issue Mar 17, 2021 · 15 comments

Comments

@Manish-210
Copy link
Contributor

One of the major use after building the number-parser might be to integrate it with price-parser and dateparser as suggested by @noviluni .
There are many features that are similar in these libraries, consider:

#4 in number-parser and #1 in price-parser. In both of these, the basic idea is to return the string of numbers mixed with words to return it as a number.
Example:

parse("1 million")                                                              
>>> '1000000'

parse("2k in USD")                                                              
>>> '2000 in USD'

This feature seems to be more favourable for number-parser and then integrate it with price-parser.

There are many similar features that are related to each other in many ways, hence integrating might be a good option, but I have a few questions related to it.

  • Does this integrating of various features means (1) taking the idea used in one library and implementing it in another or (2) directly using both libraries parallelly?
    Example:
Instead of 
>>> parse_price('115 millions de dollars (estimation)') # parse_price of price parser
Price(amount=Decimal('115000000'), currency='dollars')

using both libraries in application or implementing this in backend
>>> price1 = parse('115 millions de dollars (estimation)') # parse of number parser 
>>> parse_price(price1) # parse_price of price parser
Price(amount=Decimal('115000000'), currency='dollars')
  • What are other features that can be added (especially integrating dateparser and number-parser)?

I was thinking to take this as a part of the GSoC 2021 also, would like to hear your views.

@Gallaecio
Copy link
Member

From the beginning, the goal has been for price-parser and Dateparser to make use of number-parser, in a way that is transparent to users. So, to your first question, I would go for the example 1, where users simply use price-parser, which uses number-parser underneath. When using price-parser, users do not even need to know that number-parser exists.

As for features that could be added to number-parser to benefit Dateparser:

Mind that the prior Google Summer of Code project that created the library already started with the integration, it simply did not finish: scrapinghub/dateparser#711

@arnavkapoor
Copy link
Collaborator

Hi, @Manish-210; glad for the interest in the project. As @Gallaecio has mentioned, one of the objectives was to use number-parser in both price and date parser. The primary bottleneck in merging is not the final integration of the number-parser with either price-parser / date-parser (This could be just a single call to the number-parser library like this scrapinghub/dateparser#711 ).
The more pertinent issue is that it becomes a dependency to two libraries with a large user base. Ideally, it should be a bit more mature (support more languages, more features/edge cases), better tests, and higher coverage.

Feel free to tag me anytime you need help with the codebase/ideas will try my best to respond. Keeping the spirit of open-source and open-discussion let's talk here instead of direct message.

@Manish-210
Copy link
Contributor Author

Thanks, @Gallaecio and @arnavkapoor
I have looked at scrapinghub/dateparser#711 . I think I need to familiarize myself more with the code of dateparser and number-parser before I ask more related queries.

Just a general question for integrating dateparser and number-parser should I have to commit in scrapinghub/dateparser#711 or I have to start with a new PR.

and there does not exist any pull request for the integration of number-parser and price-parser, right ?

@Gallaecio
Copy link
Member

should I have to commit in scrapinghub/dateparser#711 or I have to start with a new PR.

That pull request corresponds to the number_parser_incorporation branch of @arnavkapoor’s fork. To work on top of his changes, you could add his fork as a Git remote to your local clone of the Dateparser repository, create a new branch starting from it, push that new branch to your own fork, and create a new pull request from there.

there does not exist any pull request for the integration of number-parser and price-parser, right ?

None exists, no. The closest things is scrapinghub/price-parser#11, but we should probably close that in favor of an approach that uses number-parser.

@Manish-210
Copy link
Contributor Author

Hello, @Gallaecio
I have prepared a draft of my proposal on Google Docs:
https://docs.google.com/document/d/16TLsg3NUljkC-FWJcpO3ZJbP68WoxDjBvP9mQWQfsZg/edit?usp=sharing

I'd be very happy about feedback.

@Gallaecio
Copy link
Member

Gallaecio commented Apr 5, 2021

@Manish-210 It looks like a great start.

A few thoughts:

  • One thing I would like to see as part of the proposal, though, is a more detailed action plan that shows that you are already familiar enough with the Dateparser and price-parser code base.

    At the moment, the goal description seem very high-level, and if you don’t already have an idea of how you plan to implement the changes, the time estimates may be overly optimistic. Mind that it’s not just you writing code, documentation and tests, is us reviewing them and you handling that feedback. If you figure out where to make the code changes and what changes to make (on an API level, without actual code) as part of the proposal, we can discuss that now.

    For example, where in the Dateparser codebase do you plan to add code that calls number-parser? Do you expect it to be a straightforward change, like adding a line somewhere, or do you expect the need to refactor one or more parsers within Dateparser to accommodate this change? What will be the inputs and outputs between the libraries at those points?

  • In the 2nd half of your 2nd goal you have:

    merge all the new features of number parser into dateparser

    Which new features? Those you are going to implement in the 3rd goal? I think it would be best to be more explicit about which features you plan to implement at each stage of the timeline.

  • Week 9 is about a stretch goal. I believe stretch goals are meant to make sure you always have something you can do even if you finish early your planned work. It’s good to have stretch goals, but I don’t think they should be part of the timeline. If you expect to be done with everything else by week 9, this should be a regular goal, not a stretch goal. But maybe you could instead add an extra week to one of the prior goals.

    Similar for week 10. Adding an extra week for one of the goals instead may be safer.

    At the moment you are basically saying you will be done by week 8. That’s great if it happens. But imagine the first or second goals take longer than you expected. Then your progress would look bad by the time you reach the first evaluation, even if you were still on track to finish by week 10.

@AmPhIbIaN26
Copy link

AmPhIbIaN26 commented Apr 7, 2021

@Gallaecio @Manish-210 Is this issue solved or can i work on it??

@Gallaecio
Copy link
Member

@AmPhIbIaN26 If you mean to participate in Google Summer of Code 2021 as a student and work on this, feel free to work on a proposal. See https://gsoc2021.zyte.com/participate for details.

If you mean as a regular contributor, since this is a big endeavor and already a target for GSoC proposals, it would be best not to work on this now, and let GSoC candidate students work on their proposals.

@AmPhIbIaN26
Copy link

I am thinking of writing a proposal on it for Google Summer of Code 2021, thanks I'll look it up

@AmPhIbIaN26
Copy link

AmPhIbIaN26 commented Apr 8, 2021

@Gallaecio I wanted to ask, do I have to make number parser a part of both price and date parser?

@Gallaecio
Copy link
Member

@AmPhIbIaN26 No, you don’t. You could focus on one of them, or you could only work on number-parser itself if you came come up with a set of features to implement that fits a GSoC project timeline.

@AmPhIbIaN26
Copy link

AmPhIbIaN26 commented Apr 8, 2021

@Gallaecio I looked up this issue would you recommend this for my proposal for GSoC 2021?

@Gallaecio
Copy link
Member

@AmPhIbIaN26 If there are enough presently-used numerical system to fit the length of a GSoC project, yes, I think that would be a good choice.

@AmPhIbIaN26
Copy link

Thanks I'll look it up.

@Manish-210
Copy link
Contributor Author

Thanks for the great feedback @Gallaecio
I have come up with the following points according to your feedback,

for bullet three: week 9 stretch goal and week 10

I agree with your phrase, so I will eradicate these extra stretch goals and devote weeks 9 and 10 as extras weeks for integration.

for bullet two: merge all the new features of number parser into dateparser

Firstly, sorry for that absurd language. I was actually saying like there might be the requirement of supporting new languages for dateparser or any other feature that might be necessary while integrating. Also, if there are not any extra requirements then I was thinking to work on #848 or #239 (mentioned previously). I cannot think of implementing it in code because we have to first agree for the interface .

for bullet one: a more detailed action plan that shows that you are already familiar enough with the Dateparser and price-parser code base.

  • for the price parser, I was thinking to use number parser in fromstring to call the parse function of the number parser
    price = parse(price)
    Also adding the test data for this.

  • for the dateparser, here what mike suggested was great.

for the implementation of other features, I was thinking to rely on the issues generated while integrating. (point out if there is any misunderstanding ).

Sorry, if I missed any of your points. let me know any other points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants