New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better bruteforce parsing for units #3066
Better bruteforce parsing for units #3066
Conversation
Also checks the case when a food might have been split in a unit + ingredient
This looks great, once I have time I will definitely be taking a look |
Code looks good, I would like to see some test cases that cover the work here. |
Thanks! Will try to see if I can set up some tests. |
I pushed some changes to parametrize the tests so they're a lot easier to debug (each parse is now its own test), and fixed the issue with missing data by adding the ingredient data parameter. That fixture doesn't persist between tests (that way ingredients can safely be modified without leaking changes to other tests) Some tests are still failing, but hopefully this helps make debugging easier |
Now they're passing |
Some more tests added and they pass now. Some notes:
I decided to keep all existing functionality for this PR (and only enhance at the end of the steps). But it is definitely an idea to move around some of the "steps", which would increase matching even more. |
Looks good, thanks for all your work on this! |
What type of PR is this?
What this PR does / why we need it:
consists of two changes:
piece
andlemon
in database)snuif
being a unit andzout
a food)slice of
andlemon
in database) => Will not work at this point (unit made of multiple tokens)red paprika
being food) => it will match, instead of sayingred
is the unit andpaprika
the foodWhich issue(s) this PR fixes:
I did not check issues to see if these specific cases were mentioned anywhere.
Special notes for your reviewer:
The code is "pasted after" on existing functionality, which is probably not the nicest way. It does make it so everything works how it used to, but there is just an extra step. So nothing should get broken, only get better.
It now uses the database to check if a unit exists in a string. This could be slow, but as this is limited to three checks per ingredient string and Units are normally limited in size, this is probably fine.
The parser code isn't really build to use the database during parsing. I took the most straightforward approach to allow this, without needing to refactor things.
This is a small change but doubles the amount of automatic matching for some of my recipes. These issues might not happen as much with English recipes.
In a way, it might make more sense that, if a unit and food is matched, but no amount, to default the amount to "1" instead of "0". As if no amount is specified most often at least one is meant (e.g. it might say "a couple of ... ", where having "1" as an amount still makes more sense than 0).
I did not include this here, because this might be language dependent and would change a feature instead of just enhance it.
Testing
put in some units and foods and try combination in the sense of (e.g. unit string being
U1
and foods strings beingF1
U1 F1
)