Better bruteforce parsing for units #3066

RealFoxie · 2024-01-28T12:57:32Z

What type of PR is this?

improvement

What this PR does / why we need it:

consists of two changes:

in the Bruteforce parser, add a final step (before defaulting to all tokens being a food) that checks if it finds any matching units. If it does, use that token as a unit and use the rest after it for food parsing.
- Fixes cases where an amount is not provided or the amount is not an integer
  - a piece lemon (with piece and lemon in database)
  - een snuif zout (snuif being a unit and zout a food)
  - a slice of lemon (with slice of and lemon in database) => Will not work at this point (unit made of multiple tokens)
- This is an extra step, so should not change any previous feature, just improve it (by doing something extra to try and parse it)
In both parsers (could be moved to only check for Bruteforce though), if both the unit and the food is not found in the database, also check if the combination string is found instead. This is useful if the string parser split a food in a unit and food incorrectly.
- Again, this only enhances what can happen. If the unit+food string is found as a food, using that as the food is always the best option.
  - 1 red paprika (red paprika being food) => it will match, instead of saying red is the unit and paprika the food

Which issue(s) this PR fixes:

I did not check issues to see if these specific cases were mentioned anywhere.

Special notes for your reviewer:

The code is "pasted after" on existing functionality, which is probably not the nicest way. It does make it so everything works how it used to, but there is just an extra step. So nothing should get broken, only get better.

It now uses the database to check if a unit exists in a string. This could be slow, but as this is limited to three checks per ingredient string and Units are normally limited in size, this is probably fine.

The parser code isn't really build to use the database during parsing. I took the most straightforward approach to allow this, without needing to refactor things.

This is a small change but doubles the amount of automatic matching for some of my recipes. These issues might not happen as much with English recipes.

In a way, it might make more sense that, if a unit and food is matched, but no amount, to default the amount to "1" instead of "0". As if no amount is specified most often at least one is meant (e.g. it might say "a couple of ... ", where having "1" as an amount still makes more sense than 0).
I did not include this here, because this might be language dependent and would change a feature instead of just enhance it.

Testing

put in some units and foods and try combination in the sense of (e.g. unit string being U1 and foods strings being F1 U1 F1)

a U1 F1
F1
U1 F1
a big U1 F1
a U1 U1 F1

Also checks the case when a food might have been split in a unit + ingredient

michael-genson · 2024-01-30T14:33:02Z

This looks great, once I have time I will definitely be taking a look

hay-kot · 2024-02-02T23:54:54Z

Code looks good, I would like to see some test cases that cover the work here.

RealFoxie · 2024-02-03T02:29:04Z

Thanks! Will try to see if I can set up some tests.

michael-genson · 2024-02-04T21:06:05Z

I pushed some changes to parametrize the tests so they're a lot easier to debug (each parse is now its own test), and fixed the issue with missing data by adding the ingredient data parameter. That fixture doesn't persist between tests (that way ingredients can safely be modified without leaking changes to other tests)

Some tests are still failing, but hopefully this helps make debugging easier

michael-genson · 2024-02-04T21:42:33Z

Now they're passing

RealFoxie · 2024-02-04T21:50:26Z

Some more tests added and they pass now. Some notes:

units with a space (so existing of multiple tokens) won't work
"1 big stalk bell pepper" will not work, as it will see "big" as the unit. This is because the original code does this.

I decided to keep all existing functionality for this PR (and only enhance at the end of the steps). But it is definitely an idea to move around some of the "steps", which would increase matching even more.

michael-genson · 2024-02-07T15:16:15Z

Looks good, thanks for all your work on this!

RealFoxie and others added 4 commits January 28, 2024 11:59

try to match units when brute parsing and no amount is matched

4c21f31

brute parser: better handle multiple word food items

fb37e9d

Also checks the case when a food might have been split in a unit + ingredient

fix formatting

052966c

Merge branch 'mealie-next' into unit-brute-parsing

41de202

hay-kot marked this pull request as draft February 2, 2024 23:54

RealFoxie and others added 2 commits February 4, 2024 18:15

add test cases for ingredient parsing that don't start with an amount

143da57

parametrized tests and added ingredient data fixture

dfbb8fd

michael-genson added 2 commits February 4, 2024 21:37

fixed group_id ref in tests

0db12bb

fixed test inputs

07ba74d

add extra tests for units as third token

f4b8826

RealFoxie marked this pull request as ready for review February 5, 2024 09:31

Merge branch 'mealie-next' into unit-brute-parsing

823ffbf

michael-genson approved these changes Feb 7, 2024

View reviewed changes

michael-genson merged commit e686fa6 into mealie-recipes:mealie-next Feb 7, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better bruteforce parsing for units #3066

Better bruteforce parsing for units #3066

RealFoxie commented Jan 28, 2024 •

edited

michael-genson commented Jan 30, 2024

hay-kot commented Feb 2, 2024

RealFoxie commented Feb 3, 2024

michael-genson commented Feb 4, 2024

michael-genson commented Feb 4, 2024

RealFoxie commented Feb 4, 2024

michael-genson commented Feb 7, 2024

Better bruteforce parsing for units #3066

Better bruteforce parsing for units #3066

Conversation

RealFoxie commented Jan 28, 2024 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Testing

michael-genson commented Jan 30, 2024

hay-kot commented Feb 2, 2024

RealFoxie commented Feb 3, 2024

michael-genson commented Feb 4, 2024

michael-genson commented Feb 4, 2024

RealFoxie commented Feb 4, 2024

michael-genson commented Feb 7, 2024

RealFoxie commented Jan 28, 2024 •

edited