Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better bruteforce parsing for units #3066

Merged

Conversation

RealFoxie
Copy link
Contributor

@RealFoxie RealFoxie commented Jan 28, 2024

What type of PR is this?

  • improvement

What this PR does / why we need it:

consists of two changes:

  • in the Bruteforce parser, add a final step (before defaulting to all tokens being a food) that checks if it finds any matching units. If it does, use that token as a unit and use the rest after it for food parsing.
    • Fixes cases where an amount is not provided or the amount is not an integer
      • a piece lemon (with piece and lemon in database)
      • een snuif zout (snuif being a unit and zout a food)
      • a slice of lemon (with slice of and lemon in database) => Will not work at this point (unit made of multiple tokens)
    • This is an extra step, so should not change any previous feature, just improve it (by doing something extra to try and parse it)
  • In both parsers (could be moved to only check for Bruteforce though), if both the unit and the food is not found in the database, also check if the combination string is found instead. This is useful if the string parser split a food in a unit and food incorrectly.
    • Again, this only enhances what can happen. If the unit+food string is found as a food, using that as the food is always the best option.
      • 1 red paprika (red paprika being food) => it will match, instead of saying red is the unit and paprika the food

Which issue(s) this PR fixes:

I did not check issues to see if these specific cases were mentioned anywhere.

Special notes for your reviewer:

The code is "pasted after" on existing functionality, which is probably not the nicest way. It does make it so everything works how it used to, but there is just an extra step. So nothing should get broken, only get better.

It now uses the database to check if a unit exists in a string. This could be slow, but as this is limited to three checks per ingredient string and Units are normally limited in size, this is probably fine.

The parser code isn't really build to use the database during parsing. I took the most straightforward approach to allow this, without needing to refactor things.

This is a small change but doubles the amount of automatic matching for some of my recipes. These issues might not happen as much with English recipes.

In a way, it might make more sense that, if a unit and food is matched, but no amount, to default the amount to "1" instead of "0". As if no amount is specified most often at least one is meant (e.g. it might say "a couple of ... ", where having "1" as an amount still makes more sense than 0).
I did not include this here, because this might be language dependent and would change a feature instead of just enhance it.

Testing

put in some units and foods and try combination in the sense of (e.g. unit string being U1 and foods strings being F1 U1 F1)

  • a U1 F1
  • F1
  • U1 F1
  • a big U1 F1
  • a U1 U1 F1

@michael-genson
Copy link
Collaborator

This looks great, once I have time I will definitely be taking a look

@hay-kot hay-kot marked this pull request as draft February 2, 2024 23:54
@hay-kot
Copy link
Collaborator

hay-kot commented Feb 2, 2024

Code looks good, I would like to see some test cases that cover the work here.

@RealFoxie
Copy link
Contributor Author

Thanks! Will try to see if I can set up some tests.

@michael-genson
Copy link
Collaborator

I pushed some changes to parametrize the tests so they're a lot easier to debug (each parse is now its own test), and fixed the issue with missing data by adding the ingredient data parameter. That fixture doesn't persist between tests (that way ingredients can safely be modified without leaking changes to other tests)

Some tests are still failing, but hopefully this helps make debugging easier

@michael-genson
Copy link
Collaborator

Now they're passing

@RealFoxie
Copy link
Contributor Author

Some more tests added and they pass now. Some notes:

  • units with a space (so existing of multiple tokens) won't work
  • "1 big stalk bell pepper" will not work, as it will see "big" as the unit. This is because the original code does this.

I decided to keep all existing functionality for this PR (and only enhance at the end of the steps). But it is definitely an idea to move around some of the "steps", which would increase matching even more.

@RealFoxie RealFoxie marked this pull request as ready for review February 5, 2024 09:31
@michael-genson
Copy link
Collaborator

Looks good, thanks for all your work on this!

@michael-genson michael-genson merged commit e686fa6 into mealie-recipes:mealie-next Feb 7, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants