Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements to the Cocktail Party scraper #273

Merged

Conversation

zdenek-biberle
Copy link
Contributor

This is a bit of a successor to #270. This commit adds four improvements to the Cocktail Party scraper:

First, some Cocktail Party recipes use units that the recipe-utils parser doesn't understand. For example, the
Manhattan Bianco uses a "piece." Such ingredients would've simply been presented to the user without the unit and the user had to fill that in themselves. Now the code will fall back to whatever the parser didn't parse, which is a fairly good default for Cocktail Party.

Next, the Cocktail Party website uses "parts" for lots of ingredients, but they actually mean fluid ounces (i.e. the same recipes in their mobile app show up with fluid ounces instead of parts). Thus the scraper now maps parts to fluid ounces.

Next, the scraper now reads the links in the "post info" part of the page as tags. The links usually provide categories or names of the cocktail's creator, so this works out nicely.

Finally, I've fixed an oversight introduced in #270. The code for parsing the cocktail's description goes through all the paragraphs and then joins them up to form proper Markdown paragraphs. However, those paragraphs were then squashed together within the toArray() function in the clean up process. That's obviously undesirable. So now the paragraphs are cleaned up before they're joined together, which produces nice Markdown with multiple paragraphs.

This is a bit of a successor to karlomikus#270. This commit adds four improvements
to the Cocktail Party scraper:

First, some Cocktail Party recipes use units that the recipe-utils
parser doesn't understand. For example, the
[Manhattan Bianco](https://cocktailpartyapp.com/drinks/manhattan-bianco/)
uses a "piece." Such ingredients would've simply been presented to the
user without the unit and the user had to fill that in themselves. Now
the code will fall back to whatever the parser didn't parse, which is a
fairly good default for Cocktail Party.

Next, the Cocktail Party website uses "parts" for lots of ingredients,
but they actually mean fluid ounces (i.e. the same recipes in their
mobile app show up with fluid ounces instead of parts). Thus the scraper
now maps parts to fluid ounces.

Next, the scraper now reads the links in the "post info" part of the
page as tags. The links usually provide categories or names of the
cocktail's creator, so this works out nicely.

Finally, I've fixed an oversight introduced in karlomikus#270. The code for
parsing the cocktail's description goes through all the paragraphs and
then joins them up to form proper Markdown paragraphs. However, those
paragraphs were then squashed together within the toArray() function in
the clean up process. That's obviously undesirable. So now the
paragraphs are cleaned up before they're joined together, which produces
nice Markdown with multiple paragraphs.
@karlomikus
Copy link
Owner

Nice work, thanks!

@karlomikus karlomikus merged commit ada1c28 into karlomikus:develop Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants