Some improvements to the Cocktail Party scraper #273
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a bit of a successor to #270. This commit adds four improvements to the Cocktail Party scraper:
First, some Cocktail Party recipes use units that the recipe-utils parser doesn't understand. For example, the
Manhattan Bianco uses a "piece." Such ingredients would've simply been presented to the user without the unit and the user had to fill that in themselves. Now the code will fall back to whatever the parser didn't parse, which is a fairly good default for Cocktail Party.
Next, the Cocktail Party website uses "parts" for lots of ingredients, but they actually mean fluid ounces (i.e. the same recipes in their mobile app show up with fluid ounces instead of parts). Thus the scraper now maps parts to fluid ounces.
Next, the scraper now reads the links in the "post info" part of the page as tags. The links usually provide categories or names of the cocktail's creator, so this works out nicely.
Finally, I've fixed an oversight introduced in #270. The code for parsing the cocktail's description goes through all the paragraphs and then joins them up to form proper Markdown paragraphs. However, those paragraphs were then squashed together within the toArray() function in the clean up process. That's obviously undesirable. So now the paragraphs are cleaned up before they're joined together, which produces nice Markdown with multiple paragraphs.