add support for weightwatchers #646

bickerdyke · 2022-10-16T23:05:10Z

Planning meals is an important part of most nutrition programs. But the meal plan feature offered by the WW-app is anything but easy to use. My idea was to use Tandoor Recipes meal plan feature, but I would need a way to import recipes from weightwatchers to Tandoor.

As non of the standard scrapers work, I'd like to see a new weightwatcher scraper in recipe-scrapers.

I know that recipe-scrapers can't download the recipes from the login protected website, but as Tandoor recipes accepts a websites Sourcecode and feeds that to recipe-scraper, this should work with only a slight detour of saving the recipe page first.

To help us out, please check that recipes published on the website you're requesting are public (we can't currently scrape recipes that require an account login) and add sample recipe URL(s) below:

https://www.weightwatchers.com/de/rezept/kartoffelgulasch/562a9b02873e1afb2a3c4c13 (public)
https://cmx.weightwatchers.de/details/WWRECIPE:5667ab72a29713e4335bb342 (login required; example will be included in the PR.)

Can you write Python and would you like to help add the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

I'd like to try adding this scraper myself
I'd like guidance to help me develop a scraper
I'd prefer if the recipe-scrapers community try to add this

This is my first brush with python and contributing on github, so help would be appreciated, but I managed to write something that works with local tests. I'll try to add it as a pull request

The text was updated successfully, but these errors were encountered:

gloriousDan · 2022-10-17T06:21:38Z

To import the recipes into tandoor you can also use the bookmarklet. Once you pull it from the import page into your bookmark bar, it can be used on any login protected site. It extracts the site's source code and sends it to recipe-scrapers.

bickerdyke · 2022-10-18T10:14:04Z

That's why I hope it will be useful as a parser for use cases like this, even if it can't be used to actually scrape a website by url without login (yet).

Short progress report: The scraper for the saved page is working and passing the tests. The public pages have quite a different structure (and different url), so my idea was to add their own scraper derived from the first one. Would that match the overall project structure? And am I keeping my code in my own local repo until I'm ready to compile a PR? Or is it best practice to have a fork in my github and pull from there?

jayaddison · 2022-10-18T10:29:00Z

@bickerdyke hey - thanks for looking into this. If I follow correctly, you're saying that you're planning to use two different classes to handle different formats for the same website - if so, yep, that sounds like a good approach

The non-public scraping is a bit more of a challenge. I don't think we'd want to mislead users / computers into thinking that a domain (like cmx.*) can be scraped by this library if it's going to cause errors when they try.

However, I do understand that for Tandoor it may make sense to provide the ability to parse HTML that someone has privately collected using their own account for a website - I don't see a problem there.

I'm not sure that the library exposes an interface to handle this at the moment. Basically it's: 'scraper is supported including content retrieval from URL' vs 'scraper is supported for retrieval from raw HTML' -- and in some cases only the latter is possible (the former implies the latter).

Other maintainers/community folks may have suggestions - I'll continue to think about it too.

Generally GitHub pull requests are the preferred way to offer contributions here, yep - when you're ready, that'll involve creating a fork, pushing your code to a branch, and then opening the PR.

bickerdyke · 2022-10-19T23:11:45Z

pull request added.

I now added a scraper for the (few) public recipes available on weightwatchers.de. That scraper should be able to work online like any other scraper. But it is based on a scraper for the recipes requiring a log in. This has been tested with saved html files.

With an online-scraper available, it shouldn't be misleading for users as this addition is indeed able to scrape the public facing recipes from that provider. But I would need help to exclude the private scraper from automated online tests.

jayaddison · 2022-10-20T09:49:08Z

With an online-scraper available, it shouldn't be misleading for users as this addition is indeed able to scrape the public facing recipes from that provider. But I would need help to exclude the private scraper from automated online tests.

It looks like I caused some breakage with the online tests a while ago -- and haven't really supported them well -- so they're likely to be removed (hopefully temporarily) fairly soon.

When they're reintroduced, it should be with a more thorough design that includes some level of continuous integration testing.

A boundary that I'm wary of is that I don't think we should include or provide code that logs in on behalf of the user during network-enabled (non-HTML) scraping.

Even though the private scraper in #657 doesn't do that, it does have me a little concerned because there's a possibility for confusion in future (like: someone sees that we have some code to scrape a private site, and misinterprets that to mean that we allow it).

Would there be a way to ensure that the scrapers are all tested against recent, public HTTP interactions as a kind of integrity and safety check (i.e.: this scrape does not do anything that anyone's start-from-scratch, unconfigured web browser would and could)? It makes me think of prior discussion in #321.

bickerdyke added the enhancement label Oct 16, 2022

bickerdyke mentioned this issue Oct 19, 2022

Issue 646/scraper weightwatchers #657

Merged

hhursev closed this as completed in #657 Oct 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for weightwatchers #646

add support for weightwatchers #646

bickerdyke commented Oct 16, 2022

gloriousDan commented Oct 17, 2022

bickerdyke commented Oct 18, 2022

jayaddison commented Oct 18, 2022

bickerdyke commented Oct 19, 2022 •

edited

jayaddison commented Oct 20, 2022

add support for weightwatchers #646

add support for weightwatchers #646

Comments

bickerdyke commented Oct 16, 2022

gloriousDan commented Oct 17, 2022

bickerdyke commented Oct 18, 2022

jayaddison commented Oct 18, 2022

bickerdyke commented Oct 19, 2022 • edited

jayaddison commented Oct 20, 2022

bickerdyke commented Oct 19, 2022 •

edited