Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for weightwatchers #646

Closed
2 of 3 tasks
bickerdyke opened this issue Oct 16, 2022 · 5 comments · Fixed by #657
Closed
2 of 3 tasks

add support for weightwatchers #646

bickerdyke opened this issue Oct 16, 2022 · 5 comments · Fixed by #657

Comments

@bickerdyke
Copy link
Contributor

Planning meals is an important part of most nutrition programs. But the meal plan feature offered by the WW-app is anything but easy to use. My idea was to use Tandoor Recipes meal plan feature, but I would need a way to import recipes from weightwatchers to Tandoor.

As non of the standard scrapers work, I'd like to see a new weightwatcher scraper in recipe-scrapers.

I know that recipe-scrapers can't download the recipes from the login protected website, but as Tandoor recipes accepts a websites Sourcecode and feeds that to recipe-scraper, this should work with only a slight detour of saving the recipe page first.

To help us out, please check that recipes published on the website you're requesting are public (we can't currently scrape recipes that require an account login) and add sample recipe URL(s) below:

Can you write Python and would you like to help add the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

  • I'd like to try adding this scraper myself
  • I'd like guidance to help me develop a scraper
  • I'd prefer if the recipe-scrapers community try to add this

This is my first brush with python and contributing on github, so help would be appreciated, but I managed to write something that works with local tests. I'll try to add it as a pull request

@gloriousDan
Copy link
Contributor

To import the recipes into tandoor you can also use the bookmarklet. Once you pull it from the import page into your bookmark bar, it can be used on any login protected site. It extracts the site's source code and sends it to recipe-scrapers.

@bickerdyke
Copy link
Contributor Author

That's why I hope it will be useful as a parser for use cases like this, even if it can't be used to actually scrape a website by url without login (yet).

Short progress report: The scraper for the saved page is working and passing the tests. The public pages have quite a different structure (and different url), so my idea was to add their own scraper derived from the first one. Would that match the overall project structure? And am I keeping my code in my own local repo until I'm ready to compile a PR? Or is it best practice to have a fork in my github and pull from there?

@jayaddison
Copy link
Collaborator

@bickerdyke hey - thanks for looking into this. If I follow correctly, you're saying that you're planning to use two different classes to handle different formats for the same website - if so, yep, that sounds like a good approach

The non-public scraping is a bit more of a challenge. I don't think we'd want to mislead users / computers into thinking that a domain (like cmx.*) can be scraped by this library if it's going to cause errors when they try.

However, I do understand that for Tandoor it may make sense to provide the ability to parse HTML that someone has privately collected using their own account for a website - I don't see a problem there.

I'm not sure that the library exposes an interface to handle this at the moment. Basically it's: 'scraper is supported including content retrieval from URL' vs 'scraper is supported for retrieval from raw HTML' -- and in some cases only the latter is possible (the former implies the latter).

Other maintainers/community folks may have suggestions - I'll continue to think about it too.

Generally GitHub pull requests are the preferred way to offer contributions here, yep - when you're ready, that'll involve creating a fork, pushing your code to a branch, and then opening the PR.

@bickerdyke
Copy link
Contributor Author

bickerdyke commented Oct 19, 2022

pull request added.

I now added a scraper for the (few) public recipes available on weightwatchers.de. That scraper should be able to work online like any other scraper. But it is based on a scraper for the recipes requiring a log in. This has been tested with saved html files.

With an online-scraper available, it shouldn't be misleading for users as this addition is indeed able to scrape the public facing recipes from that provider. But I would need help to exclude the private scraper from automated online tests.

@jayaddison
Copy link
Collaborator

With an online-scraper available, it shouldn't be misleading for users as this addition is indeed able to scrape the public facing recipes from that provider. But I would need help to exclude the private scraper from automated online tests.

It looks like I caused some breakage with the online tests a while ago -- and haven't really supported them well -- so they're likely to be removed (hopefully temporarily) fairly soon.

When they're reintroduced, it should be with a more thorough design that includes some level of continuous integration testing.

A boundary that I'm wary of is that I don't think we should include or provide code that logs in on behalf of the user during network-enabled (non-HTML) scraping.

Even though the private scraper in #657 doesn't do that, it does have me a little concerned because there's a possibility for confusion in future (like: someone sees that we have some code to scrape a private site, and misinterprets that to mean that we allow it).

Would there be a way to ensure that the scrapers are all tested against recent, public HTTP interactions as a kind of integrity and safety check (i.e.: this scrape does not do anything that anyone's start-from-scratch, unconfigured web browser would and could)? It makes me think of prior discussion in #321.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants