Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reopening #1025: Remove schema calls with no overrides #1065

Merged
merged 12 commits into from
May 2, 2024

Conversation

jknndy
Copy link
Collaborator

@jknndy jknndy commented Apr 17, 2024

Removed most occurrences of direct schema calls, exceptions below, and made small improvements to two scrapers .
Improvements

Waitrose.py - added dynamic author retrieval and site_name coverage
vegolosi.py - pulled new source and removed fields now covered by schema

Issues

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in #1020

With the current code setup the .py file must remain in place for the scraper to be recognized without wild_mode present. This may change in the v15 branch?

@jknndy jknndy marked this pull request as ready for review April 17, 2024 02:27
@jayaddison
Copy link
Collaborator

With the current code setup the .py file must remain in place for the scraper to be recognized without wild_mode present. This may change in the v15 branch?

Nope, this will not change with v15; each scraper will still require a .py file to exist as an indicator that the website is known/supported.

@jayaddison
Copy link
Collaborator

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in #1020

Did you manage to figure out what was going with this?

@jknndy
Copy link
Collaborator Author

jknndy commented Apr 22, 2024

Did you manage to figure out what was going with this?

Not yet, hadn't come back to it yet but i'll look into it soon

@jknndy jknndy mentioned this pull request Apr 28, 2024
@jknndy
Copy link
Collaborator Author

jknndy commented Apr 30, 2024

Did you manage to figure out what was going with this?

Bringing the branch up to date resolved whatever the issue was. There are a few bugs to get here, i'll open an issue to track but no reason to hold this up while its still caught up.

Few other changes in the most recent commit:

  1. lekkerensimpel - updated author function to grab the info elsewhere so all tests would pass rather than 1 with schema & 2 with null
  2. owenhan - updated author to return the correct value.
  3. saltpepperskillet - updated author to use .title() instead of .capitalize() simplify / align with other code.
  4. *test cases
  5. lecker - pulled new test data for _2 which increased the scraper coverage

@jayaddison jayaddison merged commit 4a60899 into hhursev:main May 2, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants