Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language: consistency: retrieve dynamically from HTML where possible. #1112

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jayaddison
Copy link
Collaborator

...and emit a Python warning to try to help us resolve cases where it isn't possible.

cc @rmdluo @mlduff because we've been discussing how to handle and emit warning messages.

@@ -48,4 +50,10 @@ def prep_time(self):
return self.schema.prep_time()

def language(self):
msg = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you set the message here differs a little from the way we did it in my PR -- I think I put it at the top as a variable so that it could be reused in other functions within the module (even though it isn't). Since these messages seem to be used somewhat often between modules, I was thinking we could define another module with a few custom warning classes or messages. This would make the warning messages more standardized. We could also use them in testing https://docs.python.org/3/library/warnings.html#testing-warnings, which might make them a little less noisy during testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmdluo I took some inspiration from your branch and went a step further, in this PR I have put it as a function in _utils. I'm currently investigating creating a custom warning category so that we could turn off those warnings during testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use catch_warnings to test that keys in the test json with a null value should raise a warning. This silences the warnings from the user side, but also tests to make sure that the warning is there. I have something implemented for this, but I'm not sure if we should make a separate pull request for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic idea is this:

if expect[key]:
    self.assertEqual(
        expect[key],
        scraper_func(),
        msg=f"The actual value for .{key}() did not match the expected value.",
    )
else:
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        scraper_func()
        assert len(w) >= 1
        assert issubclass(w[-1].category, UserWarning) # can be adjusted to accomodate other warning types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much agreed with the idea of detecting this automatically using catch_warnings in the unit tests 👍

...however, value-is-null isn't exactly the condition I had in mind to detect; it's value-is-always-the-same (it's kinda a warning that the scraper isn't doing anything smart for that particular field, or really even returning anything from the page itself - it's simply returning a fixed value that is hopefully correct -- and worth double-checking -- for all recipes read by a scraper).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for silencing the warnings, we could move catch_warnings to above the assert equal. For checking warning messages with non-null values, you might have to add more information to the test jsons to denote that the value stays the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for silencing the warnings, we could move catch_warnings to above the assert equal.

👍 yep, ok - that sounds like a good idea to me

For checking warning messages with non-null values, you might have to add more information to the test jsons to denote that the value stays the same.

I'd prefer to encourage developers to provide this information in the scraper implementations. We'll probably forget to annotate some scrapers, but maybe with some clever linting/tooling we could, in future, detect methods that always return a fixed/literal value.

@jayaddison jayaddison force-pushed the consistency/language-from-html branch from a9a7c47 to 69a85c1 Compare May 6, 2024 13:11
@jknndy
Copy link
Collaborator

jknndy commented May 6, 2024

This returns a lengthy output during a run of python -m unittest , is this the expected behavior?

(.venv) % python -m unittest
....................................../recipe_scrapers/bestrecipes.py:19: UserWarning: bestrecipes.com.au doesn't seem to provide language metadata in their HTML. Please let us know if it becomes available in a standard location, and then we can try to retrieve it dynamically.
  warnings.warn(msg)
................................................................................................................................................................................................................................./recipe_scrapers/potatorolls.py:52: UserWarning: potatorolls.com appears to write language metadata into HTML DOCTYPE declaration instead of the top-level 'html' element or elsewhere. Please let us know if it becomes available in a standard location, and then we can try to retrieve it dynamically.
  warnings.warn(msg)
........................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 559 tests in 12.732s

OK

@jayaddison
Copy link
Collaborator Author

It's kinda-expected that it causes noise in the unittest output, yep. I'd wondered about that in another pull request, #1067, when discussing options for logging warnings for statically-defined fields (fields that only ever return a single value).

I had an idea and have pushed a commit (ff411e1) to implement it: we could ignore warnings from the scraper fields by default, but allow unit tests to be run with warning filters enabled, and add failures in the latter case when warnings occur.

Two drawbacks to that, though: it only works for Python3.11+, and also it means that the warnings are entirely hidden (no hints/reminders about them at all) when no Python warning options are configured.

@jayaddison
Copy link
Collaborator Author

Defaults

$ python -m unittest -k potato 
..
----------------------------------------------------------------------
Ran 2 tests in 0.172s

OK

Warnings enabled

$ python -Walways -m unittest -k potato 
FF
======================================================================
FAIL: tests/test_data/potatorolls.com/potatorolls_1.json (tests.RecipeTestCase.tests/test_data/potatorolls.com/potatorolls_1.json) [language]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jka/Documents/reciperadar/recipe-scrapers/tests/__init__.py", line 110, in test_func
    self.fail(msg=msg)
AssertionError: .language() emitted warnings: UserWarning

======================================================================
FAIL: tests/test_data/potatorolls.com/potatorolls_2.json (tests.RecipeTestCase.tests/test_data/potatorolls.com/potatorolls_2.json) [language]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jka/Documents/reciperadar/recipe-scrapers/tests/__init__.py", line 110, in test_func
    self.fail(msg=msg)
AssertionError: .language() emitted warnings: UserWarning

----------------------------------------------------------------------
Ran 2 tests in 0.178s

FAILED (failures=2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants