Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogFormatter: Add the ability to skip log messages #3987

Conversation

@krisztian-toth
Copy link
Contributor

@krisztian-toth krisztian-toth commented Aug 29, 2019

Resolves #3984

@codecov
Copy link

@codecov codecov bot commented Aug 29, 2019

Codecov Report

Merging #3987 into master will increase coverage by 0.11%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3987      +/-   ##
==========================================
+ Coverage   85.39%   85.51%   +0.11%     
==========================================
  Files         167      167              
  Lines        9724     9735      +11     
  Branches     1456     1459       +3     
==========================================
+ Hits         8304     8325      +21     
+ Misses       1162     1153       -9     
+ Partials      258      257       -1
Impacted Files Coverage Δ
scrapy/logformatter.py 90.47% <ø> (ø) ⬆️
scrapy/core/scraper.py 89.33% <100%> (+2.84%) ⬆️
scrapy/core/engine.py 92.79% <100%> (+0.03%) ⬆️
scrapy/http/request/json_request.py 94.11% <0%> (+0.36%) ⬆️
scrapy/contracts/default.py 88% <0%> (+1.63%) ⬆️
scrapy/contracts/__init__.py 83.73% <0%> (+2.43%) ⬆️
scrapy/extensions/corestats.py 100% <0%> (+10%) ⬆️

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Aug 29, 2019

Thanks for the PR, LGTM 👍

The change is easy to understand, however could you add some tests? I think a few check/check_present calls should be enough, like in https://github.com/scrapy/scrapy/blob/1.7.3/tests/test_utils_log.py

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Aug 29, 2019

Although the source issue explicitly asks for a setting, I think this should be done by extending the LogFormatter class, which users can subclass to customize core messages, to allow skipping messages.

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Aug 29, 2019

Hi @Gallaecio,
Thanks for the input. How would you skip a message by subclassing LogFormatter? As of now it's only responsible for the format of the message and setting the level, I don't see how it could be used to entirely skip it. Would you rather have it do the actual logging as well?

Edit: wording

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Aug 29, 2019

It’s currently not possible, and that’s what I mean.

I think this pull request should make it possible, instead of adding a setting.

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Aug 29, 2019

It’s currently not possible, and that’s what I mean.

Thanks, I understood what you meant. My question was rather how would you go about it?
The LogFormatter class could log the message instead of the calling scrapy.core.scraper.Scraper object, but then whoever defined a LOG_FORMATTER entry in their settings file with a different class than scrapy.logformatter.LogFormatter would have unexpected behaviour. Also the name LogFormatter wouldn't fit anymore.
We could wrap the LogFormatter in another class (e.g. LogProvider) which actually does the logging and define it the same way as the LogFormatter, make it configurable via the settings file. I guess this is viable, though it might be a big change for such a small issue.

What do you think?

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Aug 29, 2019

This is interesting. I must admit I wasn't aware of the LogFormatter class and LOG_FORMATTER setting, since they're not in the latest docs version yet.

Seems like while you cannot currently skip the log entry, you could return a shorter message. Something like:

from scrapy.logformatter import LogFormatter

class IgnoreScrapedFormatter(LogFormatter):
    def scraped(self, item, response, spider):
        return {
            'level': logging.INFO,
            'msg': 'Scraped',
            'args': {},
        }

And then set LOG_FORMATTER to your custom formatter.

Update: perhaps some modifications could be done to skip messages from the Scraper class, I see (at least) two alternatives:

  • skip the entry if the LogFormatter method returns None instead of a dictionary
  • skip the entry if the LogFormatter method raises a custom exception (it could be called SkipLogEntry or something similar)

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Aug 29, 2019

I second the proposed solutions, I could go with both but I like the latter more.
Wouldn't it be an idea to implement the IgnoreScrapedFormatter as well in scrapy/logformatter.py?

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Aug 29, 2019

(nothing to see here, already addressed by @elacuesta)

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Aug 30, 2019

@elacuesta can you please advise in which test file I could cover this PR with tests? Couldn't find one which would already invoke the methods with the recent changes.

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Aug 30, 2019

@elacuesta elacuesta changed the title Add LOG_SCRAPED_ENABLED setting to control scraped logging (#3984) LogFormatter: Add the ability to skip log messages Aug 30, 2019
@elacuesta
Copy link
Member

@elacuesta elacuesta commented Aug 30, 2019

Updated the PR name, since it's no longer about adding a setting.

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Sep 2, 2019

I'm not sure, these tests are only related to the formatting of the log message, not for the actual logging. The changes I made have no effect on the default LogFormatter implementation.

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Sep 2, 2019

That's right, the new tests might not be like the existing ones that check the output format, however I think tests/test_logformatter.py is the right file to put them. I would use testfixtures.LogCapture as I mentioned in #3987 (comment).

@krisztian-toth
Copy link
Contributor Author

@krisztian-toth krisztian-toth commented Sep 3, 2019

Thanks @elacuesta for the advice. I looked into it but unfortunately I am not familiar with testing methods which return deferred results and I am also unsure about on which level I should write the tests. The functions to test are called in ExecutionEngine and in Scraper classes, but setting these classes up for test would be really tedious especially since I am unfamiliar with the codebase as well. Any help would be appreciated, otherwise I don't think I'll be able to provide these tests.

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Sep 3, 2019

@watsta That's fine, don't worry. I can write some tests, I'll see if I can do it later today.

Copy link
Member

@elacuesta elacuesta left a comment

I approve, but I feel it's kind of cheating given c568859 😄
@Gallaecio Would you mind taking a look?

@Gallaecio Gallaecio merged commit 0b52fa6 into scrapy:master Sep 16, 2019
2 checks passed
@elacuesta
Copy link
Member

@elacuesta elacuesta commented Sep 16, 2019

Many thanks @watsta for the contribution and @Gallaecio for the review and merge 🚀

@krisztian-toth krisztian-toth deleted the feature/3984-add-log-scraped-enabled-setting branch Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants