Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log with error level instead of debug when reaching max retry times #3171

Closed
ariasuni opened this issue Mar 15, 2018 · 9 comments · Fixed by #3566
Closed

Log with error level instead of debug when reaching max retry times #3171

ariasuni opened this issue Mar 15, 2018 · 9 comments · Fixed by #3566

Comments

@ariasuni
Copy link

@ariasuni ariasuni commented Mar 15, 2018

There’s no easy/not hackish way to log an error when reaching max retry times with standard scrapy RetryMiddleware. It’s useful for me to be able to see right away if a page I tried to crawl has not been downloaded.

I think it’s sensible to change this line to log to error level instead:

logger.debug("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",

@kmike
Copy link
Member

@kmike kmike commented Mar 15, 2018

You can change output settings for a particular logger, see https://doc.scrapy.org/en/latest/topics/logging.html#advanced-customization for more details.

But using error log level by default makes sense to me. If we make this change, it worths it to provid an example in release notes on how to bring old behavior.

@ariasuni
Copy link
Author

@ariasuni ariasuni commented Mar 16, 2018

If I understand correctly, I can change what appears or doesn’t, but not the level of a particular message, right?

@kmike
Copy link
Member

@kmike kmike commented Mar 16, 2018

Right, that's a good point. You can change it per-logger, and the same logger is used for both messages in retry middleware.

@VMRuiz
Copy link
Contributor

@VMRuiz VMRuiz commented Mar 23, 2018

I have also faced that problem and overriding the entire middleware just to configure that seems too overkill.

I have notice that all the middlewares uses the logger.debug by default. So, it could be changed to something like this to make it configurable:

logger = logging.getLogger(__name__)

class RetryMiddleware(object):
    def __init__(self, settings):
        log_level = (settings.get('RETRY_MIDDLEWARE_LOG_LEVEL') or settings.get('MIDDLEWARE_LOG_LEVEL', 'debug')).upper()
        assert log_level in ['error', 'warning', 'info', 'debug']
        self.logger_method = getattr(logger, log_level)

And then replace all calls to logger.debug with self.logger_method

log_level should be set to debug by default so it's keeps the consistency with the previous behaviour.

Also, it should have a general MIDDLEWARE_LOG_LEVEL setting and others like RETRY_MIDDLEWARE_LOG_LEVEL that would override the general configuration for each middleware independently.

@kmike
Copy link
Member

@kmike kmike commented Mar 23, 2018

Hey! I don't like adding new options to customize such behavior. Backwards compatible is important, but we can't fix a bad default in a 100% backwards compatible way. If we change log level to ERROR for a particular message, users can still bring back old verbosity by changing options for this logger using stdlib features. So +1 to change log level to ERROR, as soon as there is a copy-pastable example of how to make it work as before.

Also, it should have a general MIDDLEWARE_LOG_LEVEL setting and others like RETRY_MIDDLEWARE_LOG_LEVEL that would override the general configuration for each middleware independently.

Do you have a use case for MIDDLEWARE_LOG_LEVEL setting?

Also, RETRY_MIDDLEWARE_LOG_LEVEL won't work for this use case, because the middleware logs 2 messages, 1 should be ERROR, and another one should be DEBUG. If a middleware uses the same log level for all its messages, it is easy enough to change verbosity using stdlib features without overriding a middlware, Scrapy doesn't need to implement anything.

@BurnzZ
Copy link
Member

@BurnzZ BurnzZ commented Jan 4, 2019

Hi @kmike ,

I'm proposing the PR in #3566 for this at the moment since:

  • It doesn't break the default behavior of logging retry activities as DEBUG. This prevents users from being overwhelmed/surprised by the amount of logs, should the default levels change.
  • Users who need higher levels than the default DEBUG are able to customize it

We have a couple of projects that would really benefit from having at least a WARNING log level of retry activities due to the nature of the scraped website. We often inherit the RetryMiddleware so this change becomes useful when then _retry() method it called in it.

Thanks!

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Apr 17, 2019

@BurnzZ I think the desired approach is to switch the gave-up message (only that message) from debug to error.

@kmike I don’t see how people could restore the previous behavior easily without us introducing a setting to choose the log level of this specific message, which seems overkill to me.

Maybe we could extend scrapy.logformatter.LogFormatter to support this type of message. However, I am not sure if it is a good idea to add non-core messages to LogFormatter.

@BurnzZ
Copy link
Member

@BurnzZ BurnzZ commented Apr 27, 2019

Hi @Gallaecio, ooh yep my bad. I was trying to fit my project's approach when I was writing this PR. I've updated the PR for this.

Before updating it further, what might be the best and easiest way to bring the old behavior back? as per:

If we make this change, it worths it to provid an example in release notes on how to bring old behavior.

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Feb 14, 2020

That last point is covered by #4216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants