Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging level won't working #2149

Closed
rustjson opened this issue Jul 29, 2016 · 12 comments · Fixed by #3960
Closed

logging level won't working #2149

rustjson opened this issue Jul 29, 2016 · 12 comments · Fixed by #3960
Labels

Comments

@rustjson
Copy link

@rustjson rustjson commented Jul 29, 2016

Using the code in the doc:

 logging.basicConfig(
     filename='/home/jason/scrapyd/logs/scrapy.log',
     format='%(levelname)s: %(message)s',
     level=logging.ERROR
 )

But in the file scrapy.log, I can still see INFO DEBUG etc

ERROR: this is an error
INFO: please
ERROR: this is an error
DEBUG: Scraped from <200 http://www.xxxx.com>

I did not specify any log level in my settings.py, Which part could be wrong?
Thanks

@rustjson
Copy link
Author

@rustjson rustjson commented Jul 29, 2016

Version:
Scrapy 1.1.0

@redapple
Copy link
Contributor

@redapple redapple commented Sep 15, 2016

@RustJason , how are you running your spider?

@rustjson
Copy link
Author

@rustjson rustjson commented Sep 18, 2016

scrapyd and scrapy crawl

@dan-blanchard
Copy link

@dan-blanchard dan-blanchard commented Jan 11, 2017

We're running into the same problem using scrapyd with scrapy crawl.

We see loads of junk like:

2017-01-11T18:41:41+0000 [Launcher,12547/stderr] DEBUG:urllib3.util.retry:Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)

@redapple
Copy link
Contributor

@redapple redapple commented Jan 11, 2017

Hi @dan-blanchard ,

using scrapyd with scrapy crawl

I don't understand.

Also, Scrapy does not use urllib3 as far as I know, nor does scrapyd.
Are you using python-requests? or some other library to send data over the internet? some scrapy plugin?

@dan-blanchard
Copy link

@dan-blanchard dan-blanchard commented Jan 11, 2017

Sorry, mistyped a couple things there. We're using scrapyd which launches a bunch of python -m scrapyd.runner crawl processes (not scrapy crawl).

I know the logging messages are not from parts of scrapy, but the root handler that's being created must have the logging level set to DEBUG because every library we rely on anywhere is logging its debug logging all over the place. I did not see a separate scrapyd-specific logging level setting when I looked, so it seems like there's something strange happening here.

@redapple
Copy link
Contributor

@redapple redapple commented Jan 13, 2017

@dan-blanchard ,
to be sure I'm getting this: are you saying that

  • you have configured LOG_LEVEL to something higher than DEBUG in scrapy settings
  • a non-scrapyd scrapy crawl somespider does not print DEBUGs but respects the LOG_LEVEL in settings
  • when running that same spider on scrapyd, you get unexpected DEBUG messages
    ?
    (sorry if that's not it)

Can you detail how you're setting logging levels for your crawls?

I'm a total noob regarding Python logging so I don't think I can help fix this in reasonable time, but if you can build a reproducible example of this behavior, someone could jump in to help.

@dan-blanchard
Copy link

@dan-blanchard dan-blanchard commented Jan 13, 2017

@redapple I apologize for the confusion, this was entirely our fault. What happened was that we're running scrapyd via supervisord and we had logging enable via supervisord that I was not aware of.

@redapple
Copy link
Contributor

@redapple redapple commented Jan 13, 2017

Thanks for the heads up @dan-blanchard !

@Jack-Kingdom
Copy link

@Jack-Kingdom Jack-Kingdom commented May 19, 2017

level setting in logging.basicConfig canot work correctly. No matter what you set, spider will always write dubug level log to file.
Anyone has method to solve this ?

@redapple
Copy link
Contributor

@redapple redapple commented May 19, 2017

@Jack-Kingdom , how are you running your spider? a standalone script or scrapy crawl spidername?

With scrapy crawl, you shouldn't need to change the logging config if LOG_LEVEL, LOG_FORMAT, LOG_FILE... settings work for you.

If you're using a standalone script, are you using CrawlerProcess or CrawlerRunner?

CrawlerProcess calls configure_logging at init time.

With CrawlerRunner (which is the only way -- I think -- to properly configure your own logging), I'm able to set the log level with this:

# -*- coding: utf-8 -*-
import logging

from twisted.internet import reactor

import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging


class ExampleSpider(scrapy.Spider):
    name = 'example'
    allowed_domains = ['example.com']
    start_urls = ['http://example.com/']

    def parse(self, response):
        self.logger.debug('... inside parse(%r) ...' % response)
        self.logger.error('some FAKE error!')
        yield {'url': response.url}


# commenting this next line from the docs example,
# otherwise scrapy.utils.log.DEFAULT_LOGGING is used, with scrapy level to DEBUG
#configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    format='%(levelname)s: %(message)s',
    level=logging.ERROR
)

runner = CrawlerRunner()

d = runner.crawl(ExampleSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run() # the script will block here until the crawling is finished

I believe the example in the docs is wrong about calling configure_logging followed by logging.basicConfig. I had to comment configure_logging() to make it use the log level I set in basic config.

@thernstig
Copy link
Contributor

@thernstig thernstig commented Mar 10, 2018

I am by no means a logging export, but I recently read up on logging and believe @redapple is correct.

What configure_logging() does is the following:

dictConfig(DEFAULT_LOGGING)

Where DEFAULT_LOGGING is:

DEFAULT_LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'loggers': {
        'scrapy': {
            'level': 'DEBUG',
        },
        'twisted': {
            'level': 'ERROR',
        },
    }
}

basicConfig() automatically adds a root logger. So even if you have set configure_logging(install_root_handler=False) the root logger will be enabled again when you do the basicConfig().

basicConfig() adds a handler to the root logger. But the handler default level is NOTSET, which means it processes all logs (more precisely LogRecords).

All loggers (such as scrapy.core.scraper or even just scrapy) are child loggers of root. What this means is that when scrapy logs something, it will send it up to the root logger since the default level of the child logger 'scrapy' is set to DEBUG. The root logger's handler then has the level of NOTSET, so it will process all logs and emit them in the logs. You can read more about how this works in this flow: https://docs.python.org/3/howto/logging.html#logging-flow

Using both thus does not make sense. As such, one should preferably either call configure_logging or basicConfig().

The answer is thus that the documentation for scrapy needs to be improved here:
https://doc.scrapy.org/en/latest/topics/logging.html#module-scrapy.utils.log

Note that fixing this would also close #2352 and #3146.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants