New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] [logging] HttpCacheMiddleware: log cache directory at instantiation #2611
Conversation
@@ -24,6 +29,8 @@ def __init__(self, settings, stats): | |||
self.ignore_missing = settings.getbool('HTTPCACHE_IGNORE_MISSING') | |||
self.stats = stats | |||
|
|||
logger.debug("Using cache directory %(cachedir)s" % {'cachedir': self.storage.cachedir}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All built-in cache storages have 'cachedir' attribute, but it is not guaranteed for third-party cache storages. I think it could be better to either check that this attribute is present, or (better) move logging to storages. The latter will allow to log not only directory, but also a file name - for some storages cache is stored in a single file.
Codecov Report
@@ Coverage Diff @@
## master #2611 +/- ##
==========================================
+ Coverage 83.93% 84.14% +0.21%
==========================================
Files 161 161
Lines 8920 9065 +145
Branches 1316 1344 +28
==========================================
+ Hits 7487 7628 +141
- Misses 1176 1177 +1
- Partials 257 260 +3
Continue to review full report at Codecov.
|
scrapy/extensions/httpcache.py
Outdated
@@ -216,6 +220,8 @@ def __init__(self, settings): | |||
self.dbmodule = import_module(settings['HTTPCACHE_DBM_MODULE']) | |||
self.db = None | |||
|
|||
logger.debug("Using DBM cache storage in %(cachedir)s" % {'cachedir': self.cachedir}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about logging exact filename of the database used for cache, not only a folder containing these databases? See open_spider method.
@kmike I applied your suggestions. But did not manage to also log the temp file FilesystemCacheStorage. This is because |
Unlike other storages FilesystemCacheStorage uses multiple files, so it makes sense to log a folder for it. But for cache storages which use a single file I think it is better to log a path to this file. |
@kmike So if I understand you correctly I should move the log in |
…that cache to single files.
@jorenham yes, I think it makes sense. For consistency logging can be moved to open_spider method for FilesystemCacheStorage as well, so that they all log a message at the same time point. |
@kmike I've moved the logging and it now logs the entire paths when available |
Looks good to me, thanks! |
#2604