Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] [logging] HttpCacheMiddleware: log cache directory at instantiation #2611

Merged
merged 4 commits into from Mar 9, 2017

Conversation

jorenham
Copy link
Contributor

@jorenham jorenham commented Mar 2, 2017

@@ -24,6 +29,8 @@ def __init__(self, settings, stats):
self.ignore_missing = settings.getbool('HTTPCACHE_IGNORE_MISSING')
self.stats = stats

logger.debug("Using cache directory %(cachedir)s" % {'cachedir': self.storage.cachedir})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All built-in cache storages have 'cachedir' attribute, but it is not guaranteed for third-party cache storages. I think it could be better to either check that this attribute is present, or (better) move logging to storages. The latter will allow to log not only directory, but also a file name - for some storages cache is stored in a single file.

@codecov-io
Copy link

codecov-io commented Mar 3, 2017

Codecov Report

Merging #2611 into master will increase coverage by 0.21%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2611      +/-   ##
==========================================
+ Coverage   83.93%   84.14%   +0.21%     
==========================================
  Files         161      161              
  Lines        8920     9065     +145     
  Branches     1316     1344      +28     
==========================================
+ Hits         7487     7628     +141     
- Misses       1176     1177       +1     
- Partials      257      260       +3
Impacted Files Coverage Δ
scrapy/extensions/httpcache.py 95.5% <100%> (+0.06%)
scrapy/core/engine.py 92.3% <0%> (-0.07%)
scrapy/utils/url.py 100% <0%> (ø)
scrapy/settings/default_settings.py 98.63% <0%> (ø)
scrapy/linkextractors/sgml.py 96.8% <0%> (+0.06%)
scrapy/linkextractors/lxmlhtml.py 92.4% <0%> (+0.4%)
scrapy/linkextractors/init.py 100% <0%> (+1.75%)
scrapy/utils/misc.py 95% <0%> (+3.33%)
scrapy/spidermiddlewares/referer.py 93.57% <0%> (+8.95%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b49b9c...5e89db5. Read the comment docs.

@@ -216,6 +220,8 @@ def __init__(self, settings):
self.dbmodule = import_module(settings['HTTPCACHE_DBM_MODULE'])
self.db = None

logger.debug("Using DBM cache storage in %(cachedir)s" % {'cachedir': self.cachedir})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about logging exact filename of the database used for cache, not only a folder containing these databases? See open_spider method.

@jorenham
Copy link
Contributor Author

jorenham commented Mar 3, 2017

@kmike I applied your suggestions. But did not manage to also log the temp file FilesystemCacheStorage. This is because _get_request_path takes spider and request as arguments, which aren't accessible in __init__.

@kmike
Copy link
Member

kmike commented Mar 3, 2017

Unlike other storages FilesystemCacheStorage uses multiple files, so it makes sense to log a folder for it. But for cache storages which use a single file I think it is better to log a path to this file.

@jorenham
Copy link
Contributor Author

jorenham commented Mar 3, 2017

@kmike So if I understand you correctly I should move the log in DbmCacheStorage and LeveldbCacheStorage from __init__ to open_spider and include the dbpath in the message. Is that correct?

@kmike
Copy link
Member

kmike commented Mar 3, 2017

@jorenham yes, I think it makes sense. For consistency logging can be moved to open_spider method for FilesystemCacheStorage as well, so that they all log a message at the same time point.

@jorenham
Copy link
Contributor Author

jorenham commented Mar 4, 2017

@kmike I've moved the logging and it now logs the entire paths when available

@kmike kmike changed the title [logging] HttpCacheMiddleware: log cache directory at instantiation [MRG+1] [logging] HttpCacheMiddleware: log cache directory at instantiation Mar 6, 2017
@kmike
Copy link
Member

kmike commented Mar 6, 2017

Looks good to me, thanks!

@kmike kmike added this to the v1.4 milestone Mar 7, 2017
@jorenham jorenham closed this Mar 8, 2017
@jorenham jorenham deleted the httpcachemiddleware_log branch March 8, 2017 19:21
@jorenham jorenham restored the httpcachemiddleware_log branch March 8, 2017 19:24
@jorenham jorenham reopened this Mar 8, 2017
@redapple redapple merged commit db13ab5 into scrapy:master Mar 9, 2017
@redapple redapple mentioned this pull request Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants