Browse files

context.py: add disk caching of 'find all descendants'

context.cache_page_children() now stores its result (for real
directories) in the on-disk cache as a heuristically-invalidated cache
entry. This is a bit sleazy (especially because invalidation is a bit
tricky) but is very much worthwhile for WanderingThoughts.

Today's moral: filesystem walks do not scale up really well. What
was fast with a hundred files is not necessarily so much with a few
thousand.

- atomgen.py changed to use context.cache_page_children() to take
  advantage of this cache
- documented in test/pages/dwiki/Caching

This is not the full change but is the easily-separated bit. Bad me.
  • Loading branch information...
1 parent d7de4a4 commit 1e2a0211231f009ed6a59dfedd8c8e9914cd6ac0 @siebenmann committed Jan 31, 2013
Showing with 53 additions and 5 deletions.
  1. +4 −1 atomgen.py
  2. +38 −0 context.py
  3. +2 −2 test.timestamps
  4. +9 −2 test/pages/dwiki/Caching
View
5 atomgen.py
@@ -176,7 +176,10 @@ def _fillpages(context):
cutpoint = get_cutpoint(context)
cuttime = get_cuttime(context)
- dl = context.page.descendants(context)
+ #dl = context.page.descendants(context)
+ # We deliberately use this context routine because it uses the
+ # disk cache (if that exists).
+ dl = context.cache_page_children(context.page)
# Force the generator to be expanded to a full list so we can use
# .sort on it.
View
38 context.py
@@ -11,6 +11,7 @@
import copy
import utils
+import rendcache
class Context(object):
def __init__(self, cfg, model):
@@ -297,11 +298,48 @@ def cache_page_children(self, page):
res = self.getcache(("pagekids", rp))
if res is not None:
return res
+ res = self._get_disk_cpc(page)
+ if res is not None:
+ # If the general disk cache hit, we must load
+ # the in-memory cache.
+ self.setcache(("pagekids", rp), res)
+ return res
+ # Full miss. Go to all the work.
+ #
# descendants() may return an iterator, which is
# absolutely no good to cache. So we must list-ize
# it, no matter how annoying that is.
res = list(page.descendants(self))
# To be sure we sort it before we store it.
utils.sort_timelist(res)
self.setcache(("pagekids", rp), res)
+ self._set_disk_cpc(page, res)
return res
+
+ # Get and store page descendent lists in the generator disk cache,
+ # because they are time-consuming to compute. This is kind of a
+ # hack; see comments in pageranges.py for a similar case. Maybe
+ # I should merge them?
+ def _get_disk_cpc(self, page):
+ if not rendcache.cache_on(self.cfg) or page.virtual() or \
+ page.type != "dir":
+ return None
+ return rendcache.fetch_gen(self, page.path, "page-kids")
+ # TODO: is this validator good enough? Probably.
+ def _set_disk_cpc(self, page, plist):
+ if not rendcache.cache_on(self.cfg) or page.virtual() or \
+ page.type != "dir":
+ return
+ v = rendcache.Validator()
+ v.add_mtime(page)
+ ds = {page.path: True}
+ # note that Storage .children() (and thus .descendants()
+ # et al) never returns directories. This is a bit
+ # regrettable.
+ for ts, ppath in plist:
+ pdir = utils.parent_path(ppath)
+ if pdir in ds:
+ continue
+ ds[pdir] = True
+ v.add_mtime(self.model.get_page(pdir))
+ rendcache.store_gen(self, "page-kids", page.path, plist, v)
View
4 test.timestamps
@@ -86,7 +86,6 @@
@1117566817 test/pages/dwiki/RCS/Formatting,v
@1117570420 test/templates/blog/blogentry-std.tmpl
@1117571145 test/pages/dwiki/NewFeatures/DocStringDocs
-@1117572914 test/passwd
@1117589039 test/templates/wikiauth.tmpl
@1117591467 test/pages/dwiki/NewFeatures/ShortRecentChanges
@1117593027 test/pages/dwiki/NewFeatures/TemplateNewlines
@@ -257,8 +256,9 @@
@1306936630 test/pages/help/DWikiText
@1306937173 test/pages/dwiki/NewFeatures/ProcessingNotes
@1306937385 test/pages/dwiki/NewFeatures/VariousConfigBits
-@1317228424 test/pages/dwiki/Caching
@1323382253 test/pages/Tests/.flag.noview:blogdir
@1323456218 test/pages/dwiki/NewFeatures/DisallowDirViews
@1339683377 test/pages/Tests/PreWrapTest
@1359140622 test/pages/dwiki/ConfigurationFile
+@1359671470 test/passwd
+@1359671914 test/pages/dwiki/Caching
View
11 test/pages/dwiki/Caching
@@ -112,8 +112,9 @@ so that if the heuristic is fooled DWiki will pick up the new result
sooner or later.
Currently cached are various wikitext to HTML renderers (most of the
-time) and the expensive bit of _blog::prevnext_ (this must use a
-heuristic validator).
+time), the expensive bit of _blog::prevnext_ (this must use a heuristic
+validator), and the general 'find all descendants of this directory'
+operation that underlies a lot of the blog engine.
Unfortunately, a DWiki page that has comment or access restrictions
must be cached separately for each DWiki user that views it. Under some
@@ -132,6 +133,12 @@ but it can't check all of them and still be a useful cache.
So the easy way to invalidate this is to change the modification
time of a directory involved, for example with _touch_.
+The 'find all descendants' cache is similarly invalidated by changing
+a directory modification time. Unlike the _blog::prevnext_ case, the
+directory times are the only thing that this cache checks. This is a bit
+of a pity but the performance improvements from caching this information
+are very visible.
+
== Disk space usage
Much like comments, each page that has something cached for it

0 comments on commit 1e2a021

Please sign in to comment.