New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache page text until next edit operation #80
Conversation
Agree that the section part is dangerous and should be removed, but it should be done in a separate commit. Could you make a separate pull request for it? Caching also seems like a valid feature to add. There should probably also be some way to set cache size limit, so you can limit memory usage when gathering data from a large number of pages without editing. Or perhaps just a flag to disable caching? For bypassing the cache on single requests, perhaps we could just add an argument to the def text(self, section=None, expandtemplates=False, nocache=False): |
Ah, the travis error is genuine too (PEP-8 in my stuff): I'll fix that. Agree on the separate PR for the section stuff, will do. I'm guessing it goes back to the era where there was Honestly I'm looking for feedback from you on how you'd like the design refined/improved, I'm just a monkey with a bunch of Stack Overflow tabs :P I'm using pretty much this code in my project ATM (in my subclass of |
87c3ae1
to
f148b59
Compare
There's a version which makes the caching optional (per instance) and also allows the setting to be overridden with each |
ping @danmichaelo |
I think I'll try rebasing this once Dan lets me know how he'd like #81 done and we get that merged - how this should behave will be clearer then. |
Just merged #81 now. Seems like I can merge this one too if you do a rebase. Two questions:
|
In answer to both questions I guess I was imagining some kind of use case where you kept the I was also just being a wuss and figuring 'well, if this breaks, at least if it's not the default it doesn't hurt too many people'. :P |
6aef4ee
to
f5566a4
Compare
OK, I re-diffed this with a very simple approach. Things to consider:
I'm not quite sure how best to add this to the tests, advice welcome. |
I think the most important is that the docstring is clear. It should tell if caching is default or not, and explain that caching is only for the lifetime of the object.. (Btw. If someone else makes an edit after you instantiated the page object, we still have the problem that Page.revision, Page.touched, etc. still retains their old, out-dated values.)
I get that :) |
Hm, not sure, but I think I would go for no. Perhaps there might be cases where you would like to save memory?
Doesn't really seem necessary to me.
Perhaps
? Ask if you need help getting the tests working |
a8512f3
to
db886a0
Compare
Sorry to be dim, but it seems like calling
This does seem to be specific to the test environment, a very simple test script:
(or anything similar to that) works fine. |
I've struggled quite a bit with mocking… In this case, if you do print(self.site.api.return_value)
text = self.page.text()
print(self.site.api.return_value) you'll notice that the object is altered; the timestamp is expanded
To fix that, we could deep-copy the original object before the call, and re-store it after the call to self.page.text(): response = deepcopy(self.site.api.return_value)
text = self.page.text(…)
self.site.api.return_value = response This is getting quite messy though.. perhaps it can be done slightly more cleanly by stepping one step up, mocking def test_get_page_text_cached(self):
self.page.revisions = mock.Mock(return_value=iter([]))
self.page.text()
self.page.text()
assert self.page.revisions.call_count == 2 |
Store the results of page.text() operations in a simple cache dict. This avoids unnecessary remote roundtrips. Cache is cleared on each successful page.save() operation. cache argument can be set to 'False' to disable use of the cache.
db886a0
to
e8fc54d
Compare
So your idea works, but only if we change
I generally try to keep my stuff working with 2.6, but 2.5 seems like just too much work... edit: I see |
Yeah, keeping 2.5 compability was too hard, so I dropped that. |
Looks good now 😎 Sorry for the delay! |
Cache page text until next edit operation
Store the results of page.text() operations in a simple cache dict. This avoids unnecessary remote roundtrips. Cache is cleared on each successful page save operation; we might want to also have a method to clear the cache, TTL, whatever, if folks are likely to keep Page instances around a long time and need to refresh them for possible third party edits.
This also gets rid of the section attribute, which looks bogus to me. All it appears to achieve is that if you retrieve the text of a particular section, then run a 'save' operation without explicitly specifying a section, the save operation is applied to the same section. This seems a completely wrong and potentially dangerous assumption. Maybe I'm missing something, though.
This is a pretty simple-minded implementation I wouldn't necessarily expect to get merged as-is, view it as a proof-of-concept / prototype rather than something really mergeable, I think. Just wanted to provide a bit more flesh than simply a feature request. My use case for this is a case where I want to sort of assemble a set of distinct edits to a page before firing them all together, and in order to figure out all the edits, I have to poke through the existing page text for each distinct one, finding the appropriate bit to change.