-
Notifications
You must be signed in to change notification settings - Fork 113
fix (performance): #3606 Add cache for pocket stories and topics #3654
Conversation
ff93b2f
to
764c47a
Compare
Some initial testing, here's number of frames (30fps) from first paint to 1) search box 2) strings 3) topics/stories (+relative times for 2&3 from 1):
If we just take the median times, it looks like:
So at least on my machine, with this fix, topics and stories show up at the same time as strings with caching being ~100ms/167ms faster than network. Unclear if the slightly slower strings with this caching is just noise or related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall things look good from a timing/performance perspective. We should fix up the Prefs
usage and wait on @csadilek for final review of the behavior changes.
system-addon/lib/TopStoriesFeed.jsm
Outdated
} | ||
|
||
getCachedStories() { | ||
return this.getCached(STORIES_CACHE_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These getCached*
wrappers don't seem really necessary. Did you have plans for this additional indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh. not really. just how i organized it at first. but looks silly now.
system-addon/lib/TopStoriesFeed.jsm
Outdated
getCached(key) { | ||
let results = []; | ||
try { | ||
results = JSON.parse(new Prefs().get(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should move away from this anti-pattern of new Prefs
see #3431. You should be able to do this.store.getState().Prefs.values[key]
from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, did this because I was afraid of a race condition with the PrefsFeed. Or is PrefsFeed guaranteed to be first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nod. I noted in #3431 that PrefsFeed
currently INIT
s after some feeds, but luckily for us here, T
opStories comes after P
refsFeed. ;)
system-addon/lib/TopStoriesFeed.jsm
Outdated
} | ||
|
||
setCache(key, value) { | ||
new Prefs().set(key, JSON.stringify(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, probably dispatch SET_PREF
I believe…
system-addon/lib/TopStoriesFeed.jsm
Outdated
loadCachedStories() { | ||
this.stories = this.getCachedStories(); | ||
if (this.stories && this.stories.length > 0) { | ||
this.dispatchUpdateEvent(this.storiesLastUpdated, {rows: this.stories}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would cause 2 dispatches early on: one from cache and one from network. I think that should be okay… @csadilek?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be OK as the second one isn't broadcasting, but we shouldn't assign to this.stories
as this will affect the spoc experiment and we will get rid of this.stories
soon. Let's just assign const stories = this.getCachedStories()
.
system-addon/lib/TopStoriesFeed.jsm
Outdated
this.stories = this.getCachedStories(); | ||
if (this.stories && this.stories.length > 0) { | ||
this.dispatchUpdateEvent(this.storiesLastUpdated, {rows: this.stories}); | ||
this.storiesLastUpdated = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this date isn't entirely accurate, and we could store the date in the cache as well, but unclear if we need to be accurate here as we'll load from network unconditionally on INIT
. Although, arguably we could be smarter if we had an actual cached time especially for Topics as those refresh once every 3 hours, and it doesn't seem unlikely a user would restart firefox within 3 hours. @csadilek?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking I could just set it to 1. I just want to avoid the first fetch to broadcast if we already loaded from cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we don't want a second broadcast as that could potentially swap out the stories while a user is looking at them. This raises a question though. We could potentially show really old stories, from the last time firefox fetched successfully, but I think that's probably OK?
Oh. One thing about using prefs is that there is a max size. I just checked the cache and it's ~13KB. @sarracini do you remember where things go bad? @csadilek any idea how big we should expect the response to get? |
Actually, on second thought.. 13KB string pref might be bad for other Firefox performance that uses prefs. On a new profile with this caching, the pref file is 22KB total, so we're more than half here… @k88hudson any suggestions on some lightweight caching? I suppose the usual thing is write JSON to a file in the profile / cache directory…? |
Tiles used this to write to the Local/Cache profile directory: Could even read out the file timestamp for lastUpdated. |
@Mardak response size will be 15KB and more (without images). We're currently fetching 20 stories but that could likely increase over time i.e. if personalization is successful and we want a bigger client-side selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! Had two comments inline about not using this.stories, and cache expiry (we should verify with design if showing old stories is OK).
Your options are probably either write to indexedDB or write to a json file in the profile directory. I think the json file approach will probably be easier and it's not like a lot of frequent reads/writes will be happening anway 👍 |
Also, you probably want to store a fixed version number (just like We should also have some kind of max age or expiration timing on start-up, to prevent very old stories from being shown. |
Alternatively, punt on versioning and other metadata and when we have a new version, if it lacks a version, it's version 1! ;) |
@rlr @Mardak @k88hudson another edge case here is that when a story is dismissed the cache needs to be invalidated, otherwise a dismissed story could show up again after a restart. My biggest concern is about stale content though. Maybe we should talk about this before landing? |
A dismissed story is blocked, so wouldn't it just need to filter/transform the items before dispatching? |
Yes, we could filter after reading from cache. Move the filter logic out of transform to make it reusable. |
fde92c4
to
77f3264
Compare
@Mardak That last commit ^ changes the "cache" to be a file. I still need to work on tests but maybe you can run the little perf test on it to see how it compares? Or I'm happy to do that if you show me how 😄 |
I use ScreenFlow to record my screen and just note the frame count (30fps by default) from various things appearing. Here's the number of frames as from earlier comment:
I'm on a different network from earlier, but initial few runs seem to have reading separate files as slower than from prefs… I'll try measuring again later tonight. |
Well now hrmm.. median frames to stories/topics: |
Testing with actual pref caching (see previous comment edit) and slow network:
So yes, for those who have slow network, caching definitely helps whether as a file or as pref. I suppose one optimization is to store stories and topics together in a single file. |
system-addon/lib/TopStoriesFeed.jsm
Outdated
} | ||
} | ||
|
||
async loadFromFile(filename) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is getting more involved, I think it makes sense to move to a separate Cache module/component? This would make loadFromFile/saveToFile/loadFromPref/saveToPref
reusable and separate the story feed from the caching concerns.
Wdyt? We can also file a follow-up and refactor later...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think refactoring later might make more sense when we have multiple uses of the cache (although I suppose technically stories + topics = 2 uses of cache…) as that would help provide details of caching requirements, e.g., each caller wants individual lastUpdated
times, similar save
frequencies if saving "all cache data" to a single file, etc.
Unless we know now what other things might desire caching. I suppose LinksCache
could want a local backing of data across restarts vs relying on places. But, in terms of timing of refactoring, I think we would want to think a bit more about LinksCache
and other possible consumers without blocking this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if dynamic prerendering in #3367 would want to use this disk cache. I think its load behavior is different enough to do something special there, e.g., dynamic prerendering would want the data before messages are sent to content. @k88hudson ?
^ That last commit makes it one file. That makes it more complicated because you have to read before you write unless we keep around copies of the latest stories and topics? As it is now, there is also a race condition. It ends up calling |
The concurrent saves should be relatively simple to fix either:
|
Ha ha ha.. here's the "simple" slow = () => new Promise(resolve => setTimeout(resolve, 1000));
mutex = null;
file = "file: ";
save = async v => {
console.log("saving", v, file);
while (mutex) {
await mutex;
}
console.log("grabbing mutex", v);
mutex = new Promise(async resolve => {
console.log("grabbed mutex", v);
let data = file;
await slow();
data += v;
file = data;
console.log("saved", v, file);
mutex = null;
console.log("released mutex", v);
resolve();
});
};
save(1); save(2); save(3); Should print:
|
1471492
to
9fb20d6
Compare
on my machine, at home, most of the time, the network results are loaded before the disk cache 😬 I handle that properly though (I think). The kind of cool thing though is turning off wifi, starting the browser and having top stories instead of empty boxes. |
9fb20d6
to
028ee96
Compare
@Mardak I removed the mutex/locking because I don't think it's necessary anymore now that we keep an in memory copy and aren't reading before writing. Coverage check isn't happy because the functions below aren't getting executed. Any ideas how to fix or skip that check?
Any other thoughts? r? |
You can probably get line coverage by updating |
Not needing the mutex sounds right as the save operation doesn't really allow for concurrent mixing. Probably rename the jsm to be the same thing exported. So I guess |
c8f99b8
to
ec746dc
Compare
ok I think this is good (for now) for r? I'm going to see if I can get something similar working with indexdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I guess this is basically rewriting PersistentCache
but the two gInMemoryCache
and gFilesLoaded
doesn't seem quite right with the discrepancy between what's in memory vs disk. And there doesn't seem to be a need to actually have a shared global state?
I'm currently thinking of something like…
class {
constructor(name, {preload}) {
…;
if (preload) {
this._load();
}
}
_load() {
return this._cache || (this._cache = new Promise(async resolve => {
let data = {};
…; // the load from file stuff
resolve(data);
}));
}
async get(key) {
const data = await this._load();
return … ? : data[key];
}
async set(key, value) {
const data = await this._load();
data[key] = value;
this._persist(data);
}
}
Where _load
returns the same Promise
that resolves to whatever object it initialized or read from disk from its first invocation.
I think it'll be quite a bit cleaner this way, but feel free to push back ;)
system-addon/lib/PersistentCache.jsm
Outdated
* @param {string} filename Name of the file to use to persist the cache to disk. | ||
*/ | ||
constructor(filename) { | ||
this.filename = filename; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to just expose the API as some name/string identifier that we happen to convert to ${name}.json
while we're writing to disk. Hopefully the name will be reusable across other persistent storage mechanisms.
system-addon/lib/PersistentCache.jsm
Outdated
XPCOMUtils.defineLazyGetter(this, "gFilesLoaded", () => []); | ||
|
||
/** | ||
* A disk based persistent cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably note that it's a cache of a javascript object (JSON-able).
system-addon/lib/PersistentCache.jsm
Outdated
* @param {string} key The cache key. | ||
* @param {object} value The data to be cached. | ||
*/ | ||
set(key, value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm... I think it would be good API cleanliness to expose this as async
even though it's not actually async right now.
system-addon/lib/PersistentCache.jsm
Outdated
/** | ||
* Get a value from the cache. | ||
* | ||
* @param {string} key The cache key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Note that it's optional and the expected behavior
system-addon/lib/PersistentCache.jsm
Outdated
if (key) { | ||
return gInMemoryCache.get(this.filename)[key]; | ||
} | ||
return gInMemoryCache.get(this.filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
const data = gInMemoryCache.get(this.filename);
return key ? data[key] : data;
system-addon/lib/PersistentCache.jsm
Outdated
*/ | ||
async loadFromFile() { | ||
let data = {}; | ||
// let timestamp = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just remove the line
system-addon/lib/PersistentCache.jsm
Outdated
// Map of filenames to the latest data in them. | ||
XPCOMUtils.defineLazyGetter(this, "gInMemoryCache", () => new Map()); | ||
// A list of cache files that has already been loaded. | ||
XPCOMUtils.defineLazyGetter(this, "gFilesLoaded", () => []); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm......... why do we want to allow for caches that aren't loaded? It seems that these two could be combined into one to avoid some odd condition, e.g., setting then getting resulting in mismatch between what's on disk and in memory.
Maybe just have a const gCache = new Map()
that maps from the name to a promise of an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to handle set
being called before load
was done, but I see how you handled that nicely in your proposal. I like it. You're right, there currently isn't a need for a shared global. I was thinking about loading the cache from ActivityStream.jsm
which would possibly require it or having a shared cache instance, but I'm not sure that would be needed anyway.
I'll work through these changes, I think the end result will be nicer 👍
system-addon/lib/PersistentCache.jsm
Outdated
*/ | ||
constructor(filename) { | ||
this.filename = filename; | ||
gInMemoryCache.set(this.filename, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate an initial state that could possibly differ from that from loadFromFile
.
system-addon/lib/TopStoriesFeed.jsm
Outdated
this.TopStoriesFeed = class TopStoriesFeed { | ||
constructor() { | ||
this.spocsPerNewTabs = 0; | ||
this.newTabsSinceSpoc = 0; | ||
this.contentUpdateQueue = []; | ||
this.pocketCache = new PersistentCache(POCKET_DATA_FILE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like nothing else in the file explicitly referred to "pocket" before… I suppose it's supposed to be generic? @csadilek Maybe just this.cache = new PersistentCache("topStories");
?
ec746dc
to
6873190
Compare
@Mardak alrighty. ^ that came out nice I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions. In particular removing the Promise
from _persist
. Otherwise should be good!
* @param {boolean} preload (optional). Whether the cache should be preloaded from file. Defaults to false. | ||
*/ | ||
constructor(name, preload = false) { | ||
this.name = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just compute the file name once:
this._filename = `${name}.json`;
system-addon/lib/PersistentCache.jsm
Outdated
*/ | ||
_persist(data) { | ||
const filepath = OS.Path.join(OS.Constants.Path.localProfileDir, `${this.name}.json`); | ||
this._cache = new Promise(resolve => resolve(data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we don't need to re-resolve a new promise. Any reason why this was added? (The shorter way to write this is Promise.resolve(data)
)
system-addon/lib/TopStoriesFeed.jsm
Outdated
@@ -28,6 +29,7 @@ this.TopStoriesFeed = class TopStoriesFeed { | |||
this.spocsPerNewTabs = 0; | |||
this.newTabsSinceSpoc = 0; | |||
this.contentUpdateQueue = []; | |||
this.cache = new PersistentCache(SECTION_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to preload here? I'll try running some quick tests with / without.
Oh actually, I wonder if we should be putting our cache files into an activity stream directory.. or at least prefix them with something, e.g., |
Frames (60fps) from first paint to placeholders, strings, stories preload = false: preload = true: So median time to stories with |
namespacing is a fantastic idea!
So, the reason I created a new Promise was because I thought ‘_cache’ had
to be a Promise for ‘await _load()’ to work properly. But maybe there is
some magic there I don’t know about.
|
You can indeed |
Ooooooh duh. Sorry, I see it now. _load() returns a reference to _cache so
it is already up to date. My head hurts :)
|
async set(key, value) { | ||
const data = await this._load(); | ||
data[key] = value; | ||
this._persist(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't really need to pass data
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passing data
is to avoid awaiting for _cache
in _persist
as we already got it a few lines back.
* Load the cache into memory if it isn't already. | ||
*/ | ||
_load() { | ||
return this._cache || (this._cache = new Promise(async resolve => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. I think I am still confused as to how this._cache
goes from being a Promise
to being an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is always a Promise
but we are updating the underlying object it resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this._cache
is always a Promise
but when we await
it, we get the same original resolved object each time.
This a file on disk as a persistent cache.
Fixes #3606