Skip to content
This repository has been archived by the owner on Feb 29, 2020. It is now read-only.

fix (performance): #3606 Add cache for pocket stories and topics #3654

Merged
merged 3 commits into from
Oct 13, 2017

Conversation

rlr
Copy link
Contributor

@rlr rlr commented Oct 5, 2017

This a file on disk as a persistent cache.

Fixes #3606

@Mardak
Copy link
Member

Mardak commented Oct 5, 2017

Some initial testing, here's number of frames (30fps) from first paint to 1) search box 2) strings 3) topics/stories (+relative times for 2&3 from 1):

before: 10, 14, 18/18 ( +4,  +8/ +8)
before: 14, 21, 25/25 ( +7, +11/+11)
before: 13, 17, 20/23 ( +4,  +7/+10)
 after: 12, 17, 17/17 ( +5,  +5/ +5)
 after: 14, 19, 19/19 ( +5,  +5/ +5)
 after: 13, 23, 23/23 (+10, +10/+10)

If we just take the median times, it looks like:

before: 13, 17, 21/23 ( +4,  +8/+10)
 after: 13, 18, 18/18 ( +5,  +5/ +5)

So at least on my machine, with this fix, topics and stories show up at the same time as strings with caching being ~100ms/167ms faster than network. Unclear if the slightly slower strings with this caching is just noise or related.

Copy link
Member

@Mardak Mardak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall things look good from a timing/performance perspective. We should fix up the Prefs usage and wait on @csadilek for final review of the behavior changes.

}

getCachedStories() {
return this.getCached(STORIES_CACHE_KEY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These getCached* wrappers don't seem really necessary. Did you have plans for this additional indirection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh. not really. just how i organized it at first. but looks silly now.

getCached(key) {
let results = [];
try {
results = JSON.parse(new Prefs().get(key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should move away from this anti-pattern of new Prefs see #3431. You should be able to do this.store.getState().Prefs.values[key] from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, did this because I was afraid of a race condition with the PrefsFeed. Or is PrefsFeed guaranteed to be first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nod. I noted in #3431 that PrefsFeed currently INITs after some feeds, but luckily for us here, TopStories comes after PrefsFeed. ;)

}

setCache(key, value) {
new Prefs().set(key, JSON.stringify(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, probably dispatch SET_PREF I believe…

loadCachedStories() {
this.stories = this.getCachedStories();
if (this.stories && this.stories.length > 0) {
this.dispatchUpdateEvent(this.storiesLastUpdated, {rows: this.stories});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would cause 2 dispatches early on: one from cache and one from network. I think that should be okay… @csadilek?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be OK as the second one isn't broadcasting, but we shouldn't assign to this.stories as this will affect the spoc experiment and we will get rid of this.stories soon. Let's just assign const stories = this.getCachedStories().

this.stories = this.getCachedStories();
if (this.stories && this.stories.length > 0) {
this.dispatchUpdateEvent(this.storiesLastUpdated, {rows: this.stories});
this.storiesLastUpdated = Date.now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this date isn't entirely accurate, and we could store the date in the cache as well, but unclear if we need to be accurate here as we'll load from network unconditionally on INIT. Although, arguably we could be smarter if we had an actual cached time especially for Topics as those refresh once every 3 hours, and it doesn't seem unlikely a user would restart firefox within 3 hours. @csadilek?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking I could just set it to 1. I just want to avoid the first fetch to broadcast if we already loaded from cache.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we don't want a second broadcast as that could potentially swap out the stories while a user is looking at them. This raises a question though. We could potentially show really old stories, from the last time firefox fetched successfully, but I think that's probably OK?

@Mardak Mardak requested a review from csadilek October 5, 2017 13:51
@Mardak Mardak assigned rlr and csadilek and unassigned Mardak Oct 5, 2017
@Mardak
Copy link
Member

Mardak commented Oct 5, 2017

Oh. One thing about using prefs is that there is a max size. I just checked the cache and it's ~13KB. @sarracini do you remember where things go bad? @csadilek any idea how big we should expect the response to get?

@Mardak
Copy link
Member

Mardak commented Oct 5, 2017

Actually, on second thought.. 13KB string pref might be bad for other Firefox performance that uses prefs. On a new profile with this caching, the pref file is 22KB total, so we're more than half here…

@k88hudson any suggestions on some lightweight caching? I suppose the usual thing is write JSON to a file in the profile / cache directory…?

@Mardak
Copy link
Member

Mardak commented Oct 5, 2017

Tiles used this to write to the Local/Cache profile directory:
https://searchfox.org/mozilla-central/source/browser/modules/DirectoryLinksProvider.jsm#330-338

Could even read out the file timestamp for lastUpdated.

@csadilek
Copy link
Collaborator

csadilek commented Oct 5, 2017

@Mardak response size will be 15KB and more (without images). We're currently fetching 20 stories but that could likely increase over time i.e. if personalization is successful and we want a bigger client-side selection.

Copy link
Collaborator

@csadilek csadilek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! Had two comments inline about not using this.stories, and cache expiry (we should verify with design if showing old stories is OK).

@k88hudson
Copy link
Contributor

Your options are probably either write to indexedDB or write to a json file in the profile directory. I think the json file approach will probably be easier and it's not like a lot of frequent reads/writes will be happening anway 👍

@k88hudson
Copy link
Contributor

k88hudson commented Oct 5, 2017

Also, you probably want to store a fixed version number (just like 1 is fine) or something that would tell you to not use the cached data if you land breaking changes to the data format.

We should also have some kind of max age or expiration timing on start-up, to prevent very old stories from being shown.

@Mardak
Copy link
Member

Mardak commented Oct 5, 2017

Alternatively, punt on versioning and other metadata and when we have a new version, if it lacks a version, it's version 1! ;)

@csadilek
Copy link
Collaborator

csadilek commented Oct 6, 2017

@rlr @Mardak @k88hudson another edge case here is that when a story is dismissed the cache needs to be invalidated, otherwise a dismissed story could show up again after a restart. My biggest concern is about stale content though. Maybe we should talk about this before landing?

@Mardak
Copy link
Member

Mardak commented Oct 6, 2017

A dismissed story is blocked, so wouldn't it just need to filter/transform the items before dispatching?

@csadilek
Copy link
Collaborator

csadilek commented Oct 6, 2017

Yes, we could filter after reading from cache. Move the filter logic out of transform to make it reusable.

@rlr
Copy link
Contributor Author

rlr commented Oct 6, 2017

@Mardak That last commit ^ changes the "cache" to be a file. I still need to work on tests but maybe you can run the little perf test on it to see how it compares? Or I'm happy to do that if you show me how 😄

@Mardak
Copy link
Member

Mardak commented Oct 7, 2017

I use ScreenFlow to record my screen and just note the frame count (30fps by default) from various things appearing. Here's the number of frames as from earlier comment:

before: 11, 16, 23/23 (+5, +12/+12)
before: 10, 16, 23/23 (+6, +13/+13)
before: 11, 18, 23/23 (+7, +12/+12)
after:  13, 19, 24/24 (+6, +11/+11)
after:  14, 22, 23/23 (+8,  +9/ +9)
after:  14, 19, 25/25 (+5, +11/+11)

I'm on a different network from earlier, but initial few runs seem to have reading separate files as slower than from prefs… I'll try measuring again later tonight.

@Mardak
Copy link
Member

Mardak commented Oct 7, 2017

orig: 11, 16, 19/19
orig: 10, 16, 20/20
orig: 12, 19, 22/22
orig: 10, 15, 19/19
orig: 11, 17, 20/20
file: 12, 17, 21/21
file: 11, 16, 20/21
file: 10, 16, 19/19
file: 11, 17, 21/21
file: 12, 19, 21/21
pref: 11, 18, 22/22
pref: 14, 21, 24/21
pref: 13, 18, 22/22
pref: 12, 17, 21/20
pref: 12, 17, 22/22

Well now hrmm.. median frames to stories/topics:
orig: 20/20
file: 21/21
pref: 22/22
… ?
Edit: Nevermind the pref ones. I checked out the getState/SetPref commit but it wasn't actually setting the pref because the value needs to be stringified before writing.

@Mardak
Copy link
Member

Mardak commented Oct 7, 2017

Testing with actual pref caching (see previous comment edit) and slow network:

file: 12, 19, 21/22
file: 13, 17, 21/22
file: 12, 20, 21/21
pref: 13, 22, 22/22
pref: 10, 19, 19/19
pref: 11, 19, 19/19
orig: 12, 16, 57/23 (topics showed up over a second earlier than stories)
orig: 14, 20, 50/51
orig: 14, 22, 59/59

So yes, for those who have slow network, caching definitely helps whether as a file or as pref. I suppose one optimization is to store stories and topics together in a single file.

}
}

async loadFromFile(filename) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is getting more involved, I think it makes sense to move to a separate Cache module/component? This would make loadFromFile/saveToFile/loadFromPref/saveToPref reusable and separate the story feed from the caching concerns.

Wdyt? We can also file a follow-up and refactor later...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring later might make more sense when we have multiple uses of the cache (although I suppose technically stories + topics = 2 uses of cache…) as that would help provide details of caching requirements, e.g., each caller wants individual lastUpdated times, similar save frequencies if saving "all cache data" to a single file, etc.

Unless we know now what other things might desire caching. I suppose LinksCache could want a local backing of data across restarts vs relying on places. But, in terms of timing of refactoring, I think we would want to think a bit more about LinksCache and other possible consumers without blocking this fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if dynamic prerendering in #3367 would want to use this disk cache. I think its load behavior is different enough to do something special there, e.g., dynamic prerendering would want the data before messages are sent to content. @k88hudson ?

@rlr
Copy link
Contributor Author

rlr commented Oct 9, 2017

^ That last commit makes it one file. That makes it more complicated because you have to read before you write unless we keep around copies of the latest stories and topics? As it is now, there is also a race condition. It ends up calling saveToFile({topics}) in parallel with saveToFile({stories}) and one of the saves can get lost basically. So... if it isn't much faster then I don't think it's worth it. But if it is, we can fix the bugs.

@Mardak
Copy link
Member

Mardak commented Oct 9, 2017

The concurrent saves should be relatively simple to fix either:

  • keep an in-memory cache (seems a little bit dangerous if someone attempts to directly touch those values)
  • await on any pending saves to serialize saving (although need to be careful of multiple pending saves thinking they're safe to go when it 1st resolve, e.g., 3 concurrent, 1st finishes and both 2nd + 3rd end up concurrent instead of serial)

@Mardak
Copy link
Member

Mardak commented Oct 9, 2017

Ha ha ha.. here's the "simple" await mutex:

slow = () => new Promise(resolve => setTimeout(resolve, 1000));
mutex = null;
file = "file: ";
save = async v => {
  console.log("saving", v, file);
  while (mutex) {
    await mutex;
  }
  console.log("grabbing mutex", v);
  mutex = new Promise(async resolve => {
    console.log("grabbed mutex", v);
    let data = file;
    await slow();
    data += v;
    file = data;
    console.log("saved", v, file);
    mutex = null;
    console.log("released mutex", v);
    resolve();
  });
};
save(1); save(2); save(3);

Should print:

saving 1 file:
grabbing mutex 1
grabbed mutex 1
saving 2 file:
saving 3 file:
saved 1 file: 1
released mutex 1
grabbing mutex 2
grabbed mutex 2
saved 2 file: 12
released mutex 2
grabbing mutex 3
grabbed mutex 3
saved 3 file: 123
released mutex 3

@rlr
Copy link
Contributor Author

rlr commented Oct 11, 2017

on my machine, at home, most of the time, the network results are loaded before the disk cache 😬 I handle that properly though (I think).

The kind of cool thing though is turning off wifi, starting the browser and having top stories instead of empty boxes.

@rlr
Copy link
Contributor Author

rlr commented Oct 12, 2017

@Mardak I removed the mutex/locking because I don't think it's necessary anymore now that we keep an in memory copy and aren't reading before writing.

Coverage check isn't happy because the functions below aren't getting executed. Any ideas how to fix or skip that check?

XPCOMUtils.defineLazyGetter(this, "gTextDecoder", () => new TextDecoder());
XPCOMUtils.defineLazyGetter(this, "gInMemoryCache", () => new Map());
XPCOMUtils.defineLazyGetter(this, "gFilesLoaded", () => []);

Any other thoughts? r?

@Mardak
Copy link
Member

Mardak commented Oct 12, 2017

You can probably get line coverage by updating unit-entry.js to just call the lazy part of the defineLazyGetter. Although I avoided this for FilterAdult.jsm in #3422 by making them not lazy. The thinking there is if the module itself is lazily loaded until we actually need to start using it, additionally making items within the module lazy is overhead.

@Mardak
Copy link
Member

Mardak commented Oct 12, 2017

Not needing the mutex sounds right as the save operation doesn't really allow for concurrent mixing.

Probably rename the jsm to be the same thing exported. So I guess PersistentCache.jsm

@rlr
Copy link
Contributor Author

rlr commented Oct 12, 2017

ok I think this is good (for now) for r?

I'm going to see if I can get something similar working with indexdb

Copy link
Member

@Mardak Mardak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I guess this is basically rewriting PersistentCache but the two gInMemoryCache and gFilesLoaded doesn't seem quite right with the discrepancy between what's in memory vs disk. And there doesn't seem to be a need to actually have a shared global state?

I'm currently thinking of something like…

class {
  constructor(name, {preload}) {
    ;
    if (preload) {
      this._load();
    }
  }
  _load() {
    return this._cache || (this._cache = new Promise(async resolve => {
      let data = {};
      ; // the load from file stuff
      resolve(data);
    }));
  }
  async get(key) {
    const data = await this._load();
    return  ? : data[key];
  }
  async set(key, value) {
    const data = await this._load();
    data[key] = value;
    this._persist(data);
  }
}

Where _load returns the same Promise that resolves to whatever object it initialized or read from disk from its first invocation.

I think it'll be quite a bit cleaner this way, but feel free to push back ;)

* @param {string} filename Name of the file to use to persist the cache to disk.
*/
constructor(filename) {
this.filename = filename;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to just expose the API as some name/string identifier that we happen to convert to ${name}.json while we're writing to disk. Hopefully the name will be reusable across other persistent storage mechanisms.

XPCOMUtils.defineLazyGetter(this, "gFilesLoaded", () => []);

/**
* A disk based persistent cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably note that it's a cache of a javascript object (JSON-able).

* @param {string} key The cache key.
* @param {object} value The data to be cached.
*/
set(key, value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm... I think it would be good API cleanliness to expose this as async even though it's not actually async right now.

/**
* Get a value from the cache.
*
* @param {string} key The cache key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Note that it's optional and the expected behavior

if (key) {
return gInMemoryCache.get(this.filename)[key];
}
return gInMemoryCache.get(this.filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

const data = gInMemoryCache.get(this.filename);
return key ? data[key] : data;

*/
async loadFromFile() {
let data = {};
// let timestamp = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just remove the line

// Map of filenames to the latest data in them.
XPCOMUtils.defineLazyGetter(this, "gInMemoryCache", () => new Map());
// A list of cache files that has already been loaded.
XPCOMUtils.defineLazyGetter(this, "gFilesLoaded", () => []);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm......... why do we want to allow for caches that aren't loaded? It seems that these two could be combined into one to avoid some odd condition, e.g., setting then getting resulting in mismatch between what's on disk and in memory.

Maybe just have a const gCache = new Map() that maps from the name to a promise of an object.

Copy link
Contributor Author

@rlr rlr Oct 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to handle set being called before load was done, but I see how you handled that nicely in your proposal. I like it. You're right, there currently isn't a need for a shared global. I was thinking about loading the cache from ActivityStream.jsm which would possibly require it or having a shared cache instance, but I'm not sure that would be needed anyway.

I'll work through these changes, I think the end result will be nicer 👍

*/
constructor(filename) {
this.filename = filename;
gInMemoryCache.set(this.filename, {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate an initial state that could possibly differ from that from loadFromFile.

this.TopStoriesFeed = class TopStoriesFeed {
constructor() {
this.spocsPerNewTabs = 0;
this.newTabsSinceSpoc = 0;
this.contentUpdateQueue = [];
this.pocketCache = new PersistentCache(POCKET_DATA_FILE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like nothing else in the file explicitly referred to "pocket" before… I suppose it's supposed to be generic? @csadilek Maybe just this.cache = new PersistentCache("topStories"); ?

@rlr
Copy link
Contributor Author

rlr commented Oct 13, 2017

@Mardak alrighty. ^ that came out nice I think.

Copy link
Member

@Mardak Mardak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions. In particular removing the Promise from _persist. Otherwise should be good!

* @param {boolean} preload (optional). Whether the cache should be preloaded from file. Defaults to false.
*/
constructor(name, preload = false) {
this.name = name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just compute the file name once:
this._filename = `${name}.json`;

*/
_persist(data) {
const filepath = OS.Path.join(OS.Constants.Path.localProfileDir, `${this.name}.json`);
this._cache = new Promise(resolve => resolve(data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we don't need to re-resolve a new promise. Any reason why this was added? (The shorter way to write this is Promise.resolve(data))

@@ -28,6 +29,7 @@ this.TopStoriesFeed = class TopStoriesFeed {
this.spocsPerNewTabs = 0;
this.newTabsSinceSpoc = 0;
this.contentUpdateQueue = [];
this.cache = new PersistentCache(SECTION_ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to preload here? I'll try running some quick tests with / without.

@Mardak
Copy link
Member

Mardak commented Oct 13, 2017

Oh actually, I wonder if we should be putting our cache files into an activity stream directory.. or at least prefix them with something, e.g., activity-stream.${name}.json ?

@Mardak
Copy link
Member

Mardak commented Oct 13, 2017

Frames (60fps) from first paint to placeholders, strings, stories

preload = false:
26, 33, 46
21, 28, 36
21, 29, 40
21, 29, 42
25, 35, 41

preload = true:
23, 29, 39
31, 41, 41
23, 35, 40
20, 27, 37
17, 23, 29

So median time to stories with false is 41 frames vs true is 39 frames. So yes preload?

@rlr
Copy link
Contributor Author

rlr commented Oct 13, 2017 via email

@Mardak
Copy link
Member

Mardak commented Oct 13, 2017

You can indeed await a non-Promise value. But I actually meant you don't need to re-assign to this._cache. That value only needs to be assigned once on _load.

@rlr
Copy link
Contributor Author

rlr commented Oct 13, 2017 via email

async set(key, value) {
const data = await this._load();
data[key] = value;
this._persist(data);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't really need to pass data here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passing data is to avoid awaiting for _cache in _persist as we already got it a few lines back.

* Load the cache into memory if it isn't already.
*/
_load() {
return this._cache || (this._cache = new Promise(async resolve => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. I think I am still confused as to how this._cache goes from being a Promise to being an object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is always a Promise but we are updating the underlying object it resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this._cache is always a Promise but when we await it, we get the same original resolved object each time.

@Mardak Mardak dismissed csadilek’s stale review October 13, 2017 21:13

review comments have been addressed

@Mardak Mardak merged commit 1527d05 into mozilla:master Oct 13, 2017
@rlr rlr deleted the gh3606/pocket-cache branch October 13, 2017 21:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants