Skip to content

Commit

Permalink
Redesign cache storage
Browse files Browse the repository at this point in the history
In uBO, the "cache storage" is used to save resources which can
be safely discarded, though at the cost of having to fetch or
recompute them again.

Extension storage (browser.storage.local) is now always used as
cache storage backend. This has always been the default for
Chromium-based browsers.

For Firefox-based browsers, IndexedDB was used as backend for
cache storage, with fallback to extension storage when using
Firefox in private mode by default.

Extension storage is reliable since it works in all contexts,
though it may not be the most performant one.

To speed-up loading of resources from extension storage, uBO will
now make use of Cache API storage, which will mirror content of
key assets saved to extension storage. Typically loading resources
from Cache API is faster than loading the same resources from
the extension storage.

Only resources which must be loaded in memory as fast as possible
will make use of the Cache API storage layered on top of the
extension storage.

Compiled filter lists and memory snapshot of filtering engines
(aka "selfies") will be mirrored to the Cache API storage, since
these must be loaded into memory as fast as possible, and reloading
filter lists from their compiled counterpart is a common
operation.

This new design makes it now seamless to work in permanent private
mode for Firefox-based browsers, since extension storage now
always contains cache-related assets.

Support for IndexedDB is removed for the time being, except to
support migration of cached assets the first time uBO runs with
the new cache storage design.

In order to easily support all choices of storage, a new serializer
has been introduced, which is capable of serializing/deserializing
structure-cloneable data to/from a JS string.

Because of this new serializer, JS data structures can be stored
directly from their native representation, and deserialized
directly to their native representation from uBO's point of view,
since the serialization occurs (if needed) only at the storage
interface level.

This new serializer simplifies many code paths where data
structures such as Set, Map, TypedArray, RegExp, etc. had to be
converted in a disparate manner to be able to persist them to
extension storage.

The new serializer supports workers and LZ4 compression. These
can be configured through advanced settings.

With this new layered design, it's possible to introduce more
storage layers if measured as beneficial (i.e. maybe
browser.storage.session)

References:
- https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/local
- https://developer.mozilla.org/en-US/docs/Web/API/Cache
- https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm
  • Loading branch information
gorhill committed Feb 26, 2024
1 parent 2262a12 commit 086766a
Show file tree
Hide file tree
Showing 19 changed files with 1,916 additions and 924 deletions.
23 changes: 11 additions & 12 deletions platform/common/vapi-background.js
Expand Up @@ -1671,21 +1671,17 @@ vAPI.cloud = (( ) => {

const push = async function(details) {
const { datakey, data, encode } = details;
if (
data === undefined ||
typeof data === 'string' && data === ''
) {
if ( data === undefined || typeof data === 'string' && data === '' ) {
return deleteChunks(datakey, 0);
}
const item = {
source: options.deviceName || options.defaultDeviceName,
tstamp: Date.now(),
data,
};
const json = JSON.stringify(item);
const encoded = encode instanceof Function
? await encode(json)
: json;
? await encode(item)
: JSON.stringify(item);

// Chunkify taking into account QUOTA_BYTES_PER_ITEM:
// https://developer.chrome.com/extensions/storage#property-sync
Expand Down Expand Up @@ -1750,13 +1746,16 @@ vAPI.cloud = (( ) => {
i += 1;
}
encoded = encoded.join('');
const json = decode instanceof Function
? await decode(encoded)
: encoded;

let entry = null;
try {
entry = JSON.parse(json);
} catch(ex) {
if ( decode instanceof Function ) {
entry = await decode(encoded) || null;
}
if ( typeof entry === 'string' ) {
entry = JSON.parse(entry);
}
} catch(_) {
}
return entry;
};
Expand Down
157 changes: 74 additions & 83 deletions src/js/assets.js
Expand Up @@ -528,12 +528,12 @@ function getAssetSourceRegistry() {
assetSourceRegistryPromise = cacheStorage.get(
'assetSourceRegistry'
).then(bin => {
if (
bin instanceof Object &&
bin.assetSourceRegistry instanceof Object
) {
assetSourceRegistry = bin.assetSourceRegistry;
return assetSourceRegistry;
if ( bin instanceof Object ) {
if ( bin.assetSourceRegistry instanceof Object ) {
assetSourceRegistry = bin.assetSourceRegistry;
ubolog('Loaded assetSourceRegistry');
return assetSourceRegistry;
}
}
return assets.fetchText(
µb.assetsBootstrapLocation || µb.assetsJsonPath
Expand All @@ -543,6 +543,7 @@ function getAssetSourceRegistry() {
: assets.fetchText(µb.assetsJsonPath);
}).then(details => {
updateAssetSourceRegistry(details.content, true);
ubolog('Loaded assetSourceRegistry');
return assetSourceRegistry;
});
});
Expand Down Expand Up @@ -673,39 +674,27 @@ let assetCacheRegistryPromise;
let assetCacheRegistry = {};

function getAssetCacheRegistry() {
if ( assetCacheRegistryPromise === undefined ) {
assetCacheRegistryPromise = cacheStorage.get(
'assetCacheRegistry'
).then(bin => {
if (
bin instanceof Object &&
bin.assetCacheRegistry instanceof Object
) {
if ( Object.keys(assetCacheRegistry).length === 0 ) {
assetCacheRegistry = bin.assetCacheRegistry;
} else {
console.error(
'getAssetCacheRegistry(): assetCacheRegistry reassigned!'
);
if (
Object.keys(bin.assetCacheRegistry).sort().join() !==
Object.keys(assetCacheRegistry).sort().join()
) {
console.error(
'getAssetCacheRegistry(): assetCacheRegistry changes overwritten!'
);
}
}
}
return assetCacheRegistry;
});
if ( assetCacheRegistryPromise !== undefined ) {
return assetCacheRegistryPromise;
}

assetCacheRegistryPromise = cacheStorage.get(
'assetCacheRegistry'
).then(bin => {
if ( bin instanceof Object === false ) { return; }
if ( bin.assetCacheRegistry instanceof Object === false ) { return; }
if ( Object.keys(assetCacheRegistry).length !== 0 ) {
return console.error('getAssetCacheRegistry(): assetCacheRegistry reassigned!');
}
ubolog('Loaded assetCacheRegistry');
assetCacheRegistry = bin.assetCacheRegistry;
}).then(( ) =>
assetCacheRegistry
);
return assetCacheRegistryPromise;
}

const saveAssetCacheRegistry = (( ) => {
const save = function() {
const save = ( ) => {
timer.off();
cacheStorage.set({ assetCacheRegistry });
};
Expand All @@ -726,7 +715,9 @@ async function assetCacheRead(assetKey, updateReadTime = false) {
const reportBack = function(content) {
if ( content instanceof Blob ) { content = ''; }
const details = { assetKey, content };
if ( content === '' ) { details.error = 'ENOTFOUND'; }
if ( content === '' || content === undefined ) {
details.error = 'ENOTFOUND';
}
return details;
};

Expand All @@ -742,17 +733,11 @@ async function assetCacheRead(assetKey, updateReadTime = false) {
) + ' ms';
}

if (
bin instanceof Object === false ||
bin.hasOwnProperty(internalKey) === false
) {
return reportBack('');
}
if ( bin instanceof Object === false ) { return reportBack(''); }
if ( bin.hasOwnProperty(internalKey) === false ) { return reportBack(''); }

const entry = assetCacheRegistry[assetKey];
if ( entry === undefined ) {
return reportBack('');
}
if ( entry === undefined ) { return reportBack(''); }

entry.readTime = Date.now();
if ( updateReadTime ) {
Expand All @@ -762,34 +747,22 @@ async function assetCacheRead(assetKey, updateReadTime = false) {
return reportBack(bin[internalKey]);
}

async function assetCacheWrite(assetKey, details) {
let content = '';
let options = {};
if ( typeof details === 'string' ) {
content = details;
} else if ( details instanceof Object ) {
content = details.content || '';
options = details;
}

if ( content === '' ) {
async function assetCacheWrite(assetKey, content, options = {}) {
if ( content === '' || content === undefined ) {
return assetCacheRemove(assetKey);
}

const cacheDict = await getAssetCacheRegistry();
const { resourceTime, url } = options;

let entry = cacheDict[assetKey];
if ( entry === undefined ) {
entry = cacheDict[assetKey] = {};
}
entry.writeTime = entry.readTime = Date.now();
entry.resourceTime = options.resourceTime || 0;
if ( typeof options.url === 'string' ) {
entry.remoteURL = options.url;
}
cacheStorage.set({
assetCacheRegistry,
[`cache/${assetKey}`]: content
getAssetCacheRegistry().then(cacheDict => {
const entry = cacheDict[assetKey] || {};
cacheDict[assetKey] = entry;
entry.writeTime = entry.readTime = Date.now();
entry.resourceTime = resourceTime || 0;
if ( typeof url === 'string' ) {
entry.remoteURL = url;
}
cacheStorage.set({ assetCacheRegistry, [`cache/${assetKey}`]: content });
});

const result = { assetKey, content };
Expand All @@ -800,21 +773,31 @@ async function assetCacheWrite(assetKey, details) {
return result;
}

async function assetCacheRemove(pattern) {
async function assetCacheRemove(pattern, options = {}) {
const cacheDict = await getAssetCacheRegistry();
const removedEntries = [];
const removedContent = [];
for ( const assetKey in cacheDict ) {
if ( pattern instanceof RegExp && !pattern.test(assetKey) ) {
continue;
}
if ( typeof pattern === 'string' && assetKey !== pattern ) {
continue;
if ( pattern instanceof RegExp ) {
if ( pattern.test(assetKey) === false ) { continue; }
} else if ( typeof pattern === 'string' ) {
if ( assetKey !== pattern ) { continue; }
}
removedEntries.push(assetKey);
removedContent.push('cache/' + assetKey);
removedContent.push(`cache/${assetKey}`);
delete cacheDict[assetKey];
}
if ( options.janitor && pattern instanceof RegExp ) {
const re = new RegExp(
pattern.source.replace(/^\^/, 'cache\/'),
pattern.flags
);
const keys = await cacheStorage.keys(re);
for ( const key of keys ) {
removedContent.push(key);
ubolog(`Removing stray ${key}`);
}
}
if ( removedContent.length !== 0 ) {
await Promise.all([
cacheStorage.remove(removedContent),
Expand Down Expand Up @@ -980,8 +963,7 @@ assets.get = async function(assetKey, options = {}) {
}
if ( details.content === '' ) { continue; }
if ( reIsExternalPath.test(contentURL) && options.dontCache !== true ) {
assetCacheWrite(assetKey, {
content: details.content,
assetCacheWrite(assetKey, details.content, {
url: contentURL,
silent: options.silent === true,
});
Expand Down Expand Up @@ -1057,8 +1039,7 @@ async function getRemote(assetKey, options = {}) {
}

// Success
assetCacheWrite(assetKey, {
content: result.content,
assetCacheWrite(assetKey, result.content, {
url: contentURL,
resourceTime: result.resourceTime || 0,
});
Expand Down Expand Up @@ -1101,6 +1082,17 @@ assets.put = async function(assetKey, content) {

/******************************************************************************/

assets.toCache = async function(assetKey, content) {
return assetCacheWrite(assetKey, content);
};

assets.fromCache = async function(assetKey) {
const details = await assetCacheRead(assetKey);
return details && details.content;
};

/******************************************************************************/

assets.metadata = async function() {
await Promise.all([
getAssetSourceRegistry(),
Expand Down Expand Up @@ -1147,8 +1139,8 @@ assets.metadata = async function() {

assets.purge = assetCacheMarkAsDirty;

assets.remove = function(pattern) {
return assetCacheRemove(pattern);
assets.remove = function(...args) {
return assetCacheRemove(...args);
};

assets.rmrf = function() {
Expand Down Expand Up @@ -1300,8 +1292,7 @@ async function diffUpdater() {
'Diff-Path',
'Diff-Expires',
]);
assetCacheWrite(data.assetKey, {
content: data.text,
assetCacheWrite(data.assetKey, data.text, {
resourceTime: metadata.lastModified || 0,
});
metadata.diffUpdated = true;
Expand Down
3 changes: 2 additions & 1 deletion src/js/background.js
Expand Up @@ -56,6 +56,7 @@ const hiddenSettingsDefault = {
blockingProfiles: '11111/#F00 11010/#C0F 11001/#00F 00001',
cacheStorageAPI: 'unset',
cacheStorageCompression: true,
cacheStorageMultithread: 2,
cacheControlForFirefox1376932: 'no-cache, no-store, must-revalidate',
cloudStorageCompression: true,
cnameIgnoreList: 'unset',
Expand Down Expand Up @@ -181,7 +182,7 @@ const µBlock = { // jshint ignore:line
// Read-only
systemSettings: {
compiledMagic: 57, // Increase when compiled format changes
selfieMagic: 57, // Increase when selfie format changes
selfieMagic: 58, // Increase when selfie format changes
},

// https://github.com/uBlockOrigin/uBlock-issues/issues/759#issuecomment-546654501
Expand Down

8 comments on commit 086766a

@gwarser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheStorageMultithread other than 0, multiplies the allReadyAfter by two. On my system, from ~300 to ~600ms.

@gorhill
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your system specs? Is this with a selfie?

@gwarser
Copy link
Contributor

@gwarser gwarser commented on 086766a Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryzen 5 3600, Manjaro KDE, with selfie, Nightly.

uBlock Origin: 1.56.1b1
Firefox: 125
filterset (summary):
 network: 139706
 cosmetic: 171843
 scriptlet: 28070
 html: 2320
listset (total-discarded, last-updated):
 added:
  https://raw.githubusercontent.com/DandelionSprout/adfilt/master/LegitimateURLShortener.txt: 2589-167, 2h.53m
  https://raw.githubusercontent.com/FiltersHeroes/PolishAnnoyanceFilters/master/PPB.txt: 23370-65, 2h.53m
  https://raw.githubusercontent.com/gwarser/filter-lists/master/lan-block-strict.txt: 54-0, 2h.53m
  https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/google_duckduckgo/global.txt: 4907-0, 2h.53m
  adguard-spyware-url: 1467-106, 2h.53m
  adguard-social: 22171-64, 2h.53m
  fanboy-thirdparty_social: 68-0, 2h.53m
  easylist-annoyances: 4402-138, 2h.53m
  easylist-chat: 183-0, 2h.53m
  fanboy-cookiemonster: 66061-16299, 2h.53m
  easylist-newsletters: 7028-23, 2h.53m
  easylist-notifications: 2950-4, 2h.53m
  [4 lists not shown]: [too many]
 default:
  user-filters: 108-0, never
  ublock-filters: 37639-153, 26m Δ
  ublock-badware: 8009-14, 26m Δ
  ublock-privacy: 742-0, 26m Δ
  ublock-unbreak: 2280-1, 26m Δ
  ublock-quick-fixes: 106-1, 26m Δ
  easylist: 81757-633, 26m Δ
  easyprivacy: 50285-938, 26m Δ
  urlhaus-1: 8770-0, 2h.53m
  plowe-0: 3783-1, 2h.53m
filterset (user): [array of 114 redacted]
switchRuleset:
 added: [array of 35 redacted]
hostRuleset:
 added: [array of 717 redacted]
urlRuleset:
 added: [array of 21 redacted]
userSettings:
 advancedUserEnabled: true
 cloudStorageEnabled: true
hiddenSettings:
 autoCommentFilterTemplate: {{date}} {{url}}
 filterAuthorMode: true
 popupPanelDisabledSections: 2
 popupPanelLockedSections: 32
 userResourcesLocation: [redacted]
supportStats:
 allReadyAfter: 599 ms (selfie)
 maxAssetCacheWait: 429 ms
 cacheBackend: browser.storage.local

Main drive it's a gen4 NVME on gen3 interface.

@gorhill
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reloading filter lists when there is no selfie? Is 0 also better than 2?

@gwarser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting uBO without selfie:
2

13:06:22.497 [uBO] All ready 763 ms after launch console.js:38:13
13:06:31.288 [uBO] All ready 786 ms after launch console.js:38:13
13:06:38.505 [uBO] All ready 765 ms after launch console.js:38:13

0

13:07:00.674 [uBO] All ready 730 ms after launch console.js:38:13
13:07:04.799 [uBO] All ready 717 ms after launch console.js:38:13
13:07:10.075 [uBO] All ready 737 ms after launch console.js:38:13

@gwarser
Copy link
Contributor

@gwarser gwarser commented on 086766a Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting uBO with selfie (numbers from first message (300/600) were from restarting the whole browser):

0

13:07:09.409 [uBO] Selfie was removed console.js:38:13
13:09:10.445 [uBO] Selfie was created console.js:38:13
13:09:23.564 [uBO] Selfie ready 216 ms after launch console.js:38:13
13:09:23.573 [uBO] All ready 225 ms (selfie) after launch console.js:38:13
13:09:28.090 [uBO] Selfie ready 183 ms after launch console.js:38:13
13:09:28.096 [uBO] All ready 190 ms (selfie) after launch console.js:38:13
13:09:33.200 [uBO] Selfie ready 207 ms after launch console.js:38:13
13:09:33.207 [uBO] All ready 215 ms (selfie) after launch console.js:38:13

2

13:09:51.780 [uBO] Selfie ready 421 ms after launch console.js:38:13
13:09:51.788 [uBO] All ready 429 ms (selfie) after launch console.js:38:13
13:09:55.493 [uBO] Selfie ready 465 ms after launch console.js:38:13
13:09:55.500 [uBO] All ready 473 ms (selfie) after launch console.js:38:13
13:10:00.382 [uBO] Selfie ready 456 ms after launch console.js:38:13
13:10:00.390 [uBO] All ready 464 ms (selfie) after launch console.js:38:13

@gorhill
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if the two bigger chunks related to selfie ends up in the same thread. Currently deserialization jobs are assign on the number of jobs per thread with no regard to the size of what needs to be deserialized. I will fine-tune to mind deserialization size when assigning a thread for the job. Profiling would confirm this as it would show the two largest deserializations in the same worker.

@gwarser
Copy link
Contributor

@gwarser gwarser commented on 086766a Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.