Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix config caching #1129

Merged
merged 3 commits into from Mar 23, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
19 changes: 0 additions & 19 deletions docs/display-race.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed these two test pages were inadvertently committed.

This file was deleted.

34 changes: 0 additions & 34 deletions docs/number.md

This file was deleted.

27 changes: 24 additions & 3 deletions src/config.ts
@@ -1,3 +1,4 @@
import {createHash} from "node:crypto";
import {existsSync, readFileSync} from "node:fs";
import {stat} from "node:fs/promises";
import op from "node:path";
Expand All @@ -8,7 +9,7 @@ import type MarkdownIt from "markdown-it";
import {LoaderResolver} from "./dataloader.js";
import {visitMarkdownFiles} from "./files.js";
import {formatIsoDate, formatLocaleDate} from "./format.js";
import {createMarkdownIt, parseMarkdown} from "./markdown.js";
import {createMarkdownIt, parseMarkdownMetadata} from "./markdown.js";
import {isAssetPath, parseRelativeUrl, resolvePath} from "./path.js";
import {resolveTheme} from "./theme.js";

Expand Down Expand Up @@ -88,18 +89,29 @@ export async function readDefaultConfig(root?: string): Promise<Config> {
return normalizeConfig(await importConfig(tsPath), root);
}

let cachedPages: {key: string; pages: Page[]} | null = null;
Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m only storing a single value (the most recent value) in the cache for readPages, just to keep things simple. I don’t expect that there would be more than one set of pages needed simultaneously.


function readPages(root: string, md: MarkdownIt): Page[] {
const pages: Page[] = [];
const files: {file: string; source: string}[] = [];
const hash = createHash("sha256");
for (const file of visitMarkdownFiles(root)) {
if (file === "index.md" || file === "404.md") continue;
const source = readFileSync(join(root, file), "utf8");
const parsed = parseMarkdown(source, {path: file, md});
files.push({file, source});
hash.update(file).update(source);
Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried hashing the mtime of the file instead of the source, which is slightly faster in the cached case. But I don’t think it’s fast enough that it matters, and this felt simpler.

}
const key = hash.digest("hex");
if (cachedPages?.key === key) return cachedPages.pages;
const pages: Page[] = [];
for (const {file, source} of files) {
const parsed = parseMarkdownMetadata(source, {path: file, md});
if (parsed?.data?.draft) continue;
const name = basename(file, ".md");
const page = {path: join("/", dirname(file), name), name: parsed.title ?? "Untitled"};
if (name === "index") pages.unshift(page);
else pages.push(page);
}
cachedPages = {key, pages};
return pages;
}

Expand All @@ -109,7 +121,15 @@ export function setCurrentDate(date = new Date()): void {
currentDate = date;
}

// The config is used as a cache key for other operations; for example the pages
// are used as a cache key for search indexing and the previous & next links in
// the footer. When given the same spec (because import returned the same
// module), we want to return the same Config instance.
const configCache = new WeakMap<any, Config>();

export function normalizeConfig(spec: any = {}, defaultRoot = "docs"): Config {
const cachedConfig = configCache.get(spec);
if (cachedConfig) return cachedConfig;
Comment on lines +131 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer this pattern over:

  if (configCache.has(spec)) return configCache.get(spec);

For performance maybe?

Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TypeScript. (Though, it could be faster, too.)

Previously you had declared the WeakMap without types, so keys and values were type any by default. In this case TypeScript considers the return value of map.get as any, too, and hence doesn’t require that it returns a defined value and allows map.get(key).property even though it is unsafe.

When we declare the cache as WeakMap<Config["pages"], {json: string; freshUntil: number}>, then TypeScript will enforce type checking because map.get(key) now returns {json: string; freshUntil: number} | undefined. So if we don’t use the pattern above, we get an Object is possibly 'undefined' type error:

Screenshot 2024-03-23 at 9 07 57 AM

TypeScript isn’t smart enough to know that if map.has(key) returns true, then a subsequent map.get(key) with the same key returns a defined value. (There’s no way to declare such a relationship in an interface, at least not yet.) So as a type-safe alternative, we call map.get(key) first, and then test the returned value which can’t change, and hence TypeScript knows that the value is defined.

let {
root = defaultRoot,
output = "dist",
Expand Down Expand Up @@ -168,6 +188,7 @@ export function normalizeConfig(spec: any = {}, defaultRoot = "docs"): Config {
};
if (pages === undefined) Object.defineProperty(config, "pages", {get: () => readPages(root, md)});
if (sidebar === undefined) Object.defineProperty(config, "sidebar", {get: () => config.pages.length > 0});
configCache.set(spec, config);
return config;
}

Expand Down
5 changes: 5 additions & 0 deletions src/files.ts
Expand Up @@ -52,6 +52,11 @@ export function* visitMarkdownFiles(root: string): Generator<string> {
export function* visitFiles(root: string): Generator<string> {
const visited = new Set<number>();
const queue: string[] = [(root = normalize(root))];
try {
visited.add(statSync(join(root, ".observablehq")).ino);
} catch {
// ignore the .observablehq directory, if it exists
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that if you have a .md file in your docs/.observablehq/cache, it was being picked up by the default pages. We don’t want that, so let’s just ignore everything in .observablehq under the root.

}
for (const path of queue) {
const status = statSync(path);
if (status.isDirectory()) {
Expand Down
10 changes: 10 additions & 0 deletions src/markdown.ts
Expand Up @@ -343,6 +343,16 @@ export function parseMarkdown(input: string, options: ParseOptions): MarkdownPag
};
}

/** Like parseMarkdown, but optimized to return only metadata. */
export function parseMarkdownMetadata(input: string, options: ParseOptions): Pick<MarkdownPage, "data" | "title"> {
const {md, path} = options;
const {content, data} = matter(input, {});
return {
data: isEmpty(data) ? null : data,
title: data.title ?? findTitle(md.parse(content, {code: [], startLine: 0, currentLine: 0, path})) ?? null
};
}

function getHtml(
key: "head" | "header" | "footer",
data: Record<string, any>,
Expand Down
13 changes: 7 additions & 6 deletions src/search.ts
Expand Up @@ -9,8 +9,8 @@ import {parseMarkdown} from "./markdown.js";
import {faint, strikethrough} from "./tty.js";

// Avoid reindexing too often in preview.
const indexCache = new WeakMap();
const reindexingDelay = 10 * 60 * 1000; // 10 minutes
const indexCache = new WeakMap<Config["pages"], {json: string; freshUntil: number}>();
const reindexDelay = 10 * 60 * 1000; // 10 minutes

export interface SearchIndexEffects {
logger: Logger;
Expand All @@ -21,15 +21,16 @@ const defaultEffects: SearchIndexEffects = {logger: console};
const indexOptions = {
fields: ["title", "text", "keywords"],
storeFields: ["title"],
processTerm(term) {
return term.match(/\p{N}/gu)?.length > 6 ? null : term.slice(0, 15).toLowerCase(); // fields to return with search results
processTerm(term: string) {
return (term.match(/\p{N}/gu)?.length ?? 0) > 6 ? null : term.slice(0, 15).toLowerCase(); // fields to return with search results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving this comment to line 22 (the issue predates this PR, I just happened to see it here).

}
};

export async function searchIndex(config: Config, effects = defaultEffects): Promise<string> {
const {root, pages, search, md} = config;
if (!search) return "{}";
if (indexCache.has(config) && indexCache.get(config).freshUntil > +new Date()) return indexCache.get(config).json;
const cached = indexCache.get(pages);
if (cached && cached.freshUntil > Date.now()) return cached.json;

// Get all the listed pages (which are indexed by default)
const pagePaths = new Set(["/index"]);
Expand Down Expand Up @@ -84,7 +85,7 @@ export async function searchIndex(config: Config, effects = defaultEffects): Pro
)
);

indexCache.set(config, {json, freshUntil: +new Date() + reindexingDelay});
indexCache.set(pages, {json, freshUntil: Date.now() + reindexDelay});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is keyed by the pages instead of the config so that if the set of pages change, the search index is recomputed.

return json;
}

Expand Down
2 changes: 1 addition & 1 deletion test/build-test.ts
Expand Up @@ -42,7 +42,7 @@ describe("build", () => {

await rm(actualDir, {recursive: true, force: true});
if (generate) console.warn(`! generating ${expectedDir}`);
const config = Object.assign(await readConfig(undefined, path), {output: outputDir});
const config = {...(await readConfig(undefined, path)), output: outputDir};
await build({config, addPublic}, new TestEffects(outputDir));

// In the addPublic case, we don’t want to test the contents of the public
Expand Down
6 changes: 3 additions & 3 deletions test/config-test.ts
Expand Up @@ -4,7 +4,7 @@ import {normalizeConfig as config, mergeToc, readConfig, setCurrentDate} from ".
import {LoaderResolver} from "../src/dataloader.js";

describe("readConfig(undefined, root)", () => {
before(() => setCurrentDate(new Date("2024-01-11T01:02:03")));
before(() => setCurrentDate(new Date("2024-01-10T16:00:00")));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is to match the build tests — since the config is now cached, we want the footer to have the same data, so the behavior of the tests is the same independent of order.

it("imports the config file at the specified root", async () => {
const {md, loaders, ...config} = await readConfig(undefined, "test/input/build/config");
assert(md instanceof MarkdownIt);
Expand All @@ -28,7 +28,7 @@ describe("readConfig(undefined, root)", () => {
head: "",
header: "",
footer:
'Built with <a href="https://observablehq.com/" target="_blank">Observable</a> on <a title="2024-01-11T01:02:03">Jan 11, 2024</a>.',
'Built with <a href="https://observablehq.com/" target="_blank">Observable</a> on <a title="2024-01-10T16:00:00">Jan 10, 2024</a>.',
deploy: {
workspace: "acme",
project: "bi"
Expand All @@ -54,7 +54,7 @@ describe("readConfig(undefined, root)", () => {
head: "",
header: "",
footer:
'Built with <a href="https://observablehq.com/" target="_blank">Observable</a> on <a title="2024-01-11T01:02:03">Jan 11, 2024</a>.',
'Built with <a href="https://observablehq.com/" target="_blank">Observable</a> on <a title="2024-01-10T16:00:00">Jan 10, 2024</a>.',
deploy: null,
search: false
});
Expand Down