Fix jobs dashboard + Pages deep links#11
Conversation
- Add a local (browser) library + Jobs dashboard for newly-requested YouTube URLs. - Remove demo-only onboarding behavior; unknown URLs get queued instead of erroring. - Support partial content entries (video/comments without analytics) for the playbook pipeline. - Fix MDX widget resolution via MDXProvider and add a GitHub Pages 404 redirect shim.
- Namespace GH Pages redirect param to avoid collisions. - Guard local library storage access for non-browser contexts. - Document partial-ingestion semantics and parallelize generator file checks.
There was a problem hiding this comment.
The GitHub Pages deep-link handling is close, but decodeURIComponent can throw on malformed input (hard-breaking app startup) and the 404 base-path logic may fail on user/org Pages sites. The local-library metadata hydration does not propagate abort/cancellation and may trigger overlapping oEmbed requests, causing unnecessary network usage and writes. The content index generator’s fileExists should verify isFile() to avoid generating invalid imports in edge cases. fetchYouTubeOEmbed uses an unnecessary double-cast that weakens maintainability.
Additional notes (3)
- Performance |
src/pages/JobsPage.tsx:63-78
TheuseEffectthat hydrates metadata depends onvideosand triggersPromise.all(...)for up to 5 missing entries. Because the effect re-runs whenevervideoschanges (including aftersetRefreshTick/setVideos), you can end up with overlapping hydration batches for the same items if state changes quickly.
This likely works, but it’s wasteful and can cause repeated oEmbed calls. Consider debouncing or tracking “in-flight” video IDs to avoid duplicate fetches.
- Maintainability |
src/lib/youtube.ts:36-54
fetchYouTubeOEmbeddouble-casts the JSON response:(await res.json()) as unknown as YouTubeOEmbedResponse. This is type-valid but it defeats type safety and makes it easy to accidentally rely on fields that aren’t present.
Since you already treat it as unknown logically, there’s no need for the extra cast—keep it as unknown and extract fields with runtime checks (which you already do).
- Performance |
scripts/generate-content-index.ts:122-122
fileExists()is implemented asstat()inside atry/catchand it’s called 3x per video entry in afor...ofloop. That makesgenerate-content-indexpotentially very slow (and IO-heavy) as the number of videos grows.
Since this script is deterministic and only needs to know which of a few known filenames exist, you can reduce filesystem round-trips by either:
- using
access()(cheaper intent) instead ofstat(), and/or - doing the checks in parallel per entry (
Promise.all), and/or - reading the directory once (
readdir) and checking membership.
Given this runs in CI/build, it’s worth keeping it efficient.
Summary of changes
Summary of changes
-
Added GitHub Pages deep-link support
- Introduced
public/404.htmlredirect shim that rewrites unknown paths into/?p=...so the SPA router can recover. - Updated
src/main.tsxto decodepandhistory.replaceStateto the intended route underimport.meta.env.BASE_URL.
- Introduced
-
Introduced a local “Jobs” workflow for newly-requested videos
- Added a new route and nav entry for
/jobs(src/App.tsx,src/components/NavBar.tsx). - Implemented
src/pages/JobsPage.tsxwhich stores requested YouTube URLs inlocalStorage, hydrates metadata via oEmbed, and shows capture/analysis status. - Added
src/lib/localLibrary.tsfor local persistence and metadata hydration.
- Added a new route and nav entry for
-
Enabled partial content entries (capture-first, analyze-later)
- Made
VideoContent.commentsandVideoContent.analyticsoptional (src/content/types.ts). - Updated the content index generator to conditionally import
comments.json,analytics.json, andreport.mdxif present (scripts/generate-content-index.ts). - Adjusted
LibraryPageandVideoAnalyticsPageto handle missing analytics/comments gracefully.
- Made
-
Fixed MDX component resolution
- Configured MDX compilation with
providerImportSource: '@mdx-js/react'(vite.config.ts) to support runtime widgets viaMDXProvider.
- Configured MDX compilation with
-
Updated onboarding flow
- Onboarding now queues unknown YouTube IDs into the local library and routes users to Jobs unless analytics already exist (
src/pages/OnboardingPage.tsx).
- Onboarding now queues unknown YouTube IDs into the local library and routes users to Jobs unless analytics already exist (
|
@CharlieHelps resolve all the suggestions |
|
Investigating this request now and starting a review of all open suggestions on PR #11 so I can likely apply them and update the branch accordingly.
|
Expand this to see my work.
|
|
Implemented the requested review suggestions and marked every review thread as resolved. Latest push: Changes
Verification# TypeScript
bun run typecheck
# ESLint
bun run lint
# Vite build (non-Pages)
bun run build
# Vite build (GitHub Pages)
GITHUB_PAGES=true GITHUB_REPOSITORY=hapticPaper/constructive bun run buildNote: I didn’t run Requested review from |
There was a problem hiding this comment.
The build-time index generator can silently degrade output when directory reads fail; --strict should likely treat this as a hard error to avoid masking broken content. The _redirect sanitization in src/main.tsx is brittle and may reject legitimate deep links due to an overly restrictive isSafePath regex. JobsPage still kicks off a non-abortable oEmbed hydration in addByInput(), which can waste network and write after unmount despite the abortable batch effect.
Additional notes (1)
- Maintainability |
src/lib/localLibrary.ts:145-178
hydrateLocalLibraryVideoMetadata()correctly reloads from storage after the network call to reduce clobbering, but it doesn’t special-case aborts. IffetchYouTubeOEmbedreturnsnullon abort (likely), this function will returnfalseindistinguishably from “no oEmbed exists / non-OK response”.
That makes it hard for callers (like JobsPage) to decide whether to retry soon vs. treat as a permanent miss.
Summary of changes
What changed
-
Content indexing + partial content support
scripts/generate-content-index.tsnow reads each video directory once viareaddir()(listFileNames) and conditionally generates imports/entries forcomments.json,analytics.json, andreport.mdx.src/content/types.tsupdatesVideoContentsocommentsandanalyticsare optional, enabling “capture first, analyze later” states.
-
Jobs dashboard + local (browser) library
- Added a new
/jobsroute (src/App.tsx) and nav link (src/components/NavBar.tsx). - Added
src/pages/JobsPage.tsxto manage a browser-local queue of requested YouTube videos, hydrate metadata via oEmbed, and show job stage. - Added
src/lib/localLibrary.tsto persist queued videos inlocalStorage, with best-effort storage access and oEmbed metadata hydration.
- Added a new
-
Onboarding now queues unknown videos instead of erroring
src/pages/OnboardingPage.tsxnow stores unknown YouTube IDs into the local library, triggers background metadata hydration, and routes to Jobs unless analytics are already present.
-
Analytics page handles missing artifacts
src/pages/VideoAnalyticsPage.tsxadds explicit “not ready yet” / “analysis pending” states and links back to Jobs.
-
GitHub Pages deep-link recovery
vite.config.tsadds an SPA404.htmlgenerator plugin (spa404Redirect) that is aware ofconfig.baseand redirects to/?_redirect=....src/main.tsxdecodes and sanitizes_redirectand useshistory.replaceStateto restore the intended SPA route.
-
YouTube oEmbed support
src/lib/youtube.tsaddsfetchYouTubeOEmbed(videoId, signal?)with runtime shape checks for returned JSON.
| async function listFileNames(absoluteDir: string): Promise<Set<string>> { | ||
| // 1 readdir per video directory (faster than multiple per-file stats). | ||
| try { | ||
| const entries = await readdir(absoluteDir, { withFileTypes: true }); | ||
| const out = new Set<string>(); | ||
| for (const entry of entries) { | ||
| if (entry.isFile()) out.add(entry.name); | ||
| } | ||
| return out; | ||
| } catch (error) { | ||
| process.stderr.write( | ||
| `Warning: Failed to read content directory ${absoluteDir}: ${ | ||
| error instanceof Error ? error.message : String(error) | ||
| }\n`, | ||
| ); | ||
| return new Set(); | ||
| } | ||
| } |
There was a problem hiding this comment.
listFileNames() returns an empty set on any readdir failure and the generator then silently treats comments.json / analytics.json / report.mdx as missing. In --strict mode you already fail on invalid video.json, but this path can still quietly produce a degraded content index if the directory is unreadable (permissions, transient FS errors, etc.). That’s a correctness issue because it can hide real problems behind “pending analysis” UX.
Given this script drives build-time imports, a directory read error is usually something you want to fail hard on in strict mode (or at least for video.json, which should always exist for validEntries).
Suggestion
Make directory read failures respect --strict, and/or verify video.json is present even when the directory listing fails.
Example:
async function listFileNames(absoluteDir: string, strict: boolean): Promise<Set<string>> {
try {
const entries = await readdir(absoluteDir, { withFileTypes: true });
const out = new Set<string>();
for (const entry of entries) if (entry.isFile()) out.add(entry.name);
return out;
} catch (error) {
const msg = `Failed to read content directory ${absoluteDir}: ${
error instanceof Error ? error.message : String(error)
}`;
if (strict) throw new Error(msg);
process.stderr.write(`Warning: ${msg}\n`);
return new Set();
}
}Then call await listFileNames(base, strict).
Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.
| const redirectPath = new URLSearchParams(window.location.search).get('_redirect'); | ||
| if (redirectPath) { | ||
| const base = import.meta.env.BASE_URL.replace(/\/$/, ''); | ||
| let decoded = redirectPath; | ||
| try { | ||
| decoded = decodeURIComponent(redirectPath); | ||
| } catch { | ||
| // ignore malformed encoding | ||
| } | ||
|
|
||
| const target = decoded.startsWith('/') ? decoded : `/${decoded}`; | ||
|
|
||
| const match = target.match(/^([^?#]*)(.*)$/); | ||
| const pathPart = match?.[1] ?? target; | ||
| const segments = pathPart.split('/').filter(Boolean); | ||
| const hasTraversal = segments.some((segment) => segment === '.' || segment === '..'); | ||
|
|
||
| const hasScheme = pathPart.includes('://'); | ||
| const isProtocolRelative = pathPart.startsWith('//'); | ||
| const isSafePath = /^\/[0-9A-Za-z_\-./]*$/.test(pathPart); | ||
|
|
||
| if (!isProtocolRelative && !hasScheme && !hasTraversal && isSafePath) { | ||
| window.history.replaceState(null, '', `${base}${target}`); | ||
| } |
There was a problem hiding this comment.
The redirect sanitization is too strict for common SPA routes that include percent-encoded characters or non-ASCII paths (e.g. encoded spaces, %2F-like segments, or unicode). Because you run isSafePath against pathPart (the decoded path), it will reject legitimate routes containing characters outside [0-9A-Za-z_\-./].
That can produce a confusing failure mode: deep links work for some routes but not others, with no fallback routing.
Also, decodeURIComponent is applied to the entire _redirect value; if the original target contains %2F etc., decoding changes semantics before validation. Usually you want to validate the raw string first (against ://////traversal) and only then decode or normalize in a limited way.
Suggestion
Relax the validation to allow any URL-safe path characters, or validate using URL parsing constrained to same-origin + base instead of a tight regex.
A practical approach:
- Reject protocol-relative (
//) and any scheme (^[a-zA-Z][a-zA-Z0-9+.-]*:) - Reject traversal segments on the decoded path
- Allow broader characters in the path by validating via
new URL(target, window.location.origin)and ensuringurl.origin === window.location.origin
Sketch:
const raw = new URLSearchParams(window.location.search).get('_redirect');
if (raw) {
let decoded = raw;
try { decoded = decodeURIComponent(raw); } catch {}
const target = decoded.startsWith('/') ? decoded : `/${decoded}`;
// reject obvious external redirects
if (/^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(target) || target.startsWith('//')) return;
const url = new URL(target, window.location.origin);
if (url.origin !== window.location.origin) return;
const segments = url.pathname.split('/').filter(Boolean);
if (segments.some((s) => s === '.' || s === '..')) return;
window.history.replaceState(null, '', `${base}${url.pathname}${url.search}${url.hash}`);
}Reply with "@CharlieHelps yes please" if you'd like me to add a commit with an updated sanitization strategy.
| const redirectPath = new URLSearchParams(window.location.search).get('_redirect'); | ||
| if (redirectPath) { | ||
| const base = import.meta.env.BASE_URL.replace(/\/$/, ''); | ||
| let decoded = redirectPath; | ||
| try { | ||
| decoded = decodeURIComponent(redirectPath); | ||
| } catch { | ||
| // ignore malformed encoding | ||
| } | ||
|
|
||
| const target = decoded.startsWith('/') ? decoded : `/${decoded}`; | ||
|
|
||
| const match = target.match(/^([^?#]*)(.*)$/); | ||
| const pathPart = match?.[1] ?? target; | ||
| const segments = pathPart.split('/').filter(Boolean); | ||
| const hasTraversal = segments.some((segment) => segment === '.' || segment === '..'); | ||
|
|
||
| const hasScheme = pathPart.includes('://'); | ||
| const isProtocolRelative = pathPart.startsWith('//'); | ||
| const isSafePath = /^\/[0-9A-Za-z_\-./]*$/.test(pathPart); | ||
|
|
||
| if (!isProtocolRelative && !hasScheme && !hasTraversal && isSafePath) { | ||
| window.history.replaceState(null, '', `${base}${target}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
The redirect safety checks are headed in the right direction, but window.history.replaceState(null, '', ${base}${target}) uses target (which includes query/hash) even though validation is performed only on pathPart.
That means a value like /_redirect=%2Fjobs%3Ffoo=%0a... could still inject unexpected characters into the URL via the query/hash even if the path is “safe”. It won’t execute script, but it can create confusing/invalid locations and can interfere with router matching.
Also, the regex allows . in path segments, and you block . and .. segments, which is good, but you don’t prevent consecutive slashes or empty segments (not necessarily bad, but can create odd paths).
Suggestion
Sanitize/validate the full target (path + optional query/hash) before replaceState, or reconstruct the URL from validated components:
const match = target.match(/^([^?#]*)(\?[^#]*)?(#.*)?$/);
const pathPart = match?.[1] ?? '/';
const queryPart = match?.[2] ?? '';
const hashPart = match?.[3] ?? '';
// validate pathPart strictly
const isSafeQuery = /^[\w\-._~%!$&'()*+,;=:@/?]*$/.test(queryPart);
const isSafeHash = /^[\w\-._~%!$&'()*+,;=:@/?]*$/.test(hashPart);
if (isSafePath && isSafeQuery && isSafeHash && !hasScheme && !isProtocolRelative && !hasTraversal) {
window.history.replaceState(null, '', `${base}${pathPart}${queryPart}${hashPart}`);
}If you’d rather keep it simpler: drop query/hash from target entirely and only redirect to the path.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit applying one of these approaches.
| function addByInput(): void { | ||
| setError(null); | ||
| const videoId = extractYouTubeVideoId(input); | ||
| if (!videoId) { | ||
| setError('Paste a YouTube link or an 11-character video id.'); | ||
| return; | ||
| } | ||
|
|
||
| const platform: Platform = 'youtube'; | ||
| const videoUrl = `https://www.youtube.com/watch?v=${videoId}`; | ||
| upsertLocalLibraryVideo({ platform, videoId, videoUrl }); | ||
| setVideos(listLocalLibraryVideos()); | ||
|
|
||
| const key = `${platform}:${videoId}`; | ||
| if (!oembedInFlight.current.has(key)) { | ||
| oembedInFlight.current.add(key); | ||
| void hydrateLocalLibraryVideoMetadata(platform, videoId) | ||
| .then((updated) => { | ||
| if (updated) setVideos(listLocalLibraryVideos()); | ||
| }) | ||
| .finally(() => { | ||
| oembedInFlight.current.delete(key); | ||
| }); | ||
| } | ||
|
|
||
| setInput(''); | ||
| setSearchParams({ video: `${platform}:${videoId}` }); | ||
| } |
There was a problem hiding this comment.
JobsPage calls hydrateLocalLibraryVideoMetadata(platform, videoId) in addByInput() without passing an AbortSignal. The effect-based batch hydration uses an AbortController, but this immediate hydration path is not cancellable and can continue after navigation/unmount.
This reintroduces the exact class of wasted in-flight requests you just addressed for the batch effect (and can still write to localStorage after the user leaves the page).
Suggestion
Use the same abort mechanism for the immediate hydration started by addByInput().
Options:
-
Remove the immediate hydration entirely and rely on the existing batched effect (simplest).
-
Keep it, but create a page-level
AbortControllerstored in a ref and abort on unmount:
const pageAbort = useRef<AbortController | null>(null);
useEffect(() => {
pageAbort.current = new AbortController();
return () => pageAbort.current?.abort();
}, []);
// in addByInput
void hydrateLocalLibraryVideoMetadata(platform, videoId, pageAbort.current?.signal)Reply with "@CharlieHelps yes please" if you'd like me to add a commit with the simplest approach (remove immediate hydration and rely on the batched effect).
| function spa404Redirect(): Plugin { | ||
| let config: ResolvedConfig | null = null; | ||
|
|
||
| return { | ||
| name: 'constructive-github-pages-404', | ||
| apply: 'build', | ||
| configResolved(resolvedConfig) { | ||
| config = resolvedConfig; | ||
| }, | ||
| async closeBundle() { | ||
| if (!config) return; | ||
|
|
||
| const outDir = config.build.outDir; | ||
| const file = path.join(outDir, '404.html'); | ||
|
|
||
| const base = config.base; | ||
| const normalizedBase = base.endsWith('/') ? base : `${base}/`; | ||
|
|
||
| const html = `<!doctype html> | ||
| <html lang="en"> | ||
| <head> | ||
| <meta charset="UTF-8" /> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
| <title>Constructive</title> | ||
| </head> | ||
| <body> | ||
| <script> | ||
| (function () { | ||
| var l = window.location; | ||
| var base = ${JSON.stringify(normalizedBase)}; | ||
|
|
||
| var rel = l.pathname.startsWith(base) ? '/' + l.pathname.slice(base.length) : l.pathname; | ||
| var target = rel + l.search + l.hash; | ||
|
|
||
| l.replace(base + '?_redirect=' + encodeURIComponent(target)); | ||
| })(); | ||
| </script> | ||
| </body> | ||
| </html> | ||
| `; | ||
|
|
||
| await writeFile(file, html, 'utf8'); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
spa404Redirect() always writes 404.html to config.build.outDir, but it doesn’t ensure the directory exists and it doesn’t guard against write failures. In some build environments (custom outDir, permissions, parallel builds), this can fail the entire build late in closeBundle().
Given this is a convenience artifact, you probably want a best-effort behavior with explicit diagnostics rather than a hard failure (or at least a clearer error).
Suggestion
Wrap the writeFile call with a try/catch and emit a clear warning/error message (and optionally mkdir(outDir, { recursive: true }) before writing). For example:
import { mkdir, writeFile } from 'node:fs/promises';
async closeBundle() {
if (!config) return;
const outDir = config.build.outDir;
const file = path.join(outDir, '404.html');
try {
await mkdir(outDir, { recursive: true });
await writeFile(file, html, 'utf8');
} catch (err) {
// decide whether this should fail build or warn
console.warn('[spa404Redirect] Failed to write 404.html:', err);
}
}Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.
Fixes the demo-only onboarding behavior by queuing new YouTube URLs into a browser-local library + Jobs dashboard, and closes the remaining GitHub Pages/MDX runtime gaps so analytics pages render cleanly.
Changes
/jobsas the primary “work in progress” view: new URLs are stored locally (this browser) and show capture/analysis status.video.json+comments.jsonwithoutanalytics.json/report.mdx) so the nightly playbook can fill in analysis later.<Callout />) by compiling MDX withproviderImportSource.dist/404.htmlat build time (based on Vitebase) + hardensrc/main.tsxredirect decoding so GitHub Pages deep links land in the SPA router.Verification
Closes #10