Skip to content

Fix jobs dashboard + Pages deep links#11

Merged
hapticPaper merged 6 commits intomainfrom
charlie/issue-10-local-library-jobs
Dec 29, 2025
Merged

Fix jobs dashboard + Pages deep links#11
hapticPaper merged 6 commits intomainfrom
charlie/issue-10-local-library-jobs

Conversation

@charliecreates
Copy link
Copy Markdown

@charliecreates charliecreates bot commented Dec 29, 2025

Fixes the demo-only onboarding behavior by queuing new YouTube URLs into a browser-local library + Jobs dashboard, and closes the remaining GitHub Pages/MDX runtime gaps so analytics pages render cleanly.

Changes

  • Add /jobs as the primary “work in progress” view: new URLs are stored locally (this browser) and show capture/analysis status.
  • Update onboarding so unknown YouTube URLs get added + routed to Jobs (no more “demo library” dead-end).
  • Allow partial content directories (e.g. video.json + comments.json without analytics.json/report.mdx) so the nightly playbook can fill in analysis later.
  • Fix MDX widget resolution (e.g. <Callout />) by compiling MDX with providerImportSource.
  • Generate dist/404.html at build time (based on Vite base) + harden src/main.tsx redirect decoding so GitHub Pages deep links land in the SPA router.

Verification

bun run typecheck
bun run lint
bun run build
GITHUB_PAGES=true GITHUB_REPOSITORY=hapticPaper/constructive bun run build

Closes #10

- Add a local (browser) library + Jobs dashboard for newly-requested YouTube URLs.
- Remove demo-only onboarding behavior; unknown URLs get queued instead of erroring.
- Support partial content entries (video/comments without analytics) for the playbook pipeline.
- Fix MDX widget resolution via MDXProvider and add a GitHub Pages 404 redirect shim.
- Namespace GH Pages redirect param to avoid collisions.
- Guard local library storage access for non-browser contexts.
- Document partial-ingestion semantics and parallelize generator file checks.
Copy link
Copy Markdown
Author

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub Pages deep-link handling is close, but decodeURIComponent can throw on malformed input (hard-breaking app startup) and the 404 base-path logic may fail on user/org Pages sites. The local-library metadata hydration does not propagate abort/cancellation and may trigger overlapping oEmbed requests, causing unnecessary network usage and writes. The content index generator’s fileExists should verify isFile() to avoid generating invalid imports in edge cases. fetchYouTubeOEmbed uses an unnecessary double-cast that weakens maintainability.

Additional notes (3)
  • Performance | src/pages/JobsPage.tsx:63-78
    The useEffect that hydrates metadata depends on videos and triggers Promise.all(...) for up to 5 missing entries. Because the effect re-runs whenever videos changes (including after setRefreshTick/setVideos), you can end up with overlapping hydration batches for the same items if state changes quickly.

This likely works, but it’s wasteful and can cause repeated oEmbed calls. Consider debouncing or tracking “in-flight” video IDs to avoid duplicate fetches.

  • Maintainability | src/lib/youtube.ts:36-54
    fetchYouTubeOEmbed double-casts the JSON response: (await res.json()) as unknown as YouTubeOEmbedResponse. This is type-valid but it defeats type safety and makes it easy to accidentally rely on fields that aren’t present.

Since you already treat it as unknown logically, there’s no need for the extra cast—keep it as unknown and extract fields with runtime checks (which you already do).

  • Performance | scripts/generate-content-index.ts:122-122
    fileExists() is implemented as stat() inside a try/catch and it’s called 3x per video entry in a for...of loop. That makes generate-content-index potentially very slow (and IO-heavy) as the number of videos grows.

Since this script is deterministic and only needs to know which of a few known filenames exist, you can reduce filesystem round-trips by either:

  • using access() (cheaper intent) instead of stat(), and/or
  • doing the checks in parallel per entry (Promise.all), and/or
  • reading the directory once (readdir) and checking membership.

Given this runs in CI/build, it’s worth keeping it efficient.

Summary of changes

Summary of changes

  • Added GitHub Pages deep-link support

    • Introduced public/404.html redirect shim that rewrites unknown paths into /?p=... so the SPA router can recover.
    • Updated src/main.tsx to decode p and history.replaceState to the intended route under import.meta.env.BASE_URL.
  • Introduced a local “Jobs” workflow for newly-requested videos

    • Added a new route and nav entry for /jobs (src/App.tsx, src/components/NavBar.tsx).
    • Implemented src/pages/JobsPage.tsx which stores requested YouTube URLs in localStorage, hydrates metadata via oEmbed, and shows capture/analysis status.
    • Added src/lib/localLibrary.ts for local persistence and metadata hydration.
  • Enabled partial content entries (capture-first, analyze-later)

    • Made VideoContent.comments and VideoContent.analytics optional (src/content/types.ts).
    • Updated the content index generator to conditionally import comments.json, analytics.json, and report.mdx if present (scripts/generate-content-index.ts).
    • Adjusted LibraryPage and VideoAnalyticsPage to handle missing analytics/comments gracefully.
  • Fixed MDX component resolution

    • Configured MDX compilation with providerImportSource: '@mdx-js/react' (vite.config.ts) to support runtime widgets via MDXProvider.
  • Updated onboarding flow

    • Onboarding now queues unknown YouTube IDs into the local library and routes users to Jobs unless analytics already exist (src/pages/OnboardingPage.tsx).

@charliecreates charliecreates bot removed the request for review from CharlieHelps December 29, 2025 15:09
@hapticPaper
Copy link
Copy Markdown
Owner

@CharlieHelps resolve all the suggestions

@charliecreates
Copy link
Copy Markdown
Author

Investigating this request now and starting a review of all open suggestions on PR #11 so I can likely apply them and update the branch accordingly.

Updated: Dec 29, 15:19 UTC
I’m working and can’t be interrupted or see replies—feel free to start another task with me while I finish.

@charliecreates
Copy link
Copy Markdown
Author

charliecreates bot commented Dec 29, 2025

Expand this to see my work.
  • Previously completed a full pass over review feedback on PR #11, updating scripts/generate-content-index.ts, src/components/NavBar.tsx, src/content/types.ts, src/lib/localLibrary.ts, src/lib/youtube.ts, src/main.tsx, multiple src/pages/*.tsx files, vite.config.ts, and removing public/404.html to align with earlier suggestions.
  • Previously re-ran bun run typecheck && bun run lint && bun run build && GITHUB_PAGES=true GITHUB_REPOSITORY=hapticPaper/constructive bun run build plus bunx prettier -c to confirm the branch is type-safe, lint-clean, buildable (including content:generate / scripts/generate-content-index.ts), and formatted correctly, with only the known large-chunk warnings from Vite.
  • Previously revisited the maintainer’s remaining comments to refine vite.config.ts, src/lib/localLibrary.ts, and src/main.tsx, focusing on clearer SPA 404 handling, safer redirect normalization, and explicit local library/oEmbed behavior.
  • Previously implemented a spa404Redirect plugin in vite.config.ts that always runs in builds, derives normalizedBase from config.base, and writes a base-aware 404.html that redirects via window.location without depending on GITHUB_PAGES-specific flags.
  • Previously augmented src/lib/localLibrary.ts so loadLibrary and persistLibrary emit guarded console.warn diagnostics under import.meta.env.DEV when storage access fails, while still treating local state as best-effort and falling back to an empty library when corrupted.
  • Previously hardened _redirect support in src/main.tsx by splitting the decoded redirect into path/query, validating against protocol-relative URLs and ://, and rejecting targets with . or .. path segments before calling window.history.replaceState, then tightening this further with an isSafePath whitelist and documenting the *.github.io assumption in follow-up commits.
  • Previously refactored src/pages/JobsPage.tsx to remove refreshTick entirely so the oEmbed hydration effect, add/remove flows, and metadata hydration call setVideos(listLocalLibraryVideos()) directly with an active guard in the effect cleanup.
  • Committed and pushed the JobsPage refresh refactor and stricter redirect handling in 34907e4 on branch charlie/issue-10-local-library-jobs, adjusting src/pages/JobsPage.tsx and src/main.tsx.
  • Requested a fresh review on PR #11 from CharlieHelps via gh pr edit 11 --add-reviewer so the new changes get an explicit pass.
  • Queried unresolved review threads via gh api graphql to list open comments for public/404.html, src/lib/localLibrary.ts, src/pages/JobsPage.tsx, src/main.tsx, and scripts/generate-content-index.ts so I can resolve them once the maintainer confirms the suggestions are fully addressed.

@charliecreates
Copy link
Copy Markdown
Author

#11 (comment)

Implemented the requested review suggestions and marked every review thread as resolved. Latest push: 34907e4.

Changes
  • GitHub Pages deep links:
    • Generate dist/404.html at build time from Vite base (handles repo Pages + *.github.io user/org Pages).
    • Harden _redirect handling in src/main.tsx (safe decode + path sanitization).
  • Jobs/local library hydration:
    • Thread AbortSignal through hydrateLocalLibraryVideoMetadata and cancel in-flight oEmbed fetches.
    • De-dupe in-flight oEmbed requests; avoid overlapping batches; update Jobs state directly (no extra refresh ticks).
    • Gracefully handle localStorage read/write failures (dev-only warnings).
  • Content index generator:
    • Avoid repeated stat() calls by listing each video directory once; only treat files as present; warn on readdir failures.
  • fetchYouTubeOEmbed: remove the double-cast and keep parsing guarded via runtime checks.
Verification
# TypeScript
bun run typecheck

# ESLint
bun run lint

# Vite build (non-Pages)
bun run build

# Vite build (GitHub Pages)
GITHUB_PAGES=true GITHUB_REPOSITORY=hapticPaper/constructive bun run build

Note: I didn’t run bun run format because prettier -c . is currently failing on main due to preexisting formatting issues outside this PR; I did run Prettier on the PR-changed files while iterating.

Requested review from @CharlieHelps and resolved all review threads.

Copy link
Copy Markdown
Author

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build-time index generator can silently degrade output when directory reads fail; --strict should likely treat this as a hard error to avoid masking broken content. The _redirect sanitization in src/main.tsx is brittle and may reject legitimate deep links due to an overly restrictive isSafePath regex. JobsPage still kicks off a non-abortable oEmbed hydration in addByInput(), which can waste network and write after unmount despite the abortable batch effect.

Additional notes (1)
  • Maintainability | src/lib/localLibrary.ts:145-178
    hydrateLocalLibraryVideoMetadata() correctly reloads from storage after the network call to reduce clobbering, but it doesn’t special-case aborts. If fetchYouTubeOEmbed returns null on abort (likely), this function will return false indistinguishably from “no oEmbed exists / non-OK response”.

That makes it hard for callers (like JobsPage) to decide whether to retry soon vs. treat as a permanent miss.

Summary of changes

What changed

  • Content indexing + partial content support

    • scripts/generate-content-index.ts now reads each video directory once via readdir() (listFileNames) and conditionally generates imports/entries for comments.json, analytics.json, and report.mdx.
    • src/content/types.ts updates VideoContent so comments and analytics are optional, enabling “capture first, analyze later” states.
  • Jobs dashboard + local (browser) library

    • Added a new /jobs route (src/App.tsx) and nav link (src/components/NavBar.tsx).
    • Added src/pages/JobsPage.tsx to manage a browser-local queue of requested YouTube videos, hydrate metadata via oEmbed, and show job stage.
    • Added src/lib/localLibrary.ts to persist queued videos in localStorage, with best-effort storage access and oEmbed metadata hydration.
  • Onboarding now queues unknown videos instead of erroring

    • src/pages/OnboardingPage.tsx now stores unknown YouTube IDs into the local library, triggers background metadata hydration, and routes to Jobs unless analytics are already present.
  • Analytics page handles missing artifacts

    • src/pages/VideoAnalyticsPage.tsx adds explicit “not ready yet” / “analysis pending” states and links back to Jobs.
  • GitHub Pages deep-link recovery

    • vite.config.ts adds an SPA 404.html generator plugin (spa404Redirect) that is aware of config.base and redirects to /?_redirect=....
    • src/main.tsx decodes and sanitizes _redirect and uses history.replaceState to restore the intended SPA route.
  • YouTube oEmbed support

    • src/lib/youtube.ts adds fetchYouTubeOEmbed(videoId, signal?) with runtime shape checks for returned JSON.

Comment on lines +130 to +147
async function listFileNames(absoluteDir: string): Promise<Set<string>> {
// 1 readdir per video directory (faster than multiple per-file stats).
try {
const entries = await readdir(absoluteDir, { withFileTypes: true });
const out = new Set<string>();
for (const entry of entries) {
if (entry.isFile()) out.add(entry.name);
}
return out;
} catch (error) {
process.stderr.write(
`Warning: Failed to read content directory ${absoluteDir}: ${
error instanceof Error ? error.message : String(error)
}\n`,
);
return new Set();
}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listFileNames() returns an empty set on any readdir failure and the generator then silently treats comments.json / analytics.json / report.mdx as missing. In --strict mode you already fail on invalid video.json, but this path can still quietly produce a degraded content index if the directory is unreadable (permissions, transient FS errors, etc.). That’s a correctness issue because it can hide real problems behind “pending analysis” UX.

Given this script drives build-time imports, a directory read error is usually something you want to fail hard on in strict mode (or at least for video.json, which should always exist for validEntries).

Suggestion

Make directory read failures respect --strict, and/or verify video.json is present even when the directory listing fails.

Example:

async function listFileNames(absoluteDir: string, strict: boolean): Promise<Set<string>> {
  try {
    const entries = await readdir(absoluteDir, { withFileTypes: true });
    const out = new Set<string>();
    for (const entry of entries) if (entry.isFile()) out.add(entry.name);
    return out;
  } catch (error) {
    const msg = `Failed to read content directory ${absoluteDir}: ${
      error instanceof Error ? error.message : String(error)
    }`;
    if (strict) throw new Error(msg);
    process.stderr.write(`Warning: ${msg}\n`);
    return new Set();
  }
}

Then call await listFileNames(base, strict).

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.

Comment on lines +8 to +31
const redirectPath = new URLSearchParams(window.location.search).get('_redirect');
if (redirectPath) {
const base = import.meta.env.BASE_URL.replace(/\/$/, '');
let decoded = redirectPath;
try {
decoded = decodeURIComponent(redirectPath);
} catch {
// ignore malformed encoding
}

const target = decoded.startsWith('/') ? decoded : `/${decoded}`;

const match = target.match(/^([^?#]*)(.*)$/);
const pathPart = match?.[1] ?? target;
const segments = pathPart.split('/').filter(Boolean);
const hasTraversal = segments.some((segment) => segment === '.' || segment === '..');

const hasScheme = pathPart.includes('://');
const isProtocolRelative = pathPart.startsWith('//');
const isSafePath = /^\/[0-9A-Za-z_\-./]*$/.test(pathPart);

if (!isProtocolRelative && !hasScheme && !hasTraversal && isSafePath) {
window.history.replaceState(null, '', `${base}${target}`);
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redirect sanitization is too strict for common SPA routes that include percent-encoded characters or non-ASCII paths (e.g. encoded spaces, %2F-like segments, or unicode). Because you run isSafePath against pathPart (the decoded path), it will reject legitimate routes containing characters outside [0-9A-Za-z_\-./].

That can produce a confusing failure mode: deep links work for some routes but not others, with no fallback routing.

Also, decodeURIComponent is applied to the entire _redirect value; if the original target contains %2F etc., decoding changes semantics before validation. Usually you want to validate the raw string first (against ://////traversal) and only then decode or normalize in a limited way.

Suggestion

Relax the validation to allow any URL-safe path characters, or validate using URL parsing constrained to same-origin + base instead of a tight regex.

A practical approach:

  • Reject protocol-relative (//) and any scheme (^[a-zA-Z][a-zA-Z0-9+.-]*:)
  • Reject traversal segments on the decoded path
  • Allow broader characters in the path by validating via new URL(target, window.location.origin) and ensuring url.origin === window.location.origin

Sketch:

const raw = new URLSearchParams(window.location.search).get('_redirect');
if (raw) {
  let decoded = raw;
  try { decoded = decodeURIComponent(raw); } catch {}
  const target = decoded.startsWith('/') ? decoded : `/${decoded}`;

  // reject obvious external redirects
  if (/^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(target) || target.startsWith('//')) return;

  const url = new URL(target, window.location.origin);
  if (url.origin !== window.location.origin) return;

  const segments = url.pathname.split('/').filter(Boolean);
  if (segments.some((s) => s === '.' || s === '..')) return;

  window.history.replaceState(null, '', `${base}${url.pathname}${url.search}${url.hash}`);
}

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with an updated sanitization strategy.

Comment on lines +8 to +32
const redirectPath = new URLSearchParams(window.location.search).get('_redirect');
if (redirectPath) {
const base = import.meta.env.BASE_URL.replace(/\/$/, '');
let decoded = redirectPath;
try {
decoded = decodeURIComponent(redirectPath);
} catch {
// ignore malformed encoding
}

const target = decoded.startsWith('/') ? decoded : `/${decoded}`;

const match = target.match(/^([^?#]*)(.*)$/);
const pathPart = match?.[1] ?? target;
const segments = pathPart.split('/').filter(Boolean);
const hasTraversal = segments.some((segment) => segment === '.' || segment === '..');

const hasScheme = pathPart.includes('://');
const isProtocolRelative = pathPart.startsWith('//');
const isSafePath = /^\/[0-9A-Za-z_\-./]*$/.test(pathPart);

if (!isProtocolRelative && !hasScheme && !hasTraversal && isSafePath) {
window.history.replaceState(null, '', `${base}${target}`);
}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redirect safety checks are headed in the right direction, but window.history.replaceState(null, '', ${base}${target}) uses target (which includes query/hash) even though validation is performed only on pathPart.

That means a value like /_redirect=%2Fjobs%3Ffoo=%0a... could still inject unexpected characters into the URL via the query/hash even if the path is “safe”. It won’t execute script, but it can create confusing/invalid locations and can interfere with router matching.

Also, the regex allows . in path segments, and you block . and .. segments, which is good, but you don’t prevent consecutive slashes or empty segments (not necessarily bad, but can create odd paths).

Suggestion

Sanitize/validate the full target (path + optional query/hash) before replaceState, or reconstruct the URL from validated components:

const match = target.match(/^([^?#]*)(\?[^#]*)?(#.*)?$/);
const pathPart = match?.[1] ?? '/';
const queryPart = match?.[2] ?? '';
const hashPart = match?.[3] ?? '';

// validate pathPart strictly

const isSafeQuery = /^[\w\-._~%!$&'()*+,;=:@/?]*$/.test(queryPart);
const isSafeHash = /^[\w\-._~%!$&'()*+,;=:@/?]*$/.test(hashPart);

if (isSafePath && isSafeQuery && isSafeHash && !hasScheme && !isProtocolRelative && !hasTraversal) {
  window.history.replaceState(null, '', `${base}${pathPart}${queryPart}${hashPart}`);
}

If you’d rather keep it simpler: drop query/hash from target entirely and only redirect to the path.

Reply with "@CharlieHelps yes please" if you'd like me to add a commit applying one of these approaches.

Comment on lines +108 to +135
function addByInput(): void {
setError(null);
const videoId = extractYouTubeVideoId(input);
if (!videoId) {
setError('Paste a YouTube link or an 11-character video id.');
return;
}

const platform: Platform = 'youtube';
const videoUrl = `https://www.youtube.com/watch?v=${videoId}`;
upsertLocalLibraryVideo({ platform, videoId, videoUrl });
setVideos(listLocalLibraryVideos());

const key = `${platform}:${videoId}`;
if (!oembedInFlight.current.has(key)) {
oembedInFlight.current.add(key);
void hydrateLocalLibraryVideoMetadata(platform, videoId)
.then((updated) => {
if (updated) setVideos(listLocalLibraryVideos());
})
.finally(() => {
oembedInFlight.current.delete(key);
});
}

setInput('');
setSearchParams({ video: `${platform}:${videoId}` });
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JobsPage calls hydrateLocalLibraryVideoMetadata(platform, videoId) in addByInput() without passing an AbortSignal. The effect-based batch hydration uses an AbortController, but this immediate hydration path is not cancellable and can continue after navigation/unmount.

This reintroduces the exact class of wasted in-flight requests you just addressed for the batch effect (and can still write to localStorage after the user leaves the page).

Suggestion

Use the same abort mechanism for the immediate hydration started by addByInput().

Options:

  1. Remove the immediate hydration entirely and rely on the existing batched effect (simplest).

  2. Keep it, but create a page-level AbortController stored in a ref and abort on unmount:

const pageAbort = useRef<AbortController | null>(null);
useEffect(() => {
  pageAbort.current = new AbortController();
  return () => pageAbort.current?.abort();
}, []);

// in addByInput
void hydrateLocalLibraryVideoMetadata(platform, videoId, pageAbort.current?.signal)

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with the simplest approach (remove immediate hydration and rely on the batched effect).

Comment on lines +7 to +50
function spa404Redirect(): Plugin {
let config: ResolvedConfig | null = null;

return {
name: 'constructive-github-pages-404',
apply: 'build',
configResolved(resolvedConfig) {
config = resolvedConfig;
},
async closeBundle() {
if (!config) return;

const outDir = config.build.outDir;
const file = path.join(outDir, '404.html');

const base = config.base;
const normalizedBase = base.endsWith('/') ? base : `${base}/`;

const html = `<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Constructive</title>
</head>
<body>
<script>
(function () {
var l = window.location;
var base = ${JSON.stringify(normalizedBase)};

var rel = l.pathname.startsWith(base) ? '/' + l.pathname.slice(base.length) : l.pathname;
var target = rel + l.search + l.hash;

l.replace(base + '?_redirect=' + encodeURIComponent(target));
})();
</script>
</body>
</html>
`;

await writeFile(file, html, 'utf8');
},
};
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spa404Redirect() always writes 404.html to config.build.outDir, but it doesn’t ensure the directory exists and it doesn’t guard against write failures. In some build environments (custom outDir, permissions, parallel builds), this can fail the entire build late in closeBundle().

Given this is a convenience artifact, you probably want a best-effort behavior with explicit diagnostics rather than a hard failure (or at least a clearer error).

Suggestion

Wrap the writeFile call with a try/catch and emit a clear warning/error message (and optionally mkdir(outDir, { recursive: true }) before writing). For example:

import { mkdir, writeFile } from 'node:fs/promises';

async closeBundle() {
  if (!config) return;
  const outDir = config.build.outDir;
  const file = path.join(outDir, '404.html');
  try {
    await mkdir(outDir, { recursive: true });
    await writeFile(file, html, 'utf8');
  } catch (err) {
    // decide whether this should fail build or warn
    console.warn('[spa404Redirect] Failed to write 404.html:', err);
  }
}

Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.

@charliecreates charliecreates bot removed the request for review from CharlieHelps December 29, 2025 15:44
@hapticPaper hapticPaper merged commit d1de696 into main Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Site is live, resolve issues

2 participants