Conversation
📝 WalkthroughWalkthroughThis PR introduces a comprehensive embed data extraction system for URLs and migrates the TypeBox dependency from Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant Worker
participant GitHub
participant DB
participant S3
Client->>API: POST /tasks/url-metadata (gist URL)
API->>Worker: Queue gist embed task
Worker->>GitHub: GET /gists/{gistId}
GitHub-->>Worker: Gist metadata + files
Worker->>S3: Upload file content (MD5-keyed)
S3-->>Worker: Content key stored
Worker->>DB: Upsert gist metadata
Worker->>DB: Upsert gist files
Worker->>DB: Delete stale S3 files
Worker->>S3: Remove deleted files
S3-->>Worker: Deletion complete
Worker->>DB: Store url_metadata with gistId
Worker-->>API: Embed data complete
API->>DB: Query url_metadata with mappings
API-->>Client: Enriched response (gist + files)
sequenceDiagram
participant Client
participant API
participant Worker
participant FXTwitter
participant Images
participant DB
participant S3
Client->>API: POST /tasks/url-metadata (Twitter URL)
API->>Worker: Queue post embed task
Worker->>FXTwitter: GET /userId/status/postId
FXTwitter-->>Worker: Tweet data with media URLs
Worker->>Images: Process avatar image
Images->>S3: Upload avatar
S3-->>Images: Avatar key
Worker->>Images: Process tweet image
Images->>S3: Upload tweet image
S3-->>Images: Image key
Worker->>DB: Upsert url_metadata_post
Worker->>DB: Store url_metadata with postId
Worker-->>API: Post embed complete
API->>DB: Query url_metadata with post mapping
API-->>Client: Enriched response (author + content)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (4)
pnpm-workspace.yaml (1)
6-6: TypeBox migration is complete; drift check would be a good preventive measure.All source files are using the new
typeboxpackage correctly, and no legacy@sinclair/typeboxreferences remain. A CI check to prevent reintroduction of the legacy package name is recommended as a future safeguard, but not required now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pnpm-workspace.yaml` at line 6, Add a preventive CI drift check to ensure the legacy package name '@sinclair/typebox' is never reintroduced: create a CI job (or an npm script referenced by CI) named something like "check-typebox" that searches the repository for the string '@sinclair/typebox' (excluding node_modules/build artifacts) and fails the job if any matches are found; this should be triggered on PRs and main branch updates and references the package names "typebox" and "@sinclair/typebox" so reviewers can quickly locate the check and its intent.apps/worker/src/tasks/url-metadata/processor.ts (1)
10-10: Consider adding an "html" embed type for completeness.The
UrlMetadataEmbedTypedefines "gist" | "video" | "post", but the HTML fallback path leavesembedTypeasundefined. For clarity and future filtering/querying, you might want to include "html" in the union:♻️ Suggested addition
-type UrlMetadataEmbedType = "gist" | "video" | "post"; +type UrlMetadataEmbedType = "gist" | "video" | "post" | "html";And then set it explicitly in the else branch (line 28-29):
} else { + embedType = "html"; embedPromise = getEmbedDataFromHtml(jobId, inputUrl, signal); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/processor.ts` at line 10, The UrlMetadataEmbedType union should include an "html" variant and the fallback assignment should set embedType = "html" instead of leaving it undefined; update the type alias UrlMetadataEmbedType to include "html" and modify the fallback branch in the URL metadata parsing logic (the code that currently leaves embedType undefined) to explicitly assign "html" so downstream filtering/queries can rely on a concrete embedType.apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts (1)
3-25: Consider relaxing literal types for external API fields.The
provider_url: "https://vimeo.com/"is typed as a literal string. If Vimeo's API ever returns a slightly different value (e.g., without trailing slash), it could cause type mismatches at runtime with stricter type checking. Consider usingstringfor fields that aren't critical to the application logic.♻️ Suggested type relaxation
export interface VimeoOEmbedResponse { type: "video"; version: "1.0"; provider_name: "Vimeo"; - provider_url: "https://vimeo.com/"; + provider_url: string;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts` around lines 3 - 25, The VimeoOEmbedResponse interface uses literal string types for external API fields (e.g., provider_url: "https://vimeo.com/") which is brittle; update the interface by changing those literal types to more permissive types (for example provider_url: string and similarly relax provider_name, version, and type if they are not critical) so the shape accepts minor API variations while keeping strongly typed fields like title, width, height, duration, thumbnail_url, video_id, and uri unchanged.apps/worker/src/tasks/url-metadata/common.ts (1)
17-17: Consider lazy initialization for the S3 bucket.The top-level
awaitruns at module import time, which means:
- Any import failure from
s3.ensureBucketwill prevent the module from loading.- The bucket is created even if the importing code never uses it.
- Test mocking becomes trickier since this runs before test setup.
For a worker context this may be acceptable, but lazy initialization would be more resilient:
♻️ Suggested lazy initialization pattern
-export const BUCKET = await s3.ensureBucket(env.S3_BUCKET); +let _bucket: Awaited<ReturnType<typeof s3.ensureBucket>> | undefined; + +export async function getBucket() { + if (!_bucket) { + _bucket = await s3.ensureBucket(env.S3_BUCKET); + } + return _bucket; +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/common.ts` at line 17, Replace the top-level awaited export with a lazy initializer: remove the top-level "export const BUCKET = await s3.ensureBucket(env.S3_BUCKET)" and add an async getter (e.g., getBucket or ensureBucketInitialized) that caches the bucket after calling s3.ensureBucket(env.S3_BUCKET) on first use; update callers to call this getter instead of referencing BUCKET directly so the bucket creation runs lazily and is easier to mock/test and to avoid module-import side effects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/routes/tasks/url-metadata.ts`:
- Around line 107-109: Replace the use of logical OR with nullish coalescing for
optional numeric fields so valid 0 values are preserved: change occurrences of
"width: width || undefined" and "height: height || undefined" (and the analogous
occurrences at the later lines) to use "width: width ?? undefined" and "height:
height ?? undefined"; keep altText as "altText || undefined" only if empty
string should be treated as missing, otherwise change altText to "altText ??
undefined". Update the assignments where these symbols (width, height, altText)
are used in the URL metadata construction and at the later lines mentioned to
use ?? accordingly.
- Around line 113-115: The mapGist function currently throws when the side-table
row is missing; instead, change the behavior in mapGist (and the analogous
functions that throw for missing side rows—e.g., the mapPost path and the other
URL metadata mappers around the later throw sites) to return undefined when
db.query.urlMetadataGist.findFirst(...) (or the equivalent db query) returns no
row, and call the existing requeue/refresh helper (use the module's
enqueue/refresh function used elsewhere for metadata refresh) to schedule an
async refresh rather than throwing; ensure callers of mapGist handle undefined
as a cache-hit fallback so the request path does not 500.
- Line 237: The code currently force-casts result.embedType using "embedType:
result.embedType as 'post' | 'gist' | 'video' | undefined" which is unsafe;
replace this with a guarded mapper: add a small type‑guard (e.g.,
isValidEmbedType(value): value is 'post'|'gist'|'video') and set embedType to
isValidEmbedType(result.embedType) ? result.embedType : undefined (or handle
invalid values explicitly), updating the assignment site where embedType is set
from result.embedType so the response contract is enforced without unsafe
casting.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts`:
- Line 42: twitchHosts currently lists ["clips.twitch.tv", "twitch.tv"] but
misses "www.twitch.tv", causing valid Twitch URLs to be ignored; update the
twitchHosts export (the twitchHosts constant in
apps/worker/src/tasks/url-metadata/data-providers/twitch.ts) to include
"www.twitch.tv" in the array alongside the existing entries so URL
matching/embedding recognizes both forms.
- Around line 29-38: The current fallback uses the page URL (embedUrl = url)
which produces invalid Twitch iframe src; instead detect non-clip Twitch URLs
and build a proper player endpoint (use "https://player.twitch.tv/" with either
?channel=<channelName> for channel pages or ?video=v<videoId> for video pages /
collections), preserve/add the parent query param on that player URL (use
embedUrl.searchParams.set("parent", parent)), and update the returned html to
use width="400" (minimum) and height="300"; change references to embedUrl and
url where the iframe src is constructed and handle the TODO for video/collection
paths by mapping path segments to the correct channel or video query parameter
before returning html.
In `@apps/worker/src/tasks/url-metadata/data-providers/video.ts`:
- Line 37: Exported videoHosts currently combines youtubeHosts and vimeoHosts
but omits twitchHosts, preventing Twitch URLs from being routed into the video
embed path despite getVideoDataFromUrl supporting Twitch; update the exported
videoHosts (symbol: videoHosts) to include twitchHosts alongside youtubeHosts
and vimeoHosts so Twitch URLs are recognized, and verify getVideoDataFromUrl
still handles Twitch-specific logic (symbols: getVideoDataFromUrl, twitchHosts).
In `@apps/worker/src/tasks/url-metadata/data-providers/youtube.ts`:
- Around line 27-47: The code builds youtubeUrl using the variable videoId
without ensuring it was set; update the parsing logic around videoId in
apps/worker/src/tasks/url-metadata/data-providers/youtube.ts (the videoId
variable and the youtubeUrl construction) to guard against missing/empty IDs:
after the if/else blocks that set videoId, check that videoId is a non-empty
string and if not, return null (or otherwise handle the invalid case) instead of
proceeding to construct `https://www.youtube.com/watch?v=${videoId}`.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.test.ts`:
- Around line 33-35: The test uses a simplified mock chain
((db.delete(urlMetadataGistFile).where(undefined).returning as Mock)) that hides
the real where clause used by the implementation (and(eq(...),
not(inArray(...))); update the mock in getEmbedDataFromGist.test.ts to reflect
or assert the actual where predicate: either mock
db.delete(urlMetadataGistFile).where with a matcher that checks the passed
predicate (e.g., capture the where arg and verify it composes and(eq(...),
not(inArray(...)))) or replace the .where(undefined) with a more explicit
mockImplementation that validates the incoming where function/criteria before
returning Promise.resolve([]); alternatively, if simplifying is intentional add
a brief comment explaining why the simplified mock covers the case and cannot
hide regressions, referencing db.delete and urlMetadataGistFile so reviewers can
locate the change.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.ts`:
- Around line 25-33: The function currently logs an invalid gist URL but
continues to call github.getGistById; change the invalid-path branch (when
!githubUsername || !gistId) to return immediately (e.g., return undefined) after
logging to avoid the network call and further processing instead of only setting
error = true; ensure this prevents execution of github.getGistById and
downstream code that expects valid gistId/inputUrl.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.test.ts`:
- Around line 97-101: The test incorrectly passes a URL string as the jobId to
getEmbedDataFromPost; update the test to pass a distinct mock job ID (e.g.,
"test-job-id" or similar) as the first argument while keeping the existing URL
(new URL(fakeFxTwitterUrl)) and AbortSignal, so the call mirrors production
usage and avoids conflating jobId with the input URL when generating S3 keys.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts`:
- Around line 33-37: The code in getEmbedDataFromPost uses new URL(avatarUrl)
and new URL(image.url) directly (variables avatarUrl and image.url) which can
throw synchronously; wrap the URL construction in a try/catch or pre-validate
the string before calling processImages so malformed URLs are skipped or
converted to undefined instead of throwing, and ensure any skipped images are
filtered out before calling processImages([ ... ]) and before mapping into new
URL for images so the downstream .catch handlers are not bypassed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts`:
- Around line 37-45: In getEmbedDataFromVideo, protect parsing of
videoData.thumbnail_url before calling processImages: the expression new
URL(videoData.thumbnail_url) can throw synchronously and bypass the existing
.catch on the processImages promise; wrap the URL construction/validation in a
try/catch (or guard with a safe URL-parse helper) and only call processImages
when the URL was successfully created, and ensure any thrown error is caught and
logged/handled using the existing jobId/signal flow so banner assignment doesn't
crash the task.
In `@apps/worker/test-utils/setup.ts`:
- Around line 38-41: The mock deleteMockResponse currently sets returning =
vi.fn() which yields undefined when awaited; change returning to be a function
that returns an empty array by default (e.g., make returning an async/mock
function that returns []), so callers that await the .returning(...) can safely
iterate; update the returned object in deleteMockResponse (the where mock that
returns { returning }) accordingly so .where(...).returning(...) resolves to an
empty array by default.
In `@packages/db/src/schema/url-metadata.ts`:
- Line 24: The url_metadata_gist_file schema currently forces language as
notNull (varchar("language", { length: 16 }).notNull()), which fails when GitHub
returns null/omits files[...].language; change the column definition in
packages/db/src/schema/url-metadata.ts for the url_metadata_gist_file entry to
allow nulls (remove .notNull() or use .nullable() depending on the ORM) so
inserts succeed when language is absent, and run migrations/regen types if
applicable.
In `@packages/github-api/src/getGistById.ts`:
- Around line 26-29: The error message in getGistById incorrectly references
getCommits; update the throw in the block that checks if data is undefined or
response.error (in function getGistById) to mention getGistById instead of
getCommits and include the same response.response.status and response.error
details so the runtime message correctly identifies the failing operation.
---
Nitpick comments:
In `@apps/worker/src/tasks/url-metadata/common.ts`:
- Line 17: Replace the top-level awaited export with a lazy initializer: remove
the top-level "export const BUCKET = await s3.ensureBucket(env.S3_BUCKET)" and
add an async getter (e.g., getBucket or ensureBucketInitialized) that caches the
bucket after calling s3.ensureBucket(env.S3_BUCKET) on first use; update callers
to call this getter instead of referencing BUCKET directly so the bucket
creation runs lazily and is easier to mock/test and to avoid module-import side
effects.
In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts`:
- Around line 3-25: The VimeoOEmbedResponse interface uses literal string types
for external API fields (e.g., provider_url: "https://vimeo.com/") which is
brittle; update the interface by changing those literal types to more permissive
types (for example provider_url: string and similarly relax provider_name,
version, and type if they are not critical) so the shape accepts minor API
variations while keeping strongly typed fields like title, width, height,
duration, thumbnail_url, video_id, and uri unchanged.
In `@apps/worker/src/tasks/url-metadata/processor.ts`:
- Line 10: The UrlMetadataEmbedType union should include an "html" variant and
the fallback assignment should set embedType = "html" instead of leaving it
undefined; update the type alias UrlMetadataEmbedType to include "html" and
modify the fallback branch in the URL metadata parsing logic (the code that
currently leaves embedType undefined) to explicitly assign "html" so downstream
filtering/queries can rely on a concrete embedType.
In `@pnpm-workspace.yaml`:
- Line 6: Add a preventive CI drift check to ensure the legacy package name
'@sinclair/typebox' is never reintroduced: create a CI job (or an npm script
referenced by CI) named something like "check-typebox" that searches the
repository for the string '@sinclair/typebox' (excluding node_modules/build
artifacts) and fails the job if any matches are found; this should be triggered
on PRs and main branch updates and references the package names "typebox" and
"@sinclair/typebox" so reviewers can quickly locate the check and its intent.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (39)
apps/api/package.jsonapps/api/src/routes/tasks/post-images.tsapps/api/src/routes/tasks/url-metadata.tsapps/worker/package.jsonapps/worker/src/tasks/sync-author/processor.tsapps/worker/src/tasks/sync-author/types.tsapps/worker/src/tasks/sync-collection/processor.tsapps/worker/src/tasks/sync-collection/types.tsapps/worker/src/tasks/sync-post/processor.tsapps/worker/src/tasks/sync-post/types.tsapps/worker/src/tasks/url-metadata/common.tsapps/worker/src/tasks/url-metadata/data-providers/fx-embed/twitter.tsapps/worker/src/tasks/url-metadata/data-providers/fx-embed/types.tsapps/worker/src/tasks/url-metadata/data-providers/twitch.tsapps/worker/src/tasks/url-metadata/data-providers/video.tsapps/worker/src/tasks/url-metadata/data-providers/vimeo.tsapps/worker/src/tasks/url-metadata/data-providers/x.tsapps/worker/src/tasks/url-metadata/data-providers/youtube.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromGist.test.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromGist.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromHtml.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.test.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.tsapps/worker/src/tasks/url-metadata/processor.tsapps/worker/src/tasks/url-metadata/utils/processImage.tsapps/worker/test-utils/setup.tspackages/bullmq/package.jsonpackages/bullmq/src/tasks/post-image.tspackages/bullmq/src/tasks/url-metadata.tspackages/common/package.jsonpackages/common/src/env.tspackages/db/src/schema/url-metadata.tspackages/github-api/src/getGistById.tspackages/github-api/src/index.tspackages/post-images/package.jsonpackages/post-images/src/fetch-post-data.tspackages/s3/src/utils.tspnpm-workspace.yaml
| width: width || undefined, | ||
| height: height || undefined, | ||
| altText: altText || undefined, |
There was a problem hiding this comment.
Use nullish coalescing for optional numeric fields.
Line [107], Line [108], Line [163], and Line [164] use ||, which drops valid 0 values. Use ?? so only null/undefined are omitted.
Suggested fix
- width: width || undefined,
- height: height || undefined,
+ width: width ?? undefined,
+ height: height ?? undefined,
@@
- width: result.embedWidth || undefined,
- height: result.embedHeight || undefined,
+ width: result.embedWidth ?? undefined,
+ height: result.embedHeight ?? undefined,Also applies to: 163-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/tasks/url-metadata.ts` around lines 107 - 109, Replace
the use of logical OR with nullish coalescing for optional numeric fields so
valid 0 values are preserved: change occurrences of "width: width || undefined"
and "height: height || undefined" (and the analogous occurrences at the later
lines) to use "width: width ?? undefined" and "height: height ?? undefined";
keep altText as "altText || undefined" only if empty string should be treated as
missing, otherwise change altText to "altText ?? undefined". Update the
assignments where these symbols (width, height, altText) are used in the URL
metadata construction and at the later lines mentioned to use ?? accordingly.
| async function mapGist(gistId: string): Promise<Static<typeof GistSchema>> { | ||
| const gist = await db.query.urlMetadataGist.findFirst({ where: { gistId } }); | ||
| if (!gist) throw new Error(`Gist information for ${gistId} not found!`); |
There was a problem hiding this comment.
Don’t hard-fail cached responses when gist/post side rows are missing.
On Line [115] and Line [134], throwing makes a cache hit fail with 500 when side-table data is absent/inconsistent. Return undefined and requeue refresh instead of failing the request path.
Suggested resilience fix
-async function mapGist(gistId: string): Promise<Static<typeof GistSchema>> {
+async function mapGist(
+ gistId: string,
+): Promise<Static<typeof GistSchema> | undefined> {
const gist = await db.query.urlMetadataGist.findFirst({ where: { gistId } });
- if (!gist) throw new Error(`Gist information for ${gistId} not found!`);
+ if (!gist) return undefined;
@@
-async function mapPost(postId: string): Promise<Static<typeof PostSchema>> {
+async function mapPost(
+ postId: string,
+): Promise<Static<typeof PostSchema> | undefined> {
const post = await db.query.urlMetadataPost.findFirst({ where: { postId } });
- if (!post) throw new Error(`Post information for ${postId} not found!`);
+ if (!post) return undefined;
@@
- const response = {
+ const gist = result.gistId ? await mapGist(result.gistId) : undefined;
+ const post = result.postId ? await mapPost(result.postId) : undefined;
+ if ((result.gistId && !gist) || (result.postId && !post)) {
+ shouldSubmitJob = true;
+ }
+
+ const response = {
...mapUrlMetadata(result),
- gist: result.gistId ? await mapGist(result.gistId) : undefined,
- post: result.postId ? await mapPost(result.postId) : undefined,
+ gist,
+ post,Also applies to: 132-134, 233-236
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/tasks/url-metadata.ts` around lines 113 - 115, The
mapGist function currently throws when the side-table row is missing; instead,
change the behavior in mapGist (and the analogous functions that throw for
missing side rows—e.g., the mapPost path and the other URL metadata mappers
around the later throw sites) to return undefined when
db.query.urlMetadataGist.findFirst(...) (or the equivalent db query) returns no
row, and call the existing requeue/refresh helper (use the module's
enqueue/refresh function used elsewhere for metadata refresh) to schedule an
async refresh rather than throwing; ensure callers of mapGist handle undefined
as a cache-hit fallback so the request path does not 500.
| ...mapUrlMetadata(result), | ||
| gist: result.gistId ? await mapGist(result.gistId) : undefined, | ||
| post: result.postId ? await mapPost(result.postId) : undefined, | ||
| embedType: result.embedType as "post" | "gist" | "video" | undefined, |
There was a problem hiding this comment.
Avoid unsafe embedType assertion from DB data.
Line [237] force-casts DB data into the API enum; invalid persisted values can silently violate the response contract. Prefer a guarded mapper.
Suggested contract-safe mapping
+function mapEmbedType(
+ value: string | null,
+): "post" | "gist" | "video" | undefined {
+ return value === "post" || value === "gist" || value === "video"
+ ? value
+ : undefined;
+}
@@
- embedType: result.embedType as "post" | "gist" | "video" | undefined,
+ embedType: mapEmbedType(result.embedType),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| embedType: result.embedType as "post" | "gist" | "video" | undefined, | |
| function mapEmbedType( | |
| value: string | null, | |
| ): "post" | "gist" | "video" | undefined { | |
| return value === "post" || value === "gist" || value === "video" | |
| ? value | |
| : undefined; | |
| } | |
| // ... other code ... | |
| embedType: mapEmbedType(result.embedType), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/tasks/url-metadata.ts` at line 237, The code currently
force-casts result.embedType using "embedType: result.embedType as 'post' |
'gist' | 'video' | undefined" which is unsafe; replace this with a guarded
mapper: add a small type‑guard (e.g., isValidEmbedType(value): value is
'post'|'gist'|'video') and set embedType to isValidEmbedType(result.embedType) ?
result.embedType : undefined (or handle invalid values explicitly), updating the
assignment site where embedType is set from result.embedType so the response
contract is enforced without unsafe casting.
| if (!embedUrl) { | ||
| embedUrl = url; | ||
| } | ||
|
|
||
| // Set the "parent" property for embedding - https://dev.twitch.tv/docs/embed/everything/#usage | ||
| embedUrl.searchParams.set("parent", parent); | ||
|
|
||
| return { | ||
| html: `<iframe src="${embedUrl.toString()}" height="300"></iframe>`, | ||
| title: "Twitch Embed", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/worker/src/tasks/url-metadata/data-providers/twitch.tsRepository: playfulprogramming/hoof
Length of output: 1678
🌐 Web query:
Twitch embed iframe src requirements player.twitch.tv channel video
💡 Result:
For a non-interactive Twitch player iframe, the src must point to https://player.twitch.tv/ and include one content identifier plus at least one parent domain:
src formats
- Live channel
<iframe src="https://player.twitch.tv/?channel=<channel>&parent=<yourdomain.com>"></iframe>
- VOD / past broadcast
<iframe src="https://player.twitch.tv/?video=<videoID>&parent=<yourdomain.com>"></iframe>
Key requirements
- HTTPS/SSL required for domains using Twitch embeds. [1]
parentis required: add the full domain (include subdomain likewwwif used). If the embed can appear on multiple domains, include multipleparent=entries. Missing/mismatchedparenttriggers a playback/config error. [1][2]src="https://player.twitch.tv/?channel=twitchpresents&parent=example.com&parent=www.example.com"
- Minimum size (per Twitch embed requirements): width ≥ 400, height ≥ 300. [3]
Sources: Twitch embed requirements and examples from Twitch Developers docs and Twitch Developer Forums. [1][2][3]
Non-clip Twitch URLs produce invalid embed endpoints.
The code falls back to embedding the page URL directly (line 30: embedUrl = url). This results in iframe sources like https://twitch.tv/somechannel?parent=..., which Twitch does not recognize as valid embed endpoints. Twitch requires embeds to use https://player.twitch.tv/ with channel or video query parameters.
Additionally, the iframe has height="300" but no width attribute. Twitch embeds require a minimum width of 400 pixels.
The TODO comment on line 25 explicitly flags this limitation for video and collection URLs, which also need proper handling.
💡 Suggested fix
if (!embedUrl) {
const [first, second] = url.pathname.split("/").filter(Boolean);
embedUrl = new URL("https://player.twitch.tv/");
if (first === "videos" && second) {
embedUrl.searchParams.set("video", `v${second}`);
} else if (first) {
embedUrl.searchParams.set("channel", first);
} else {
return { html: "", title: "Twitch Embed" };
}
}Also add width to the iframe: width="400" minimum.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts` around lines 29
- 38, The current fallback uses the page URL (embedUrl = url) which produces
invalid Twitch iframe src; instead detect non-clip Twitch URLs and build a
proper player endpoint (use "https://player.twitch.tv/" with either
?channel=<channelName> for channel pages or ?video=v<videoId> for video pages /
collections), preserve/add the parent query param on that player URL (use
embedUrl.searchParams.set("parent", parent)), and update the returned html to
use width="400" (minimum) and height="300"; change references to embedUrl and
url where the iframe src is constructed and handle the TODO for video/collection
paths by mapping path segments to the correct channel or video query parameter
before returning html.
| }; | ||
| } | ||
|
|
||
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv"]; |
There was a problem hiding this comment.
Add www.twitch.tv to supported hosts.
Common Twitch URLs use www.twitch.tv; excluding it can skip valid embeds.
🛠️ Proposed fix
-export const twitchHosts = ["clips.twitch.tv", "twitch.tv"];
+export const twitchHosts = ["clips.twitch.tv", "twitch.tv", "www.twitch.tv"];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv"]; | |
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv", "www.twitch.tv"]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts` at line 42,
twitchHosts currently lists ["clips.twitch.tv", "twitch.tv"] but misses
"www.twitch.tv", causing valid Twitch URLs to be ignored; update the twitchHosts
export (the twitchHosts constant in
apps/worker/src/tasks/url-metadata/data-providers/twitch.ts) to include
"www.twitch.tv" in the array alongside the existing entries so URL
matching/embedding recognizes both forms.
| const avatarUrl = post.author.avatar_url; | ||
| let avatarResult: ProcessImageResult | undefined = undefined; | ||
| if (avatarUrl) { | ||
| avatarResult = await processImages( | ||
| [new URL(avatarUrl)], |
There was a problem hiding this comment.
Guard provider URL parsing before image processing.
new URL(avatarUrl) (Line 37) and new URL(image.url) (Line 54) can throw before the downstream .catch(...) blocks run. A malformed upstream value can abort the whole task.
💡 Proposed fix
+ const safeParseUrl = (value: string, label: string): URL | undefined => {
+ try {
+ return new URL(value);
+ } catch (e) {
+ console.error(`Invalid ${label} URL for '${inputUrl}'`, e);
+ error = true;
+ return undefined;
+ }
+ };
+
const avatarUrl = post.author.avatar_url;
let avatarResult: ProcessImageResult | undefined = undefined;
if (avatarUrl) {
+ const parsedAvatarUrl = safeParseUrl(avatarUrl, "avatar");
+ if (!parsedAvatarUrl) {
+ avatarResult = undefined;
+ } else {
avatarResult = await processImages(
- [new URL(avatarUrl)],
+ [parsedAvatarUrl],
72,
BUCKET,
"remote-posts",
jobId,
signal,
).catch((e) => {
console.error(`Error processing post image for '${inputUrl}'`, e);
error = true;
return undefined;
});
+ }
}
...
if (image) {
+ const parsedImageUrl = safeParseUrl(image.url, "post image");
+ if (!parsedImageUrl) {
+ imageResult = undefined;
+ } else {
imageResult = await processImages(
- [new URL(image.url)],
+ [parsedImageUrl],
image.width,
BUCKET,
"remote-posts",
jobId,
signal,
).catch((e) => {
console.error(`Error processing post image for '${inputUrl}'`, e);
error = true;
return undefined;
});
+ }
}Also applies to: 53-55
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts` around lines 33 -
37, The code in getEmbedDataFromPost uses new URL(avatarUrl) and new
URL(image.url) directly (variables avatarUrl and image.url) which can throw
synchronously; wrap the URL construction in a try/catch or pre-validate the
string before calling processImages so malformed URLs are skipped or converted
to undefined instead of throwing, and ensure any skipped images are filtered out
before calling processImages([ ... ]) and before mapping into new URL for images
so the downstream .catch handlers are not bypassed.
| const banner = videoData?.thumbnail_url | ||
| ? await processImages( | ||
| [new URL(videoData.thumbnail_url)], | ||
| expectedThumbnailWidth, | ||
| BUCKET, | ||
| "remote-video-thumbnail", | ||
| jobId, | ||
| signal, | ||
| ).catch((e) => { |
There was a problem hiding this comment.
Protect thumbnail URL parsing before calling processImages.
new URL(videoData.thumbnail_url) (Line 39) can throw before the current .catch(...) handler executes. This can fail the task unexpectedly on malformed provider data.
💡 Proposed fix
- const banner = videoData?.thumbnail_url
- ? await processImages(
- [new URL(videoData.thumbnail_url)],
- expectedThumbnailWidth,
- BUCKET,
- "remote-video-thumbnail",
- jobId,
- signal,
- ).catch((e) => {
- console.error(`Unable upload thumbnail image for ${inputUrl}`, e);
- error = true;
- return undefined;
- })
- : undefined;
+ let banner = undefined;
+ if (videoData?.thumbnail_url) {
+ try {
+ const thumbnailUrl = new URL(videoData.thumbnail_url);
+ banner = await processImages(
+ [thumbnailUrl],
+ expectedThumbnailWidth,
+ BUCKET,
+ "remote-video-thumbnail",
+ jobId,
+ signal,
+ ).catch((e) => {
+ console.error(`Unable upload thumbnail image for ${inputUrl}`, e);
+ error = true;
+ return undefined;
+ });
+ } catch (e) {
+ console.error(`Invalid thumbnail URL for ${inputUrl}`, e);
+ error = true;
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts` around lines 37
- 45, In getEmbedDataFromVideo, protect parsing of videoData.thumbnail_url
before calling processImages: the expression new URL(videoData.thumbnail_url)
can throw synchronously and bypass the existing .catch on the processImages
promise; wrap the URL construction/validation in a try/catch (or guard with a
safe URL-parse helper) and only call processImages when the URL was successfully
created, and ensure any thrown error is caught and logged/handled using the
existing jobId/signal flow so banner assignment doesn't crash the task.
| const deleteMockResponse = () => { | ||
| const returning = vi.fn(); | ||
| return { where: vi.fn(() => ({ returning })) }; | ||
| }; |
There was a problem hiding this comment.
Make mocked .returning(...) default to an array.
Current default vi.fn() returns undefined. Callers that iterate over await ...returning(...) can crash in tests.
💡 Proposed fix
const deleteMockResponse = () => {
- const returning = vi.fn();
+ const returning = vi.fn(async () => []);
return { where: vi.fn(() => ({ returning })) };
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const deleteMockResponse = () => { | |
| const returning = vi.fn(); | |
| return { where: vi.fn(() => ({ returning })) }; | |
| }; | |
| const deleteMockResponse = () => { | |
| const returning = vi.fn(async () => []); | |
| return { where: vi.fn(() => ({ returning })) }; | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/test-utils/setup.ts` around lines 38 - 41, The mock
deleteMockResponse currently sets returning = vi.fn() which yields undefined
when awaited; change returning to be a function that returns an empty array by
default (e.g., make returning an async/mock function that returns []), so
callers that await the .returning(...) can safely iterate; update the returned
object in deleteMockResponse (the where mock that returns { returning })
accordingly so .where(...).returning(...) resolves to an empty array by default.
| .references(() => urlMetadataGist.gistId, { onDelete: "cascade" }), | ||
| filename: varchar("filename", { length: 256 }).notNull(), | ||
| contentKey: varchar("content_key", { length: 256 }).notNull(), | ||
| language: varchar("language", { length: 16 }).notNull(), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub REST API GET gists gist_id files language field null omitted
💡 Result:
In the REST API “Get a gist” response, each entry under files may include a language property (the docs show it in examples), but it’s not something you can rely on always being a non-empty string. [1]
In practice, files[...].language is best treated as optional/nullable:
- Many client SDKs model it as an optional field (e.g.,
*stringwithomitempty), meaning it can be missing or null in JSON. [2] - GitHub determines language via Linguist (used to detect languages and ignore/handle binary content), so if Linguist can’t confidently classify a file (unknown/unsupported extension, binary-ish content, etc.), the API may return
nullor omit it. [3]
If you need a stable value, fall back to your own detection (e.g., by filename/extension or running Linguist yourself) when language is null/absent. [3]
Sources: GitHub REST Gists docs [1], go-github GistFile model [2], GitHub Linguist README [3].
🏁 Script executed:
# Find and inspect the url-metadata.ts schema file
find . -name "url-metadata.ts" -type fRepository: playfulprogramming/hoof
Length of output: 195
🏁 Script executed:
# Inspect the url-metadata.ts schema file to see the actual implementation
cat -n ./packages/db/src/schema/url-metadata.ts | head -50Repository: playfulprogramming/hoof
Length of output: 2244
Make language optional in url_metadata_gist_file schema.
The language field is marked as notNull() but GitHub's REST API can omit or return null for files[...].language when Linguist cannot confidently classify the file. This causes insert failures when ingesting gist data.
Schema adjustment
- language: varchar("language", { length: 16 }).notNull(),
+ language: varchar("language", { length: 16 }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| language: varchar("language", { length: 16 }).notNull(), | |
| language: varchar("language", { length: 16 }), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/db/src/schema/url-metadata.ts` at line 24, The
url_metadata_gist_file schema currently forces language as notNull
(varchar("language", { length: 16 }).notNull()), which fails when GitHub returns
null/omits files[...].language; change the column definition in
packages/db/src/schema/url-metadata.ts for the url_metadata_gist_file entry to
allow nulls (remove .notNull() or use .nullable() depending on the ORM) so
inserts succeed when language is absent, and run migrations/regen types if
applicable.
| if (typeof data === "undefined" || response.error) { | ||
| throw new Error( | ||
| `GitHub API (getCommits) returned ${response.response.status} ${response.error}`, | ||
| ); |
There was a problem hiding this comment.
Fix incorrect operation name in error message.
Line 28 references getCommits, but this function is getGistById. This makes runtime errors misleading.
🛠️ Proposed fix
- `GitHub API (getCommits) returned ${response.response.status} ${response.error}`,
+ `GitHub API (getGistById) returned ${response.response.status} ${response.error}`,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (typeof data === "undefined" || response.error) { | |
| throw new Error( | |
| `GitHub API (getCommits) returned ${response.response.status} ${response.error}`, | |
| ); | |
| if (typeof data === "undefined" || response.error) { | |
| throw new Error( | |
| `GitHub API (getGistById) returned ${response.response.status} ${response.error}`, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/github-api/src/getGistById.ts` around lines 26 - 29, The error
message in getGistById incorrectly references getCommits; update the throw in
the block that checks if data is undefined or response.error (in function
getGistById) to mention getGistById instead of getCommits and include the same
response.response.status and response.error details so the runtime message
correctly identifies the failing operation.
Fixes #82 - copies the platform-detectors implementation from the frontend to run as part of the url-metadata task to store/upload github gists, posts, and video embed info.
Summary by CodeRabbit