feat(core): Add Amazon URL parsing and metadata extraction by shaurya-cd · Pull Request #27455 · google-gemini/gemini-cli

shaurya-cd · 2026-05-26T13:36:06Z

Summary

Adds Amazon URL parsing and product metadata extraction support to web-fetch.

This enables the CLI to automatically resolve Amazon short URLs (amzn.in, amzn.to) and extract structured product information for comparison and analysis workflows.

Details

Features Added

Detects Amazon and Amazon short URLs
Expands shortened Amazon URLs to canonical product URLs
Extracts structured metadata from Amazon product pages:
- Product title
- Price
- Brand
- Model
- Key feature bullets/specifications
Injects extracted metadata into LLM context through web-fetch
Gracefully falls back to standard fetch behavior if extraction fails

Implementation Notes

Added utility parser:
- packages/core/src/utils/amazon-url-parser.ts
Added unit tests:
- packages/core/src/utils/amazon-url-parser.test.ts
Integrated metadata extraction into:
- packages/core/src/tools/web-fetch.ts

This implementation intentionally keeps scope lightweight and avoids browser automation or anti-bot bypassing systems to remain maintainable and focused.

Related Issues

Fixes #27448

How to Validate

Build

npm run build --workspace=@google/gemini-cli-core

Typecheck

npm run typecheck --workspace=@google/gemini-cli-core

Run Tests

npx vitest src/utils/amazon-url-parser.test.ts

Manual Validation

Use an Amazon product URL such as:

https://amzn.in/d/00geRr5g

Expected behavior:

URL resolves successfully
Product metadata is extracted
Structured product details are returned instead of raw HTML

Pre-Merge Checklist

Added/updated tests
Validated on Windows
- npm run
- npx

google-cla · 2026-05-26T13:36:24Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions · 2026-05-26T13:36:49Z

🛑 Action Required: Evaluation Approval

Steering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged.

Maintainers:

Go to the Workflow Run Summary.
Click the yellow 'Review deployments' button.
Select the 'eval-gate' environment and click 'Approve'.

Once approved, the evaluation results will be posted here automatically.

gemini-code-assist · 2026-05-26T13:38:24Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces native support for parsing and extracting structured metadata from Amazon product pages within the web-fetch tool. By enabling the automatic resolution of shortened Amazon links and the extraction of key product details, this change improves the quality of information available to the LLM for comparison and analysis tasks without requiring heavy browser automation.

Highlights

Amazon URL Support: Added logic to detect Amazon and Amazon short URLs (amzn.in, amzn.to) and expand them to canonical product URLs.
Metadata Extraction: Implemented scraping utilities to extract product titles, prices, brands, models, and feature bullets from Amazon product pages.
LLM Integration: Integrated the new parsing logic into the web-fetch tool to inject structured product metadata directly into the LLM context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds Amazon product metadata extraction to the WebFetchTool, introducing a new utility to parse Amazon URLs, expand shortened links, extract product details (such as title, price, brand, model, and bullets), and format them into LLM-friendly context. The reviewer raised critical security concerns regarding Server-Side Request Forgery (SSRF). Specifically, they recommended validating that the expanded canonical URL does not point to a private IP before fetching, and tightening the isAmazonUrl hostname validation to prevent domain spoofing and DNS rebinding bypasses.

gemini-code-assist · 2026-05-26T13:41:22Z

+export async function extractAmazonMetadata(
+  url: string,
+): Promise<AmazonProductMetadata> {
+  const canonicalUrl = await expandAmazonUrl(url);
+
+  const html = await fetchAmazonHtml(canonicalUrl);
+
+  return {
+    canonicalUrl,
+    title: extractTitle(html),
+    price: extractPrice(html),
+    bullets: extractBullets(html),
+    brand: extractBrand(html),
+    model: extractModel(html),
+  };
+}


Security Vulnerability: Redirect to Private IP (SSRF)

When expanding shortened Amazon URLs, the redirected canonicalUrl could point to a private IP or localhost. We must validate that the expanded URL is not a private IP before fetching its HTML content.

Suggested change

export async function extractAmazonMetadata(

url: string,

): Promise<AmazonProductMetadata> {

const canonicalUrl = await expandAmazonUrl(url);

const html = await fetchAmazonHtml(canonicalUrl);

return {

canonicalUrl,

title: extractTitle(html),

price: extractPrice(html),

bullets: extractBullets(html),

brand: extractBrand(html),

model: extractModel(html),

};

}

export async function extractAmazonMetadata(

url: string,

): Promise<AmazonProductMetadata> {

const canonicalUrl = await expandAmazonUrl(url);

if (isPrivateIp(canonicalUrl)) {

throw new PrivateIpError(`Access to private network is blocked: ${canonicalUrl}`);

}

const html = await fetchAmazonHtml(canonicalUrl);

return {

canonicalUrl,

title: extractTitle(html),

price: extractPrice(html),

bullets: extractBullets(html),

brand: extractBrand(html),

model: extractModel(html),

};

}

gemini-code-assist · 2026-05-26T13:41:22Z

+export function isAmazonUrl(url: string): boolean {
+  try {
+    const parsed = new URL(url);
+
+    const host = parsed.hostname.toLowerCase();
+
+    return AMAZON_HOST_PATTERNS.some((pattern) => host.includes(pattern));
+  } catch {
+    return false;
+  }
+}


The isAmazonUrl function uses host.includes(pattern) for validation, which is insecure. This allows attacker-controlled domains like amazon.attacker.com or amazon.127.0.0.1.nip.io to be incorrectly identified as Amazon URLs, leading to Server-Side Request Forgery (SSRF) and DNS rebinding bypasses. A stricter validation is required, ensuring the hostname accurately matches Amazon domains (e.g., ending with .amazon.<tld> or amzn.to/amzn.in).

Suggested change

export function isAmazonUrl(url: string): boolean {

try {

const parsed = new URL(url);

const host = parsed.hostname.toLowerCase();

return AMAZON_HOST_PATTERNS.some((pattern) => host.includes(pattern));

} catch {

return false;

}

}

export function isAmazonUrl(url: string): boolean {

try {

const parsed = new URL(url);

const host = parsed.hostname.toLowerCase();

return /^(.*\.)?(amazon\.[a-z]{2,3}(\.[a-z]{2})?|amzn\.(in|to))$/i.test(host);

} catch {

return false;

}

}

gemini-code-assist · 2026-05-26T13:41:22Z

+ * SPDX-License-Identifier: Apache-2.0
+ */
+
+import { fetchWithTimeout } from './fetch.js';


Import Required Utilities for SSRF Prevention

Import isPrivateIp and PrivateIpError to validate the expanded canonical URL before fetching its HTML content.

Suggested change

import { fetchWithTimeout } from './fetch.js';

import { fetchWithTimeout, isPrivateIp, PrivateIpError } from './fetch.js';

shaurya-cd requested review from a team as code owners May 26, 2026 13:36

shaurya-cd had a problem deploying to eval-gate May 26, 2026 13:36 — with GitHub Actions Error

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

gemini-cli Bot added priority/p3 Backlog - a good idea but not currently a priority. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality labels May 26, 2026

shaurya-cd had a problem deploying to eval-gate May 27, 2026 03:54 — with GitHub Actions Error

shaurya-cd had a problem deploying to eval-gate May 27, 2026 03:56 — with GitHub Actions Error

shaurya-cd had a problem deploying to eval-gate May 27, 2026 10:12 — with GitHub Actions Error

shaurya-cd had a problem deploying to eval-gate May 27, 2026 10:18 — with GitHub Actions Error

shaurya-cd force-pushed the feature/amazon-url-unfurling branch from a62d8e0 to da92498 Compare May 27, 2026 10:32

shaurya-cd requested a review from a team as a code owner May 27, 2026 10:32

shaurya-cd had a problem deploying to eval-gate May 27, 2026 10:33 — with GitHub Actions Error

shaurya-cd had a problem deploying to eval-gate May 27, 2026 10:34 — with GitHub Actions Error

shaurya-cd added 3 commits May 27, 2026 16:06

feat(core): Add Amazon URL parsing and metadata extraction

0c36cb8

fix(core): harden amazon URL validation against SSRF

0e29d1b

feat: add utility to fetch and parse Amazon product metadata

9a83b59

shaurya-cd force-pushed the feature/amazon-url-unfurling branch from d34ad0d to 9a83b59 Compare May 27, 2026 10:37

shaurya-cd had a problem deploying to eval-gate May 27, 2026 10:38 — with GitHub Actions Error

Merge branch 'main' into feature/amazon-url-unfurling

cc39124

shaurya-cd requested a deployment to eval-gate May 28, 2026 12:54 — with GitHub Actions Waiting

github-actions Bot mentioned this pull request May 29, 2026

📊 AI CLI 工具社区动态日报 2026-05-29 zx0828/big_model_radar#75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Add Amazon URL parsing and metadata extraction#27455

feat(core): Add Amazon URL parsing and metadata extraction#27455
shaurya-cd wants to merge 4 commits into
google-gemini:mainfrom
shaurya-cd:feature/amazon-url-unfurling

shaurya-cd commented May 26, 2026

Uh oh!

google-cla Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	import { fetchWithTimeout } from './fetch.js';
	import { fetchWithTimeout, isPrivateIp, PrivateIpError } from './fetch.js';

Conversation

shaurya-cd commented May 26, 2026

Summary

Details

Features Added

Implementation Notes

Related Issues

How to Validate

Build

Typecheck

Run Tests

Manual Validation

Pre-Merge Checklist

Uh oh!

google-cla Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛑 Action Required: Evaluation Approval

Uh oh!

gemini-code-assist Bot commented May 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Security Vulnerability: Redirect to Private IP (SSRF)

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Import Required Utilities for SSRF Prevention

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 26, 2026 •

edited

Loading