Skip to content

feat(brightdata): add Bright Data integration with 8 tools#4183

Merged
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/add-brightdata
Apr 15, 2026
Merged

feat(brightdata): add Bright Data integration with 8 tools#4183
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/add-brightdata

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented Apr 15, 2026

Summary

  • Adds complete Bright Data integration with 8 tools: Web Unlocker (scrape URL), SERP search, Discover, sync scrape, async scrape dataset, snapshot status, download snapshot, and cancel snapshot
  • Block with operation dropdown, conditional fields per operation, and proper param mapping
  • All tools validated against official Bright Data API docs with correct endpoints, params, and response handling

Tools

Tool API Endpoint Description
brightdata_scrape_url POST /request Fetch content via Web Unlocker
brightdata_serp_search POST /request Search Google/Bing/DuckDuckGo/Yandex
brightdata_discover POST /discover AI-powered web discovery with intent ranking
brightdata_sync_scrape POST /datasets/v3/scrape Synchronous dataset scrape (up to 20 URLs)
brightdata_scrape_dataset POST /datasets/v3/trigger Async dataset scrape trigger
brightdata_snapshot_status GET /datasets/v3/progress/{id} Check async job status
brightdata_download_snapshot GET /datasets/v3/snapshot/{id} Download completed results
brightdata_cancel_snapshot POST /datasets/v3/snapshot/{id}/cancel Cancel active job

Files Changed

  • apps/sim/tools/brightdata/ — 8 tool files, types, and barrel export
  • apps/sim/blocks/blocks/brightdata.ts — Block definition
  • apps/sim/components/icons.tsx — BrightDataIcon
  • apps/sim/tools/registry.ts — Tool registrations
  • apps/sim/blocks/registry.ts — Block registration
  • apps/docs/ — Auto-generated docs and icon mapping

Test plan

  • Verify block appears in toolbar under Tools category
  • Test scrape URL operation with a Web Unlocker zone
  • Test SERP search returns structured results
  • Test Discover API returns results with relevance scores
  • Test async dataset trigger returns snapshot ID
  • Test sync scrape returns data or falls back to async (202)
  • Test snapshot status/download/cancel lifecycle
  • Verify advanced mode fields (country, language, numResults, includeContent) toggle correctly

Add complete Bright Data integration supporting Web Unlocker, SERP API,
Discover API, and Web Scraper dataset operations. Includes scrape URL,
SERP search, discover, sync scrape, scrape dataset, snapshot status,
download snapshot, and cancel snapshot tools.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 15, 2026 7:36pm

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 15, 2026

PR Summary

Medium Risk
Adds a new external-API integration that handles user-provided API keys and multiple request/response transforms; misconfiguration or API contract mismatches could break scraping/search workflows.

Overview
Adds a new Bright Data integration end-to-end: an brightdata block with an operation selector and conditional inputs that maps to eight new tools (brightdata_scrape_url, brightdata_serp_search, brightdata_discover, brightdata_sync_scrape, brightdata_scrape_dataset, brightdata_snapshot_status, brightdata_download_snapshot, brightdata_cancel_snapshot).

Registers the block and tools in the SIM registries, adds BrightDataIcon and icon mappings for docs and the landing integrations page, and publishes a new docs page plus tools meta.json entry. Also adjusts Agiloft’s bgColor to #FFFFFF in both the integrations data JSON and block config.

Reviewed by Cursor Bugbot for commit eb50282. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 15, 2026

Greptile Summary

Adds a complete Bright Data integration with 8 tools covering Web Unlocker scraping, SERP search, Discover, synchronous/asynchronous dataset scraping, and snapshot lifecycle management — each mapped to a conditional sub-block in a single brightdata block.

  • The discover tool's transformResponse hardcodes query: null in its output, so any workflow step reading {{brightdata.query}} after a Discover operation will always get null. The value should be extracted from data.query in the parsed response (see inline comment).
  • apps/sim/blocks/blocks/agiloft.ts has an unrelated bgColor change (#263A5C#FFFFFF) that appears to be a stray edit.

Confidence Score: 4/5

Safe to merge after fixing the hardcoded null query output in the Discover tool.

One P1 defect: the Discover tool's query output is always null, silently dropping a documented output field that downstream workflow steps would read. All other tools are well-structured and follow established patterns. The unrelated Agiloft bgColor change is a P2 worth confirming but not blocking.

apps/sim/tools/brightdata/discover.ts — hardcoded null query output

Important Files Changed

Filename Overview
apps/sim/tools/brightdata/discover.ts Discover tool implementation; query output is hardcoded to null instead of being extracted from the API response
apps/sim/tools/brightdata/sync_scrape.ts Sync scrape tool with correct 202/async fallback handling and proper JSON body wrapping in {input: [...]}
apps/sim/tools/brightdata/serp_search.ts SERP search tool; correctly builds engine-specific URLs with brd_json=1 and handles JSON/raw content types
apps/sim/blocks/blocks/brightdata.ts Block definition with 8 operations, conditional fields, and params switch; well-structured with good use of mode:advanced for optional fields
apps/sim/tools/brightdata/types.ts Type definitions for all 8 tool param/response pairs; comprehensive and correctly typed
apps/sim/blocks/blocks/agiloft.ts Unrelated change: bgColor switched from #263A5C to #FFFFFF, likely accidental or undiscussed cleanup
apps/sim/blocks/registry.ts BrightDataBlock registered after brandfetch but before box — box was already out of order pre-existing, minor style issue
apps/sim/tools/registry.ts All 8 Bright Data tools registered correctly; minor alphabetical ordering mismatch with box_ entries (pre-existing)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    BD[BrightData Block] --> OP{Operation}
    OP -->|scrape_url| WU[Web Unlocker\nPOST /request]
    OP -->|serp_search| SERP[SERP Search\nPOST /request\nvia search engine URL]
    OP -->|discover| DISC[Discover\nPOST /discover]
    OP -->|sync_scrape| SYNC[Sync Scrape\nPOST /datasets/v3/scrape]
    OP -->|scrape_dataset| ASYNC[Scrape Dataset\nPOST /datasets/v3/trigger]
    OP -->|snapshot_status| STATUS[Snapshot Status\nGET /datasets/v3/progress/id]
    OP -->|download_snapshot| DL[Download Snapshot\nGET /datasets/v3/snapshot/id]
    OP -->|cancel_snapshot| CANCEL[Cancel Snapshot\nPOST /datasets/v3/snapshot/id/cancel]
    SYNC -->|202 timeout| ASYNC_FB[Falls back to async\nisAsync=true, returns snapshotId]
    SYNC -->|200 ok| SYNC_DATA[Returns data array\nisAsync=false]
    ASYNC --> SNAP_ID[Returns snapshotId]
    SNAP_ID --> STATUS
    STATUS -->|ready| DL
    STATUS -->|running/starting| STATUS
    SNAP_ID --> CANCEL
Loading

Comments Outside Diff (1)

  1. apps/sim/tools/brightdata/discover.ts, line 568-576 (link)

    P1 query output is always null

    The query field in the returned output is hardcoded to null, making the advertised output effectively useless. Any downstream workflow step that reads {{brightdata.query}} from a Discover operation will always receive null. The API response likely contains the echoed query at data.query — extract it the same way serp_search.ts reads data?.general?.query.

Reviews (3): Last reviewed commit: "fix(brightdata): disable incompatible Du..." | Re-trigger Greptile

Comment thread apps/sim/tools/brightdata/serp_search.ts Outdated
- Fix truncated "Download Snapshot" description in integrations.json and docs
- Map engine-specific query params (num/count/numdoc, hl/setLang/lang/kl,
  gl/cc/lr) per search engine instead of using Google-specific params for all
- Attempt to parse snapshot_id from cancel/download response bodies instead
  of hardcoding null

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread apps/sim/tools/brightdata/serp_search.ts Outdated
…tion

The docs generator regex truncates at inner quotes. Reword the
download_snapshot description to avoid embedded double quotes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDuckGo kl expects region-language format (us-en) and Yandex lr
expects numeric region IDs (213), not plain two-letter codes. Disable
these URL-level params since Bright Data normalizes localization through
the body-level country param.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit eb50282. Configure here.

Comment thread apps/sim/blocks/blocks/agiloft.ts
@waleedlatif1 waleedlatif1 merged commit a39dc15 into staging Apr 15, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/add-brightdata branch April 15, 2026 19:47
Sg312 added a commit that referenced this pull request Apr 15, 2026
…mat, logs performance improvements

fix(csp): add missing analytics domains, remove unsafe-eval, fix workspace CSP gap (#4179)
fix(landing): return 404 for invalid dynamic route slugs (#4182)
improvement(seo): optimize sitemaps, robots.txt, and core web vitals across sim and docs (#4170)
fix(gemini): support structured output with tools on Gemini 3 models (#4184)
feat(brightdata): add Bright Data integration with 8 tools (#4183)
fix(mothership): fix superagent credentials (#4185)
fix(logs): close sidebar when selected log disappears from filtered list; cleanup (#4186)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant