Skip to content

Conversation

@grantdfoster
Copy link
Collaborator

@grantdfoster grantdfoster commented Sep 4, 2025

What

Adds types and arguments for web and llm actor requests. Supports unmarshalling and plugs into existing patterns for other job types. Removes web as a basic capability - it now requires an apify api key alongside an llm provider key.

Why

We want to support web scraping with an LLM summary of keywords and topics for indexing. This PR sets up the types and arguments to support both of those actors.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive support for web scraping with LLM processing capabilities. It introduces new type definitions, arguments handling, and validation for both web scraping operations and LLM processing of the scraped content.

  • Adds WebArguments and LLMProcessorArguments with validation and marshalling support
  • Updates web job capabilities to require API keys instead of being always available
  • Includes comprehensive test coverage for the new argument types

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
types/web.go Defines web scraping request/result types and query enums
types/llm.go Defines LLM processor request/result types for content processing
types/jobs.go Updates web job capabilities and removes always-available web capabilities
args/web.go Implements WebArguments with validation, defaults, and conversion methods
args/llm.go Implements LLMProcessorArguments with validation, defaults, and conversion methods
args/web_test.go Comprehensive test suite for WebArguments functionality
args/llm_test.go Comprehensive test suite for LLMProcessorArguments functionality
args/unmarshaller.go Updates interface definitions and web argument unmarshalling
args/unmarshaller_test.go Updates tests to reflect new WebArguments type

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@mcamou mcamou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to restate the same comment I added to https://github.com/masa-finance/tee-indexer/pull/399: we're currently using web search in the tee-indexer E2E tests, since it doesn't require any tokens or API keys. Should we keep it around for internal use, or do you have any ideas how to get around that?

@grantdfoster grantdfoster merged commit 3114b62 into main Sep 12, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants