diff --git a/docs/case_studies/deep-research-bench-pareto-analysis/notebook.ipynb b/docs/case_studies/deep-research-bench-pareto-analysis/notebook.ipynb index 9b740147..b50c692e 100644 --- a/docs/case_studies/deep-research-bench-pareto-analysis/notebook.ipynb +++ b/docs/case_studies/deep-research-bench-pareto-analysis/notebook.ipynb @@ -667,7 +667,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## 6. Choosing the right effort level\n\n**`LOW`** is the default, and it's the right choice for most tasks that don't require web research — classifying rows, extracting fields, reformatting data. It runs a single LLM call with no tool use, so it's fast and cheap. Because DRB measures agentic information retrieval, the DRB score for the `LOW` model isn't very meaningful here: in practice `LOW` doesn't do research at all.\n\n**`MEDIUM`** turns on the research agent. Gemini 3 Flash (low) sits on the cost Pareto frontier — it's the cheapest model that delivers strong research accuracy. Use this when you need agents to look things up on the web but want to keep costs down.\n\n**`HIGH`** uses Claude 4.6 Opus (low), which sits on both the cost and speed Pareto frontiers. It's the fastest high-accuracy model on DRB and delivers the best score-per-dollar among top-tier models. Use this when accuracy matters and you're willing to pay more per row.\n\n**Want the absolute best accuracy?** You can override the model directly by setting `effort_level=None` and specifying all parameters explicitly:\n\n```python\nfrom everyrow.ops import agent_map\nfrom everyrow.task import LLM\n\nresult = await agent_map(\n task=\"Find each company's latest funding round\",\n input=companies_df,\n effort_level=None,\n llm=LLM.CLAUDE_4_6_OPUS_HIGH,\n iteration_budget=10,\n include_research=True,\n)\n```\n\nClaude 4.6 Opus (high) is the top-scoring model on DRB, but it costs roughly twice as much and takes about three times as long as the `HIGH` default. For most workloads the `HIGH` preset already captures the bulk of that accuracy at a fraction of the price — but the option is there when you need it.\n\nWe re-run these benchmarks as new models launch, so the model behind each effort level may change over time. You always get the current best trade-off without changing your code." + "source": "## 6. Choosing the right effort level\n\n**`LOW`** is the default, and it's the right choice for most tasks that don't require web research — classifying rows, extracting fields, reformatting data. It runs a single LLM call with no tool use, so it's fast and cheap. Because DRB measures agentic information retrieval, the DRB score for the `LOW` model isn't very meaningful here: in practice `LOW` doesn't do research at all.\n\n**`MEDIUM`** turns on the research agent. Gemini 3 Flash (low) sits on the cost Pareto frontier — it's the cheapest model that delivers strong research accuracy. Use this when you need agents to look things up on the web but want to keep costs down.\n\n**`HIGH`** uses Claude 4.6 Opus (low), which sits on both the cost and speed Pareto frontiers. It's the fastest high-accuracy model on DRB and delivers the best score-per-dollar among top-tier models. Use this when accuracy matters and you're willing to pay more per row.\n\n**Want the absolute best accuracy?** You can override the model directly by setting `effort_level=None` and specifying all parameters explicitly:\n\n```python\nfrom everyrow.ops import agent_map\nfrom everyrow.task import LLM\n\nresult = await agent_map(\n task=\"Find each company's latest funding round\",\n input=companies_df,\n effort_level=None,\n llm=LLM.CLAUDE_4_6_OPUS_HIGH,\n iteration_budget=10,\n include_reasoning=True,\n)\n```\n\nClaude 4.6 Opus (high) is the top-scoring model on DRB, but it costs roughly twice as much and takes about three times as long as the `HIGH` default. For most workloads the `HIGH` preset already captures the bulk of that accuracy at a fraction of the price — but the option is there when you need it.\n\nWe re-run these benchmarks as new models launch, so the model behind each effort level may change over time. You always get the current best trade-off without changing your code." } ], "metadata": { diff --git a/src/everyrow/generated/models/agent_map_operation.py b/src/everyrow/generated/models/agent_map_operation.py index acde7ee4..cdc47a5b 100644 --- a/src/everyrow/generated/models/agent_map_operation.py +++ b/src/everyrow/generated/models/agent_map_operation.py @@ -33,15 +33,17 @@ class AgentMapOperation: not provided, use default answer schema. llm (LLMEnumPublic | None | Unset): LLM to use for each agent. Required when effort_level is not set. effort_level (None | PublicEffortLevel | Unset): Effort level preset: low (quick), medium (balanced), high - (thorough). Mutually exclusive with llm/iteration_budget/include_research - use either a preset or custom + (thorough). Mutually exclusive with llm/iteration_budget/include_reasoning - use either a preset or custom params, not both. If not specified, you must provide all individual parameters (llm, iteration_budget, - include_research). + include_reasoning). join_with_input (bool | Unset): If True, merge agent output with input row. If False, output only agent results. Default: True. iteration_budget (int | None | Unset): Number of agent iterations per row (0-20). Required when effort_level is not set. - include_research (bool | None | Unset): Include research notes in the response. Required when effort_level is + include_reasoning (bool | None | Unset): Include reasoning notes in the response. Required when effort_level is not set. + include_research (bool | None | Unset): Deprecated: use include_reasoning instead. Include research notes in the + response. Required when effort_level is not set. enforce_row_independence (bool | Unset): If True, each agent runs completely independently without being affected by other agents. Disables adaptive budget adjustment and straggler management, ensuring agents are not hurried or given iteration limits based on other agents' progress. Use this when consistent per-row behavior is @@ -56,6 +58,7 @@ class AgentMapOperation: effort_level: None | PublicEffortLevel | Unset = UNSET join_with_input: bool | Unset = True iteration_budget: int | None | Unset = UNSET + include_reasoning: bool | None | Unset = UNSET include_research: bool | None | Unset = UNSET enforce_row_independence: bool | Unset = False additional_properties: dict[str, Any] = _attrs_field(init=False, factory=dict) @@ -117,6 +120,12 @@ def to_dict(self) -> dict[str, Any]: else: iteration_budget = self.iteration_budget + include_reasoning: bool | None | Unset + if isinstance(self.include_reasoning, Unset): + include_reasoning = UNSET + else: + include_reasoning = self.include_reasoning + include_research: bool | None | Unset if isinstance(self.include_research, Unset): include_research = UNSET @@ -145,6 +154,8 @@ def to_dict(self) -> dict[str, Any]: field_dict["join_with_input"] = join_with_input if iteration_budget is not UNSET: field_dict["iteration_budget"] = iteration_budget + if include_reasoning is not UNSET: + field_dict["include_reasoning"] = include_reasoning if include_research is not UNSET: field_dict["include_research"] = include_research if enforce_row_independence is not UNSET: @@ -271,6 +282,15 @@ def _parse_iteration_budget(data: object) -> int | None | Unset: iteration_budget = _parse_iteration_budget(d.pop("iteration_budget", UNSET)) + def _parse_include_reasoning(data: object) -> bool | None | Unset: + if data is None: + return data + if isinstance(data, Unset): + return data + return cast(bool | None | Unset, data) + + include_reasoning = _parse_include_reasoning(d.pop("include_reasoning", UNSET)) + def _parse_include_research(data: object) -> bool | None | Unset: if data is None: return data @@ -291,6 +311,7 @@ def _parse_include_research(data: object) -> bool | None | Unset: effort_level=effort_level, join_with_input=join_with_input, iteration_budget=iteration_budget, + include_reasoning=include_reasoning, include_research=include_research, enforce_row_independence=enforce_row_independence, ) diff --git a/src/everyrow/generated/models/llm_enum_public.py b/src/everyrow/generated/models/llm_enum_public.py index 2b35008e..dc735e91 100644 --- a/src/everyrow/generated/models/llm_enum_public.py +++ b/src/everyrow/generated/models/llm_enum_public.py @@ -18,6 +18,11 @@ class LLMEnumPublic(str, Enum): CLAUDE_4_6_OPUS_MAX = "CLAUDE_4_6_OPUS_MAX" CLAUDE_4_6_OPUS_MEDIUM = "CLAUDE_4_6_OPUS_MEDIUM" CLAUDE_4_6_OPUS_NT = "CLAUDE_4_6_OPUS_NT" + CLAUDE_4_6_SONNET_HIGH = "CLAUDE_4_6_SONNET_HIGH" + CLAUDE_4_6_SONNET_LOW = "CLAUDE_4_6_SONNET_LOW" + CLAUDE_4_6_SONNET_MAX = "CLAUDE_4_6_SONNET_MAX" + CLAUDE_4_6_SONNET_MEDIUM = "CLAUDE_4_6_SONNET_MEDIUM" + CLAUDE_4_6_SONNET_NT = "CLAUDE_4_6_SONNET_NT" GEMINI_3_FLASH_HIGH = "GEMINI_3_FLASH_HIGH" GEMINI_3_FLASH_LOW = "GEMINI_3_FLASH_LOW" GEMINI_3_FLASH_MEDIUM = "GEMINI_3_FLASH_MEDIUM" diff --git a/src/everyrow/generated/models/single_agent_operation.py b/src/everyrow/generated/models/single_agent_operation.py index 7c26b627..9a720a09 100644 --- a/src/everyrow/generated/models/single_agent_operation.py +++ b/src/everyrow/generated/models/single_agent_operation.py @@ -33,14 +33,16 @@ class SingleAgentOperation: If not provided, use default answer schema. llm (LLMEnumPublic | None | Unset): LLM to use for the agent. Required when effort_level is not set. effort_level (None | PublicEffortLevel | Unset): Effort level preset: low (quick), medium (balanced), high - (thorough). Mutually exclusive with llm/iteration_budget/include_research - use either a preset or custom + (thorough). Mutually exclusive with llm/iteration_budget/include_reasoning - use either a preset or custom params, not both. If not specified, you must provide all individual parameters (llm, iteration_budget, - include_research). + include_reasoning). return_list (bool | Unset): If True, treat the output as a list of responses instead of a single response. Default: True. iteration_budget (int | None | Unset): Number of agent iterations (0-20). Required when effort_level is not set. - include_research (bool | None | Unset): Include research notes in the response. Required when effort_level is + include_reasoning (bool | None | Unset): Include reasoning notes in the response. Required when effort_level is not set. + include_research (bool | None | Unset): Deprecated: use include_reasoning instead. Include research notes in the + response. Required when effort_level is not set. """ input_: list[SingleAgentOperationInputType1Item] | SingleAgentOperationInputType2 | UUID @@ -51,6 +53,7 @@ class SingleAgentOperation: effort_level: None | PublicEffortLevel | Unset = UNSET return_list: bool | Unset = True iteration_budget: int | None | Unset = UNSET + include_reasoning: bool | None | Unset = UNSET include_research: bool | None | Unset = UNSET additional_properties: dict[str, Any] = _attrs_field(init=False, factory=dict) @@ -111,6 +114,12 @@ def to_dict(self) -> dict[str, Any]: else: iteration_budget = self.iteration_budget + include_reasoning: bool | None | Unset + if isinstance(self.include_reasoning, Unset): + include_reasoning = UNSET + else: + include_reasoning = self.include_reasoning + include_research: bool | None | Unset if isinstance(self.include_research, Unset): include_research = UNSET @@ -137,6 +146,8 @@ def to_dict(self) -> dict[str, Any]: field_dict["return_list"] = return_list if iteration_budget is not UNSET: field_dict["iteration_budget"] = iteration_budget + if include_reasoning is not UNSET: + field_dict["include_reasoning"] = include_reasoning if include_research is not UNSET: field_dict["include_research"] = include_research @@ -263,6 +274,15 @@ def _parse_iteration_budget(data: object) -> int | None | Unset: iteration_budget = _parse_iteration_budget(d.pop("iteration_budget", UNSET)) + def _parse_include_reasoning(data: object) -> bool | None | Unset: + if data is None: + return data + if isinstance(data, Unset): + return data + return cast(bool | None | Unset, data) + + include_reasoning = _parse_include_reasoning(d.pop("include_reasoning", UNSET)) + def _parse_include_research(data: object) -> bool | None | Unset: if data is None: return data @@ -281,6 +301,7 @@ def _parse_include_research(data: object) -> bool | None | Unset: effort_level=effort_level, return_list=return_list, iteration_budget=iteration_budget, + include_reasoning=include_reasoning, include_research=include_research, ) diff --git a/src/everyrow/ops.py b/src/everyrow/ops.py index 097f4325..61e9d44a 100644 --- a/src/everyrow/ops.py +++ b/src/everyrow/ops.py @@ -147,7 +147,7 @@ async def single_agent[T: BaseModel]( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, response_model: type[T] = DefaultAgentResponse, return_table: Literal[False] = False, ) -> ScalarResult[T]: ... @@ -161,7 +161,7 @@ async def single_agent( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, response_model: type[BaseModel] = DefaultAgentResponse, return_table: Literal[True] = True, ) -> TableResult: ... @@ -174,7 +174,7 @@ async def single_agent[T: BaseModel]( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, response_model: type[T] = DefaultAgentResponse, return_table: bool = False, ) -> ScalarResult[T] | TableResult: @@ -185,10 +185,10 @@ async def single_agent[T: BaseModel]( session: Optional session. If not provided, one will be created automatically. input: Input data (BaseModel, DataFrame, UUID, or Result). effort_level: Effort level preset (low/medium/high). Mutually exclusive with - custom params (llm, iteration_budget, include_research). Default: medium. + custom params (llm, iteration_budget, include_reasoning). Default: medium. llm: LLM to use. Required when effort_level is None. iteration_budget: Number of agent iterations (0-20). Required when effort_level is None. - include_research: Include research notes. Required when effort_level is None. + include_reasoning: Include reasoning notes. Required when effort_level is None. response_model: Pydantic model for the response schema. return_table: If True, return a TableResult instead of ScalarResult. @@ -204,7 +204,7 @@ async def single_agent[T: BaseModel]( effort_level=effort_level, llm=llm, iteration_budget=iteration_budget, - include_research=include_research, + include_reasoning=include_reasoning, response_model=response_model, return_table=return_table, ) @@ -216,7 +216,7 @@ async def single_agent[T: BaseModel]( effort_level=effort_level, llm=llm, iteration_budget=iteration_budget, - include_research=include_research, + include_reasoning=include_reasoning, response_model=response_model, return_table=return_table, ) @@ -230,7 +230,7 @@ async def single_agent_async[T: BaseModel]( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, response_model: type[T] = DefaultAgentResponse, return_table: bool = False, ) -> EveryrowTask[T]: @@ -252,7 +252,7 @@ async def single_agent_async[T: BaseModel]( else UNSET, llm=LLMEnumPublic(llm.value) if llm is not None else UNSET, iteration_budget=iteration_budget if iteration_budget is not None else UNSET, - include_research=include_research if include_research is not None else UNSET, + include_reasoning=include_reasoning if include_reasoning is not None else UNSET, return_list=return_table, ) @@ -278,7 +278,7 @@ async def agent_map( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, enforce_row_independence: bool = False, response_model: type[BaseModel] = DefaultAgentResponse, ) -> TableResult: @@ -289,10 +289,10 @@ async def agent_map( session: Optional session. If not provided, one will be created automatically. input: The input table (DataFrame, UUID, or TableResult). effort_level: Effort level preset (low/medium/high). Mutually exclusive with - custom params (llm, iteration_budget, include_research). Default: low. + custom params (llm, iteration_budget, include_reasoning). Default: low. llm: LLM to use for each agent. Required when effort_level is None. iteration_budget: Number of agent iterations per row (0-20). Required when effort_level is None. - include_research: Include research notes. Required when effort_level is None. + include_reasoning: Include reasoning notes. Required when effort_level is None. response_model: Pydantic model for the response schema. Returns: @@ -309,7 +309,7 @@ async def agent_map( effort_level=effort_level, llm=llm, iteration_budget=iteration_budget, - include_research=include_research, + include_reasoning=include_reasoning, enforce_row_independence=enforce_row_independence, response_model=response_model, ) @@ -324,7 +324,7 @@ async def agent_map( effort_level=effort_level, llm=llm, iteration_budget=iteration_budget, - include_research=include_research, + include_reasoning=include_reasoning, enforce_row_independence=enforce_row_independence, response_model=response_model, ) @@ -341,7 +341,7 @@ async def agent_map_async( effort_level: EffortLevel | None = DEFAULT_EFFORT_LEVEL, llm: LLM | None = None, iteration_budget: int | None = None, - include_research: bool | None = None, + include_reasoning: bool | None = None, enforce_row_independence: bool = False, response_model: type[BaseModel] = DefaultAgentResponse, ) -> EveryrowTask[BaseModel]: @@ -361,7 +361,7 @@ async def agent_map_async( else UNSET, llm=LLMEnumPublic(llm.value) if llm is not None else UNSET, iteration_budget=iteration_budget if iteration_budget is not None else UNSET, - include_research=include_research if include_research is not None else UNSET, + include_reasoning=include_reasoning if include_reasoning is not None else UNSET, join_with_input=True, enforce_row_independence=enforce_row_independence, ) diff --git a/tests/test_ops.py b/tests/test_ops.py index 2f3c6ac2..5a81fa86 100644 --- a/tests/test_ops.py +++ b/tests/test_ops.py @@ -431,12 +431,12 @@ async def test_single_agent_with_effort_level_preset(mocker, mock_session): # Custom params should be UNSET when using preset assert body.llm is UNSET assert body.iteration_budget is UNSET - assert body.include_research is UNSET + assert body.include_reasoning is UNSET @pytest.mark.asyncio async def test_single_agent_with_custom_params(mocker, mock_session): - """Test that custom params (llm, iteration_budget, include_research) are sent correctly.""" + """Test that custom params (llm, iteration_budget, include_reasoning) are sent correctly.""" task_id = uuid.uuid4() artifact_id = uuid.uuid4() @@ -472,7 +472,7 @@ async def test_single_agent_with_custom_params(mocker, mock_session): effort_level=None, llm=LLM.CLAUDE_4_5_HAIKU, iteration_budget=5, - include_research=True, + include_reasoning=True, ) # Verify the body sent to the API @@ -484,7 +484,7 @@ async def test_single_agent_with_custom_params(mocker, mock_session): # Custom params should have the specified values assert body.llm == LLMEnumPublic.CLAUDE_4_5_HAIKU assert body.iteration_budget == 5 - assert body.include_research is True + assert body.include_reasoning is True @pytest.mark.asyncio @@ -536,7 +536,7 @@ async def test_agent_map_with_effort_level_preset(mocker, mock_session): assert body.effort_level == PublicEffortLevel.HIGH assert body.llm is UNSET assert body.iteration_budget is UNSET - assert body.include_research is UNSET + assert body.include_reasoning is UNSET @pytest.mark.asyncio @@ -581,7 +581,7 @@ async def test_agent_map_with_custom_params(mocker, mock_session): effort_level=None, llm=LLM.GPT_5_MINI, iteration_budget=10, - include_research=False, + include_reasoning=False, ) # Verify the body sent to the API @@ -591,4 +591,4 @@ async def test_agent_map_with_custom_params(mocker, mock_session): assert body.effort_level is UNSET assert body.llm == LLMEnumPublic.GPT_5_MINI assert body.iteration_budget == 10 - assert body.include_research is False + assert body.include_reasoning is False