Add strategy and strategy_prompt params to dedupe#110
Conversation
- Add `strategy` parameter to `dedupe()` / `dedupe_async()`: `"identify"`, `"select"` (default), or `"combine"` - Add `strategy_prompt` parameter for guiding LLM selection/combining - Update generated `DedupeOperation` model with new fields - Convert strategy string to `DedupeOperationStrategy` enum before passing to generated model (prevents AttributeError on serialization) - Update docs with strategy examples - Add integration tests for each strategy mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a45efab to
f08b083
Compare
| input_: DedupeOperationInputType2 | list[DedupeOperationInputType1Item] | UUID | ||
| equivalence_relation: str | ||
| session_id: None | Unset | UUID = UNSET | ||
| strategy: DedupeOperationStrategy | Unset = DedupeOperationStrategy.SELECT |
There was a problem hiding this comment.
How does unset work? would it work if we have a default already?
There was a problem hiding this comment.
Good question. Unset is how the generated client distinguishes "not provided" from an explicit value. When UNSET is passed, the field is omitted from the serialized JSON request, letting the server apply its own default. The default of DedupeOperationStrategy.SELECT here is the server's documented default — it's used when someone instantiates the model directly without specifying strategy. But from ops.py, when the user passes strategy=None, we pass UNSET to let the server decide. So | Unset in the type is needed because it's a valid runtime value (meaning "omit from request"), even though the attr default is SELECT.
There was a problem hiding this comment.
The above was Claude :)
src/everyrow/ops.py
Outdated
| Args: | ||
| equivalence_relation: Description of what makes items equivalent | ||
| session: Optional session. If not provided, one will be created automatically. | ||
| input: The input table (DataFrame, UUID, or TableResult) | ||
| strategy: Strategy for handling duplicates: 'identify' (cluster only), | ||
| 'select' (pick best, default), 'combine' (synthesize combined row) | ||
| strategy_prompt: Optional instructions guiding how selection or combining is performed |
There was a problem hiding this comment.
sorry this goes beyond your PR, but I think these might not be informative enough for CC or the like.
There was a problem hiding this comment.
Agreed — expanded the docstrings for strategy and strategy_prompt on both dedupe() and the generated DedupeOperation model to be much more descriptive. Each strategy mode now explains what columns are added, when to use it, and how strategy_prompt interacts with it. Should be much more useful for CC and similar tools reading the docstrings.
Address PR review feedback: make parameter descriptions more verbose so that Claude Code and similar tools can understand the full behavior of each strategy mode and how strategy_prompt interacts with them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
strategyparameter todedupe()/dedupe_async():"identify","select"(default), or"combine"strategy_promptparameter for guiding LLM selection/combining behaviorDedupeOperationmodel with new fieldsTest plan
to_dict/from_dict) handles new fields🤖 Generated with Claude Code