diff --git a/docs/RANK.md b/docs/RANK.md index 8bc2d30a..ee754727 100644 --- a/docs/RANK.md +++ b/docs/RANK.md @@ -1,16 +1,24 @@ # Rank -Score rows based on criteria you can't put in a database field. +AI-powered ranking using natural language criteria ## The problem -Traditional lead scoring uses firmographic data—employee count, industry code, funding stage. But "likelihood to need data integration tools" isn't in any database. +If your dataset already contains the data you want to rank by, then sorting is easy. -Ultramain Systems and Ukraine International Airlines both show up as "Aviation" in your CRM. One sells software to airlines (simple ops, unified systems). The other *is* an airline (complex ops, dozens of data sources, legacy integrations everywhere). Their actual needs are opposite, but they look identical on paper. +But what do you do if your data is in an unstructured format? Or what if it requires researching every row to find what you need? + +For instance, let's say you're trying to prioritize sales leads. You may have employee count, industry code, and funding stage. But "likelihood to need data integration tools" isn't in any database. ## How it works -You describe what you want to score in plain English. For each row, agents research the company and assign a score with reasoning. +`rank` uses AI research agents to find the metric you specify for each row of your dataset. Then it sorts the rows by that metric. + +Our research agents can search the internet, read webpages and documents, extract relevant information, and reason with nuance about what they find. + +## Examples + +You describe the metric you want to rank by in plain English. ```python from everyrow.ops import rank @@ -19,10 +27,11 @@ result = await rank( task="Score by likelihood to need data integration solutions", input=leads_dataframe, field_name="integration_need_score", + ascending_order=False, # highest first ) ``` -The task can be as specific as you want: +The task can be as specific as you want. You can describe the metric in detail, list which sources to use, and explain how to resolve ambiguities. ```python result = await rank( @@ -32,6 +41,8 @@ result = await rank( High scores: teams actively publishing, hiring researchers, or with recent funding for R&D. Low scores: pure trading shops, firms with no public research output. + + Consult the company's website, job postings, and LinkedIn profile for information. """, input=investment_firms, field_name="research_adoption_score", @@ -39,48 +50,46 @@ result = await rank( ) ``` -## Structured output +### Structured output + +If you want more than just a number, pass a Pydantic model. -If you want more than just a number, pass a Pydantic model: +Note that you don't need specify fields for reasoning, explanation or sources. That information is included automatically. ```python from pydantic import BaseModel, Field -class LeadScore(BaseModel): - score: float = Field(description="0-100, higher = more likely to need integration") - reasoning: str = Field(description="Why this score") - key_signal: str = Field(description="The single most important factor") +class AcquisitionScore(BaseModel): + fit_score: float = Field(description="0-100, strategic alignment with our business") + annual_revenue_usd: int = Field(description="Their estimated annual revenue in USD") result = await rank( - task="Score by data integration needs", - input=leads, - field_name="score", - response_model=LeadScore, + task="Score acquisition targets by product-market fit and revenue quality", + input=potential_acquisitions, + field_name="fit_score", + response_model=AcquisitionScore, + ascending_order=False, # highest first ) ``` -Now each row has `score`, `reasoning`, and `key_signal` columns. +Now every row has both `fit_score` and `annual_revenue_usd` fields, each of which includes its own explanation. + +When specifying a response model, make sure that it contains `field_name`. Otherwise, you'll get an error. Also, the `field_type` parameter is ignored when you pass a response model. ## Parameters | Name | Type | Description | -|------|------|-------------| -| `task` | str | What to score and how | +| ---- | ---- | ----------- | +| `task` | str | The task for the agent describing how to find your metric | +| `session` | Session | Optional, auto-created if omitted | | `input` | DataFrame | Your data | -| `field_name` | str | Column name for the score | -| `response_model` | BaseModel | Optional structured output | +| `field_name` | str | Column name for the metric | +| `field_type` | str | The type of the field (default: "float") | +| `response_model` | BaseModel | Optional response model for multiple output fields | | `ascending_order` | bool | True = lowest first (default) | -| `session` | Session | Optional, auto-created if omitted | - -## Performance - -| Rows | Time | Cost | -|------|------|------| -| 100 | ~2 min | ~$1.50 | -| 1,000 | ~7 min | ~$13 | -| 5,000 | ~25 min | ~$60 | +| `preview` | bool | True = process only a few rows | ## Case studies -- [Lead Scoring with Data Fragmentation](https://futuresearch.ai/lead-scoring-data-fragmentation/) — 1,000 B2B leads ranked by data fragmentation risk -- [Lead Scoring Without CRM](https://futuresearch.ai/lead-scoring-without-crm/) — 85 investment firms scored for $28 (Clay wanted $145) +- [Ranking 1000 Businesses by Data Fragmentation Risk](https://futuresearch.ai/lead-scoring-data-fragmentation/): Ranking 1,000 B2B leads by data fragmentation risk +- [Rank Leads Like an Analyst, Not a Marketer](https://futuresearch.ai/lead-scoring-without-crm/): Using `rank` to score leads instead of a CRM