<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/retrievers/you_retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# You.com Retriever

This notebook demonstrates how to use You.com's Search API as a retriever in LlamaIndex. The API automatically returns relevant web and/or news results based on your query. Visit our docs to learn more about our Search and other APIs: https://docs.you.com/

The retriever converts You.com's search results into LlamaIndex's standard format (`NodeWithScore`), allowing you to:
- Use search results as context for LLM queries
- Combine with other retrievers (vector stores, databases)
- Integrate seamlessly with query engines and agents


Running cells with '.venv (Python 3.13.9)' requires the `ipykernel` package. You may need to install it into your Python environment.

To get started, install the `llama-index-retrievers-you` package.

In [None]:
%pip install llama-index-retrievers-you

## Setup

Get your API key from the [You.com platform](https://you.com/platform)

In [None]:
import os
from getpass import getpass

# Set your API key
you_api_key = os.environ.get("YDC_API_KEY") or getpass(
    "Enter your You.com API key: "
)

## Basic usage

First, let's set up the retriever and see what data it returns:

In [None]:
from llama_index.retrievers.you import YouRetriever

retriever = YouRetriever(api_key=you_api_key)
retrieved_results = retriever.retrieve("national parks in the US")

print(f"Retrieved {len(retrieved_results)} results")

for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 10 results

Result 1:
  Text: National monuments, on the other hand, are also frequently protected for their historical or archaeological significance. Eight national parks (including six in Alaska) are paired with a national preserve, areas with different levels of protection that are administered together but considered separate units and whose areas are not included in the figures below.
A bill creating the first national park, Yellowstone, was signed into law by President Ulysses S. Grant in 1872, followed by Mackinac National Park in 1875 (decommissioned in 1895), and then Rock Creek Park (later merged into National Capital Parks), Sequoia and Yosemite in 1890.
Fourteen national parks are designated UNESCO World Heritage Sites (WHS), and 21 national parks are named UNESCO Biosphere Reserves (BR), with eight national parks in both programs. Thirty states have national parks, as do the territories of American Samoa and the U.S.
The state with the most national parks is Cal

## Async usage

The retriever also supports async operations.

In [None]:
from llama_index.retrievers.you import YouRetriever

retriever = YouRetriever(api_key=you_api_key)

# Use aretrieve for async operations
retrieved_results = await retriever.aretrieve("national parks in the US")

print(f"Retrieved {len(retrieved_results)} results asynchronously")

for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 10 results asynchronously

Result 1:
  Text: National monuments, on the other hand, are also frequently protected for their historical or archaeological significance. Eight national parks (including six in Alaska) are paired with a national preserve, areas with different levels of protection that are administered together but considered separate units and whose areas are not included in the figures below.
A bill creating the first national park, Yellowstone, was signed into law by President Ulysses S. Grant in 1872, followed by Mackinac National Park in 1875 (decommissioned in 1895), and then Rock Creek Park (later merged into National Capital Parks), Sequoia and Yosemite in 1890.
Fourteen national parks are designated UNESCO World Heritage Sites (WHS), and 21 national parks are named UNESCO Biosphere Reserves (BR), with eight national parks in both programs. Thirty states have national parks, as do the territories of American Samoa and the U.S.
The state with the most nation

## Getting the latest news

The You.com API can also news results automatically, based on your query.

In [None]:
# News-related queries will include news results in the response
from typing import Any

# You should see at most 5 results per type - news and web
# Notice the source_type: "news" or "web"
retriever = YouRetriever(api_key=you_api_key, count=5, country="IN")

retrieved_results = retriever.retrieve(
    "What are the latest geopolitical updates in India"
)

print(f"Retrieved {len(retrieved_results)} results")
for i, result in enumerate[Any](retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 10 results

Result 1:
  Text: This surge, driven by safe-haven demand amidst global geopolitical tensions, marks significant gains for both precious metals. Investors are now keenly awaiting US inflation data for future market direction. Silver rises to Rs 2.55L/kg; gold at Rs 1.445L per 10g ... Stock market outlook: Q3 earnings, inflation data in focus this week; global cues to steer sentiment ... Indian equity markets brace for an event-packed week, with the December quarter earnings season kicking off and key inflation data releases from India and the US on the horizon.
Stock market today: Which are the top 10 losers and gainers on NSE, BSE on January 12? Check list ... WEF Davos 2026: Donald Trump to attend summit with largest-ever US delegation; strong Indian side also expected ... US President Donald Trump will lead a large delegation to the World Economic Forum in Davos, Switzerland, starting January 18. Global leaders will convene amidst rising geopolitical tensions a

## Customizing Search Parameters

You can customize the search with optional parameters:

In [None]:
retriever = YouRetriever(
    api_key=you_api_key,
    count=20,  # Return up to 20 results per section (web/news)
    country="US",  # Focus on US results
    language="en",  # English results
    freshness="week",  # Results from the past week
    safesearch="moderate",  # Moderate safe search filtering
)

retrieved_results = retriever.retrieve("renewable energy breakthroughs")

print(f"Retrieved {len(retrieved_results)} recent results from the US")
for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 20 recent results from the US

Result 1:
  Text: Efficiency Breakthrough: Perovskite-silicon tandem solar cells achieving 34.6% efficiency represent a 57% improvement over traditional silicon panels, marking the most significant solar technology advancement in decades and positioning solar as the dominant renewable energy source.
The renewable energy sector is experiencing an unprecedented wave of innovation in 2025, with renewable energy innovations driving the global transition toward a carbon-free future. Currently generating 33% of global electricity, renewable sources are projected to capture a $3.6 trillion market by 2030. To achieve the critical 95% emissions reduction needed for climate goals, breakthrough technologies across solar, wind, storage, and grid integration are reshaping how we generate, store, and distribute clean energy.
Solar technology continues to lead renewable energy innovations with revolutionary advances that dramatically improve efficiency and exp

## Using with Query Engine

Now that we've seen how to customize the web data we want to retrieve, let's use an LLM to synthesize natural language answers from the search results. In this example, we'll use a model from Anthropic.

In [None]:
%pip install llama-index-llms-anthropic

In [None]:
import os
from getpass import getpass

# Set your Anthropic API key
anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY") or getpass(
    "Enter your Anthropic API key: "
)

In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
from llama_index.retrievers.you import YouRetriever

# Configure Anthropic as your LLM
llm = Anthropic(model="claude-haiku-4-5-20251001", api_key=anthropic_api_key)

# Create a query engine that uses You.com search results as context
retriever = YouRetriever(api_key=you_api_key)
query_engine = RetrieverQueryEngine.from_args(retriever, llm)

In [None]:
# The query engine:
# 1. Uses the retriever to fetch relevant search results from You.com
# 2. Passes those results as context to the LLM
# 3. Returns a synthesized answer

response = query_engine.query(
    "What are the most visited national parks in the US and why? keep it brief."
)

# Try a different query
# response = query_engine.query("What are the latest geopolitical updates from India")

print(str(response))

# Most Visited National Parks in the US

**Top 3 Most Visited:**

1. **Great Smoky Mountains National Park** (North Carolina/Tennessee) - 13.3 million visitors
   - Why: Gorgeous ancient mountains, diverse plant and animal life, historical Southern Appalachian culture, and home to American black bears and other wildlife

2. **Zion National Park** (Utah) - 4.9 million visitors
   - Why: Striking vertical topography with red rock formations, sandstone canyons, and sharp cliffs

3. **Grand Canyon National Park** (Arizona) - 4.7 million visitors
   - Why: Iconic mile-deep canyon with spectacular erosion examples and incomparable vistas from both rims

**Other Popular Parks:**
- **Yosemite National Park** (California) - Famous for towering granite cliffs, waterfalls, and giant sequoias
- **Glacier National Park** (Montana) - Mountain landscapes shaped by glacial forces

These parks attract millions of visitors annually due to their dramatic natural landscapes, diverse ecosystems, abundant w

## Why this format?

The retriever converts You.com's JSON response into LlamaIndex's standard `NodeWithScore` format. This provides:

**Benefits:**
- **Source-agnostic**: Same interface whether retrieving from You.com, vector DBs, or other sources
- **Composability**: Easily combine multiple retrievers or swap them out
- **Integration**: Works seamlessly with LlamaIndex query engines, agents, and other components

**What's preserved:**
- **Text content**: Snippets from web results or descriptions from news articles
- **Metadata**: URL, title, page_age stored in the `metadata` dict
- **Score**: Relevance score (1.0 by default since You.com doesn't provide scores)

This abstraction lets you focus on building applications rather than handling API-specific response formats.