<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/retrievers/you_retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# You.com Retriever

This notebook demonstrates how to use You.com's Search API as a retriever in LlamaIndex. The API automatically returns relevant web and/or news results based on your query. Visit our docs to learn more about our Search and other APIs: https://docs.you.com/

The retriever converts You.com's search results into LlamaIndex's standard format (`NodeWithScore`), allowing you to:
- Use search results as context for LLM queries
- Combine with other retrievers (vector stores, databases)
- Integrate seamlessly with query engines and agents

In [None]:
%pip install llama-index-retrievers-you

## Setup

Get your API key from the [You.com platform](https://you.com/platform)

In [None]:
import os
from getpass import getpass

# Set your API key
if not os.environ.get("YDC_API_KEY"):
    you_api_key = getpass("Enter your You.com API key: ")

## Basic usage

First, let's set up the retriever and see what data it returns:

In [3]:
from llama_index.retrievers.you import YouRetriever

retriever = YouRetriever(api_key=you_api_key)
retrieved_results = retriever.retrieve("national parks in the US")

# The retriever returns a list of NodeWithScore objects
# We return 10 results per result type (web or news) by default
print(f"Retrieved {len(retrieved_results)} results\n")

# Each result contains:
# - node.text: The actual content (snippets or description)
# - node.metadata: Metadata (URL, title, page_age and more)

for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"Content: {result.node.text[:200]}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 10 results


Result 1:
Content: National monuments, on the other hand, are also frequently protected for their historical or archaeological significance. Eight national parks (including six in Alaska) are paired with a national pres...
Metadata:
  url: https://en.wikipedia.org/wiki/List_of_national_parks_of_the_United_States
  title: List of national parks of the United States - Wikipedia
  description: National monuments, on the other hand, are also frequently protected for their historical or archaeological significance. Eight national parks (including six in Alaska) are paired with a national preserve, areas with different levels of protection that are administered together but considered ...
  page_age: 2025-12-10T05:10:46
  thumbnail_url: https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/RNS_Yellowstone_13399u.jpg/960px-RNS_Yellowstone_13399u.jpg
  favicon_url: https://you.com/favicon?domain=en.wikipedia.org&size=128
  source_type: web

Result 2:
Content: Secure 

## Understanding the results

The You.com API returns both web and news results (when relevant). The retriever processes both types:

In [4]:
# News-related queries will include news results in the response
retriever = YouRetriever(
    api_key=you_api_key,
    count=5
)

retrieved_results = retriever.retrieve("What are the latest geopolitical updates from India")

print(f"Retrieved {len(retrieved_results)} results")
for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text[:200]}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 5 results

Result 1:
  Text: Indian equity markets brace for an event-packed week, with the December quarter earnings season kicking off and key inflation data releases from India and the US on the horizon. Investors will closely...
Metadata:
  url: https://timesofindia.indiatimes.com/topic/geopolitics/news
  title: Geopolitics News | Latest News on Geopolitics - Times of India
  description: Indian equity markets brace for ... from India and the US on the horizon. Investors will closely monitor corporate results from IT, banking, and energy giants, alongside global macro and geopolitical developments, for near-term market direction. ... Gold price prediction today: Why are gold prices ...
  thumbnail_url: https://static.toiimg.com/photo/47529300.cms
  favicon_url: https://you.com/favicon?domain=timesofindia.indiatimes.com&size=128
  source_type: web

Result 2:
  Text: Precious and industrial metals are surging to record highs as the year ends, driven by economic and geopolit

## Customizing Search Parameters

You can customize the search with optional parameters:

In [5]:
retriever = YouRetriever(
    api_key=you_api_key,
    count=20,  # Return up to 20 results per section (web/news)
    country="US",  # Focus on US results
    language="en",  # English results
    freshness="week",  # Results from the past week
    safesearch="moderate"  # Moderate safe search filtering
)

retrieved_results = retriever.retrieve("renewable energy breakthroughs")

print(f"Retrieved {len(retrieved_results)} recent results from the US")
for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text[:200]}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Retrieved 20 recent results from the US

Result 1:
  Text: Engineers have unlocked a new class of supercapacitor material that could rival traditional batteries in energy while charging dramatically faster. By redesigning carbon structures into highly curved,...
Metadata:
  url: https://www.sciencedaily.com/releases/2025/11/251130205509.htm
  title: New graphene breakthrough supercharges energy storage | ScienceDaily
  description: Engineers have unlocked a new class of supercapacitor material that could rival traditional batteries in energy while charging dramatically faster. By redesigning carbon structures into highly curved, accessible graphene networks, the team achieved record energy and power densities—enough ...
  page_age: 2026-01-14T14:56:35
  thumbnail_url: https://www.sciencedaily.com/images/1920/graphene-energy-storage.webp
  favicon_url: https://you.com/favicon?domain=www.sciencedaily.com&size=128
  source_type: web

Result 2:
  Text: If you thought ground-based wind turb

## Using with Query Engine

Now that we've seen how to customize the web data we want to retrieve, let's use an LLM to synthesize natural language answers from the search results. In this example, we'll use a model from Anthropic.

In [None]:
%pip install llama-index-llms-anthropic

In [7]:
import os
from getpass import getpass

# Set your Anthropic API key
if not os.environ.get("ANTHROPIC_API_KEY"):
    anthropic_api_key = getpass("Enter your Anthropic API key: ")
else:
    anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY")

In [8]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
from llama_index.retrievers.you import YouRetriever

# Configure Anthropic as your LLM
llm = Anthropic(
    model="claude-haiku-4-5-20251001",
    api_key=anthropic_api_key
)

# Create a query engine that uses You.com search results as context
retriever = YouRetriever(api_key=you_api_key)
query_engine = RetrieverQueryEngine.from_args(retriever, llm)

In [9]:
# The query engine:
# 1. Uses the retriever to fetch relevant search results from You.com
# 2. Passes those results as context to the LLM
# 3. Returns a synthesized answer

response = query_engine.query("What are the most visited national parks in the US and why? keep it brief.")

# Try a different query
# response = query_engine.query("What are the latest geopolitical updates from India")

print(str(response))

# Most Visited National Parks in the US

**Top 3 Most Visited:**

1. **Great Smoky Mountains National Park** (Tennessee/North Carolina) - 12-13 million visitors annually. It's the clear leader, drawing nearly three times more visitors than the second-place park. The park protects a beautiful section of the Appalachian Mountains and features diverse wildlife, ancient mountains, and historical Southern Appalachian culture.

2. **Grand Canyon National Park** (Arizona) - 4.7 million visitors. This iconic park encompasses 278 miles of the Colorado River and is renowned as one of the world's most spectacular examples of erosion, offering incomparable vistas from both rims.

3. **Zion National Park** (Utah) - 4.6-4.9 million visitors. Known for its striking vertical topography, including rock towers, sandstone canyons, and sharp cliffs carved by the Virgin River.

**Why They're Popular:**

These parks attract massive crowds due to their iconic natural landscapes, accessibility to major popula

## Why this format?

The retriever converts You.com's JSON response into LlamaIndex's standard `NodeWithScore` format. This provides:

**Benefits:**
- **Source-agnostic**: Same interface whether retrieving from You.com, vector DBs, or other sources
- **Composability**: Easily combine multiple retrievers or swap them out
- **Integration**: Works seamlessly with LlamaIndex query engines, agents, and other components

**What's preserved:**
- **Text content**: Snippets from web results or descriptions from news articles
- **Metadata**: URL, title, page_age stored in the `metadata` dict
- **Score**: Relevance score (1.0 by default since You.com doesn't provide scores)

This abstraction lets you focus on building applications rather than handling API-specific response formats.