<a href="https://colab.research.google.com/github/liyanonline/Artificial-Intelligence-with-Python/blob/master/Copy_of_Get_Started_with_Tavily_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to Tavily!

<img src="https://drive.google.com/uc?id=1TSA6HaZ9nRVKo64oOg3afPU7_P33ctYF" alt="Tavily logo" width="200">


Tavily is a specialized search API **designed specifically for LLMs**, enabling developers to build AI applications that can access real-time, accurate web data.

In this notebook, you will learn step-by-step how to get started with Tavily, and build a **RAG-powered daily news digest app** using our main services - [Tavily Search](https://docs.tavily.com/docs/python-sdk/tavily-search/getting-started) and [Tavily Extract](https://docs.tavily.com/docs/python-sdk/tavily-extract/getting-started).

You will first be building a simple version of the app using only Tavily Search, and then combining the power of Tavily Search and Tavily Extract to make a more advanced app.

Feel free to add `print` statements along the way to understand what's going on under the hood.

Let's get started!

> **NOTE:** Please make sure to execute all the cells in order, otherwise you might run into issues.

# Pre-requisites

Before writing any code, you will need a **Tavily API key**, and an **OpenAI API key**. You will also need to install some **required dependencies** for this notebook.

## Tavily API key

To generate a Tavily API key, head to https://app.tavily.com. Once you sign up, you will be directed to your Tavily Dashboard. There, you can find your default API key, or generate a new one using the "+" button.

<img src="https://drive.google.com/uc?id=1lRgvkKh5hzwC5qGJ1HUfRdbYB8SnJ3h-" alt="OpenAI Platform" width="700">


Once you get your API key, paste it in the cell below. Then, run the cell.

In [None]:
TAVILY_API_KEY = "" # Paste your API key here

## OpenAI API key

To generate an OpenAI API key, head to https://platform.openai.com. Once you sign up, you will be directed to your OpenAI Dashboard. There, you can generate an API key by clicking on "Start building".

<img src="https://drive.google.com/uc?id=1Ousf91dc7QjzTjcycHmpiRreNOLgXc4u" alt="OpenAI Platform" width="700">

Once you get your API key, paste it in the cell below. Then, run the cell.

In [None]:
OPENAI_API_KEY = "" # Paste your API key here

## Dependencies

Before you can write code, you will need to install some Python packages using `pip`:
1. `tavily-python` will allow you to natively access the Tavily API using Python.
2. `langchain_openai` will allow you to natively access the OpenAI GPT-4o API using Python, and use structured outputs.


Run the next cell to install the required dependencies.

In [None]:
!pip install tavily-python
!pip install langchain_openai

# How to use Tavily Search

Before we get to the actual apps, we need to know how to use Tavily Search.

Tavily Search is our search API that allows you to search the internet for relevant and accurate information, in only a few lines of code.

In [None]:
from tavily import TavilyClient

tavily_client = TavilyClient(TAVILY_API_KEY)

search_response = tavily_client.search("Who is Leo Messi?")

print(search_response)

That's it! As simple as that. Now, you know how to run a very basic query using Tavily Search.

You can customize the parameters for the search. Head to the [docs](https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference#keyword-arguments-optional) to learn more about each parameter and how it affects the results.

# How to use Tavily Extract

For our apps, we'll also need to know how to extract content from websites using Tavily Extract.

Tavily Extract is our powerful content extraction API, allowing you to extract full content from URLs. It is designed to work seamlessly with Tavily Search.

In [None]:
extract_response = tavily_client.extract([
    "https://en.wikipedia.org/wiki/Lionel_Messi",
    "https://www.fcbarcelona.com/en/",
    "https://www.intermiamicf.com/news/"
])

print(extract_response)

And that's all! Now you have all the tools you need to build your first RAG app with Tavily!

# Simple RAG Example

Now, let's move on to the first real example. We'll be building a daily news digest app.

In this example, we'll be fetching some information using **Tavily Search**, and then using this information to generate the daily news digest.

<img src="https://drive.google.com/uc?id=1M4DISVdn6AD0_0vYE6zYGYlQZn7cQ7Le" alt="Simple RAG flowchart">

First, you'll define a few topics that you are interested in getting news about.

In [None]:
topics = [
    "", # Fill in topic 1
    "", # Fill in topic 2
    ""  # Fill in topic 3
]

## Tavily News Search
Then, we will perform the research using Tavily's News Search, and extract the source content and URLs.

In [None]:
context = []

for topic in topics:

  search_response = tavily_client.search(topic, topic="news", time_range="day")

  context.append({
      "topic": topic,
      "results": [
          { "url": result["url"], "title": result["title"], "content": result["content"] } for result in search_response["results"]
      ]
  })

print(context)

Notice that we used the keyword argument `topic="news"` when conducting the search. This allows us to use the Tavily News Search Agent, which is optimized for searching through news. We are also limiting the results to articles from the last day with `time_range="day"`.

Head to our [docs](https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference#keyword-arguments-optional) for more information on the different keyword arguments and their effects.

## Generating the daily news digest using GPT-4o through LangChain
Now, we'll use our search results as context for GPT-4o and generate your daily digest.

In [None]:
from langchain_openai import ChatOpenAI
from datetime import datetime

gpt_4o = ChatOpenAI(api_key=OPENAI_API_KEY, model="gpt-4o", temperature=0.0)

prompt = """
    You are a Journalist agent.

    - Generate a daily news digest. Today's date is {date}.
    - Use only the following sources to get accurate information for each topic and write a short article about it:
      {context}.
    """

formatted_prompt = prompt.format(context=context, date=datetime.now().strftime("%Y-%m-%d"))
gpt_4o_response = gpt_4o.invoke(formatted_prompt)

print(gpt_4o_response.content)

You've now got your own custom daily news digest, powered by Tavily Search!

# Intermediate RAG Example

This daily news digest is pretty good, but you might have noticed a few potential issues:
1. The results aren't very detailed for some of the chosen topics. This is because we're not extracting the full content from each source.
2. The output doesn't have an explicitly defined structure. Across different runs of the program, you might get an output with a different structure.

We are going to solve these challenges here by improving on our Simple RAG application using **Tavily Extract**, and **LangChain's Structured Output**.

We will be sending our Tavily Search results to Tavily Extract to get the full context, and then we'll feed that information into GPT-4o to get a more accurate daily news digest.

<img alt="Intermediate RAG flowchart" src="https://drive.google.com/uc?id=1279tExnlk1xZbX0vP7IU2vJIOhZ5lKOi">

Let's get started!

## Tavily News Search
As before, we'll start by performing a Tavily News Search to get some relevant sources. However, Tavily Search only provides short snippets of content. We'll use Tavily Extract to get the full content from the sources.

In [None]:
context = []

for topic in topics:

  search_query = f"Today's latest news about {topic}"
  search_response = tavily_client.search(topic, topic="news", time_range="day")

  context.append({
      "topic": topic,
      "sources": [
          { "url": result["url"], "title": result["title"] } for result in search_response["results"]
      ]
  })

## Full content extraction using Tavily Extract
Now that we have the sources, let's use Tavily Extract to get their contents.

In [None]:
extracted_results = []

for topic in context:
  extract_response = tavily_client.extract([source["url"] for source in topic["sources"]])

  for extracted_result in extract_response["results"]:
    for source in topic["sources"]:
      if source["url"] == extracted_result["url"]:
        source["content"] = extracted_result["raw_content"]

  for extracted_result in extract_response["failed_results"]:
    for source in topic["sources"]:
      if source["url"] == extracted_result["url"]:
        topic["sources"].remove(source)

  extracted_results.append(topic)

## Defining the output structure with Pydantic

Now, we have filtered the sources to only include sources for which we were able to extract the full content.

Let's now configure the structure of our output. We want our output to have a clearly defined structure so we can use it

In [None]:
from pydantic import BaseModel, Field

class TopicSection(BaseModel):
  title: str
  article: str
  sources: list[str]

class DailyNewsDigest(BaseModel):
  sections: list[TopicSection]


## Generating the structured daily news digest using GPT-4o through LangChain
Time to generate our structured output using our LLM!

In [None]:
prompt = """
    You are a Journalist agent.

    - Generate a daily news digest. Today's date is {date}.
    - Use only the following sources to get accurate information for each topic and write a short article about it:
      {context}.
    - Also provide a list of citations.
    """

formatted_prompt = prompt.format(context=extracted_results, date=datetime.now().strftime("%Y-%m-%d"))

gpt_4o_response = gpt_4o.with_structured_output(DailyNewsDigest).invoke(formatted_prompt)

for section in gpt_4o_response.sections:
  print(f"Topic: {section.title}")
  print(f"Content: {section.article}")
  print(f"Sources: {section.sources}")
  print()

And that's it! You now have a structured daily news digest, powered by Tavily.

# Conclusion

We hope you enjoyed this tutorial and got to familiarize yourself with the Tavily Search and Tavily Extract APIs. Here are some more resources to learn about Tavily:

- If you want to further experiment with Tavily Search and understand the effects of each parameter, head to our [API Playground](https://app.tavily.com/playground).

- For more examples and use cases, as well as more complex agentic (and multi-agent) workflows using Tavily, head to our [blog](https://blog.tavily.com).

- If you have any questions about this notebook, or anything else related to Tavily, please get in touch with our team on our [Developer Community](https://community.tavily.com) or at support@tavily.com.