# How to make an AI that uses functions to summarize long documents

This tutorial takes you through an example of using OpenAI's new function calling feature to download academic papers and summarize them based on user queries. Upon completion you will be prepared to take on more production-ready scenarios of deploying functions with your own knowledge bases and other internal services.

### Getting started (WIP - this section to be rewritten)

Some basic knowledge of Python and GitHub is helpful for this tutorial. Before diving in, make sure to set up an OpenAI API key and walk through the quickstart tutorial. This will give a good intuition on how to use the API to its full potential.

Python is used as the main programming language along with the OpenAI, Pandas, transformers, NumPy, and other popular packages. If you run into any issues working through this tutorial, please ask a question on the OpenAI Community Forum.

To start with the code, clone the full code for this tutorial on GitHub. Alternatively, follow along and copy each section into a Jupyter notebook and run the code step by step, or just read along. A good way to avoid any issues is to set up a new virtual environment and install the required packages by running the following commands:

## Wiring up the search service

The main focus of this tutorial is building a function to work with a search service, so if the search service itself isn't of interest then please skip down to the next section on **Defining a function**. This section takes you through the process of setting up a search service to download papers and summarize them to answer user questions.

For this example we'll use the capable search service from [arXiv](https://arxiv.org/search/), which stores a huge quantity of academic papers across a range of scientific disciplines. This can be replaced by any knowledge base, document repository or other repository of text that is relevant for your use case.

### Search and download articles

The first step is to create a directory to hold the papers that we download based on user searches, and to create an empty ```csv``` file that will hold references to our downloaded files so we can summarize the most relevant one.

In [90]:
import os
import pandas as pd

# Set a directory to store downloaded papers
data_dir = os.path.join(os.curdir, "papers")
paper_dir_filepath = "arxiv_library.csv"

# Generate a blank dataframe where we can store downloaded files
df = pd.DataFrame(list())
df.to_csv(paper_dir_filepath)

We'll also define what models to use for this task. Our ```GPT_MODEL``` will be ```gpt-3.5-turbo-0613``` as this has been enabled for the function calling feature - more details on compatible models can be found in our function calling [documentation](https://platform.openai.com/docs/guides/gpt/function-calling). We've also gone with ```text-embedding-ada-002``` because this is currently our only widely used embeddings model, though more may come in future and will be found in our [models](https://platform.openai.com/docs/models/embeddings) page.

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")

GPT_MODEL = "gpt-3.5-turbo-0613"
EMBEDDING_MODEL = "text-embedding-ada-002"

This function takes in text and returns an embedding that can be used for a similarity search to find the most relevant search result to the user's query. To learn more about embeddings please refer to our [documentation](https://platform.openai.com/docs/guides/embeddings).

In [None]:
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt

@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))
def embedding_request(text):
    response = openai.Embedding.create(input=text, model=EMBEDDING_MODEL)
    return response

The ```get_articles``` function uses the ```arxiv``` Python library to take in a user query and return the most relevant search results. For each retrieved paper, we:
- Create an empty dictionary
- Fill it with the title, summary, article and download URLs
- Download a PDF and save it to our ```papers``` directory
- Create an embedding of the title and summary together for later retrieval
- Store the title, summary, filepath and embedding in our library file ```arxiv_library.csv``` to be retrieved against

In [178]:
import arxiv
from csv import writer

def get_articles(query, library=paper_dir_filepath, top_k=3):
    """This function gets the top_k articles based on a user's query, sorted by relevance.
    It also downloads the files and stores them in arxiv_library.csv to be retrieved by the read_article_and_summarize.
    """
    search = arxiv.Search(
        query=query, max_results=top_k, sort_by=arxiv.SortCriterion.Relevance
    )
    result_list = []
    for result in search.results():
        result_dict = {}
        result_dict.update({"title": result.title})
        result_dict.update({"summary": result.summary})

        # Taking the first url provided
        result_dict.update({"article_url": [x.href for x in result.links][0]})
        result_dict.update({"pdf_url": [x.href for x in result.links][1]})
        result_list.append(result_dict)

        # Store references for library file
        text_for_embedding = f"Title: {result.title}\nSummary: {result.summary}"
        response = embedding_request(text=text_for_embedding)
        file_reference = [
            result.title,
            result.summary,
            result.download_pdf(data_dir),
            response["data"][0]["embedding"],
        ]

        # Write file_reference to library file
        print(f'Downloading "{result.title}" to knowledge base')
        with open(library, "a") as f_object:
            writer_object = writer(f_object)
            writer_object.writerow(file_reference)
            f_object.close()
    return result_list


In [92]:
result_output = get_articles('Tree of thought reasoning')
result_output[0]


Downloading "Large Language Model Guided Tree-of-Thought" to knowledge base
Downloading "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" to knowledge base
Downloading "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models" to knowledge base
Downloading "Reasoning with Language Model is Planning with World Model" to knowledge base
Downloading "Categorical Buechi and Parity Conditions via Alternating Fixed Points of Functors" to knowledge base


{'title': 'Large Language Model Guided Tree-of-Thought',
 'summary': "In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel\napproach aimed at improving the problem-solving capabilities of auto-regressive\nlarge language models (LLMs). The ToT technique is inspired by the human mind's\napproach for solving complex reasoning tasks through trial and error. In this\nprocess, the human mind explores the solution space through a tree-like thought\nprocess, allowing for backtracking when necessary. To implement ToT as a\nsoftware system, we augment an LLM with additional modules including a prompter\nagent, a checker module, a memory module, and a ToT controller. In order to\nsolve a given problem, these modules engage in a multi-round conversation with\nthe LLM. The memory module records the conversation and state history of the\nproblem solving process, which allows the system to backtrack to the previous\nsteps of the thought-process and explore other directions from th

### Summarize the most relevant article

This next block of functions takes in a filepath to the most relevant downloaded PDF, and does the following to break long documents into manageable pieces:
- ```read_pdf```: Reads in the PDF text and page numbers, and returns it as one long string.
- ```create_chunks```: Intelligently breaks the PDF into n-length token chunks (in our demonstration we use 1500), avoiding stopping in the middle of sentences.
- ```extract_chunk```: Applies a prompt to a chunk of text using OpenAI's ```ChatCompletion``` endpoint.

In [93]:
from PyPDF2 import PdfReader

def read_pdf(filepath):
    """Takes a filepath to a PDF and returns a string of the PDF's contents"""
    # creating a pdf reader object
    reader = PdfReader(filepath)
    pdf_text = ""
    page_number = 0
    for page in reader.pages:
        page_number += 1
        pdf_text += page.extract_text() + f"\nPage Number: {page_number}"
    return pdf_text


# Split a text into smaller chunks of size n, preferably ending at the end of a sentence
def create_chunks(text, n, tokenizer):
    """Returns successive n-sized chunks from provided text."""
    tokens = tokenizer.encode(text)
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j


def extract_chunk(content, template_prompt):
    """This function applies a prompt to some input content. In this case it returns a summarized chunk of text"""
    prompt = template_prompt + content
    response = openai.ChatCompletion.create(
        model=GPT_MODEL, messages=[{"role": "user", "content": prompt}], temperature=0
    )
    return response["choices"][0]["message"]["content"]

This next block contains the ```summarize_text``` function, which will be the key one driving our user results. In it we:
- Read in our library file and get the filepath for the most relevant paper to the user's query using the ```search_embeddings``` function.
- Read in the PDF using ```read_pdf``` and chunk it up into 1500 tokens using ```create_chunks```.
- Summarize every chunk using the ```summary_prompt``` and the ```extract_chunk``` function. Because papers can have many chunks, we do this in parallel using ```concurrent```.
- Perform a final summary of the summaries into a standard format which should answer the user's question.

In [105]:
import ast
import concurrent
from openai.embeddings_utils import distances_from_embeddings
import tiktoken
from tqdm import tqdm


def search_embeddings(query,df,top_n=1):
    
    query_embedding = embedding_request(query)
    
    df['distances'] = distances_from_embeddings(query_embedding["data"][0]["embedding"]
                                              , df['embedding'].values
                                              , distance_metric='cosine')
    
    return list(df.sort_values('distances', ascending=True)['filepath'])[:top_n]

def summarize_text(query):
    """This function does the following:
    - Reads in the arxiv_library.csv file in including the embeddings
    - Finds the closest file to the user's query
    - Scrapes the text out of the file and chunks it
    - Summarizes each chunk in parallel
    - Does one final summary and returns this to the user"""

    # A prompt to dictate how the recursive summarizations should approach the input paper
    summary_prompt = """Summarize this text from an academic paper. Extract any key points with reasoning.\n\nContent:"""

    # If the library is empty (no searches have been performed yet), we perform one and download the results
    library_df = pd.read_csv(paper_dir_filepath).reset_index()
    if len(library_df) == 0:
        print("No papers searched yet, downloading first.")
        get_articles(query)
        print("Papers downloaded, continuing")
        library_df = pd.read_csv(paper_dir_filepath).reset_index()
    library_df.columns = ["title", "summary", "filepath", "embedding"]
    library_df["embedding"] = library_df["embedding"].apply(ast.literal_eval)
    strings = search_embeddings(query, library_df,top_n=1)
    print("Chunking text from paper")
    pdf_text = read_pdf(strings[0])

    # Initialise tokenizer
    tokenizer = tiktoken.get_encoding("cl100k_base")
    results = ""

    # Chunk up the document into 1500 token chunks
    chunks = create_chunks(pdf_text, 1500, tokenizer)
    text_chunks = [tokenizer.decode(chunk) for chunk in chunks]
    print("Summarizing each chunk of text")

    # Parallel process the summaries
    with concurrent.futures.ThreadPoolExecutor(
        max_workers=len(text_chunks)
    ) as executor:
        futures = [
            executor.submit(extract_chunk, chunk, summary_prompt)
            for chunk in text_chunks
        ]
        with tqdm(total=len(text_chunks)) as pbar:
            for _ in concurrent.futures.as_completed(futures):
                pbar.update(1)
        for future in futures:
            data = future.result()
            results += data

    # Final summary
    print("Summarizing into overall summary")
    response = openai.ChatCompletion.create(
        model=GPT_MODEL,
        messages=[
            {
                "role": "user",
                "content": f"""Write a summary collated from this collection of key points extracted from an academic paper.
                        The summary should highlight the core argument, conclusions and evidence, and answer the user's query.
                        User query: {query}
                        The summary should be structured in bulleted lists following the headings Core Argument, Evidence, and Conclusions.
                        Key points:\n{results}\nSummary:\n""",
            }
        ],
        temperature=0,
    )
    return response

In [106]:
# Test the summarize_text function works
chat_test_response = summarize_text("PPO reinforcement learning sequence generation")
print(chat_test_response["choices"][0]["message"]["content"])

Chunking text from paper
Summarizing each chunk of text


100%|████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.48it/s]


Summarizing into overall summary
Core Argument:
The Reasoning via Planning (RAP) framework combines world models, rewards, and Monte Carlo Tree Search (MCTS) planning to enable large language models (LLMs) to perform complex reasoning tasks. RAP addresses the limitations of LLMs by incorporating a principled planning algorithm and utilizing the LLM as both a world model and a reasoning agent.

Evidence:
- RAP repurposes LLMs as world models and reasoning agents, allowing for grounded and coherent inference.
- Rewards are designed to assess the feasibility and desirability of reasoning steps.
- MCTS planning algorithm is used to explore the reasoning space and find optimal reasoning traces.
- RAP outperforms strong baselines in tasks such as plan generation, math reasoning, and logical inference.

Conclusions:
- RAP demonstrates superiority in plan generation, achieving a 33% relative improvement compared to the Chain-of-Thought baseline.
- RAP achieves high accuracy in math reasoning a

## Defining a function

At this stage we have a working search service that reads articles from arXiv and summarizes them to answer a user question. We now need to wrap this in a ```function``` definition so that ```ChatCompletion``` can take it in and make it available to our LLM.

An OpenAI ```function``` is defined with a ```name``` that the model will identify it with, a ```description``` that describes when to use it, and ```parameters``` that define what information the LLM needs to gather to use the function. The ```parameters``` can be required or optional, and you can also force the LLM to call a function (or not to call one) should your application logic demand it - for more details on these please refer to [this cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb) and our function calling [docs](https://platform.openai.com/docs/guides/gpt/function-calling).

The LLM should then check the user's message each time to see whether a ```function``` is necessary, and if it does it will return a response that includes the ```finish_reason``` of ```function_call```, triggering your application to call the named function. 

For our example, we'll define a function called ```read_article_and_summarize```. This will extract a ```query```, which is simply what the user searched for, and a ```previous_search``` enum to identify whether the user has searched this topic before. If they haven't, our application will trigger ```get_articles``` first so we have some relevant articles to answer their question with.

In [194]:
arxiv_functions = [
    {
        "name": "read_article_and_summarize",
        "description": """Use this function to answer the user's question using arXiv papers.
        You should use this tool to answer all questions that you haven't already answered.""",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": f"""
                            User's query in plain text.
                            """,
                },
                "previous_search": {
                    "type": "string",
                    "enum": ["yes","no"],
                    "description": """If the customer has asked a previous question on this topic say yes, otherwise say no.
                                    Infer this from the conversation history if the customer has asked a similar question"""
                }
            },
            "required": ["query","previous_search"],
        },
    }
]

## Building a question & answer agent to use the function

This wrapper function makes a ```ChatCompletion``` request that optionally provides ```functions``` as well. If the ```finish_reason``` is ```function_call```, then we execute the right functions to get the user the answer they need.

**This is an important concept** when working with ```functions```. OpenAI provides you the name of the function to call and the arguments that you should supply, but we do not call the function for you - you must write the logic to do this yourself.

In [187]:
def chat_completion_with_function_execution(messages, functions=[None]):
    """This function makes a ChatCompletion API call with the option of adding functions"""
    
    response = openai.ChatCompletion.create(model=GPT_MODEL
                                            ,messages=messages
                                            ,functions=functions)
    full_message = response["choices"][0]
        
    if full_message["finish_reason"] == "function_call":
        print(f"Function generation requested, calling function")
        return call_arxiv_function(messages, full_message)
    else:
        print(f"Function not required, responding to user")
        return response


The ```call_arxiv_function``` goes through the logic of figuring out what to do when the ```function_call``` is received. Here we have built in some logic so that if there hasn't been a ```previous_search``` on a similar topic for this user, we will execute a search first so that we have a relevant paper to summarize.

Once we have a relevant paper, we summarize it using the ```summarize_text``` function and provide the response back to the user.

In [184]:
import json

def call_arxiv_function(messages, full_message):
    """Function calling function which executes function calls when the model believes it is necessary.
    Currently extended by adding clauses to this if statement."""

    if full_message["message"]["function_call"]["name"] == "read_article_and_summarize":
        try:
            previous_search = json.loads(
                full_message["message"]["function_call"]["arguments"]
            )["previous_search"]

        except:
            previous_search = "no"

        if previous_search == "no":
            try:
                parsed_output = json.loads(
                    full_message["message"]["function_call"]["arguments"]
                )
                print("Getting search results")
                results = get_articles(parsed_output["query"])

                messages.append(
                    {
                        "role": "function",
                        "name": full_message["message"]["function_call"]["name"],
                        "content": str(results),
                    }
                )

            except Exception as e:
                print(parsed_output)
                print(f"Function execution failed")
                print(f"Error message: {e}")

        print("Finding and reading paper")
        summary = summarize_text(parsed_output["query"])
        return summary

    else:
        raise Exception("Function does not exist and cannot be called")

## Testing our agent

We now have all the pieces needed to test our agent. We'll initiate a ```messages``` list to hold our interactions between the different roles in our conversation. The key ```roles``` used in the ```ChatCompletion``` API are:
- ```system```: This is a guiding message which dictates the behaviour you want the LLM to display and any instructions you'd like to provide. In most UX's this is hidden from the user.
- ```user```: This is what the user interacting with the LLM has input - in this example you can imagine a user has a search bar available that they are typing their queries into.
- ```assistant```: Outputs from a call to ```ChatCompletion``` are always received with ```role``` = ```assistant```. This is the response from the LLM, which can be directed at either the ```user```, or at a ```function``` if the ```finish_reason``` is ```function_call```. 
- ```function```: The output from a function call being provided to the LLM to make its final response.

For this example we'll build up three turns of conversation with the LLM, with each one using either a function or directly responding to the user.

In [195]:
# Initialize an empty list of messages
messages = []

# Start with a system message
paper_system_message = """You are arXivGPT, a helpful assistant pulls academic papers to answer user questions.
You summarize the papers clearly so the customer can get the answer to the question.
Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.
Do not answer questions using your own knowledge, only use your functions.
Begin!"""
messages.append({"role": "system"
                 ,"content": paper_system_message})

# Add a user message
messages.append({"role": "user"
                 ,"content": "Hi, can you examine tree of thought reasoning for LLMs?"})
chat_response = chat_completion_with_function_execution(
    messages, functions=arxiv_functions
)
assistant_message = chat_response["choices"][0]["message"]
messages.append(assistant_message)
print(assistant_message['content'])

Function generation requested, calling function
Getting search results
Downloading "Large Language Model Guided Tree-of-Thought" to knowledge base
Downloading "Reasoning with Language Model is Planning with World Model" to knowledge base
Downloading "Temporal Data Meets LLM -- Explainable Financial Time Series Forecasting" to knowledge base
Finding and reading paper
Chunking text from paper
Summarizing each chunk of text


100%|████████████████████████████████████████████████████████████████████| 16/16 [00:06<00:00,  2.49it/s]


Summarizing into overall summary
Core Argument:
- The academic paper introduces the Tree of Thoughts (ToT) framework for language model inference, which allows for deliberate decision-making and exploration over coherent units of text.

Evidence:
- ToT significantly enhances language models' problem-solving abilities on tasks that require planning or search, such as the Game of 24, Creative Writing, and Mini Crosswords.
- ToT addresses the limitations of existing language models by enabling exploration of different continuations within a thought process and heuristic-guided search.
- The paper discusses the components of ToT, including thought decomposition, thought generation, state evaluation, and search algorithms.
- Experiments show that ToT outperforms other prompting methods, such as Input-Output (IO) and Chain-of-Thought (CoT), in terms of success rates and exploration efficiency.

Conclusions:
- ToT is a general method for problem-solving with language models and offers general

In [196]:
# Add another user message
messages.append({"role": "user"
                 ,"content": "How about PPO using sequence generation, can you explain that to me"})
updated_response = chat_completion_with_function_execution(
    messages, functions=arxiv_functions
)
assistant_message = updated_response["choices"][0]["message"]
messages.append(assistant_message)
print(assistant_message['content'])

Function generation requested, calling function
Getting search results
Downloading "Proximal Policy Optimization and its Dynamic Version for Sequence Generation" to knowledge base
Downloading "Lifetime policy reuse and the importance of task capacity" to knowledge base
Downloading "Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective" to knowledge base
Finding and reading paper
Chunking text from paper
Summarizing each chunk of text


100%|██████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00,  1.04it/s]


Summarizing into overall summary
Core Argument:
- The paper discusses the use of Proximal Policy Optimization (PPO) in sequence generation tasks, specifically in the context of chit-chat chatbots.
- The authors argue that PPO is a more efficient reinforcement learning algorithm compared to policy gradient, which is commonly used in these tasks.
- They propose a dynamic approach for PPO (PPO-dynamic) and demonstrate its efficacy in synthetic experiments and chit-chat chatbot tasks.

Evidence:
- PPO-dynamic achieves a high precision score in a synthetic counting task, comparable to other algorithms such as REINFORCE and MIXER.
- In the chit-chat chatbot task, PPO-dynamic achieves a slightly higher BLEU-2 score than REINFORCE and PPO.
- The learning curves of PPO and PPO-dynamic are more stable than policy gradient, and PPO-dynamic converges faster.

Conclusions:
- PPO is a better optimization method for sequence learning compared to policy gradient.
- PPO-dynamic further improves the opt

In [197]:
# Add another user message
messages.append({"role": "user"
                 ,"content": "What are the top three reasons PPO is better than policy gradient? Provide a reference to the paper you get your answer from so I can verify it."})
updated_response = chat_completion_with_function_execution(
    messages, functions=arxiv_functions
)
assistant_message = updated_response["choices"][0]["message"]
messages.append(assistant_message)
print(assistant_message['content'])

Function not required, responding to user
I apologize for the confusion. I am unable to provide direct links. However, I can provide the top three reasons based on the paper "Proximal Policy Optimization and its Dynamic Version for Sequence Generation" by Wu et al.

1. Efficiency: PPO has been proven to be a more efficient reinforcement learning algorithm compared to policy gradient. It achieves stable and faster convergence in sequence generation tasks, such as chit-chat chatbots, as demonstrated by experiments conducted in the paper.

2. Stability: PPO exhibits more stable learning curves compared to policy gradient. It avoids the issues of high variance and sensitivity to hyperparameters that can be encountered with policy gradient methods, leading to more reliable and consistent training.

3. Performance: PPO outperforms policy gradient in terms of both stability and overall performance. It achieves higher precision scores in tasks like synthetic counting and higher BLEU-2 scores i

If the LLM is inconsistent in calling the functions or supplying the arguments you want, you should dedicate time to engineering both the function descriptions and the system prompt as these have a huge bearing on the output. If you are using multiple functions, you should also take care to ensure the case for using each is clear and that there is not significant overlap between them - if this happens, you will have an uneven user experience where the LLM sometimes calls one function and sometimes calls another.

For production solutions you should also consider using a more permanent storage and retrieval solution such as a vector database to minimize cost and latency as you scale. There are details on the sorts of solutions available on our [documentation](https://platform.openai.com/docs/guides/embeddings/how-can-i-retrieve-k-nearest-embedding-vectors-quickly) and in the [cookbook](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases).

We hope you've enjoyed this introduction to function calling and long document summarization, and we look forward to seeing what you build!