# Metaphor Researcher
---
In this example, we will build Metaphor Researcher, a Python app that given a research topic, automatically searches for different sources about the topic with Metaphor and synthesizes the searched contents into a research report. [Check it out on Colab!](https://colab.research.google.com/drive/1BaGhYb394cQSavFo7wiu95Rt_G1mtfY9)

To play with this code, first we need a [Metaphor API key](https://dashboard.metaphor.systems/overview) and an [OpenAI API key](https://platform.openai.com/api-keys). Get 1000 Metaphor searches per month free just for [signing up](https://dashboard.metaphor.systems/overview)!

Let's import the Metaphor and OpenAI SDKs and put in our API keys to create a client object for each.

In [None]:
# install Metaphor and OpenAI SDKs
!pip install metaphor_python
!pip install openai

from google.colab import userdata # comment this out if you're not using Colab

METAPHOR_API_KEY = userdata.get('METAPHOR_API_KEY') # replace with your api key, or add to Colab Secrets
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY') # replace with your api key, or add to Colab Secrets

from metaphor_python import Metaphor
import openai

openai.api_key = OPENAI_API_KEY
metaphor = Metaphor(METAPHOR_API_KEY);

Since we'll be making several calls to the OpenAI API to get a completion from GPT 3.5-turbo, let's make a simple utility wrapper function so we can pass in the system and user messages directly, and get the LLM's response back as a string.

In [None]:
def get_llm_response(system='You are a helpful assistant.', user = '', temperature = 1, model = 'gpt-3.5-turbo'):
    completion = openai.chat.completions.create(
        model=model,
        temperature=temperature,
        messages=[
            {'role': 'system', 'content': system},
            {'role': 'user', 'content': user},
        ]
    );
    return completion.choices[0].message.content;

Okay, great! Now let's start building Metaphor Researcher. The app should be able to automatically generate research reports for all kinds of different topics. Here's two to start:

In [None]:
XYZZY_TOPIC = 'xyzzy';
ART_TOPIC = 'renaissance art';

The first thing our app has to do is decide what kind of search to do for the given topic.

Metaphor offers two kinds of search: **neural** and **keyword** search. Here's how we decide:

- Neural search is preferred because it lets us retrieve high quality, semantically relevant data. It is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans.  
- Keyword search is only necessary when the topic is extremely specific, local or obscure. If the machine learning model might not know about the topic, but relevant documents can be found by directly matching the search query, keyword search is suitable.

So, Metaphor Researcher is going to get a query, and it needs to automatically decide whether to use `keyword` or `neural` search to research the query based on the criteria. Sounds like a job for the LLM! But we need to write a prompt that tells it about the difference between keyword and neural search-- oh wait, we have a perfectly good explanation right there.

In [None]:
# Let's generalize the prompt and call the search types (1) and (2) in case the LLM is sensitive to the names. We can replace them with different names programmatically to see what works best.
SEARCH_TYPE_EXPLANATION = """- (1) search is preferred because it lets us retrieve high quality, up-to-date, and semantically relevant data. It is especially suitable when a topic is well-known and popularly discussed on the Internet, allowing the machine learning model to retrieve contents which are more likely recommended by real humans.
- (2) search is only necessary when the topic is extremely specific, local or obscure. If the machine learning model might not know about the topic, but relevant documents can be found by directly matching the search query, (2) search is suitable.
"""

Here's a function that instructs the LLM to choose between the search types and give its answer in a single word. Based on its choice, we return `keyword` or `neural`.

In [None]:
def decide_search_type(topic, choice_names = ['neural', 'keyword']):
    user_message = 'Decide whether to use (1) or (2) search for the provided research topic. Output your choice in a single word: either "(1)" or "(2)". Here is a guide that will help you choose:\n';
    user_message += SEARCH_TYPE_EXPLANATION;
    user_message += f'Topic: {topic}\n';
    user_message += 'Search type: ';
    user_message = user_message.replace('(1)', choice_names[0]).replace('(2)', choice_names[1]);

    response = get_llm_response(
        system='You will be asked to make a choice between two options. Answer with your choice in a single word.',
        user=user_message,
        temperature=0
    )
    use_keyword = response.strip().lower().startswith(choice_names[1].lower())
    return 'keyword' if use_keyword else 'neural';

Let's test it out:

In [None]:
print(XYZZY_TOPIC, 'expected: keyword, got:', decide_search_type(XYZZY_TOPIC));
print(ART_TOPIC, 'expected: neural, got:', decide_search_type(ART_TOPIC));

xyzzy expected: keyword, got: keyword
renaissance art expected: neural, got: neural


Great! Now we have to craft some search queries for the topic and the search type. There are two cases here: keyword search and neural search. Let's do the easy one first. LLMs already know what Google-like keyword searches look like. So let's just ask the LLM for what we want:

In [None]:
def create_keyword_query_generation_prompt(topic, n):
    return f"""I'm writing a research report on {topic} and need help coming up with Google keyword search queries.
Google keyword searches should just be a few words long. It should not be a complete sentence.
Please generate a diverse list of {n} Google keyword search queries that would be useful for writing a research report on ${topic}. Do not add any formatting or numbering to the queries."""

print(get_llm_response(
    system='The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
    user=create_keyword_query_generation_prompt(XYZZY_TOPIC, 3),
));


xyzzy history
xyzzy significance
xyzzy applications


Those are some good ideas!

Now we have to handle the neural Metaphor search. This is tougher: you can read all about crafting good Metaphor searches [here](https://docs.metaphor.systems/reference/prompting-guide). But this is actually a really good thing: making the perfect Metaphor search is hard because Metaphor is so powerful! Metaphor allows us to express so much more nuance in our searches and gives us unparalleled ability to steer our search queries towards our real objective.

We need to our app to understand our goal, what Metaphor is, and how to use it to achieve the goal. So let's just tell the LLM everything it needs to know.

In [None]:
def create_neural_query_generation_prompt(topic, n):
    return f"""I'm writing a research report on {topic} and need help coming up with Metaphor keyword search queries.
Metaphor is a fully neural search engine that uses an embeddings based approach to search. Metaphor was trained on how people refer to content on the internet. The model is trained given the description to predict the link. For example, if someone tweets "This is an amazing, scientific article about Roman architecture: <link>", then our model is trained given the description to predict the link, and it is able to beautifully and super strongly learn associations between descriptions and the nature of the content (style, tone, entity type, etc) after being trained on many many examples. Because Metaphor was trained on examples of how people talk about links on the Internet, the actual Metaphor queries must actually be formed as if they are content recommendations that someone would make on the Internet where a highly relevant link would naturally follow the recommendation, such as the example shown above.
Metaphor neural search queries should be phrased like a person on the Internet indicating a webpage to a friend by describing its contents. It should end in a colon :.
Please generate a diverse list of {n} Metaphor neural search queries for informative and trustworthy sources useful for writing a research report on ${topic}. Do not add any quotations or numbering to the queries."""

print(get_llm_response(
    system='The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
    user=create_neural_query_generation_prompt(ART_TOPIC, 3),
    #model='gpt-4'
))

Check out this comprehensive website on Renaissance art: 
Discover the fascinating world of Renaissance art on this reliable webpage: 
I stumbled upon a reliable source with in-depth information about Renaissance art:


Now let's put them together into a function that generates queries for the right search mode.

In [None]:
def generate_search_queries(topic, n, searchType):
    if(searchType != 'keyword' and searchType != 'neural'):
        raise 'invalid searchType';
    user_prompt = create_neural_query_generation_prompt(topic, n) if searchType == 'neural' else create_keyword_query_generation_prompt(topic, n);
    completion = get_llm_response(
        system='The user will ask you to help generate some search queries. Respond with only the suggested queries in plain text with no extra formatting, each on it\'s own line.',
        user=user_prompt,
        temperature=1
    )
    queries = [s for s in completion.split('\n') if s.strip()][:n]
    return queries;

Let's make sure it works, and check out some more queries:

In [None]:
XYZZY_queries = generate_search_queries(XYZZY_TOPIC, 3, 'keyword');
art_queries = generate_search_queries(ART_TOPIC, 3, 'neural');

In [None]:
print(XYZZY_queries);
print(art_queries);

[ [32m"xyzzy definition"[39m, [32m"xyzzy history"[39m, [32m"xyzzy research studies"[39m ]
[
  [32m"- You have to check out this comprehensive guide to Renaissance art: "[39m,
  [32m"- I stumbled upon an amazing website that delves into the fascinating world of Renaissance art: "[39m,
  [32m"- Hey, I found a hidden gem of information about Renaissance art that you need to read:"[39m
]


Now it's time to use Metaphor to do the search, either neural or keyword:

In [None]:
def get_search_results(queries, type, linksPerQuery=2):
    results = [];
    for query in queries:
        search_response = metaphor.search(query, type=type, num_results=linksPerQuery, use_autoprompt=False);
        results.extend(search_response.results)
    return results;

In [None]:
art_links = get_search_results(art_queries, 'neural');
print(art_links[0]) # first result of six

{
  title: [32m"Italian Renaissance Art"[39m,
  url: [32m"https://www.italian-renaissance-art.com/"[39m,
  publishedDate: [32m"2023-01-01"[39m,
  author: [1mnull[22m,
  id: [32m"FP6SGj5eJJohGakUpexj4g"[39m,
  score: [33m0.17535683512687683[39m
}


And to get the webpage contents:

In [None]:
def get_page_contents(search_results):
    contents_response = metaphor.get_contents(search_results);
    return contents_response.contents;

In [None]:
art_content = get_page_contents([link.id for link in art_links]);
print(art_content[0].extract) # first result of six

<div><div>
<h2>Italian Renaissance Art.<br /> A personal voyage into art history.</h2>
<p>This site explores all the major masterpieces of Italian Renaissance Art. From the fourteenth-century period known as the
Proto-Renaissance, championed by <a href="https://www.italian-renaissance-art.com/Giotto.html">Giotto de Bondone</a> and his contemporaries, to
the Renaissance of the fifteenth and sixteenth centuries.</p><p> Artists such as
Masaccio, Fra Angelico, Donatello and Botticelli in addition to the High
Renaissance masters Michelangelo, Leonardo da Vinci, Raphael, and Titian are
key to the development of the artistic innovations of the era.</p>
<h3>A Rebirth of Classical Antiquity.</h3>
<p>The Renaissance, the rebirth of Art and Science, represents the pinnacle of artistic achievement, revived and confidently executed after a thousand years in the wilderness.</p><p> The need to recapture the glories of antiquity was initially fuelled by scholars from various social backgrounds. In Ita

In just a couple lines of code, we've used Metaphor to go from some search queries to useful Internet content.

The final step is to instruct the LLM to synthesize the content into a research report, including citations of the original links. We can do that by pairing the content and the urls and writing them into the prompt.

In [None]:
def synthesize_report(topic, search_contents, content_slice = 750):
    inputData = ''.join([
        f'--START ITEM--\nURL: {item.url}\nCONTENT: {item.extract[:content_slice]}\n--END ITEM--\n'
        for item in search_contents
    ])
    return get_llm_response(
        system='You are a helpful research assistant. Write a report according to the user\'s instructions.',
        user='Input Data:\n' + inputData + f'Write a two paragraph research report about {topic} based on the provided information. Include as many sources as possible. Provide citations in the text using footnote notation ([#]). First provide the report, followed by a single "References" section that lists all the URLs used, in the format [#] <url>.',
        # model: 'gpt-4' # want a better report? use gpt-4
    )

In [None]:
art_report = synthesize_report(ART_TOPIC, art_content);

In [None]:
print(artReport)

Research Report: Renaissance Art

The Renaissance period, spanning from the 14th to the 17th century, marked a significant cultural and artistic bridge between the Middle Ages and modern history[^3]. This era, initially sparked as a cultural movement in Italy during the Late Medieval period, later spread throughout Europe, ushering in the Early Modern Age[^4]. Renaissance art emerged as a pivotal aspect of this period, embodying a rebirth and awakening in Europe[^4]. It represented a time when artists pushed the boundaries of their creativity and produced works of extraordinary beauty and intellectual prowess[^4]. The Italian Renaissance, in particular, witnessed the rise of renowned masters such as Giotto de Bondone, Masaccio, Botticelli, and Leonardo da Vinci, among others, who played vital roles in developing artistic innovations during the era[^1][^3].

The artistic achievements of the Renaissance largely revolved around a rediscovery and reapplication of classical antiquity[^1]. A

Let's wrap up by putting it all together into one `researcher()` function that starts from a topic and returns us the finished report. We can also let Metaphor Researcher generate us a report about our keyword search topic as well.

In [None]:
def researcher(topic):
    search_type = decide_search_type(topic)
    search_queries = generate_search_queries(topic, 3, search_type)
    print(search_queries)
    search_results = get_search_results(search_queries, search_type)
    print(search_results[0])
    search_contents = get_page_contents([link.id for link in search_results])
    print(search_contents[0])
    report = synthesize_report(topic, search_contents)
    return report

In [None]:
print(researcher(XYZZY_TOPIC));

[ [32m"xyzzy history"[39m, [32m"xyzzy uses"[39m, [32m"xyzzy benefits"[39m ]
{
  title: [32m"Xyzzy (computing) - Wikipedia"[39m,
  url: [32m"https://en.wikipedia.org/wiki/Xyzzy_(computing)"[39m,
  author: [1mnull[22m,
  id: [32m"ac05e07a-722a-4de5-afc8-856c8192c5d2"[39m
}
{
  id: [32m"ac05e07a-722a-4de5-afc8-856c8192c5d2"[39m,
  url: [32m"https://en.wikipedia.org/wiki/Xyzzy_(computing)"[39m,
  title: [32m"Xyzzy (computing)"[39m,
  author: [1mnull[22m,
  extract: [32m"<div><div>\n"[39m +
    [32m"<p>From Wikipedia, the free encyclopedia</p>\n"[39m +
    [32m"</div><div>\n"[39m +
    [32m'<p>In <a href="https://en.wikipe'[39m... 10626 more characters
}
Report:

Xyzzy is a term that is commonly used in computing. It can act as a metasyntactic variable or a video game cheat code[^1^]. The term originated from the Colossal Cave Adventure computer game, where it served as the first "magic string" that players usually encounter[^1^]. Additionally, Xyzzy is also re