<a href="https://colab.research.google.com/github/mdkzimmm/agentic/blob/main/Exa_Company_Analyst.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this example, we will build a company analyst tool for startups that discovers and researches companies building a similar product. If you just want to see the code, check out the [Colab notebook](https://colab.research.google.com/drive/1VROD6zsaDh_rSmogSpSn9FJCwmJO8TSi?here).

This project requires an [Exa API key](https://dashboard.exa.ai/overview) and an [OpenAI API key](https://platform.openai.com/api-keys). Get 1000 Exa searches per month free just for [signing up](https://dashboard.exa.ai/overview)!

In [None]:
# install Exa and OpenAI SDKs
!pip install exa_py
!pip install openai

Collecting exa_py
  Downloading exa_py-1.0.8-py3-none-any.whl (6.3 kB)
Installing collected packages: exa_py
Successfully installed exa_py-1.0.8
Collecting openai
  Downloading openai-1.11.1-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.1/226.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m

In [None]:
EXA_API_KEY = "API_KEY_HERE"
OPENAI_API_KEY =  "API_KEY_HERE"


### Context

For this tutorial, let’s use [Thrifthouse](https://thrift.house) as an example company. Let's imagine I'm building Thrifthouse, a platform for selling secondhand goods on college campuses, and I want to learn about other companies doing something similar.

Unfortunately, googling “[companies similar to Thrifthouse](https://www.google.com/search?q=companies+similar+to+Thrifthouse)” doesn't do a very good job. Traditional search engines rely heavily on keyword search. In this case we get results about physical thrift stores. Hm, that's not really what I want.

Let’s try again, this time searching based on a description of the company, like by googling “[community based resale apps](https://www.google.com/search?q=community+based+resale+apps).” But, this isn’t very helpful either and just returns premade listicles...

1. CNBC: Best Selling Apps and Websites for 2024
2. CNET: Best Thrifting and Secondhand Shopping Apps of 2024
3. Mirror Review: 20 Resale Apps That will Make Your Life Better in 2024
4. US News: 15 Best Apps for Buying and Selling Used Stuff

What we really need is neural search.

### What is Neural vs. Keyword Search
Traditional search engines like Google are primarily keyword-based - the core algorithm matches words in a query to text in links. An example of how this is limiting is a search for “companies working on AI for finance”, which returns almost all low-quality listicle results like “[Top 10 companies changing the future of finance with AI](https://aimagazine.com/top10/top-10-companies-changing-the-future-of-finance-with-ai)" or "[31 Examples of AI in finance 2024](https://builtin.com/artificial-intelligence/ai-finance-banking-applications-companies)".

With the emergence of LLMs, it’s now possible to build much more intelligent search that is neural - today, that means embeddings-based. That is precisely what Exa is - a fully embeddings-based search engine built using a foundational embeddings model trained for webpage retrieval. It’s capable of understanding entity types (company, blog post, Github repo), descriptors (funny, scholastic, authoritative), and any other semantic qualities inside of a query.


### Finding companies with Exa

So, let's now try neural search with the Exa Python SDK! We can use the`find_similar_and_contents` python function which first finds similar links, then returns the contents of each link. Our `input_url` is our starting company, <https://thrift.house>  and we set `num_results=10`. This will find 10 webpages semantically similar to Thrifthouse's homepage, which are likely companies similar to Thrifthouse.

By specifying `highlights={"num_sentences":2}` for each search result, Exa will also identify and return a 2 sentence highlight from the content that's relevant to our query. This will allow us to quickly understand each website that we find.


In [None]:
from exa_py import Exa
exa = Exa(api_key=EXA_API_KEY)

In [None]:
# let's get 10 similar companies
input_url = 'https://thrift.house'
search_response = exa.find_similar_and_contents(
        input_url,
        highlights={"num_sentences":2},
        num_results=10)

companies = search_response.results

print(companies[0])

Title: rumie - College Marketplace
URL: https://www.rumieapp.com/
ID: nyGyU_YmvSIIUDRyorci1A
Score: 0.7602835893630981
Published Date: 2012-01-01
Author: None
Text: None
Highlights: ['It makes buying and selling things so safe and easy! Much more efficient than other buy/sell platforms!Amazing!5 stars for being simple, organized, safe, and a great way to buy and sell in your college community.. much more effective than posting on Facebook or Instagram!The BEST marketplace for college students!!!Once rumie got to my campus, I was excited to see what is has to offer!']
Highlight Scores: [0.2570305373890231]



In [None]:
# to just see the 10 titles and urls
urls = {}
for c in companies:
  print(c.title + ':' + c.url)


rumie - College Marketplace:https://www.rumieapp.com/
The Airbnb of Storage:https://www.mystorestash.com/
Bunction.net:https://bunction.net/
Home - Community Gearbox:https://communitygearbox.com/
NOVA SHOPPING:https://www.novashoppingapp.com/
Re-Fridge: Buy, sell, or store your college fridge - Re-Fridge:https://www.refridge.com/
Jamble: Social Fashion Resale:https://www.jambleapp.com/
Branded Resale | Treet:https://www.treet.co/
Swapskis:https://www.swapskis.co/
Earn Money for Used Clothing:https://www.thredup.com/cleanout?redirectPath=%2Fcleanout%2Fsell



Looks pretty darn good! Now that we have 10 companies we want to dig into further, let’s do some research on each of these companies.

### Finding Additional Info for each company

Now let's get more information by finding additional webpages about each company. To do this, we're going to do a keyword search of each company's URL. We're using keyword because we want to find webpages that exactly match the company we're inputting. We can do this with the `search_and_contents` function, and specify `type="keyword"` and `num_results=5`. This will give me 5 websites about each company.

In [None]:
# doing an example with the first company
c = companies[0]
all_contents = ""
search_response = exa.search_and_contents(
  c.url, # input the company's URL
  type="keyword",
  num_results=5
)
research_response = search_response.results
for r in research_response:
  all_contents += r.text

### Creating a report

Finally, let's create a summarized report that lists our 10 companies and gives us an easily digestible summary of each company.

In [None]:
import textwrap
import openai

SYSTEM_MESSAGE = "You are a helpful assistant writing a research report about a company. Summarize the users input into multiple paragraphs. Be extremely concise, professional, and factual as possible. The first paragraph should be an introduction and summary of the company. The second paragraph should include pros and cons of the company. Things like what are they doing well, things they are doing poorly or struggling with. And ideally, suggestions to make the company better."
openai.api_key = OPENAI_API_KEY

completion = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": all_contents},
    ],
)

summary = completion.choices[0].message.content

print(f"Summary for {c.url}:")
print(textwrap.fill(summary, 80))

Summary for https://www.rumieapp.com/:
Rumie is a college-exclusive marketplace app that allows students to buy, sell,
and rent items with other students. It has over 320,000 users in its network and
offers features such as quick setup, .edu verification, local and campus-wide
selling options, and exclusive discounts from local businesses. Students can
also rent dresses from other students, buy or sell student tickets at student
prices, and enjoy secure and intuitive transactions. The app has received
positive feedback from users for its convenience, safety, and effectiveness in
buying and selling within the college community.  Pros of Rumie include its
focus on college students' needs, such as providing a safe platform and
exclusive deals for students. The app offers an intuitive and fast setup
process, making it easy for students to start buying and selling. The option to
trade with other students is also appreciated. Users find it convenient that
they can sell locally or ship items 

And we’re done! We’ve built an app that takes in a company webpage and uses Exa to

1. Discover similar startups
2. Find information about each of those startups
3. Gather useful content and summarize it with OpenAI

Hopefully you found this tutorial helpful and are ready to start building your very own company analyst! Whether you want to generate sales leads or research competitors to your own company, Exa's got you covered :).