Let's pretend we are going to be interviewing [Elad Gil](https://eladgil.com/) since he has a bunch of content online


In [None]:
# LLMs
from langchain import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

# Twitter
import tweepy

# Scraping
import requests
from bs4 import BeautifulSoup
from markdownify import markdownify as md

# YouTube
from langchain.document_loaders import YoutubeLoader
# !pip install youtube-transcript-api

# Environment Variables
import os
from dotenv import load_dotenv

load_dotenv()

True

You'll need a few API keys to complete the script below. If you don't want to pull from Twitter feel free to leave those blank

In [None]:
TWITTER_API_KEY = os.getenv('TWITTER_API_KEY', 'YourAPIKeyIfNotSet')
TWITTER_API_SECRET = os.getenv('TWITTER_API_SECRET', 'YourAPIKeyIfNotSet')
TWITTER_ACCESS_TOKEN = os.getenv('TWITTER_ACCESS_TOKEN', 'YourAPIKeyIfNotSet')
TWITTER_ACCESS_TOKEN_SECRET = os.getenv('TWITTER_ACCESS_TOKEN_SECRET', 'YourAPIKeyIfNotSet')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', 'YourAPIKeyIfNotSet')

In [None]:
def get_original_tweets(screen_name, tweets_to_pull=80, tweets_to_return=80):

    # Tweepy set up
    auth = tweepy.OAuthHandler(TWITTER_API_KEY, TWITTER_API_SECRET)
    auth.set_access_token(TWITTER_ACCESS_TOKEN, TWITTER_ACCESS_TOKEN_SECRET)
    api = tweepy.API(auth)

    tweets = []

    tweepy_results = tweepy.Cursor(api.user_timeline,
                                   screen_name=screen_name,
                                   tweet_mode='extended',
                                   exclude_replies=True).items(tweets_to_pull)

    for status in tweepy_results:
        if hasattr(status, 'retweeted_status') or hasattr(status, 'quoted_status'):
            # Skip if it's a retweet or quote tweet
            continue
        else:
            tweets.append({'full_text': status.full_text, 'likes': status.favorite_count})


    sorted_tweets = sorted(tweets, key=lambda x: x['likes'], reverse=True)

    full_text = [x['full_text'] for x in sorted_tweets][:tweets_to_return]

    users_tweets = "\n\n".join(full_text)

    return users_tweets

In [None]:
user_tweets = get_original_tweets("eladgil")
print (user_tweets[:300])

More AI companies with sudden virality + paying customers should just bootstrap

0. Running co for cash may be best success

1. If it does scale, being profitable or near to it creates lot of options

2. it may not scale, or only work for a few months

3. Why get on the… https://t.co/Q9TRQo4yau

Som


### Pulling Data From Websites

Let's do two pages

1. His personal website which has his background - https://eladgil.com/
2. One of his blogposts - https://blog.eladgil.com/p/defensibility-and-competition


In [None]:
def pull_from_website(url):

    # Doing a try in case it doesn't work
    try:
        response = requests.get(url)
    except:
        # In case it doesn't work
        print ("Whoops, error")
        return

    soup = BeautifulSoup(response.text, 'html.parser')
    text = soup.get_text()

    # Convert your html to markdown. This reduces tokens and noise
    text = md(text)

    return text

In [None]:
website_data = ""
urls = ["https://eladgil.com/", "https://blog.eladgil.com/p/defensibility-and-competition"]

for url in urls:
    text = pull_from_website(url)

    website_data += text

In [None]:
print (website_data[:400])




Elad Gil




Welcome to Elad Gil's retro homepage!

 Who? I am a technology entrepreneur. LinkedIn profile is here.
What?
I am an investor or advisor to companies including Airbnb, Airtable, Anduril, Brex, Checkr, Coinbase, dbt Labs, Deel, Figma, Flexport, Gitlab, Gusto, Instacart, Navan, Notion, Opendoor, PagerDuty, Pinterest, Retool, Rippling, Samsara, Square, Stripe
I am involved with AI com


### Pulling Data From YouTube

In [None]:
# Pulling data from YouTube in text form
def get_video_transcripts(url):
    loader = YoutubeLoader.from_youtube_url(url, add_video_info=True)
    documents = loader.load()
    transcript = ' '.join([doc.page_content for doc in documents])
    return transcript

In [None]:
# Using a regular string to store the youtube transcript data
# Video selection will be important.
# Parsing interviews is a tough so opting for one where Elid is mostly talking about himself
video_urls = ['https://www.youtube.com/watch?v=nglHX4B33_o']
videos_text = ""

for video_url in video_urls:
    video_text = get_video_transcripts(video_url)

    videos_text += video_text

In [None]:
print(video_text[:300])

I like to say that startups are an act of desperation and the desperation went out of the ecosystem over the last two or three years and we just had people showing up for the status and the money and now I think it's getting back to people who are doing it for a variety of reasons including the impa


combining it together into a single information block

In [None]:
user_information = user_tweets + website_data + video_text

Right now, `user_information` variable is a big messy wall of text. So we'll chunk the text into pieces so we can do a map_reduce process on it.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=20000, chunk_overlap=2000)

In [None]:
docs = text_splitter.create_documents([user_information])

In [None]:
# documents we created
len(docs)

3

Because we have a special requset for the LLM on our data, we need custom prompts. This will help to tinker with what data the LLM pulls out.

First let's make our custom map prompt. This is where we'll instruction the LLM that it will pull out interview questoins and what makes a good question.

In [None]:
map_prompt = """You are a helpful AI bot that aids a user in research.
Below is information about a person named {persons_name}.
Information will include tweets, interview transcripts, and blog posts about {persons_name}
Your goal is to generate interview questions that we can ask {persons_name}
Use specifics from the research when possible

% START OF INFORMATION ABOUT {persons_name}:
{text}
% END OF INFORMATION ABOUT {persons_name}:

Please respond with list of a few interview questions based on the topics above

YOUR RESPONSE:"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text", "persons_name"])

Then we'll make our custom combine promopt. This is the set of instructions that we'll LLM on how to handle the list of questions that is returned in the first step above.

In [None]:
combine_prompt = """
You are a helpful AI bot that aids a user in research.
You will be given a list of potential interview questions that we can ask {persons_name}.

Please consolidate the questions and return a list

% INTERVIEW QUESTIONS
{text}
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text", "persons_name"])

In [None]:
llm = ChatOpenAI(temperature=.25, model_name='gpt-4')

chain = load_summarize_chain(llm,
                             chain_type="map_reduce",
                             map_prompt=map_prompt_template,
                             combine_prompt=combine_prompt_template,
#                              verbose=True
                            )

In [None]:
output = chain({"input_documents": docs, # The seven docs that were created before
                "persons_name": "Elad Gil"
               })



In [None]:
print (output['output_text'])

1. As an investor and advisor to various AI companies, what are some common challenges you've observed in the industry, and how do you recommend overcoming them?

2. Can you elaborate on the advantages of bootstrapping for AI startups and share any success stories you've come across?

3. What are some key lessons you've learned from your experiences in high-profile companies like Twitter, Google, and Color Health that have shaped your approach to investing and advising startups?

4. How do you think AI will continue to shape the job market in the coming years?

5. What motivated you to enter the healthcare space as a co-founder of Color Health, and how do you envision the role of AI in improving healthcare outcomes?

6. Can you share some insights on what sets high growth companies apart from others and the key factors that contribute to their rapid growth?

7. How do you evaluate the defensibility of AI startups when considering investment or advisory opportunities?

8. What excites y