# Welcome to the Lenny Chatbot Colab!

This Colab notebook contains all of the code you need to make a basic chatbot that will answer questions about a corpus of text. Colab is a cloud-based programming environment which will let you run all of this code from your browser.

At each step, follow the written instructions and press the "play" button next to the code sample in order to run it.

**Important Note:** This is a basic chatbot running on a limited selection of articles. It's only a starting point to show you what's possible!

If you have questions, feel free to reach out to me on Twitter at [@danshipper](https://www.twitter.com/danshipper).

## 1. Download our text corpus

The first thing we need to do is download the text our chatbot is going to use as reference material for answering questions.

In the Lenny Chatbot, I used every article he's written as the text corpus. But for this public codebase, I've collected two articles from his archive that we can use as a starting point.

These are the articles I'm using:

- [What is good retention?](https://www.lennysnewsletter.com/p/what-is-good-retention-issue-29)
- [How the biggest consumer apps got their first 1,000 users
](https://www.lennysnewsletter.com/p/how-the-biggest-consumer-apps-got)

You can replace these articles with any text corpus you want, however.


In [None]:
! git clone https://github.com/sgauchet/GrowthGems-Newsletter-Corpus-All

Cloning into 'GrowthGems-Newsletter-Corpus-All'...
remote: Enumerating objects: 101, done.[K
remote: Counting objects:   5% (1/17)[Kremote: Counting objects:  11% (2/17)[Kremote: Counting objects:  17% (3/17)[Kremote: Counting objects:  23% (4/17)[Kremote: Counting objects:  29% (5/17)[Kremote: Counting objects:  35% (6/17)[Kremote: Counting objects:  41% (7/17)[Kremote: Counting objects:  47% (8/17)[Kremote: Counting objects:  52% (9/17)[Kremote: Counting objects:  58% (10/17)[Kremote: Counting objects:  64% (11/17)[Kremote: Counting objects:  70% (12/17)[Kremote: Counting objects:  76% (13/17)[Kremote: Counting objects:  82% (14/17)[Kremote: Counting objects:  88% (15/17)[Kremote: Counting objects:  94% (16/17)[Kremote: Counting objects: 100% (17/17)[Kremote: Counting objects: 100% (17/17), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 101 (delta 13), reused 0 (delta 0), pack-reused 84[K
Receiving objects: 100% (101/101

# 2. Install our dependencies and define our functions

In this section we'll install GPT Index and Langchain. We'll also define the functions that we'll use later to construct our index and query it.

First, let's install our dependencies.

In [None]:
!pip install gpt-index
!pip install langchain

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Now, we'll define the functions we're going to use later in order to construct our index and query it.

In [None]:
from gpt_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 256
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_outputs))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    index = GPTSimpleVectorIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    index.save_to_disk('index.json')

    return index

def ask_growthgems():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask Growth Gems? ")
        response = index.query(query, response_mode="compact")
        display(Markdown("Growth Gems Bot says: <b>{response.response}</b>"))
  

# 3. Set OpenAI API Key
In order to run this notebook you'll need an API key from OpenAI. 

If you don't have one already, you can grab one by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste it into the text input.



In [None]:
os.environ["OPENAI_API_KEY"] = input("Secret Key")

sk-P3OqcCqvibkllr2x65fnT3BlbkFJoifTh47iys2txYWGsDA3sk-P3OqcCqvibkllr2x65fnT3BlbkFJoifTh47iys2txYWGsDA3


# 4. Construct Index

Now we're going to construct our index. This will take every file in the folder 'Lenny-Newsletter-Corpus', split it into chunks, and embed it with OpenAI's embeddings API.

**Important Note:** This step costs money. Running it on the text corpus we've given you by default should only cost $0.03 in total. But if you use other pieces of text be careful if they're really long.


In [None]:
construct_index('/content/GrowthGems-Newsletter-Corpus-All')

<gpt_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7faa2fea4820>

# 5. Ask Questions!

Now we'll run the "ask_lenny" function we defined above. 

This will prompt the you to input a question, and then it will find chunks of text that might answer the question, and summarize the answer from those text chunks using GPT-3.

Remember, in this public Colab file we're only using two of Lenny's articles for our corpus. So it will only answer questions from:

- [What is good retention?](https://www.lennysnewsletter.com/p/what-is-good-retention-issue-29)
- [How the biggest consumer apps got their first 1,000 users
](https://www.lennysnewsletter.com/p/how-the-biggest-consumer-apps-got)


A few sample questions you can ask:

- What is good retention for a consumer social product?

- How did DoorDash get its first users?

- How did LinkedIn get started?

Again, this step costs money. So be aware!

In [None]:
ask_growthgems()

Growth Gems Bot says: <b>
1. New feature launch
2. Improvement/Optimization of an existing feature</b>

Growth Gems Bot says: <b>
It is unclear who said the statement.</b>

Growth Gems Bot says: <b>
No, I do not know the source of the quote.</b>

Growth Gems Bot says: <b>
The main insights on monetization by Thomas Petit are that growth is bigger than user acquisition (UA) and that CRM and lifecycle tactics are more important than ever. He suggests compensating for higher acquisition prices with better onboarding and monetization, and focusing beyond the very early part of the journey to mitigate the loss in remarketing.</b>

Growth Gems Bot says: <b>
One way to improve onboarding is to make it interactive and valuable. This could include providing helpful tips and tutorials, or offering rewards for completing certain tasks. Additionally, it is important to think of onboarding as a separate product and to optimize it accordingly. Finally, it may be beneficial to look at what other companies are doing in terms of onboarding, such as Fastic and Noom, or to look at Darius Contractor's Psych'd Framework.</b>

Growth Gems Bot says: <b>
The experts sharing insights on onboarding and monetization are Thomas Petit (Growth Consultant).</b>

Growth Gems Bot says: <b>
It is not possible to answer this question without more information.</b>

Growth Gems Bot says: <b>
No, the context information does not provide any other experts than Thomas Petit that have shared insights on onboarding and monetization.</b>

Growth Gems Bot says: <b>
It is not possible to answer this question without prior knowledge.</b>

Growth Gems Bot says: <b>
Yes, you can produce creatives for paid acquisition in a smart way by understanding your audience and being best in class in creative production cost. Additionally, you can test in cheaper geos to validate the viability of your creatives. It is also important to remember that revenue is not a function of creative but a function of product.</b>

Growth Gems Bot says: <b>
It is not possible to answer this question with the given context information.</b>

Growth Gems Bot says: <b>
The advice comes from Natalie Drozd, the UA Lead at Fabulous.</b>

Growth Gems Bot says: <b>
Natalie shares advice about leveraging data analytics to make informed decisions and to understand user behavior. She also encourages UA/Growth people to stay up to date on the latest trends in the industry, such as the ATT and Google AdROAS campaigns, and to take advantage of Admon Exchanges.</b>

Growth Gems Bot says: <b>
Matej Lancaric shared that if an app is monetized by ads, the app developer should reach out to their Google representative to get guidelines about AdROAS campaigns. These campaigns are optimized for ad revenue and use shorter conversion windows (3 to 7 days). It is important to wait for 2 or 3 cycles (6-9 days or 3 weeks) to evaluate the campaign.</b>

Growth Gems Bot says: <b>
No, it was Natalie Rozenblat.</b>

Growth Gems Bot says: <b>
It is not possible to answer this question with the given context information.</b>

Growth Gems Bot says: <b>
Natalie Ronzenblat shared the insight that data analytics is a powerful tool for user acquisition and growth.</b>

Growth Gems Bot says: <b>
Natalie Ronzenblat shared insights on creatives that involve setting guidelines to justify the production or not of a creative, with a low threshold for “wildcard” creatives (totally original creatives). She also shared that there should be a strong synergy between testing, ideation and production, and that there should be a roadmap/calendar each month for when things need to be delivered, tested, analyzed and produced.</b>