In [10]:
import json
from transcribe.config import OPENAI_API_KEY, PINECONE_API_KEY
from transcribe.db import init_db
import transcribe.db.embedding as db_embedding
import transcribe.db.transcription as db_transcription
from typing import Optional
from gpt_index import GPTSimpleVectorIndex, GPTListIndex, Document, GPTPineconeIndex
import openai
import os
from yaspin import yaspin

In [25]:
import pinecone
api_key = PINECONE_API_KEY
pinecone.init(api_key=api_key, environment="us-east1-gcp")
pindex = pinecone.Index("quickstart")

In [12]:
def get_ycombinator_videos():
    with open('ycombinator_videos.json') as f:
        data = json.load(f)
        return data

In [13]:
def get_link_from_id(id: str) -> str:
    return f'https://www.youtube.com/watch?v={id}'

In [14]:
def get_index_for_video(id: str, db) -> Optional[GPTSimpleVectorIndex]:
    link = get_link_from_id(id)
    embedding = db_embedding.get_embeddings_for_link(db, link)
    if not embedding:
        print("no embedding for link:", link)
        return None
    index = GPTSimpleVectorIndex.load_from_string(
        embedding['embedding_json'],
    )

    index.set_doc_id(link)
    summary = db_transcription.get_summary_for_link(db, link)
    if not summary:
        print("no summary for link:", link)
        return None
    index.set_text(summary)

    return index

In [15]:
videos = get_ycombinator_videos()
print(videos[0])

{'id': 'ycKU-ebeE24', 'title': 'The best way to have startup ideas is to just notice them organically.'}


In [16]:
openai.api_key = OPENAI_API_KEY
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
db = init_db()

In [26]:
def get_pinecone_indexes(vids):
    docs = []
    skipped = 0
    done = 0
    for video in vids:
        link = get_link_from_id(video['id'])
        transcription = db_transcription.get_transcription_by_link(db, link)
        if not transcription or not transcription['result']:
            skipped += 1
            print(f"no result for link: {link}, skipped {skipped}")
            continue
        tdata = json.loads(transcription['result'])
        text = tdata['transcription']
        doc = Document(text, doc_id=link)
        docs.append(doc) 
        done += 1
    print(f"done {done} videos!")
    return GPTPineconeIndex(docs,pinecone_index=pindex)

In [23]:
ndx = get_pinecone_indexes(videos)
ndx.save_to_disk("ycombinator_pinecone_index.json")

no result for link: https://www.youtube.com/watch?v=qh8sHetf-Nk, skipped 1
no result for link: https://www.youtube.com/watch?v=vqgnifnlLMI, skipped 2
no result for link: https://www.youtube.com/watch?v=K8tcouVhtI8, skipped 3
no result for link: https://www.youtube.com/watch?v=Octm_7llbGA, skipped 4
no result for link: https://www.youtube.com/watch?v=euZH0tVotPQ, skipped 5
no result for link: https://www.youtube.com/watch?v=5fmDKGV0TnQ, skipped 6
no result for link: https://www.youtube.com/watch?v=3xU050kMbHM, skipped 7
no result for link: https://www.youtube.com/watch?v=IYLVhk7yaaw, skipped 8
no result for link: https://www.youtube.com/watch?v=KWNNmPCF-Xs, skipped 9
no result for link: https://www.youtube.com/watch?v=tzsmJtKZ2No, skipped 10
no result for link: https://www.youtube.com/watch?v=sM2reZib2RY, skipped 11
no result for link: https://www.youtube.com/watch?v=jwXlo9gy_k4, skipped 12
no result for link: https://www.youtube.com/watch?v=VIWiEzO9KMM, skipped 13
no result for link: h

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 2615815 tokens


In [27]:
from_disk_pinecone_ndx = GPTPineconeIndex.load_from_disk(
    "ycombinator_pinecone_index.json",
    pinecone_index=pindex,
) 

In [28]:
def ask(question):
    with yaspin(text="thinking..."):
        response = from_disk_pinecone_ndx.query(question)
    print(response)
    print(response.get_formatted_sources())

In [29]:
ask("what programming language is the best to use for a startup?")

⠋ thinking... 

INFO:root:> [query] Total LLM token usage: 3077 tokens
INFO:root:> [query] Total embedding token usage: 12 tokens


              

The best programming language to use for a startup will depend on the specific needs of the startup. However, some popular programming languages for startups that have been successful include Python, JavaScript, Java, and C#. It is important to remember that startups are a roller coaster and it pays to be a cockroach - never give up and be passionate about what you are working on. With the right tools and dedication, anyone can learn to program and build a successful startup.
> Source (Doc id: None): doc_id: https://www.youtube.com/watch?v=ypLoGFaKdbU
text: We got to the bagel store and my phone ...


In [30]:
ask("how do you get a startup idea?")

⠦ thinking... 

INFO:root:> [query] Total LLM token usage: 672 tokens
INFO:root:> [query] Total embedding token usage: 8 tokens


              
The best way to get a startup idea is to notice them organically. Look at the YC-TOP 100 companies and observe that at least 70% of them had their startup ideas organically, rather than by sitting down and explicitly trying to think of a startup idea. To put yourself in a position to have organic startup ideas in the future, become an expert on something valuable, go work at a startup, and if you're a programmer, build things that you find interesting.
> Source (Doc id: None): doc_id: https://www.youtube.com/watch?v=ycKU-ebeE24
text:  Let's talk about how to come up with s...


In [37]:
ask("what is a startup?")

⠹ thinking... 

INFO:root:> [query] Total LLM token usage: 4791 tokens
INFO:root:> [query] Total embedding token usage: 5 tokens


              

A startup is a business venture that is typically in the early stages of development and growth. It is usually founded by entrepreneurs who are looking to develop a product or service that can be sold to customers. Startups often involve a high degree of risk and uncertainty, as they are typically funded by venture capital and require a great deal of effort to succeed. Starting a successful startup is a life-changing endeavor that requires dedication and hard work. It is not a game of tricks or shortcuts, but rather a process of creating something that users love and then telling them about it. It is an all-consuming endeavor that can take up years of your life, and even if you are successful, the problems you face will never get any easier. It is similar to having kids in that it is a button you press that changes your life irrevocably, and while it is honestly the best thing in the world, it is important to remember that there are a lot of things that are easier to do

In [38]:
ask("how do i talk to users?")

⠦ thinking... 

INFO:root:> [query] Total LLM token usage: 5047 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens


              

When talking to users, it is important to focus on learning about their life and the specific problem they are trying to solve. Ask open-ended questions about the hardest part of the problem they are trying to solve, the last time they encountered the problem, and why it was hard. Additionally, ask questions about the path that led them to encounter the problem, their motivations, and the context in which they began solving this problem. Try to restrain your own talking and take notes to extract as much information as possible. Additionally, avoid talking about your idea or hypotheticals, and instead focus on specifics that have already occurred in the user's life. Ask questions such as, what was the hardest part of the problem they were trying to solve? What was the last time they encountered the problem? Why was it hard? What were the specific things that they encountered that were difficult? What were the circumstances in which they began solving this problem? What w

In [31]:
ask("how did stripe get its first customers?")

⠦ thinking... 

INFO:root:> [query] Total LLM token usage: 4921 tokens
INFO:root:> [query] Total embedding token usage: 8 tokens


              

Stripe got its first customers by creating a buzz around their product through their website and documentation, which was designed to be a programmer's dream. They also created a beta invite program, which made it a hot commodity to be able to use Stripe. This created a lot of buzz on Hacker News and among nerds, and people were seen as well connected if they were able to get access to Stripe. Stripe's founders also made sure to emphasize the product's value by pricing it higher than the competition, and they made sure to emphasize the fact that it was a much better option than the existing solutions, which were often expensive, time-consuming, and unreliable. They also created a revolutionary brand promise by designing a website and documentation that was every programmer's dream, and they made sure to emphasize the product's value by pricing it higher than the competition. This combination of creating a buzz, emphasizing the product's value, and creating a revolutiona

In [32]:
ask("how did airbnb get started?")

⠇ thinking... 

INFO:root:> [query] Total LLM token usage: 268 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens


              
Airbnb got started by renting out air beds in people's homes while they were there for conferences.
> Source (Doc id: None): doc_id: https://www.youtube.com/watch?v=wKpwyLSu_7k
text:  Airbnb is one of the most famous examp...
