# Youtube Transcript Analyzer

In [None]:
!pip install langchain
!pip install openai
!pip install python-dotenv
!pip install youtube-transcript-api
!pip install tiktoken
!pip install faiss-cpu

In [None]:
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from dotenv import find_dotenv, load_dotenv
from langchain.prompts.chat import(
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate
)
import textwrap

In [None]:
import os
os.environ['OPENAI_API_KEY'] = ""

In [None]:
load_dotenv(find_dotenv())
embeddings = OpenAIEmbeddings()

In [None]:
def create_db_from_yt_url(url):
    loader = YoutubeLoader.from_youtube_url(url)
    transcript = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=100)
    docs = text_splitter.split_documents(transcript)

    db = FAISS.from_documents(docs, embeddings)
    return db

In [None]:
def get_response_from_query(db, query, k=4):
  """
  gpt-3.5-turbo can handle upto 4097 tokens. 
  Setting the chunk size to 1000 and k = 4 maximizes
  the number of tokens to analyze
  """

  docs = db.similarity_search(query, k=k)
  docs_page_content = " ".join([d.page_content for d in docs])

  chat = ChatOpenAI(model_name = "gpt-3.5-turbo", temperature = 0.2)

  # Template to use for the system message prompt
  template = """
  You are a helpful assistant that can answer questions about youtube videos
  based on the video's transcript : {docs}

  Only use the factual information from the transcript to answer the question.

  If you feel there isn't enough information to answer the question, say "I do not have answer to this question".

  Your answer should be verbose and detailed.

  """

  sys_msg_prompt = SystemMessagePromptTemplate.from_template(template)

  # human question prompt
  human_template = "Answer the following question: {question}"
  human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

  chat_prompt = ChatPromptTemplate.from_messages(
      [sys_msg_prompt, human_message_prompt]
  )

  chain = LLMChain(llm=chat, prompt = chat_prompt)

  response = chain.run(question = query, docs = docs_page_content)
  response = response.replace("\n", "")
  return response, docs

In [None]:
video_url = "https://www.youtube.com/watch?v=L_Guz73e6fw"
db = create_db_from_yt_url(video_url)

In [None]:
query = "What are they saying about AGI?"
response, docs = get_response_from_query(db, query)
print(textwrap.fill(response, width = 85))

In the given transcript, there are several discussions about AGI (Artificial General
Intelligence). Sam Altman, the CEO of OpenAI, talks about the challenges they face
while working on AGI and how they believe in the approach of iterative deployment and
iterative discovery. He also mentions that the pace of progress is fast, and they are
making good progress. When asked about what they would do if an AGI told them that
aliens are already here, Sam Altman says that he would just go about his life. He
further adds that the source of joy and happiness in life is from other humans, so
mostly nothing would change unless it causes some kind of threat. Sam Altman also
talks about how they were misunderstood and mocked when they announced that they were
going to work on AGI in 2015. However, they don't get mocked as much now. He also
mentions that he thought building AI would be the coolest thing ever, but he never
thought he would get the chance to work on AGI. In summary, the transcript disc

In [None]:
query = "Who is the host of this program?"
response, docs = get_response_from_query(db, query)
print(textwrap.fill(response, width = 85))

The host of this program is not explicitly mentioned in the given transcript.
However, based on the information provided, we can assume that the program is a
podcast and the host interviews Sam Altman. The name of the host is not mentioned in
this particular section of the transcript.


In [None]:
query = "Who are the hosts of this podcast?"
response, docs = get_response_from_query(db, query)
print(textwrap.fill(response, width = 85))

The transcript does not provide clear information about who the hosts of this podcast
are. However, it does mention a conversation with Sam Altman and an interview with
Lex Fridman. It is possible that one or both of them are the hosts, but this cannot
be confirmed with certainty.


In [None]:
query = "Is this a podcast or an informative video with visuals?"
response, docs = get_response_from_query(db, query)
print(textwrap.fill(response, width = 85))

Based on the transcript, it is not explicitly stated whether this is a podcast or an
informative video with visuals. However, it is mentioned that this is the "Lex
Fridman podcast" and there is a mention of "listening to this conversation with Sam
Altman." Therefore, it is likely that this is a podcast.


In [None]:
query = "Can you summarize the entire podcast?"
response, docs = get_response_from_query(db, query)
print(textwrap.fill(response, width = 85))

As an AI language model, I cannot listen to the entire podcast, but based on the
transcript provided, the podcast features a conversation between Lex Fridman and Sam
Altman. They discuss various topics related to artificial intelligence, including the
development of GPT4, the potential risks and benefits of AI, and the role of AI in
society. They also touch on personal topics such as success, leadership, and the
recent controversy surrounding Silicon Valley Bank. Overall, the podcast provides
insights into the current state and future of AI, as well as the perspectives of a
prominent figure in the tech industry.
