## **Use LangChain and ChatGPT to Summarize YouTube Videos of Any Length**



This notebook shows all the steps to use LangChain and OpenAI's GPT 3.5 to create summaries of YouTube videos.


### **Steps Covered in this Tutorial**

We'll be coveringt the following steps in this tutorial:

1. Installing Dependencies
2. Define helper functions to extract transcripts from YouTube videos
3. Convert the text into a doc using LangChain
4. Split the document into chunks using LangChain
5. Create a summary using ChatGPT + LangChain


## 1. **Install Dependencies**

In [1]:
!pip install openai

Collecting openai
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m454.3 kB/s[0m eta [36m0:00:00[0m1m811.8 kB/s[0m eta [36m0:00:01[0m
[?25hCollecting requests>=2.20
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tqdm
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m0:01[0m:01[0m
[?25hCollecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197 kB)
Collecting idna<4,>=2.5
  Using

In [2]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.0.179-py3-none-any.whl (907 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m907.7/907.7 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m[31m1.6 MB/s[0m eta [36m0:00:01[0m
[?25hCollecting PyYAML>=5.4.1
  Using cached PyYAML-6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
Collecting SQLAlchemy<3,>=1.4
  Downloading SQLAlchemy-2.0.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
Collecting dataclasses-json<0.6.0,>=0.5.7
  Using cached dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting numexpr<3.0.0,>=2.8.4
  Downloading numexpr-2.8.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (383 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m 

In [3]:
!pip install youtube-transcript-api # for Linux and MacOs 

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.0-py3-none-any.whl (23 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.0


In [4]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m445.7 kB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting regex>=2022.1.18
  Downloading regex-2023.5.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (780 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m780.9/780.9 kB[0m [31m244.2 kB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m[36m0:00:01[0m:01[0m
Installing collected packages: regex, tiktoken
Successfully installed regex-2023.5.5 tiktoken-0.4.0


## **2. Add Video URL**
Insert the URL of the video you want to summarize

In [5]:
url = 'https://www.youtube.com/watch?v=LWiM-LuRe6w&t=1204s' ## Replace this with the URL of video you want to summarize

## **3. Import Libraries**

In [7]:
from youtube_transcript_api import YouTubeTranscriptApi
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

from langchain import OpenAI, PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader

OPENAI_KEY = "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" ## Add your API key 


## **4. Helper Functions**

In [8]:
import re
def extract_youtube_id(url):
    youtube_id_match = re.search(r'(?<=v=)[^&#]+', url)
    youtube_id_match = youtube_id_match or re.search(r'(?<=be/)[^&#]+', url)
    trailer = youtube_id_match.group(0) if youtube_id_match else None
    return trailer

In [9]:
video_id = extract_youtube_id(url)
srt = YouTubeTranscriptApi.get_transcript(video_id)
text_arr=''

for ele in srt:
  text_arr=text_arr+' '+ele['text']

In [11]:
def text_to_doc(text_arr):
  from langchain.text_splitter import RecursiveCharacterTextSplitter


  text = [text_arr]
  page_docs = [Document(page_content=page) for page in text]

  # Add page numbers as metadata
  for i, doc in enumerate(page_docs):
      doc.metadata["page"] = i + 1

  # Split pages into chunks
  doc_chunks = []

  for doc in page_docs:
      text_splitter = RecursiveCharacterTextSplitter(
          chunk_size=800,
          separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
          chunk_overlap=0,
      )
      chunks = text_splitter.split_text(doc.page_content)
      for i, chunk in enumerate(chunks):
          doc = Document(
              page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
          )
          # Add sources a metadata
          doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
          doc_chunks.append(doc)
  return doc_chunks

## **5. Code to generate summary**

In [12]:
prompt_template = """The following is a portion of a transcript from a 
youtube video. Your job is to write a concise summary.

{text}

"""


PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

In [18]:
from langchain.chat_models import ChatOpenAI

model_name='gpt-4'

model_name='gpt-3.5-turbo'


llm = ChatOpenAI(model_name=model_name,temperature=0.3,openai_api_key=OPENAI_KEY)

In [19]:
doc_chunks=text_to_doc(text_arr)

In [20]:
chain = load_summarize_chain(llm, chain_type="map_reduce",map_prompt=PROMPT, combine_prompt=PROMPT)
summary = chain.run(doc_chunks)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 3d11ed938d155603c8dd7ef65ecb0f57 in your message.).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 4f4ca047c029269195bb3ece3af9b2dd in your message.).


## **6. Summary Output**

In [25]:
summary

'The video discusses the potential dangers of artificial intelligence (AI) and the need for regulation to prevent it from causing harm to society. The speaker notes that AI has the potential to manipulate human language and create a matrix-like world of illusions, leading to societal polarization, undermined mental health, and destabilized democratic societies. The video also touches on the controversy surrounding online censorship and the difficulty of regulating AI due to the amount of computing power and money required. The speaker emphasizes the need for time to understand what kind of regulations are necessary and suggests that the first regulation should be to make it mandatory for AI to disclose that it is an AI.'

In [26]:
llm = OpenAI(temperature=0, openai_api_key=OPENAI_KEY)

prompt_template = """Use the following pieces of context to write a detailed linkedin article using the given summary of a selected topic

{context}


"""






PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context"]
)


In [29]:
from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate


chatgpt_chain = LLMChain(
    llm=OpenAI(temperature=0,openai_api_key=OPENAI_KEY), 
    prompt=PROMPT, 
    verbose=True, 
)





In [31]:
output = chatgpt_chain.predict(context=summary)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to write a detailed linkedin article using the given summary of a selected topic

The video discusses the potential dangers of artificial intelligence (AI) and the need for regulation to prevent it from causing harm to society. The speaker notes that AI has the potential to manipulate human language and create a matrix-like world of illusions, leading to societal polarization, undermined mental health, and destabilized democratic societies. The video also touches on the controversy surrounding online censorship and the difficulty of regulating AI due to the amount of computing power and money required. The speaker emphasizes the need for time to understand what kind of regulations are necessary and suggests that the first regulation should be to make it mandatory for AI to disclose that it is an AI.


[0m

[1m> Finished chain.[0m


In [33]:
print(output)


Summary: This video discusses the potential dangers of artificial intelligence (AI) and the need for regulation to prevent it from causing harm to society.

In a recent video, the speaker discussed the potential dangers of artificial intelligence (AI) and the need for regulation to prevent it from causing harm to society. AI has the potential to manipulate human language and create a matrix-like world of illusions, leading to societal polarization, undermined mental health, and destabilized democratic societies. The speaker also touched on the controversy surrounding online censorship and the difficulty of regulating AI due to the amount of computing power and money required.

The speaker emphasized the need for time to understand what kind of regulations are necessary and suggested that the first regulation should be to make it mandatory for AI to disclose that it is an AI. This would help to ensure that people are aware of the potential dangers of AI and can make informed decisions 