## This demo app shows:
* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video
* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method
* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info

We start by installing the necessary packages:
- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video
- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo
- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer
- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos

**Note** This example uses Replicate to host the Llama model. If you have not set up/or used Replicate before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up Replicate before continuing with this example.
If you do not want to use Replicate, you will need to make some changes to this notebook as you go along.

In [2]:
!pip install langchain youtube-transcript-api tiktoken pytube


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Let's load the YouTube video transcript using the YoutubeLoader.

In [3]:
from langchain.document_loaders import YoutubeLoader

loader = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=PqHfuaF2EiY", add_video_info=True
)

In [4]:
# load the youtube video caption into Documents
docs = loader.load()

In [5]:
# check the docs length and content
len(docs[0].page_content), docs[0].page_content[:300]

(12911,
 'In a previous video, I discussed the\xa0\ndifference between the Command Prompt\xa0\xa0 and PowerShell in Windows. And if\xa0\nyou\'ve ever gone to use PowerShell,\xa0\xa0 you may have noticed that there\'s a little\xa0\nmessage that it shows every time at the top\xa0\xa0 when you run it that says, "Install the latest\xa0\nPowerShell ')

We are using Replicate in this example to host our Llama 2 model so you will need to get a Replicate token.

To get the Replicate token: 

- You will need to first sign in with Replicate with your github account
- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. 

**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.

Alternatively, you can run Llama locally. See:
- [HelloLlamaCloud](HelloLlamaCloud.ipynb) for further information on how to run Llama using Replicate.
- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally.

In [6]:
# enter your Replicate API token, or you can use local Llama. See README for more info
from getpass import getpass
import os

REPLICATE_API_TOKEN = 'r8_duxtnDKCMHgm3zubWmGycyVGlgPv1TT0BLvf4'
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN


Next we call the Llama 2 model from Replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).

You can add them here in the format: model_name/version

If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)

In [7]:
from langchain.llms import Replicate

llama2_13b = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"
llm = Replicate(
    model=llama2_13b,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)

Collecting Replicate
  Using cached replicate-0.21.0-py3-none-any.whl (34 kB)
Collecting httpx<1,>=0.21.0 (from Replicate)
  Using cached httpx-0.25.2-py3-none-any.whl (74 kB)
Collecting httpcore==1.* (from httpx<1,>=0.21.0->Replicate)
  Using cached httpcore-1.0.2-py3-none-any.whl (76 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.21.0->Replicate)
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Installing collected packages: h11, httpcore, httpx, Replicate
Successfully installed Replicate-0.21.0 h11-0.14.0 httpcore-1.0.2 httpx-0.25.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us.

In [8]:
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
prompt = ChatPromptTemplate.from_template(
    "Give me a summary of the text below: {text}?"
)
chain = LLMChain(llm=llm, prompt=prompt)
# be careful of the input text length sent to LLM
text = docs[0].page_content[:4000]
summary = chain.run(text)
# this is the summary of the first 4000 characters of the video content
print(summary)

 Sure! Here's a summary of the text:

The video discusses the differences between Windows PowerShell and PowerShell Core (also known as PowerShell). All versions of PowerShell are the same software, but different stages of development. The latest version is PowerShell 7. Windows PowerShell is a specific version that comes pre-installed with Windows, while PowerShell can be downloaded and installed separately. Windows PowerShell is an older version that is more stable and included with Windows 10 and 11, while PowerShell is the latest version with more features. Microsoft keeps both versions separate because some companies may use older versions of Windows and need the stability of the built-in PowerShell version.


Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`.

In [9]:
# try to get a summary of the whole content
text = docs[0].page_content
summary = chain.run(text)
print(summary)


 Sure! Here is a summary of the text:

The video discusses the difference between Windows PowerShell and PowerShell Core, which is a newer version of PowerShell that is cross-platform and open-source. It explains that PowerShell Core is written on the .NET Core framework, whereas Windows PowerShell is on the older .NET framework. The video also touches on PowerShell ISE, an integrated scripting environment only available with Windows PowerShell. It concludes by recommending that viewers install the latest version of PowerShell, as it has many additional features and improvements.



Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.

We will use the LangChain's `load_summarize_chain` and play around with the `chain_type`.


In [10]:
from langchain.chains.summarize import load_summarize_chain
# see https://python.langchain.com/docs/use_cases/summarization for more info
chain = load_summarize_chain(llm, chain_type="stuff") # other supported methods are map_reduce and refine
chain.run(docs)
# same RuntimeError: Your input is too long. but stuff works for shorter text with input length <= 4096 tokens

' Sure! Here is a concise summary of the video:\n\nMicrosoft has been pushing an update to the latest version of PowerShell, which is called PowerShell Core. This new version includes several improvements, such as cross-platform compatibility and open-source development. However, there are some differences between PowerShell and PowerShell Core, including changes to the .NET framework and compatibility issues with older versions. Microsoft has also introduced a new integrated scripting environment (ISE) for PowerShell, which is only available with the Windows version. Whether or not to install the latest version of PowerShell depends on individual needs and preferences, but it is recommended to stay up-to-date with the latest features and improvements.'

In [11]:
chain = load_summarize_chain(llm, chain_type="refine")
# still get the "RuntimeError: Your input is too long. Max input length is 4096 tokens"
chain.run(docs)

' Sure! Here is a concise summary of the video:\n\nMicrosoft has been pushing an update to the latest version of PowerShell, which is called PowerShell Core. This new version includes several improvements, such as cross-platform compatibility and open-source development. However, there are some differences between PowerShell and PowerShell Core, including changes to the .NET framework and compatibility issues with older versions. Microsoft has also introduced a new integrated scripting environment (ISE) for PowerShell, which is only available with the Windows version. Whether or not to install the latest version of PowerShell depends on individual needs and preferences, but it is recommended to stay up-to-date with the latest features and improvements.'


Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` to iteratively create an answer.

In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# we need to split the long input text
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=3000, chunk_overlap=0
)
split_docs = text_splitter.split_documents(docs)

In [13]:
# check the splitted docs lengths
len(split_docs), len(docs), len(split_docs[0].page_content), len(docs[0].page_content)

(2, 1, 12859, 12911)

In [14]:
# now get the summary of the whole docs - the whole youtube content
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(split_docs)

" Sure, I'd be happy to help! Based on the additional context provided, here's a revised summary of the video:\n\nMicrosoft has been pushing an update to the latest version of PowerShell, called PowerShell Core, due to compatibility issues and changes in the .NET framework. PowerShell Core is built on .NET Core, making it cross-platform and able to run on Windows, Linux, and macOS. While the updated version offers improved performance and new features, some scripts may not work with older versions of PowerShell, so testing is essential before upgrading. Microsoft recommends installing the latest version for optimal results.\n\nI hope this revised summary provides more context and helps to clarify the key points of the video. If you have any further requests or need any additional assistance, please let me know!"

You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents.

In [15]:
# another method is map_reduce
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(split_docs)

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
vocab.json: 100%|██████████| 1.04M/1.04M [00:03<00:00, 302kB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 462kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:03<00:00, 419kB/s]
config.json: 100%|██████████| 665/665 [00:00<00:00, 2.38MB/s]


" Sure! Here's a concise summary of the video:\n\nMicrosoft is updating PowerShell to PowerShell Core, which is built on .NET Core instead of .NET Framework. This change allows for cross-platform compatibility (Windows, Linux, macOS) and new features. However, some scripts may not work with older versions of PowerShell, so testing is recommended before upgrading."

To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.
We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output.

In [16]:
# to find how many calls to Llama have been made and the details of inputs and outputs of each call, set langchain to debug
import langchain
langchain.debug = True

# stuff method will cause the error in the end
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(split_docs)

[32;1m[1;3m[chain/start][0m [1m[1:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
[0m{
  "text": "In a previous video, I discussed the \ndifference between the Command Prompt   and PowerShell in Windows. And if \nyou've ever gone to use PowerShell,   you may have noticed that there's a little \nmessage that it shows every time at the top   when you run it that says, \"Install the latest \nPowerShell for new features and improvements.\" Now that new version they're talking about \nis PowerShell Core, it's sometimes called,   also PowerShell 7 is the latest diversion number \nfor it. And the names are kind of confusing,   so that's one thing I'm gonna explain. \nNow, one thing you might be wondering   is if they're pushing so hard for you to \ninstall this latest version, why don't   they just update your version of PowerShell \nautomatically and 

' Sure! Here is a concise summary of the video:\n\nMicrosoft has been pushing an update to the latest version of PowerShell, which is called PowerShell Core. This new version includes several improvements, such as cross-platform compatibility and open-source development. However, there are some differences between PowerShell and PowerShell Core, including changes in naming conventions and compatibility issues. The main difference is that PowerShell Core is written on the .NET Core framework, while traditional PowerShell is based on the .NET framework. Additionally, PowerShell Core includes a new integrated scripting environment (ISE) for writing and debugging scripts. Overall, installing the latest version of PowerShell is recommended, but it may not be compatible with all scripts or systems.'

In [17]:
# but refine works
chain = load_summarize_chain(llm, chain_type="refine")
chain.run(split_docs)

[32;1m[1;3m[chain/start][0m [1m[1:chain:RefineDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RefineDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
[0m{
  "text": "In a previous video, I discussed the \ndifference between the Command Prompt   and PowerShell in Windows. And if \nyou've ever gone to use PowerShell,   you may have noticed that there's a little \nmessage that it shows every time at the top   when you run it that says, \"Install the latest \nPowerShell for new features and improvements.\" Now that new version they're talking about \nis PowerShell Core, it's sometimes called,   also PowerShell 7 is the latest diversion number \nfor it. And the names are kind of confusing,   so that's one thing I'm gonna explain. \nNow, one thing you might be wondering   is if they're pushing so hard for you to \ninstall this latest version, why don't   they just update your version of PowerShell \nautomatically an

" Sure, I'd be happy to help! Based on the additional context provided, here's a revised summary of the video:\n\nMicrosoft has released an updated version of PowerShell, now called PowerShell Core, which offers significant improvements over previous versions. With its new .NET Core framework, PowerShell Core is cross-platform and open-source, providing better performance, new features, and language enhancements. While some scripts may not be compatible with older versions of PowerShell or other operating systems, upgrading to the latest version is highly recommended for companies looking to invest in PowerShell infrastructure.\n\nIn comparison to the original summary, this revised version provides more context about the changes in PowerShell and why upgrading is important for companies. It also highlights the benefits of the new .NET Core framework and the fact that PowerShell Core is now cross-platform and open-source."


As you can see, `stuff` fails because it tries to treat all the split documents as one and "stuffs" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes.