In [1]:
!pip install tiktoken -q

In [2]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY")

In [2]:
from langchain.llms import OpenAI

model = OpenAI(temperature=0.6)

In [3]:
from langchain.document_loaders import TextLoader
loader = TextLoader(r"D:\DataSquad_Tasks\summarizer\artifacts\10_29_2023_01_57_36\transcription\transcript.txt")
documents = loader.load()


In [4]:
from langchain.chains.summarize import load_summarize_chain

chain = load_summarize_chain(model, chain_type="map_reduce", verbose=True)
caption = chain.run(documents)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


" It is my pleasure to welcome Dr. Andrew Wu tonight. Andrew is the Managing General Partner of AI Fund, founder of Deep Learning AI and lending AI, chairman and co-founder of Coursera, and an unjunct professor of computer science here at Stanford. Previously he had started and led the Google Brain team, which had helped Google adopt modern AI. And he was also director of the Stanford AI lab. About 8 million people, one in 1000 persons on the planet have taken an AI class from him and through both his education and his AI work. He has changed humor's lives. Please welcome Dr. An. Thank you Lisa. It's good to see everyone. So what I want to do today is chat new electricity. One of the difficult things to understand about AI is that it is a general purpose technology. Meaning that it's not useful only for on

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 7425 tokens (7169 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

In [19]:
caption

' This video explains how to do file handling and organization in Python, including how to navigate directories, rename, move, copy, and remove files and directories. It also covers the OS and pathlib modules, the shutil module, and how to run a Python script as a cron job. A real-life example of an automation script is provided.'

In [9]:
prefix = """You are an AI assistant which generates a summary from a transcript text of a video. \
The following is the transcript text generated from a video. Creating meaningful summary of the \
the transcript is your job. Based on the summary, anyone can learn the topics covered in the video. 
The summary should not be just a caption which only describes about the content of the video. You can create \
seperate points to demonstrate topics coverd in the video and which all things are discussed in that topic; write that inside \
that particular points. In short, the summary should be precised and descriptive unlike a shorter caption.
You are not supposed to answer any other questions.
"""

suffix = """
Transcript: {text}
Summary: """



In [10]:
from src.prompts.video_summarizer_prompt import VideoSummarizerPromptTemplate

prompt = VideoSummarizerPromptTemplate(
    prefix=prefix,
    suffix = suffix,
    input_variables = ["text"]
)

In [11]:
prompt_template = prompt.format(
    text = documents[0].page_content
)

In [12]:
len(documents)

1

In [13]:
print(prompt_template)

You are an AI assistant which generates a summary from a transcript text of a video. The following is the transcript text generated from a video. Creating meaningful summary of the the transcript is your job. Based on the summary, anyone can learn the topics covered in the video. 
The summary should not be just a caption which only describes about the content of the video. You can create seperate points to demonstrate topics coverd in the video and which all things are discussed in that topic; write that inside that particular points. In short, the summary should be precised and descriptive unlike a shorter caption.
You are not supposed to answer any other questions.


Transcript:  What's up everyone? In this video I show you how to do file handling and file organization in Python. So I show you how to navigate to different directories how to rename files, move them, copy them or remove files and directories. So these are all the functions we need in order to automate our file organiza

In [16]:
from langchain.chains import LLMChain

chain = LLMChain(
    llm = model,
    prompt = prompt
)


In [17]:
print(chain.run(documents[0].page_content))


This video demonstrates how to do file handling and file organization in Python using the OS and pathlib modules. It covers topics such as navigating through different directories, renaming files, moving files, copying files, and removing files and directories. It also provides a real-life example of how to use an automation script to keep a desktop organized. Finally, it shows how to use a cron job to automate the script.


In [10]:
chain.run("what is machine learning")

'\n\nMachine Learning is a subset of Artificial Intelligence which enables machines to learn from data, identify patterns, and make decisions without explicit programming. It is used to develop applications that can analyze large datasets to make predictions, automate tasks, and provide recommendations. This technology is used in many industries such as healthcare, finance, and retail. In this video, the concept of Machine Learning is discussed in detail, including the types of machine learning algorithms, the different types of data used for machine learning, and the potential applications of machine learning.'

In [14]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain
from langchain.text_splitter import RecursiveCharacterTextSplitter

map_template = """The following is a set of documents containing a transcript of an youtube video.
{docs}
Based on this list of docs, please generate the summary of the transcripts.
Helpful Answer:"""

map_prompt = PromptTemplate.from_template(map_template)

map_chain = LLMChain(llm=model, prompt=map_prompt)

reduce_template = """The following is set of summaries of a transcripted video:
{doc_summaries}
Take these and distill it into a final, consolidated summary of that particular video transcript. 
Helpful Answer:"""

reduce_prompt = PromptTemplate.from_template(reduce_template)

reduce_chain = LLMChain(llm=model, prompt=reduce_prompt)

combine_documents_chain = StuffDocumentsChain(
                    llm_chain=reduce_chain, document_variable_name="doc_summaries"
                    )

reduce_documents_chain = ReduceDocumentsChain(
                combine_documents_chain=combine_documents_chain,
                collapse_documents_chain=combine_documents_chain,
                token_max=4000,
                )

map_reduce_chain = MapReduceDocumentsChain(
                llm_chain=map_chain,
                reduce_documents_chain=reduce_documents_chain,
                document_variable_name="docs",
                return_intermediate_steps=False,
                )

In [15]:
documents

[Document(page_content=" It is my pleasure to welcome Dr. Andrew Wu tonight. Andrew is the Managing General Partner of AI Fund, founder of Deep Learning AI and lending AI, chairman and co-founder of Coursera, and an unjunct professor of computer science here at Stanford. Previously he had started and led the Google Brain team, which had helped Google adopt modern AI. And he was also director of the Stanford AI lab. About 8 million people, one in 1000 persons on the planet have taken an AI class from him and through both his education and his AI work. He has changed humor's lives. Please welcome Dr. An. Thank you Lisa. It's good to see everyone. So what I want to do today is chat new electricity. One of the difficult things to understand about AI is that it is a general purpose technology. Meaning that it's not useful only for one thing, but it's useful for lots of different applications. Kind of like electricity. If I were to ask you what is electricity good for? You know, it's not any

In [16]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)

docs = text_splitter.split_documents(documents)
output = map_reduce_chain(docs)

In [17]:
print(output["output_text"])



In this video, Dr. Andrew Wu discussed the similarities between AI and electricity, and how AI has changed millions of lives through both education and work. He explained that supervised learning and generative AI are the two most important tools for AI development and discussed how they can be used for a variety of tasks. He also discussed the progress of AI over the last decade, the tremendous momentum behind Genose Bay and other tools for supervised learning, and the opportunities for startups and large companies to create value through the use of general purpose technologies. Additionally, he discussed the difficulty of using AI outside of the consumer software internet, the inefficiencies of hiring many engineers to work on expensive projects, and the two trends for enabling custom AI systems to be built and deployed by pieces of factories. Finally, he discussed the process for building startups and the potential ethical issues of AI.


In [18]:
documents

[Document(page_content=" It is my pleasure to welcome Dr. Andrew Wu tonight. Andrew is the Managing General Partner of AI Fund, founder of Deep Learning AI and lending AI, chairman and co-founder of Coursera, and an unjunct professor of computer science here at Stanford. Previously he had started and led the Google Brain team, which had helped Google adopt modern AI. And he was also director of the Stanford AI lab. About 8 million people, one in 1000 persons on the planet have taken an AI class from him and through both his education and his AI work. He has changed humor's lives. Please welcome Dr. An. Thank you Lisa. It's good to see everyone. So what I want to do today is chat new electricity. One of the difficult things to understand about AI is that it is a general purpose technology. Meaning that it's not useful only for one thing, but it's useful for lots of different applications. Kind of like electricity. If I were to ask you what is electricity good for? You know, it's not any

In [1]:
from src.pipeline.summarizer_pipeline import SummarizerPipeline

txt = """Now here are 5 major reasons to wait for Galaxy S24 Ultra. The S24 Ultra is expected to be an AI powerhouse offering features from chatGPD and Google Bud 
such as the ability to create contents based on keywords you input. The text to image generative AI is also expected to be on S24 Ultra. Though the phone is expected to look simila
r to the S23 Ultra, this time the phone is going to feature a titanium frame which is going to be more durable. The pre-release benchmark scores of Snapdragon 8 Gen 3 or the Xenos 
2400 appears to be great and we can expect a custom the The RAMs and is expected to drop the 10x optical zoom which may seem like a downgrade but that is expected to be replaced by
 5x optical zoom which is going to be more practical and useful for sure. The main camera may come with the same tone megapixels but the 5x telephoto lens is expected to be 50 mega
pixels which is going to be great and the night photography and video photography is expected to be much better compared to the predecessors. Now are you planning to upgrade to S24 Ultra? Drop a comment, share your thoughts and subscribe to this channel for more tips.
"""
SummarizerPipeline().summarize_transcript(txt)

transcript docs [Document(page_content='Now here are 5 major reasons to wait for Galaxy S24 Ultra. The S24 Ultra is expected to be an AI powerhouse offering features from chatGPD and Google Bud \nsuch as the ability to create contents based on keywords you input. The text to image generative AI is also expected to be on S24 Ultra. Though the phone is expected to look simila\nr to the S23 Ultra, this time the phone is going to feature a titanium frame which is going to be more durable. The pre-release benchmark scores of Snapdragon 8 Gen 3 or the Xenos \n2400 appears to be great and we can expect a custom the The RAMs and is expected to drop the 10x optical zoom which may seem like a downgrade but that is expected to be replaced by\n 5x optical zoom which is going to be more practical and useful for sure. The main camera may come with the same tone megapixels but the 5x telephoto lens is expected to be 50 mega', metadata={}), Document(page_content='pixels which is going to be great and 

{'input_documents': [Document(page_content='Now here are 5 major reasons to wait for Galaxy S24 Ultra. The S24 Ultra is expected to be an AI powerhouse offering features from chatGPD and Google Bud \nsuch as the ability to create contents based on keywords you input. The text to image generative AI is also expected to be on S24 Ultra. Though the phone is expected to look simila\nr to the S23 Ultra, this time the phone is going to feature a titanium frame which is going to be more durable. The pre-release benchmark scores of Snapdragon 8 Gen 3 or the Xenos \n2400 appears to be great and we can expect a custom the The RAMs and is expected to drop the 10x optical zoom which may seem like a downgrade but that is expected to be replaced by\n 5x optical zoom which is going to be more practical and useful for sure. The main camera may come with the same tone megapixels but the 5x telephoto lens is expected to be 50 mega', metadata={}),
  Document(page_content='pixels which is going to be grea