# Summarise podcast transcripts

Prepare raw caption data (manual copy-past + data clean)

In [15]:
import pandas as pd

# Read the text file into a DataFrame
df = pd.read_csv('caption.txt', names=['Lines'])

# Split odd and even lines into separate columns
df_new = pd.DataFrame({
    'text': df[df.index % 2 != 0]['Lines'].reset_index(drop=True),
    'timestamp': df[df.index % 2 == 0]['Lines'].reset_index(drop=True)
})

print(df_new)

                                                   text timestamp
0     two one boom all right we're live thank you ve...      0:00
1     information and listening to you talk for uh q...      0:06
2     having me my pleasure my pleasure you are one ...      0:12
3     you are um you're deep in the tech world but y...      0:20
4     perspective in terms of how to live life as op...      0:26
...                                                 ...       ...
1310  actually okay just at naval then i have a webs...   2:11:25
1311  channel neval and i have a podcast in the worl...   2:11:30
1312                    thank you bye everybody [Music]   2:11:35
1313                                 [Applause] [Music]   2:11:41
1314                                          [Music] i   2:11:51

[1315 rows x 2 columns]


# OpenAI setup

In [8]:
import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

from langchain.llms import OpenAI
llm = OpenAI(temperature=0, openai_api_key=openai.api_key)

In [45]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [52]:
captions_text = df_new['text'].str.cat(sep='\n\n')
llm.get_num_tokens(captions_text)

32397

In [53]:
# text_splitter = RecursiveCharacterTextSplitter()
text_splitter = CharacterTextSplitter()
texts = text_splitter.split_text(captions_text)
docs = [Document(page_content=t) for t in texts[:3]]

## LangChain methods

https://docs.langchain.com/docs/components/chains/index_related_chains
- stuffing method: only makes a single call to the LLM
- map reduce: run an initial prompt on each chunk, then a different prompt to combine all th initial outputs
- refine: pass in output for remaining contents

In [59]:
prompt_template = """divide the following conversation into sections based on the topic, summarize the conversation
and generate a subtitle for each:

{text}

Podcast Summary:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

### Stuffing

In [54]:
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

' In this conversation, the speaker discusses the idea of balance in life and how to live a happy life. He talks about the idea of specialization being for insects and how people should try their hand at everything. He also talks about the idea of reading for understanding rather than reading for completion, and how social media can be a double-edged sword in terms of cultivating a self-image. He also talks about the idea of being rich and anonymous rather than poor and famous.'

with customized prompt

In [62]:
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
results = chain.run(docs)
print(results)



Balanced Perspective on Life:

In this conversation, the two discuss the importance of having a balanced perspective on life and how to live it in a happy way. They talk about the ancient model of life, where one would try their hand at everything, and how specialization is for insects. They also discuss the importance of being willing to start over and have a beginner's mind, and how curiosity is key to learning and understanding.

Reading Habits:

The conversation then shifts to the topic of reading habits, and how one should read for understanding rather than to complete books. They discuss the importance of multitasking and how social media has changed the way we consume information. They also talk about the dangers of being a celebrity and how it can be a problem, and how people should strive to be rich and anonymous rather than poor and famous.


### Map Reduce

In [55]:
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

' The speaker discusses the importance of having a balanced perspective in life and not just focusing on success and financial success. He encourages people to have a beginner\'s mind and to be willing to learn new things and make incremental progress. He also talks about the effects of modern society on attention spans and how social media can create an unrealistic version of one\'s life. He encourages people to be "rich and anonymous" rather than "poor and famous" and to understand the difficulties of being a celebrity.'

with customized prompt

In [61]:
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).


{'intermediate_steps': ["\n\nTopic 1: Investing and Tech:\nTwo people discuss the perspective of a big investor in the tech world who has a balanced approach to life.\n\nTopic 2: Combining Unusual Things:\nThe conversation shifts to the idea of combining unusual things to create something interesting, such as Bruce Lee's combination of martial arts and philosophy.\n\nTopic 3: Multivariate Humans:\nThe conversation moves to the idea that humans are multivariate and can experience and think about many different things.\n\nTopic 4: Ancient Model of Life:\nThe two discuss the ancient model of life, where one starts out young and then goes to war, runs a business, serves in government, and becomes a philosopher.\n\nTopic 5: Specialization is for Insects:\nThe conversation shifts to the idea that everyone should be able to do everything and that specialization is for insects.\n\nTopic 6: Fear of Starting Over:\nThe two discuss the fear of starting over and how it can be difficult to go back 

# Claude 100k tokens
https://saikatkumardey.com/post/claude+langchain+youtube/

In [1]:
import argparse
import os
import anthropic
from langchain.document_loaders import YoutubeLoader

In [2]:
def load_transcript(url):
    loader = YoutubeLoader.from_youtube_url(
        url,
        add_video_info=True,
    )
    docs = loader.load()
    if len(docs) == 0:
        raise ValueError("Sorry, No transcript found.😢")
    return docs[0]


In [12]:
def write_summary(doc):

    anthropic_key = os.getenv("ANTHROPIC_KEY")

    client = anthropic.Client(anthropic_key)

    prompt = f"""
Please write a summary based on the video transcript.

Instructions

- Do not include any details that are not mentioned in the video.
- First divide the video into sections, and give each section a short section headings
- Under each section, use bullet points to summarise each section, and include a few quotes within each bullet points directly from the conversation
- Do not include any personal opinions or thoughts.
- Do not miss or ignore any content.


Video transcript:

{doc.dict()}

"""
    prompt_formatted = (
        f"{anthropic.HUMAN_PROMPT}{prompt}\n{anthropic.AI_PROMPT}"
    )
    response = client.completion(
        prompt=prompt_formatted,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-v1-100k",
        max_tokens_to_sample=100000,
        temperature=0,
    )
    return response["completion"]

In [10]:
def summarize_youtube(url):

    # parser = argparse.ArgumentParser(
    #     description="Write a professional summary based on a YouTube video transcript."
    # )
    # parser.add_argument("url", type=str, help="The YouTube URL of the video.")
    # args = parser.parse_args()
    try:
        doc = load_transcript(url)
        summary = write_summary(doc)
        print(summary)
    except Exception as e:
        print(e)


In [9]:
summarize_youtube('https://www.youtube.com/watch?v=3qHkcs3kG44')

 Here is a summary of the video:

Section 1: The Meaning of Life

- There is no definitive answer to the meaning of life. Any answer leads to infinite regress, circular reasoning, or an axiom. 
- The act of pursuing the meaning of life is useful as it gives you intrinsic understanding, but there is no single answer. 
- Happiness is a choice. By believing it is a choice, you can work on finding it within yourself.

Section 2: Happiness and Money

- Material possessions do not bring lasting happiness. Being rich will not make you happy, but being poor can make you unhappy.  
- True happiness comes from within. By observing your thoughts and letting go of negative judgments, you can develop inner peace.
- Anyone can become rich through education, hard work and creating value for society. Specific knowledge, leverage and ownership can help you make money.

Section 3: Meetings and Time Management

- Meetings are a waste of time. Phone calls and emails are more efficient.  
- Be jealous of y

In [11]:
summarize_youtube('https://www.youtube.com/watch?v=cFSrxSBrgSc&t=161s')

 Here is a summary of the key points from the conversation:

• Ayla grew up in a conservative and controlling household. She was homeschooled and her father had narcissistic personality disorder. 

• After leaving home, Ayla struggled with the trauma of her upbringing. She used LSD which helped her gain a new perspective and let go of some of the pain. 

• Ayla started working as a cam girl which gave her freedom and control. She enjoyed being creative and experimenting with different ideas and routines. 

• Ayla then started escorting which she found less competitive and stressful than camming. She was able to charge high rates due to her marketing skills and experience.

• Ayla has done surveys on a range of topics related to human sexuality including fetishes, rape spectrum, body count, and relationships. The large sample sizes and transparent methodology give valuable insights.

• Ayla believes that polyamory comes more naturally to her and that monogamy goes against human nature. 

In [13]:
summarize_youtube('https://www.youtube.com/watch?v=PdE-waSx-d8')

 Here is a summary of the key points from the video:

- Stephen Wolfram discusses ChatGPT and large language models. He explains that they work by predicting the next word based on probabilities learned from a large training dataset. 

- Wolfram Language aims to represent knowledge in a precise, computable form that can be used to derive new facts and insights. ChatGPT acts as a linguistic interface to generate Wolfram Language code from natural language prompts.

- Wolfram discusses the concept of computational thinking - how to formalize and represent the world in a computational way. He argues this should be part of standard education, though the curriculum is still being developed.

- Wolfram talks about the second law of thermodynamics and how it relates to computational irreducibility and bounded observers. He explains how the laws of physics emerge from the interplay between computational irreducibility in the universe and the limitations of observers.

- Wolfram discusses the n

### With timestamps

In [28]:
captions_time_text = df['Lines'].str.cat(sep='\n\n')
captions_time_text

"0:00\n\ntwo one boom all right we're live thank you very much for doing this man i really appreciate it i've been absorbing your\n\n0:06\n\ninformation and listening to you talk for uh quite a while now so it's uh it's great to actually meet you thanks for\n\n0:12\n\nhaving me my pleasure my pleasure you are one of the rare guys that is uh you're a big investor\n\n0:20\n\nyou are um you're deep in the tech world but yet you seem to have a very balanced\n\n0:26\n\nperspective in terms of how to live life as opposed to not just be entirely focused on\n\n0:33\n\nsuccess and financial success and tech investing but rather\n\n0:40\n\nhow to live your life in a happy way that's a it's a it's not balance\n\n0:45\n\nyeah you know i think the reason why people like uh hearing me is because like if it's like if you go to a circus\n\n0:52\n\nand you see a bear right that's kind of interesting but not that much if you see a unicycle that's interesting but you\n\n0:57\n\nsee a bear on a unicycle t

In [31]:
def write_summary_with_time(doc):

    anthropic_key = os.getenv("ANTHROPIC_KEY")

    client = anthropic.Client(anthropic_key)

    prompt = f"""
Please write a summary based on the video transcript. The transcripts consists of a timestamp in the format of "mm:ss", followed by the texts. 

Instructions:

- First divide the video into sections. Return the actual timestamp from the beginning of each section
- Second, give each section a short description as headings
- Third, under each section, use bullet points to summarise each section

Followed the exact four steps in your summary. Do not include any personal opinions or thoughts. Do not miss or ignore any content. Do not include any details that are not mentioned in the video.

Video transcript:

{doc}

"""
    prompt_formatted = (
        f"{anthropic.HUMAN_PROMPT}{prompt}\n{anthropic.AI_PROMPT}"
    )
    response = client.completion(
        prompt=prompt_formatted,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-v1-100k",
        max_tokens_to_sample=100000,
        temperature=0,
    )
    return response["completion"]

In [32]:
summary = write_summary_with_time(captions_time_text)
print(summary)

 Here is a summary of the key points from the transcript:

1. Section 1 (0:00 - 12:00): Naval discusses the importance of understanding fundamentals rather than memorizing advanced concepts. He talks about how meditation and witnessing your thoughts can lead to happiness. Money alone cannot make you happy, but financial freedom can give you time for self-improvement. 

2. Section 2 (12:00 - 24:00): Naval talks about how technology is atomizing companies and enabling more people to work for themselves. He believes that in the future, most high-quality work will be done on a gig basis. He also discusses nuclear fusion and the need for innovation to solve energy problems.

3. Section 3 (24:00 - 36:00): They discuss universal basic income and whether everyone can truly be wealthy. Naval argues that with proper education, everyone could be trained for creative jobs while automation handles manual work. However, he believes UBI is a slippery slope towards socialism. 

4. Section 4 (36:00 - 4