# Summarise podcast transcripts

Prepare raw caption data (manual copy-past + data clean)

In [4]:
import pandas as pd

# Read the text file into a DataFrame
df = pd.read_csv('caption.txt', names=['Lines'])

# Split odd and even lines into separate columns
df_new = pd.DataFrame({
    'text': df[df.index % 2 != 0]['Lines'].reset_index(drop=True),
    'timestamp': df[df.index % 2 == 0]['Lines'].reset_index(drop=True)
})

print(df_new)

                                                   text timestamp
0     two one boom all right we're live thank you ve...      0:00
1     information and listening to you talk for uh q...      0:06
2     having me my pleasure my pleasure you are one ...      0:12
3     you are um you're deep in the tech world but y...      0:20
4     perspective in terms of how to live life as op...      0:26
...                                                 ...       ...
1310  actually okay just at naval then i have a webs...   2:11:25
1311  channel neval and i have a podcast in the worl...   2:11:30
1312                    thank you bye everybody [Music]   2:11:35
1313                                 [Applause] [Music]   2:11:41
1314                                          [Music] i   2:11:51

[1315 rows x 2 columns]


# OpenAI setup

In [8]:
import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

from langchain.llms import OpenAI
llm = OpenAI(temperature=0, openai_api_key=openai.api_key)

In [45]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [52]:
captions_text = df_new['text'].str.cat(sep='\n\n')
llm.get_num_tokens(captions_text)

32397

In [53]:
# text_splitter = RecursiveCharacterTextSplitter()
text_splitter = CharacterTextSplitter()
texts = text_splitter.split_text(captions_text)
docs = [Document(page_content=t) for t in texts[:3]]

## LangChain methods

https://docs.langchain.com/docs/components/chains/index_related_chains
- stuffing method: only makes a single call to the LLM
- map reduce: run an initial prompt on each chunk, then a different prompt to combine all th initial outputs
- refine: pass in output for remaining contents

In [59]:
prompt_template = """divide the following conversation into sections based on the topic, summarize the conversation
and generate a subtitle for each:

{text}

Poscast Summary:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

### Stuffing

In [54]:
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

' In this conversation, the speaker discusses the idea of balance in life and how to live a happy life. He talks about the idea of specialization being for insects and how people should try their hand at everything. He also talks about the idea of reading for understanding rather than reading for completion, and how social media can be a double-edged sword in terms of cultivating a self-image. He also talks about the idea of being rich and anonymous rather than poor and famous.'

with customized prompt

In [62]:
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
results = chain.run(docs)
print(results)



Balanced Perspective on Life:

In this conversation, the two discuss the importance of having a balanced perspective on life and how to live it in a happy way. They talk about the ancient model of life, where one would try their hand at everything, and how specialization is for insects. They also discuss the importance of being willing to start over and have a beginner's mind, and how curiosity is key to learning and understanding.

Reading Habits:

The conversation then shifts to the topic of reading habits, and how one should read for understanding rather than to complete books. They discuss the importance of multitasking and how social media has changed the way we consume information. They also talk about the dangers of being a celebrity and how it can be a problem, and how people should strive to be rich and anonymous rather than poor and famous.


### Map Reduce

In [55]:
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

' The speaker discusses the importance of having a balanced perspective in life and not just focusing on success and financial success. He encourages people to have a beginner\'s mind and to be willing to learn new things and make incremental progress. He also talks about the effects of modern society on attention spans and how social media can create an unrealistic version of one\'s life. He encourages people to be "rich and anonymous" rather than "poor and famous" and to understand the difficulties of being a celebrity.'

with customized prompt

In [61]:
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).


{'intermediate_steps': ["\n\nTopic 1: Investing and Tech:\nTwo people discuss the perspective of a big investor in the tech world who has a balanced approach to life.\n\nTopic 2: Combining Unusual Things:\nThe conversation shifts to the idea of combining unusual things to create something interesting, such as Bruce Lee's combination of martial arts and philosophy.\n\nTopic 3: Multivariate Humans:\nThe conversation moves to the idea that humans are multivariate and can experience and think about many different things.\n\nTopic 4: Ancient Model of Life:\nThe two discuss the ancient model of life, where one starts out young and then goes to war, runs a business, serves in government, and becomes a philosopher.\n\nTopic 5: Specialization is for Insects:\nThe conversation shifts to the idea that everyone should be able to do everything and that specialization is for insects.\n\nTopic 6: Fear of Starting Over:\nThe two discuss the fear of starting over and how it can be difficult to go back 

# Claude 100k tokens