### Setup paths

In [1]:
import sys
import os
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Get the absolute path of the project root
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

### Step 0: (Optional) Download audio files from YouTube

In this project, I'm using podcasts from youtube as examples. This step is not needed if you already have an mp3 file, skip to step 1.

Takes ~10 seconds to download a 3 hour video

In [3]:
from data_pull_and_prep.audio_from_yt import download_audio

video_url = "https://www.youtube.com/watch?v=klRb0_BAX9g"  # Replace with your video URL
video_name = "peter_thiel"  # Replace with your video name
output_dir = project_root+"/data/audio_2/"  # Replace with your output directory

download_audio(video_url, video_name, output_dir)

Downloading audio...
Audio downloaded: /Users/rishikeshdhayarkar/rag-audio-indexing/data/audio_2/peter_thiel.mp3


### Step 1: Convert mp3 file to text and generate time stamps for each character

In [3]:
import data_pull_and_prep.utils as utils
import data_pull_and_prep.data_preparation as data_prep
import textwrap

Convert audio to text using Open AI whisper

Takes ~6 minutes time to transcribe a ~3 hour video using OpenAI Whisper and it costs ~$0.05 

In [5]:
audio_file_path = project_root+"/data/audio_2/peter_thiel.mp3"
transcription = data_prep.transcribe(audio_file_path)

Transcribed output contains an id, piece of converted text, start time and end time in the audio clip for this text. 

In [7]:
print(len(transcription))
print(f"id: {transcription[5][0]}")
print(f"text: {transcription[5][1]}")
print(f"start time: {transcription[5][2]}")
print(f"end time: {transcription[5][3]}")

2635
id: 5
text:  My pleasure. Thanks for having me.
start time: 14.0
end time: 16.0


For each such segment(above cell), calculate the time stamp for each character in text by interpolation.

But why do we need character level time stamps?
Character level timestamps provide the flexibility to create textchunks of any size.

In [4]:
transcription_with_char_timestamps = utils.import_pkl_file(project_root+"/data/audio_1/ivanka_trump_transcription_char_timestamps.pkl")

In [5]:
# transcription_with_char_timestamps = data_prep.map_characters_to_timestamps(transcription)

In [5]:
print(f"Total number of characters: {len(transcription_with_char_timestamps)}")
transcription_with_char_timestamps[:5]

Total number of characters: 157283


[(' ', 0.0),
 ('T', 0.06449438202247192),
 ('h', 0.12898876404494383),
 ('e', 0.19348314606741573),
 (' ', 0.25797752808988766)]

Save character level timestamps

In [8]:
# utils.save_as_pickle_file(directory=project_root+"/data/audio_2/",
#                     filename="transcription_with_char_timestamps_peter_thiel.pkl",
#                     data=transcription_with_char_timestamps)

Create custom chunks using SentenceSplitter from Llamaindex.

In [7]:
custom_chunking_obj = data_prep.CreateCustomTextChunks(transcription_with_char_timestamps)
text_chunks_with_timestamps = custom_chunking_obj.create_custom_text_chunks()

In [7]:
print(f"Number of text chunks: {len(text_chunks_with_timestamps)}")

Number of text chunks: 42


In [8]:
print(textwrap.fill(str(text_chunks_with_timestamps[0]), width=160))

("The following is a conversation with Ivanka Trump, businesswoman, real estate developer, and former senior advisor to the President of the United States. I've
gotten to know Ivanka well over the past two years. We've become good friends, hitting it off right away over our mutual love of reading, especially
philosophical writings from Marcus Aurelius, Joseph Campbell, Alan Watts, Victor Franklin, and so on. She is a truly kind, compassionate, and thoughtful human
being. In the past, people have attacked her. In my view, to get indirectly at her dad, Donald Trump, as part of a dirty game of politics and clickbait
journalism. These attacks obscured many projects and efforts, often bipartisan, that she helped get done, and they obscured the truth of who she is as a human
being. Through all that, she never returned the attacks with anything but kindness, and always walked through the fire of it all with grace. For this, and much
more, she is an inspiration, and I'm honored to be able to c

### Step2: Create textnodes and add them to a vector store

In [5]:
import basic_rag.rag as rag

In [8]:
custom_ingestion_obj = rag.CustomRAG(
              index_name="ivanka-08-28-via-class",
              text_chunks_with_timestamps=text_chunks_with_timestamps
              )

Takes ~10 seconds to upload all text nodes to pinecone vector store.

In [9]:
await custom_ingestion_obj.create_text_nodes_and_add_to_vector_store()

100%|██████████| 42/42 [00:19<00:00,  2.17it/s]
100%|██████████| 42/42 [00:17<00:00,  2.44it/s]
100%|██████████| 42/42 [00:10<00:00,  4.06it/s]
Upserted vectors: 100%|██████████| 42/42 [00:02<00:00, 18.53it/s]


### Step 3: Embedding retrieval from vector store

In [10]:
query_str = "What are some famous quotes mentioned in this podcast and who said them?"

In [11]:
custom_retriever_obj = rag.CustomRetriever(embed_model=custom_ingestion_obj.embed_model,
                                           vector_store=custom_ingestion_obj.vector_store)
query_result = custom_retriever_obj.retrieve(query=query_str)

### Step 4: Response Synthesis

In [12]:
import basic_rag.response_synthesizer as response_synthesizer

In [13]:
async def run_aggregate_answers(summarizer, retrieved_nodes, query_str, num_children=10):
    return await summarizer.generate_response_hs(
        retrieved_nodes=retrieved_nodes,
        query_str=query_str,
        num_children=num_children
    )

In [14]:
response_synthesizer_obj = response_synthesizer.aHierarchicalSummarizer(llm=custom_ingestion_obj.llm)
response = await run_aggregate_answers(response_synthesizer_obj, query_result.nodes, query_str, 5)      

print(textwrap.fill(response, 80))

1. "Watch the stars and see yourself running with them" - Marcus Aurelius 2.
"You're under no obligation to be the same person you were five minutes ago" -
Alan Watts 3. "Radio Ga Ga" - Queen 4. "I torture myself" - Jango Ryan Hart


### A wrapper that takes query and returns an answer string(step 3 + step 4)

In [15]:
from basic_rag.utils import aRetrieveAndAnswer

In [16]:
raa = aRetrieveAndAnswer(ingestion_obj=custom_ingestion_obj)

In [17]:
query_str = "What are some famous quotes mentioned in this podcast and who said them?"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

1. "Watch the stars and see yourself running with them" - Marcus Aurelius 2.
"She doesn't condemn or criticize, she loves and accepts" - Dolly Parton 3.
"You're under no obligation to be the same person you were five minutes ago" -
Alan Watts 4. "Radio Ga Ga" - Freddie Mercury 5. "Amazing Grace" - performed by
Aretha Franklin


In [18]:
query_str = "What are Ivanka Trump's thoughts on music?"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump enjoys live music and is a fan of Dolly Parton. She appreciates
Dolly Parton's authenticity, talent, and positivity. Her daughter also attended
Dolly Parton's concert at Madison Square Garden.


In [28]:
query_str = "There must be a music related stuff in the context. Give me more details on that."
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

The context provides information about live music performances, favorite live
performances, watching old videos of musicians like Jango Ryan Hart, Adele's
influence from Aretha Franklin, and Queen's live performances. It also discusses
the experience of attending live music shows, the anticipation of choosing a
show, and the communal aspect of going to concerts with friends. Additionally,
there is mention of Michael Jackson attending a performance of "One Moment in
Time" and being brought by someone's father, highlighting the impact of music
and performance on individuals. Dolly Parton, a renowned country music singer,
songwriter, and performer, is also discussed, with admiration for her
authenticity, talent, and positive energy. The interviewer expresses interest in
interviewing Dolly Parton due to her iconic status in the music industry.


In [29]:
query_str = "describe the incident with kim kardashian. Something about prisons"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Based on the context information provided, the incident involving Kim Kardashian
and prisons was related to her advocacy work in criminal justice reform efforts.
Specifically, she was involved in advocating for pardons and commutations for
individuals who were deserving and overdue. This included working closely with
Jared Kushner on the First Step Act, a piece of legislation that provided many
people with another opportunity. Additionally, Kim Kardashian was mentioned as
being on the phone with Corvon's mother late at night to inform her that her son
would be getting out of prison the next day. This incident highlights Kim
Kardashian's efforts in advocating for individuals impacted by the criminal
justice system.


In [32]:
query_str = "What are some architectural projects that Ivanka Trump has worked on?"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump has not worked on any architectural projects based on the provided
context information.


In [31]:
query_str = "Describe the impact of NYC on Ivanka Trump's life"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

New York City had a significant impact on Ivanka Trump's life in various ways.
It was where she lived and worked during a crucial time in her career, juggling
multiple successful businesses while raising her children. The city exposed her
to the struggles of everyday Americans, influencing her decision to join her
father's campaign and later his administration. The fast-paced and diverse
environment likely shaped her perspective on tackling challenges in Washington.
Additionally, growing up in NYC fueled her love for building and architecture,
inspiring her career path. The city also provided her with a strong support
system, which she missed when she moved to Washington, D.C. Overall, NYC played
a pivotal role in Ivanka Trump's personal and professional growth, shaping her
ambitions and connections.


In [33]:
query_str = "What buildings has Ivanka Trump worked on?"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump has worked on various building projects, particularly in real
estate development, with a focus on designing beautiful city skylines,
especially in New York City. She has been involved in multidisciplinary aspects
of real estate, including design, engineering, construction, time management,
and project planning. While specific buildings are not mentioned in the context,
it is clear that Ivanka Trump has been involved in various iconic structures and
projects throughout her career in real estate development.


In [34]:
query_str = "What does Ivanka Trump say about her children and husband?"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump talks about the support she received from her husband and children
during her time in Washington, mentioning how important they were to her. She
describes how her husband, Jared, would make her coffee every morning as an act
of love, and how her children brought levity and joy to her life, especially her
youngest son, Theo, who learned how to make cappuccinos for her. She emphasizes
the grounding and importance of having her family by her side during challenging
times.


In [35]:
query_str = "Describe the type of work Ivanka Trump did as a senior advisor to the president"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

As a senior advisor to the president, Ivanka Trump was involved in policy-
making, decision-making, and advocacy on issues related to women, families,
workforce development, economic initiatives, and tax cuts. She also focused on
addressing social challenges such as homelessness and childcare issues, working
towards finding solutions to these problems. Ivanka Trump worked on bipartisan
projects and initiatives to make a positive impact and contribute to the
betterment of society, despite facing criticism and attacks. Overall, her work
involved a combination of policy development, advocacy, and coalition-building
to achieve positive outcomes for the American people.


In [36]:
query_str = "Describe the type of work Ivanka Trump did on taxes"
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Based on the context information provided, Ivanka Trump was involved in policy-
making, negotiation, and building coalitions of support for tax cuts during her
time in Washington.


In [44]:
query_str = """What unique considerations and complexities were involved in the renovation of the old post 
office building, particularly in terms of layout, room configurations, and preserving the building's historic exterior?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

The renovation of the old post office building involved unique considerations
and complexities in terms of layout and room configurations due to the fact that
each of the almost 300 rooms had a different layout, making it impossible to
repeat any design. The setbacks in the building required moving plumbing, adding
to the complexity of the project. Additionally, preserving the historic exterior
of the building was a priority, so any additions had to be done gently and
signage additions had to be carefully considered. The meticulous restoration of
the exterior was a key aspect of the renovation project, requiring careful
restoration work to ensure that the original architectural features and design
elements of the building were maintained.


In [48]:
query_str = """How did Ivanka Trump's children, particularly her son Theo, contribute to her sense of grounding and joy during her time in Washington, D.C.?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump's son Theo contributed to her sense of grounding and joy during her
time in Washington, D.C. by learning how to make coffee for her every morning,
bringing her a cappuccino as a loving gesture. This act of care and
thoughtfulness from Theo helped Ivanka feel grounded and happy amidst the
challenges of her career. Additionally, her children, including Theo, reminded
her of the importance of family, love, and simple joys, providing her with a
sense of normalcy and perspective in the midst of the pressures of her role in
the political sphere. Their presence helped her stay connected to her values and
priorities, bringing balance, strength, and inspiration to her life.


In [120]:
query_str = """What does Ivanka Trump like to do in her free time? What are her hobbies and interests?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

In her free time, Ivanka Trump enjoys designing city skylines, reading
newspapers in bed while her husband brings her coffee, spending time with her
children, working in real estate and fashion, and running the Trump Hotel
collection. She also has a passion for architecture and the multidisciplinary
aspects of real estate, as well as a strong interest in understanding and
helping people in need.


In [124]:
query_str = """What are some things that Ivanka trump has said about her father and his interests or hobbies other than politics?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Ivanka Trump mentioned that her father, Jared, used to make her coffee every
morning as an act of love. She also shared a story about how their son, Theo,
learned how to make a cappuccino and would bring it to her every morning with
joy. These anecdotes show a more personal and family-oriented side of her
father, focusing on his interests or hobbies outside of politics.


In [125]:
query_str = """What type of music is the tump family interested in? Specifically what type of music does Donald trump like?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Based on the context information provided, it is not possible to determine the
specific type of music the Trump family is interested in or what type of music
Donald Trump likes.


In [127]:
query_str = """Give me context around Mar-a-lago and the Trump family"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Mar-a-Lago is a resort and private club owned by the Trump family, located in
Palm Beach, Florida. It has been a significant destination for the Trump family
for vacations, events, and official functions during Donald Trump's presidency.
The Trump family has been known to host events, entertain guests, and conduct
meetings at Mar-a-Lago, reflecting their lifestyle and business interests in
luxury hospitality and real estate. However, the high membership fees and
potential conflicts of interest with President Trump's presidency have also made
Mar-a-Lago a controversial location.


In [128]:
query_str = """Does Ivanka trumps father like music?"""
response_coroutine = raa.answer(query_str)
print(textwrap.fill(await response_coroutine, 80))

Based on the context information provided, it is not possible to determine
whether or not Ivanka Trump's father likes music.


#### Don't know what questions to ask? Not a problem!

As a part of textnode creation each textnode gets mappped to set of LLM generated relevant questions and topics/titles.

Any number of random questions and topics/titles can be generated for a given audio clip. Each run of 'get_random_questions_and_titles' function generates new questions and topics.

In [39]:
from basic_rag.utils import RandomQuestionAndTopics

In [41]:
random_questions_and_topics_obj = RandomQuestionAndTopics(ingestion_obj=custom_ingestion_obj)

In [119]:
random_questions_and_topics_obj.print_questions_and_topics(
    random_questions_and_topics_obj.get_random_questions_and_titles())

╔═════════════════════════════════════════════════════════════════════════════╗
║ Questions                                                                   ║
╠═════════════════════════════════════════════════════════════════════════════╣
║ 1. What similarities did the interviewee draw between ancient buildings in    ║
║    Europe and giant trees in terms of history, energy, and stories emanating  ║
║    from them?                                                                 ║
║                                                                             ║
║ 2. How did Ivanka Trump's personal experiences and interactions during the    ║
║    campaign influence her decision to join her father in Washington?          ║
║                                                                             ║
║ 3. How did the individual in the excerpt prioritize their family over         ║
║    engaging in politics, and what alternative ways did they find to serve     ║
║    their community?     

In [110]:
print(len(custom_ingestion_obj.all_nodes))

42


In [112]:
all_document_titles = []
for i in range(len(custom_ingestion_obj.all_nodes)):
    all_document_titles.append(custom_ingestion_obj.all_nodes[i].metadata['document_title'])

print(set(all_document_titles))

{'"Architectural Inspirations: Exploring Beauty, Innovation, and Collaboration in New York City"'}


In [113]:
all_document_titles = []
for i in range(len(custom_ingestion_obj.all_nodes)):
    all_document_titles.append(custom_ingestion_obj.all_nodes[i].metadata['questions_this_excerpt_can_answer'])

for d in set(all_document_titles):
    print(textwrap.fill(d, 180))
    print("-"*100)

1. How did Ivanka Trump's fashion brand cater to the needs of the modern, multi-dimensional woman who works in various aspects of her life? 2. What challenges did Ivanka Trump face
in creating a successful fashion brand, and how did she overcome them with the help of her partners? 3. How did Ivanka Trump's approach to designing and creating fashion products
lead to a strong emotional connection with her customers, as evidenced by their continued love and appreciation for her shoes and other products years later?
----------------------------------------------------------------------------------------------------
1. How did the process of doubling the child tax credit involve meeting with lawmakers, convincing them of the policy's benefits, and campaigning in their districts to gain
support? 2. What impact did the doubling of the child tax credit have on American families, with 40 million families receiving an average of $2,200 each year as a result? 3. Can
you provide an example of a sp

In [111]:
print(len(set(all_document_titles)))

42
