# Video to Podcast

In [1]:
pip install -qU pytubefix moviepy openai-whisper langchain langchain-text-splitters langchain-core langchain-community lancedb langchain-openai tiktoken openai python-dotenv azure-cognitiveservices-speech

Note: you may need to restart the kernel to use updated packages.


In [2]:
from pytubefix import YouTube
from moviepy.editor import VideoFileClip
import os



# link of the video to be downloaded
# yt = YouTube('https://www.youtube.com/watch?v=8N9L-XK1eEU')
# yt = YouTube('https://www.youtube.com/watch?v=hJllkhC5GZU')
# yt = YouTube('https://www.youtube.com/watch?v=KKWPSkYN3vw')
# yt = YouTube('https://www.youtube.com/watch?v=8MMoBiIj9hI')
yt = YouTube('https://www.youtube.com/watch?v=jLVl5V8roMU')

yt.title

# Get all streams and filter for mp4 files
# yt.streams
# ... .filter(progressive=True, file_extension='mp4')
# ... .order_by('resolution')
# ... .asc()
# ... .first()
# ... .download()


'Have You Picked the Wrong AI Agent Framework?'

# ffmpeg required for Whisper

Install ffmpeg

```
sudo apt update && sudo apt install ffmpeg
```

In [3]:
fileName = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download()


In [4]:
title = yt.title
url = yt.watch_url
description = yt.description
thumbnail_url = yt.thumbnail_url

# Escape description and replace newlines with \n
escaped_description = description.replace('\n', '\\n').replace('"', '\\"')

# Write a file with all the information about the video as a json
info_file_name = fileName.split('.')[0] + '_info.json'
with open(info_file_name, 'w') as f:
    f.write(f'{{"title": "{title}", "url": "{url}", "description": "{escaped_description}", "thumbnail_url": "{thumbnail_url}"}}')


In [5]:
print(yt.description)

Are you diving into the world of agent-based AI workflows and finding it more complicated than it needs to be? You’ve probably chosen the wrong approach, burdened with verbose, time-consuming frameworks. In this video, I’ll show you a far easier method to build your AI workflow, cutting through the unnecessary boilerplate and complexity.

Using CrewAI as an example, inspired by Code with Brandon’s excellent tutorials, we’ll explore how to streamline your processes. Watch as I automate a YouTube strategy in just seconds, a task that would typically take hours. Whether you’re a YouTuber or have daily repetitive tasks, this video will show you how to simplify and enhance your workflow with ease.

Join me as we debunk the myth that AI workflows are inherently complex. We’ll contrast CrewAI’s traditional method with a more efficient solution using Typescript and BunJS, reducing hundreds of lines of code to just a few dozen. Discover how you can achieve the same results with less effort and 

In [6]:
# Create the file name for the audio file
audioFileName = fileName.split('.')[0] + '.wav'

In [7]:
video = VideoFileClip(fileName)
audio = video.audio
audio.write_audiofile(audioFileName)

MoviePy - Writing audio in /home/pmalarme/workspace/drop-all/Have You Picked the Wrong AI Agent Framework.wav


                                                                        

MoviePy - Done.




In [8]:
# Cleanup: delete the video file
video.close()
os.remove(fileName)

In [9]:
import whisper

model = whisper.load_model('base')
result = model.transcribe(audioFileName)

print(result['text'])

 The agent-based workflow is one of the hottest trends in the AI world right now. And if you are building one, you've probably chosen the wrong way to build it. And by wrong, I mean more verbose, more complicated, and more time-consuming. The framework's out there. Want you to think that this stuff is hard, but it's not. In this video, I'm going to show you how to build one far easier than any other tutorial out on YouTube right now. Most of the tools out there require reams of repetitive boilerplate that has little need to actually be there. In most cases, you can do it a lot more simply, and again, in this video, I'm going to remind you of an easier way to get the same thing done. You're probably going to smack yourself on the forehead when you see how easy this can be in most cases. Now, let's get right to it and take a look at an example using Crew AI. I saw this tutorial by Code with Brandon, and it does a great job. In fact, all the videos on his channel are really well done, and

In [10]:
# Cleanup: delete the audio file
os.remove(audioFileName)

In [11]:
# Save the transcript
transcriptFileName = fileName.split('.')[0] + '.txt'
with open(transcriptFileName, "w") as f:
    f.write(result["text"])

In [12]:
prompt_template = """Given title and description of a video, can you check its transcript and correct it. Give back only the corrected transcript.

Title: {title}
Description:
{description}

Transcript:
{transcript}
"""

In [13]:
# Read .env file
from dotenv import load_dotenv

load_dotenv()

True

In [14]:
from openai import AzureOpenAI


azure_openai_client = AzureOpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  # azure_deployment=os.environ['OPENAI_AZURE_DEPLOYMENT'],
  api_version=os.environ['OPENAI_API_VERSION']
)

In [15]:
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens
    

encoding_name = 'gpt-4o'

# Split the text in chunks of maximum 500 tokens with '.' as separator without using langchain
sentences = result['text'].split('.')
chunks = []
chunk = ''
chunk_number = 1
for sentence in sentences:
    if num_tokens_from_string(chunk + sentence, encoding_name) > 500:
        prompt = prompt_template.format(title=title, description=description, transcript=chunk)
        corrected_chunk = azure_openai_client.chat.completions.create(
            model="gpt-4o",
            temperature=0,
            top_p=1,
            messages=[
                {"role": "user", "content": prompt},
            ],
        ).choices[0].message.content
        chunks.append(corrected_chunk)
        #  # Write the file
        # chunk_file_name = fileName.split('.')[0] + '_chunk_' + str(chunk_number) + '.txt'
        # with open(chunk_file_name, "w") as f:
        #     f.write(chunk)
        # corrected_chunk_file_name = fileName.split('.')[0] + '_chunk_' + str(chunk_number) + '_corrected.txt'
        # with open(corrected_chunk_file_name, "w") as f:
        #     f.write(corrected_chunk)
        chunk = sentence + '. '
        chunk_number += 1
    else:
        chunk += sentence + '. '
        
# Write the last chunk
prompt = prompt_template.format(title=title, description=description, transcript=chunk)
corrected_chunk = azure_openai_client.chat.completions.create(
    model="gpt-4o",
    temperature=0,
    top_p=1,
    messages=[
        {"role": "user", "content": prompt},
    ],
).choices[0].message.content
chunks.append(corrected_chunk)
# # Write the file
# chunk_file_name = fileName.split('.')[0] + '_chunk_' + str(chunk_number) + '.txt'
# with open(chunk_file_name, "w") as f:
#     f.write(chunk)
# corrected_chunk_file_name = fileName.split('.')[0] + '_chunk_' + str(chunk_number) + '_corrected.txt'
# with open(corrected_chunk_file_name, "w") as f:
#     f.write(corrected_chunk)

In [16]:
# Create the full corrected transcript and add whitelines between the chunks but not for the last chunk
full_corrected_transcript = ''
for i, chunk in enumerate(chunks):
    full_corrected_transcript += chunk
    if i < len(chunks) - 1:
        full_corrected_transcript += '\n\n'
        

In [17]:
# Write the full corrected transcript
full_corrected_transcript_file_name = fileName.split('.')[0] + '_corrected.txt'
with open(full_corrected_transcript_file_name, "w") as f:
    f.write(full_corrected_transcript)

In [18]:
# Create langchain document
from langchain_core.documents.base import Document

document = Document(
  page_content=full_corrected_transcript,
  metadata={
    "title": title,
    "source": url,
    "description": description,
    "thumbnail_url": thumbnail_url
  }
)

documents = [document]  # List of documents to be processed

In [19]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
  chunk_size=1000,
  chunk_overlap=200
)
splits = text_splitter.split_documents(documents)



In [20]:
from langchain_openai import AzureOpenAIEmbeddings

azure_openai_embeddings = AzureOpenAIEmbeddings(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  api_version=os.environ['OPENAI_API_VERSION'],
  azure_deployment=os.environ['OPENAI_AZURE_DEPLOYMENT_EMBEDDINGS']
)

In [21]:
from langchain_community.vectorstores import LanceDB

vectorstore = LanceDB.from_documents(
  documents=splits,
  embedding=azure_openai_embeddings
)

retriever = vectorstore.as_retriever()

In [22]:
import lancedb

db = lancedb.connect("/tmp/lancedb")
# table = db.create_table(
#     "my_table",
#     data=[
#         {
#             "vector": embeddings.embed_query("Hello World"),
#             "text": "Hello World",
#             "id": "1",
#         }
#     ],
#     mode="overwrite",
# )


In [23]:
query = "What is LoRA?"
vectorstore.similarity_search(query)

[Document(metadata={'description': 'Are you diving into the world of agent-based AI workflows and finding it more complicated than it needs to be? You’ve probably chosen the wrong approach, burdened with verbose, time-consuming frameworks. In this video, I’ll show you a far easier method to build your AI workflow, cutting through the unnecessary boilerplate and complexity.\n\nUsing CrewAI as an example, inspired by Code with Brandon’s excellent tutorials, we’ll explore how to streamline your processes. Watch as I automate a YouTube strategy in just seconds, a task that would typically take hours. Whether you’re a YouTuber or have daily repetitive tasks, this video will show you how to simplify and enhance your workflow with ease.\n\nJoin me as we debunk the myth that AI workflows are inherently complex. We’ll contrast CrewAI’s traditional method with a more efficient solution using Typescript and BunJS, reducing hundreds of lines of code to just a few dozen. Discover how you can achiev

In [24]:
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [25]:
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  api_version=os.environ['OPENAI_API_VERSION'],
  azure_deployment=os.environ['OPENAI_AZURE_DEPLOYMENT']
)

In [26]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [27]:
# response = rag_chain.invoke({"input": "What is the purpose of the video 'How to fine-tune a model using LoRA (step by step)'?"})
# response['answer']

In [28]:
# response = rag_chain.invoke({"input": "Create an abstract of the video 'How to fine-tune a model using LoRA (step by step)'."})
# response['answer']

In [29]:
response = rag_chain.invoke({"input": "Create an outline for a podcast based on the video " + title + "."})
print(response['answer'])

### Podcast Outline: Have You Picked the Wrong AI Agent Framework?

#### 1. Introduction
- Brief introduction of the podcast host(s)
- Overview of the topic: AI agent frameworks and the common pitfalls
- Mention the inspiration from the video "Have You Picked the Wrong AI Agent Framework?"

#### 2. The Complexity of Current AI Agent Frameworks
- Discussion on why current frameworks are considered verbose, complicated, and time-consuming
- Examples of repetitive boilerplate code
- Comparison to early Windows programming using the Charles Petzold book

#### 3. Simplifying the Process: An Easier Way
- Introduction to the simplified method showcased in the video
- Benefits of using simpler methods: less code, fewer bugs, and quicker results

#### 4. Case Study: Using CrewAI
- Detailed walkthrough of the CrewAI example from the video
  - Automating components of a YouTube strategy
  - Tasks like researching topics, generating titles, descriptions, and emails
- Discussion on the time savings

In [30]:
# Create a prompt with the outline to get a full podcast text
podcast_outline = response['answer']
podcast_prompt = f"""Create a podcast complete text based on the following outline:

{podcast_outline}

This text will be used to generate the audio of the podcast. There are 2 participants in the podcast: the host and the guest. The host will introduce the podcast and the guest. The guest will explain the outline of the podcast. The host will ask questions to the guest and the guest will answer them. The host will thank the guest and close the podcast.
The name of the host is Pierre and his role is to be the listener's podcast assistant. The name of the guest is Marie and her role is to be the expert in the podcast topic. The name of the podcast is "Advanced AI Podcast".

When you thanks someone, write "Thank you" and the name of the person without a comma. For example, "Thank you Pierre".

Output as a JSON with the following fields:
- title: Title of the podcast
- text: an array of objects with the speaker, the intonation and the text to be spoken
Return only the json as plain text.
"""
formatted_podcast_prompt = podcast_prompt.format(podcast_outline)

podcast_script_response = rag_chain.invoke({"input": formatted_podcast_prompt})
podcast_script_text = podcast_script_response['answer']
print(podcast_script_text)

{
  "title": "Have You Picked the Wrong AI Agent Framework?",
  "text": [
    {
      "speaker": "Pierre",
      "intonation": "enthusiastic",
      "text": "Welcome to the Advanced AI Podcast! I'm your host, Pierre, your go-to podcast assistant. Today, we have an exciting topic lined up: AI agent frameworks and the common pitfalls you might be facing. Inspired by the video 'Have You Picked the Wrong AI Agent Framework?' we're here to dive deep into this issue. Joining us is Marie, our expert on today's topic. Welcome, Marie!"
    },
    {
      "speaker": "Marie",
      "intonation": "warm",
      "text": "Thank you Pierre. It's great to be here. Today, we'll be discussing the complexity of current AI agent frameworks, how we can simplify the process, and we'll even look at a case study using CrewAI. We'll also compare code lengths and discuss practical applications and potential challenges in the future."
    },
    {
      "speaker": "Pierre",
      "intonation": "curious",
      "t

In [31]:
import azure.cognitiveservices.speech as speechsdk
import json

# Creates an instance of a speech config with specified subscription key and service region.
speech_key = os.environ['AZURE_SPEECH_KEY']
service_region = os.environ['AZURE_SPEECH_REGION']

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# This is an example of SSML (Speech Synthesis Markup Language) format.
# <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
#   <voice name="en-US-AvaMultilingualNeural">
#     When you're on the freeway, it's a good idea to use a GPS.
#   </voice>
# </speak>
# Parse the JSON response and create a SSML with en-US-GuyNeural for Pierre Voice
# and en-US-JennyNeural for Marie Voice
podcast_script_json = json.loads(str(podcast_script_text))
ssml_text = "<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>"
for line in podcast_script_json['text']:
    speaker = line['speaker']
    text = line['text']
    if speaker == 'Pierre':
        ssml_text += f"<voice name='en-US-GuyNeural'>{text}</voice>"
    elif speaker == 'Marie':
        ssml_text += f"<voice name='en-US-JennyNeural'>{text}</voice>"
ssml_text += "</speak>"

# use the default speaker as audio output.
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

result = speech_synthesizer.speak_ssml_async(ssml_text).get()
stream = speechsdk.AudioDataStream(result)
podcast_filename = fileName.split('.')[0] + '_podcast.wav'
stream.save_to_wav_file(podcast_filename)



ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
