# Video to Podcast

## Prerequisites

1. Create a virtual environment and install the required packages.

    ```bash
    python -m venv venv
    ```

2. Install ffmpeg for Whisper

    ```bash
    sudo apt-get install ffmpeg
    ```

3. Install the required packages:

In [58]:
pip install -qU pytubefix moviepy openai-whisper langchain langchain-text-splitters langchain-core langchain-community lancedb langchain-openai tiktoken openai python-dotenv azure-cognitiveservices-speech

Note: you may need to restart the kernel to use updated packages.


In [59]:
# Read .env file

from dotenv import load_dotenv

load_dotenv()

True

## Download the video

In [60]:
from pytubefix import YouTube
from moviepy.editor import VideoFileClip
import os

yt = YouTube('https://www.youtube.com/watch?v=8MMoBiIj9hI')

title = yt.title
url = yt.watch_url
description = yt.description
thumbnail_url = yt.thumbnail_url

print(f'Title: {title}')


Title: Azure AI Studio vs Copilot Studio


In [61]:
# Create the information file

# Escape description and replace newlines with \n
escaped_description = description.replace('\n', '\\n').replace('"', '\\"')

# Write a file with all the information about the video as a json
info_file_name = fileName.split('.')[0] + '_info.json'
with open(info_file_name, 'w') as f:
    f.write(f'{{"title": "{title}", "url": "{url}", "description": "{escaped_description}", "thumbnail_url": "{thumbnail_url}"}}')


In [62]:
# Download the video with lowest resolution as the focus is on the audio

fileName = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download()


## Extract the audio

In [63]:
# Create the file name for the audio file

audioFileName = fileName.split('.')[0] + '.wav'

In [64]:
# Create the audio

video = VideoFileClip(fileName)
audio = video.audio
audio.write_audiofile(audioFileName)

MoviePy - Writing audio in /home/pmalarme/workspace/drop-all/Azure AI Studio vs Copilot Studio.wav


                                                                        

MoviePy - Done.




In [65]:
# Cleanup: delete the video file

video.close()
os.remove(fileName)

## Create the transcript

In [66]:
# Create the transcript using Whisper

import whisper

model = whisper.load_model('base')
result = model.transcribe(audioFileName)

print(result['text'])

 If you're wondering about the difference between Azure AI Studio and Copilot Studio, or in fact, Azure Open AI Studio and the bot framework and all of these pieces that allow us to create Copilot, I'm going to take you through an explanation in a demo here to help you get this sorted out. So Azure AI Studio and Copilot Studio are actually two quite different things, and it's not necessarily one or the other they do work together. So to start with, what I'm going to do is give you a bit of a demo walkthrough of Azure AI Studio, and then we're going to spend most of the time having a look at the Copilot Studio experience so that you can get that real sense and understanding of what these tools do and how you can choose which one to use and when. So help you understand this, I'm going to frame this up using three C words, cost complexity, and another phrase called conversational orchestration. Now I reckon as we go through this video, I'm going to come up with more C words, don't worry, 

In [67]:
# Cleanup: delete the audio file

os.remove(audioFileName)

In [68]:
# Save the transcript

transcriptFileName = fileName.split('.')[0] + '.txt'
with open(transcriptFileName, "w") as f:
    f.write(result["text"])

In [69]:
# Create the prompt to improve each transcript chunck using the title and the description of the video (technologies, people names, etc.)

prompt_template = """Given title and description of a video, can you check its transcript and correct it. Give back only the corrected transcript.

Title: {title}
Description:
{description}

Transcript:
{transcript}
"""

In [70]:
# Create the gpt-4o model client

from openai import AzureOpenAI


azure_openai_client = AzureOpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  api_version=os.environ['OPENAI_API_VERSION']
)

In [71]:
# Divide the transcript in semantic chunks of maximum 500 tokens containing full sentences

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens
    

encoding_name = 'gpt-4o'

# Split the text in chunks of maximum 500 tokens with '.' as separator without using langchain
sentences = result['text'].split('.')
chunks = []
chunk = ''
chunk_number = 1
for sentence in sentences:
    if num_tokens_from_string(chunk + sentence, encoding_name) > 500:
        prompt = prompt_template.format(title=title, description=description, transcript=chunk)
        corrected_chunk = azure_openai_client.chat.completions.create(
            model="gpt-4o",
            temperature=0,
            top_p=1,
            messages=[
                {"role": "user", "content": prompt},
            ],
        ).choices[0].message.content
        chunks.append(corrected_chunk)
        chunk = sentence + '. '
        chunk_number += 1
    else:
        chunk += sentence + '. '
        
# Write the last chunk
prompt = prompt_template.format(title=title, description=description, transcript=chunk)
corrected_chunk = azure_openai_client.chat.completions.create(
    model="gpt-4o",
    temperature=0,
    top_p=1,
    messages=[
        {"role": "user", "content": prompt},
    ],
).choices[0].message.content
chunks.append(corrected_chunk)

In [72]:
# Create the full corrected transcript and add white-lines between the chunks but not for the last chunk

full_corrected_transcript = ''
for i, chunk in enumerate(chunks):
    full_corrected_transcript += chunk
    if i < len(chunks) - 1:
        full_corrected_transcript += '\n\n'
        

In [73]:
# Write the full corrected transcript

full_corrected_transcript_file_name = fileName.split('.')[0] + '_corrected.txt'
with open(full_corrected_transcript_file_name, "w") as f:
    f.write(full_corrected_transcript)

## Create the embeddings

In [74]:
# Create langchain document

from langchain_core.documents.base import Document

document = Document(
  page_content=full_corrected_transcript,
  metadata={
    "title": title,
    "source": url,
    "description": description,
    "thumbnail_url": thumbnail_url
  }
)

documents = [document]  # List of documents to be processed

In [75]:
# Split the document in chunks of maximum 1000 characters with 200 characters overlap using langchain

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
  chunk_size=1000,
  chunk_overlap=200
)
splits = text_splitter.split_documents(documents)



In [76]:
# Define the embeddings model

from langchain_openai import AzureOpenAIEmbeddings

azure_openai_embeddings = AzureOpenAIEmbeddings(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  api_version=os.environ['OPENAI_API_VERSION'],
  azure_deployment=os.environ['OPENAI_AZURE_DEPLOYMENT_EMBEDDINGS']
)

In [77]:
# Create the vector store

import lancedb
from langchain_community.vectorstores import LanceDB

db = lancedb.connect("/tmp/lancedb")

vectorstore = LanceDB.from_documents(
  documents=splits,
  embedding=azure_openai_embeddings
)

retriever = vectorstore.as_retriever()

## Create the langchain chain to do RAG

In [78]:
# Create the prompt for the chain with embeddings and LLM

from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [79]:
# Define the LLM model

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
  azure_endpoint=os.environ['OPENAI_AZURE_ENDPOINT'],
  api_version=os.environ['OPENAI_API_VERSION'],
  azure_deployment=os.environ['OPENAI_AZURE_DEPLOYMENT']
)

In [80]:
# Define the rag chain

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

## Create the outline of the podcast

In [81]:
podcast_outline_response = rag_chain.invoke({"input": "Create an outline for a podcast based on the video " + title + "."})
podcast_outline = podcast_outline_response['answer']
print(podcast_outline)

### Podcast Outline: Azure AI Studio vs Copilot Studio

#### 1. **Introduction**
   - Welcome and brief introduction of hosts
   - Overview of the episode's topic: Azure AI Studio vs Copilot Studio
   - Purpose and what listeners can expect to learn

#### 2. **Understanding Azure AI Studio**
   - Definition and primary purpose
   - Key features and functionalities
   - Use cases and scenarios where Azure AI Studio excels
   - Importance of customization and control
   - Brief demo walkthrough highlights

#### 3. **Exploring Copilot Studio**
   - Definition and primary purpose
   - Key features and functionalities
   - Use cases and scenarios where Copilot Studio excels
   - Difference from Azure AI Studio: SaaS product, ease of use, and out-of-the-box solutions
   - Brief demo walkthrough highlights

#### 4. **Comparative Analysis**
   - Direct comparison of Azure AI Studio and Copilot Studio
   - Discussing the three C's: Cost, Complexity, and Conversational Orchestration
   - Additio

## Create the podcast script

In [82]:
# Create a prompt with the outline to get a full podcast text

podcast_prompt = f"""Create a podcast complete text based on the following outline:

{podcast_outline}

This text will be used to generate the audio of the podcast. There are 2 participants in the podcast: the host and the guest. The host will introduce the podcast and the guest. The guest will explain the outline of the podcast. The host will ask questions to the guest and the guest will answer them. The host will thank the guest and close the podcast.
The name of the host is Pierre and his role is to be the listener's podcast assistant. The name of the guest is Marie and her role is to be the expert in the podcast topic. The name of the podcast is "Advanced AI Podcast".

When you thanks someone, write "Thank you" and the name of the person without a comma. For example, "Thank you Pierre".

Output as a JSON with the following fields:
- title: Title of the podcast
- text: an array of objects with the speaker, the intonation and the text to be spoken
Return only the json as plain text.
"""

formatted_podcast_prompt = podcast_prompt.format(podcast_outline)

In [83]:
# Generate the podcast script

podcast_script_response = rag_chain.invoke({"input": formatted_podcast_prompt})
podcast_script_text = podcast_script_response['answer']

In [84]:
# Save the podcast script

podcast_script_file_name = fileName.split('.')[0] + '_podcast_script.json'

with open(podcast_script_file_name, "w") as f:
    f.write(podcast_script_text)

## Generate the podcast audio

In [85]:
import azure.cognitiveservices.speech as speechsdk
import json

# Creates an instance of a speech config with specified subscription key and service region.
speech_key = os.environ['AZURE_SPEECH_KEY']
service_region = os.environ['AZURE_SPEECH_REGION']

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# This is an example of SSML (Speech Synthesis Markup Language) format.
# <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
#   <voice name="en-US-AvaMultilingualNeural">
#     When you're on the freeway, it's a good idea to use a GPS.
#   </voice>
# </speak>
# Parse the JSON response and create a SSML with en-US-GuyNeural for Pierre Voice
# and en-US-JennyNeural for Marie Voice
podcast_script_json = json.loads(str(podcast_script_text))
ssml_text = "<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>"
for line in podcast_script_json['text']:
    speaker = line['speaker']
    text = line['text']
    if speaker == 'Pierre':
        ssml_text += f"<voice name='en-US-GuyNeural'>{text}</voice>"
    elif speaker == 'Marie':
        ssml_text += f"<voice name='en-US-JennyNeural'>{text}</voice>"
ssml_text += "</speak>"

# use the default speaker as audio output.
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

result = speech_synthesizer.speak_ssml_async(ssml_text).get()
stream = speechsdk.AudioDataStream(result)
podcast_filename = fileName.split('.')[0] + '_podcast.wav'
stream.save_to_wav_file(podcast_filename)



ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
