## Using Generative AI to automatically create a video lecture from an article

This notebook uses Gemini to extract key points from an article and create a lecture script.

Then, it uses Text to Speech to create the audio track

And the Python library moviepy to create a movie.

For details, see:
https://medium.com/@lakshmanok/using-generative-ai-to-automatically-create-a-video-lecture-from-an-article-6381c44c5fe0

In [1]:
#%pip install --quiet --upgrade -r requirements.txt

In [1]:
import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv("../genai_agents/keys.env")
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

In [2]:
## params
params1 = {
    "article_url": "https://lakshmanok.medium.com/what-goes-into-bronze-silver-and-gold-layers-of-a-medallion-data-architecture-4b6fdfb405fc",
    "num_slides": 10,
    
}
params2 = {
    "article_url": "https://lakshmanok.medium.com/6381c44c5fe0",
    "num_slides": 15,    
}

## Get text of article

In [3]:
params = params2
print(params)

{'article_url': 'https://lakshmanok.medium.com/6381c44c5fe0', 'num_slides': 15}


In [7]:
import pdfkit
pdfkit.from_url(params['article_url'], "article.pdf")

True

In [4]:
pdf_file = genai.upload_file("article.pdf")

## Convert to lecture notes (JSON)

In [9]:
lecture_prompt = f"""
You are a university professor who needs to create a lecture to
a class of undergraduate students.

* Create a {params['num_slides']}-slide lecture based on the following article.
* Each slide should contain the following information:
  - title: a single sentence that summarizes the main point
  - key_points: a list of between 2 and 5 bullet points. Use phrases or code snippets, not full sentences.
  - lecture_notes: 3-10 sentences explaining the key points in easy-to-understand language. Expand on the points using other information from the article. If the bullet point is code, explain what the code does.
* Also, create a title for the lecture and attribute the original article's author.
"""

In [10]:
from pydantic import BaseModel
from typing import List
class Slide(BaseModel):
    title: str
    key_points: List[str]
    lecture_notes: str

class Lecture(BaseModel):
    slides: List[Slide]
    lecture_title: str
    based_on_article_by: str

In [21]:
model = genai.GenerativeModel(
    "gemini-1.5-flash-001",
    system_instruction=[lecture_prompt]
)
generation_config={
    "temperature": 0.7,
    "max_output_tokens": params['num_slides']*10000,
    "response_mime_type": "application/json",
    "response_schema": Lecture
}
responses = model.generate_content(
    [pdf_file],
    generation_config=generation_config,
    stream=False
)

In [23]:
print(responses.text)

{"based_on_article_by": "Lak Lakshmanan", "lecture_title": "Automating Video Lectures with Generative AI: A Guide for Professors", "slides": [{"key_points": ["Use Google Gemini Flash LLM", "Supports multimodal input (text and images)", "Offers controlled generation for desired output structure"], "lecture_notes": "We'll use Google Gemini Flash for this project. It's a powerful language model with the advantage of being relatively inexpensive. Additionally, Gemini Flash can process both text and images, making it ideal for our needs. It also supports controlled generation, allowing us to ensure the output matches a specific format.", "title": "Choosing the Right Language Model"}, {"key_points": ["Download article as PDF", "Upload to temporary storage", "Gemini reads PDF for analysis"], "lecture_notes": "First, we need to get the article in a format that Gemini can understand. We'll use Python to download the article as a PDF and upload it to a temporary storage location. This will make 

In [24]:
print(len(responses.text))

7181


In [25]:
print(responses.text[-100:]) # is it complete?

ies for creating personalized and interactive experiences.", "title": "End-User Customizability"}]} 


In [26]:
import json
lecture = json.loads(responses.text)

In [27]:
len(lecture['slides'])

16

In [28]:
lecture['lecture_title']

'Automating Video Lectures with Generative AI: A Guide for Professors'

In [29]:
lecture['slides'][2]

{'key_points': ['Define data structure for lecture notes',
  'Use Python classes for organization',
  'Structure includes title, key points, and lecture notes'],
 'lecture_notes': "Let's define the structure for our lecture notes. We'll use Python classes to organize the information. Each slide will have a title, a list of key points, and detailed lecture notes.",
 'title': 'Defining the Data Structure'}

In [30]:
## Write this out
with open("lecture.json", "w") as ofp:
    json.dump(lecture, ofp)

## Convert lecture.json to PowerPoint

In [31]:
import json
with open("lecture.json", "r") as ifp:
    lecture = json.load(ifp)

In [32]:
from pptx import Presentation
presentation = Presentation()

In [33]:
# Title slide for presentation
# see https://python-pptx.readthedocs.io/en/latest/user/quickstart.html
slide = presentation.slides.add_slide(presentation.slide_layouts[0])
title = slide.shapes.title
title.text = lecture['lecture_title']
subtitle = slide.placeholders[1] # subtitle
subtitle.text = f"Based on article by {lecture['based_on_article_by']}"

In [34]:
# each slide
for slidejson in lecture['slides']:
    slide = presentation.slides.add_slide(presentation.slide_layouts[1])
    title = slide.shapes.title
    title.text = slidejson['title']
    # bullets
    textframe = slide.placeholders[1].text_frame
    for key_point in slidejson['key_points']:
        p = textframe.add_paragraph()
        p.text = key_point
        p.level = 1
    # notes
    notes_frame = slide.notes_slide.notes_text_frame
    notes_frame.text = slidejson['lecture_notes']

In [35]:
presentation.save('lecture.pptx')

<img src="powerpoint_screenshot.jpg"/>

## Have AI read the notes aloud, and save the audio

See: https://cloud.google.com/text-to-speech/docs/samples/tts-synthesize-text?hl=en

In [36]:
from google.cloud import texttospeech

def convert_text_audio(text, audio_mp3file):
    """Synthesizes speech from the input string of text."""
    tts_client = texttospeech.TextToSpeechClient()    
    input_text = texttospeech.SynthesisInput(text=text)
    
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Standard-C",
        ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
    )
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3
    )

    response = tts_client.synthesize_speech(
        request={"input": input_text, "voice": voice, "audio_config": audio_config}
    )

    # The response's audio_content is binary.
    with open(audio_mp3file, "wb") as out:
        out.write(response.audio_content)
        print(f"{audio_mp3file} written.")

In [None]:
%%bash
rm -rf article_audio
mkdir article_audio

In [37]:
import json
import os

with open("lecture.json", "r") as ifp:
    lecture = json.load(ifp)

def create_audio_files(lecture, outdir):
    # create output directory
    os.makedirs(outdir, exist_ok=True)
    filenames = []
    
    # title slide
    filename = os.path.join(outdir, "audio_00.mp3")
    text = f"Today, we are going to talk about {lecture['lecture_title']}.\n"
    text += f"This lecture is based on an article by {lecture['based_on_article_by']}. I'm going to assign that article as supplemental reading.\n"
    convert_text_audio(text, filename)
    filenames.append(filename)
    
    for slideno, slide in enumerate(lecture['slides']):
        text = f"On to {slide['title']} \n"
        text += slide['lecture_notes'] + "\n\n"
        filename = os.path.join(outdir, f"audio_{slideno+1:02}.mp3")
        convert_text_audio(text, filename)
        filenames.append(filename)
        
    return filenames

audio_files = create_audio_files(lecture, "article_audio")

article_audio/audio_00.mp3 written.
article_audio/audio_01.mp3 written.
article_audio/audio_02.mp3 written.
article_audio/audio_03.mp3 written.
article_audio/audio_04.mp3 written.
article_audio/audio_05.mp3 written.
article_audio/audio_06.mp3 written.
article_audio/audio_07.mp3 written.
article_audio/audio_08.mp3 written.
article_audio/audio_09.mp3 written.
article_audio/audio_10.mp3 written.
article_audio/audio_11.mp3 written.
article_audio/audio_12.mp3 written.
article_audio/audio_13.mp3 written.
article_audio/audio_14.mp3 written.
article_audio/audio_15.mp3 written.
article_audio/audio_16.mp3 written.


In [40]:
import pydub

In [41]:
combined = pydub.AudioSegment.empty()
for audio_file in audio_files:
    audio = pydub.AudioSegment.from_file(audio_file)
    combined += audio
    # pause for 4 seconds
    silence = pydub.AudioSegment.silent(duration=4000)
    combined += silence
combined.export("lecture.wav", format="wav")

<_io.BufferedRandom name='lecture.wav'>

## Create a movie of the slides + audio clips

Unfortunately pptx doesn't have a way to export slides as JPG. Let's create the images from scratch.

In [43]:
%%bash
rm -rf article_slides
mkdir article_slides

In [6]:
import json
with open("lecture.json", "r") as ifp:
    lecture = json.load(ifp)

Can we use Image Gen to generate images based on the content?

In [19]:
from vertexai.vision_models import ImageGenerationModel
image_creation_prompt = f"""
You are an illustrator who needs to create illustrations for a technical article.
Generate a visually captivating image that represents the following idea. 

Idea:
{lecture['slides'][2]['lecture_notes']}
"""

model = ImageGenerationModel.from_pretrained("imagegeneration@005")
images = model.generate_images(image_creation_prompt)
images[0].save(location="img0.jpg")

In [20]:
from IPython.display import Image
Image("./img0.jpg")

<IPython.core.display.Image object>

Guess not ... let's drop back to just displaying text ...

In [48]:
# draw_multiline_text_2.py

from PIL import Image, ImageDraw, ImageFont

def wrap(text, width):
    import textwrap
    return '\n'.join(textwrap.wrap(text, width=width))

def text_to_image(output_path, title, keypoints):
    image = Image.new("RGB", (1000, 750), "black")
    draw = ImageDraw.Draw(image)
    title_font = ImageFont.truetype("Coval-Black.ttf", size=42)
    draw.multiline_text((10, 25), wrap(title, 50), font=title_font)
    text_font = ImageFont.truetype("Coval-Light.ttf", size=36)
    for ptno, keypoint in enumerate(keypoints):
        draw.multiline_text((10, (ptno+2)*100), wrap(keypoint, 60), font=text_font) 
    image.save(output_path)

text_to_image("article_slides/slide_00.jpg", 
              lecture['lecture_title'], 
              [f"Based on article by {lecture['based_on_article_by']}"]
             )
# each slide
for slideno, slidejson in enumerate(lecture['slides']):
    text_to_image(f"article_slides/slide_{slideno+1:02}.jpg",
                  slidejson['title'],
                  slidejson['key_points']
                 )
    print(f"article_slides/slide_{slideno+1:02}.jpg")

article_slides/slide_01.jpg
article_slides/slide_02.jpg
article_slides/slide_03.jpg
article_slides/slide_04.jpg
article_slides/slide_05.jpg
article_slides/slide_06.jpg
article_slides/slide_07.jpg
article_slides/slide_08.jpg
article_slides/slide_09.jpg
article_slides/slide_10.jpg
article_slides/slide_11.jpg
article_slides/slide_12.jpg
article_slides/slide_13.jpg
article_slides/slide_14.jpg
article_slides/slide_15.jpg
article_slides/slide_16.jpg


In [49]:
from IPython.display import Image
Image(filename='article_slides/slide_03.jpg')

<IPython.core.display.Image object>

In [50]:
from moviepy.editor import ImageClip, AudioFileClip, concatenate_videoclips

In [51]:
import os
audio_files = sorted(os.listdir("article_audio"))
audio_files

['audio_00.mp3',
 'audio_01.mp3',
 'audio_02.mp3',
 'audio_03.mp3',
 'audio_04.mp3',
 'audio_05.mp3',
 'audio_06.mp3',
 'audio_07.mp3',
 'audio_08.mp3',
 'audio_09.mp3',
 'audio_10.mp3',
 'audio_11.mp3',
 'audio_12.mp3',
 'audio_13.mp3',
 'audio_14.mp3',
 'audio_15.mp3',
 'audio_16.mp3']

In [56]:
slide_files = sorted(os.listdir("article_slides"))
slide_files = [file for file in slide_files if file.endswith(".jpg")]
slide_files

['slide_00.jpg',
 'slide_01.jpg',
 'slide_02.jpg',
 'slide_03.jpg',
 'slide_04.jpg',
 'slide_05.jpg',
 'slide_06.jpg',
 'slide_07.jpg',
 'slide_08.jpg',
 'slide_09.jpg',
 'slide_10.jpg',
 'slide_11.jpg',
 'slide_12.jpg',
 'slide_13.jpg',
 'slide_14.jpg',
 'slide_15.jpg',
 'slide_16.jpg']

In [57]:
clips = []
for slide, audio in zip(slide_files, audio_files):
    audio_clip = AudioFileClip(f"article_audio/{audio}")
    slide_clip = ImageClip(f"article_slides/{slide}").set_duration(audio_clip.duration)
    slide_clip = slide_clip.set_audio(audio_clip)
    clips.append(slide_clip)
full_video = concatenate_videoclips(clips)

In [58]:
full_video.duration

299.4700000000001

In [59]:
full_video.write_videofile("lecture.mp4", fps=24, codec="mpeg4", 
                           temp_audiofile='temp-audio.mp4', remove_temp=True)

Moviepy - Building video lecture.mp4.
MoviePy - Writing audio in temp-audio.mp4


                                                                      

MoviePy - Done.
Moviepy - Writing video lecture.mp4



                                                                 

Moviepy - Done !
Moviepy - video ready lecture.mp4


In [61]:
!ls -lh lecture.*

-rw-r--r-- 1 jupyter jupyter 7.1K Sep 22 19:15 lecture.json
-rw-r--r-- 1 jupyter jupyter  18M Sep 22 19:22 lecture.mp4
-rw-r--r-- 1 jupyter jupyter  64K Sep 22 19:15 lecture.pptx
-rw-r--r-- 1 jupyter jupyter  17M Sep 22 19:16 lecture.wav
