# Presentation Generator

Use LLMs to produce presentation decks is a great time saving exercise. I have been playing with tools like [tome](https://tome.app/) and [gamma](https://gamma.app/) and was well impressed by it. So I thought why don't I build my own.

Instead of asking LLMs to do everything from ideation to generating content, I take the approach of extracting content-rich text from a YouTube video, then feeding the text to LLM as context in order to generate presentation slides.

![presentation_gen_flow](presentation_gen_flow.png)

The example chosen is a Google Cloud training video that summarises key modules covered throughout the [Google Cloud Core Infrastructure](https://www.youtube.com/watch?v=dAsKylaxuSo) course.

## Donwload Audio File

First we need to extract the audio file.

In [2]:
import pytube

video = "https://www.youtube.com/watch?v=dAsKylaxuSo"
data = pytube.YouTube(video)
audio = data.streams.get_audio_only()
audio.download(filename = "326697.mp4")

'/Users/meng.lin/workspace/GenAI/Playground/326697.mp4'

> NOTE: You may come across a known issue with `pytube`, `pytube.exceptions.RegexMatchError`. You can refer to this [solution](https://github.com/pytube/pytube/issues/1678#issuecomment-1603948730) to patch `pytube` before the issue is properly fixed and merged.

## Transcribe

Then use an OpenAI model called [Whisper](https://openai.com/research/whisper) to transform speech to text. For more detailed walk through of Whisper, refer to this [tutorial](../OpenAI/102_audio.ipynb).

In [3]:
import whisper
model = whisper.load_model("tiny")
transcribe = model.transcribe("326697.mp4", fp16 = False)
with open('326697.txt', 'w') as f:
    f.write(transcribe['text'])

## Generate Presentation

Now we load up the OpenAI API key.

In [4]:
from dotenv import load_dotenv
import os

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

Since the content is likely to be larger than the current 4K context window with `gpt-3.5-turbo`. I am listing all available models, and see if the newly announced [16K context window model](https://openai.com/blog/function-calling-and-other-api-updates) is available.

In [7]:
import openai

openai.api_key = openai_api_key
models = openai.Model.list()

for model in models["data"]:
    print(model["id"])

whisper-1
babbage
davinci
text-davinci-edit-001
babbage-code-search-code
text-similarity-babbage-001
code-davinci-edit-001
text-davinci-001
ada
babbage-code-search-text
babbage-similarity
code-search-babbage-text-001
text-curie-001
code-search-babbage-code-001
gpt-3.5-turbo-0613
text-ada-001
text-similarity-ada-001
curie-instruct-beta
gpt-3.5-turbo-0301
gpt-3.5-turbo
ada-code-search-code
ada-similarity
code-search-ada-text-001
text-search-ada-query-001
davinci-search-document
ada-code-search-text
text-search-ada-doc-001
davinci-instruct-beta
text-similarity-curie-001
code-search-ada-code-001
ada-search-query
text-search-davinci-query-001
curie-search-query
davinci-search-query
babbage-search-document
ada-search-document
text-search-curie-query-001
text-search-babbage-doc-001
curie-search-document
text-search-curie-doc-001
babbage-search-query
text-babbage-001
text-search-davinci-doc-001
text-search-babbage-query-001
curie-similarity
gpt-3.5-turbo-16k-0613
curie
text-embedding-ada-002
g

The answer is yes, the name of the model is `gpt-3.5-turbo-16k`.

In [54]:
completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k", 
    messages=[{"role": "user", "content": "Hello world"}]
)

completion.choices[0].message.content

'Hello! How can I assist you today?'

Here is the prompt used to carry out the task. The prompt contains the following instructions:
- The task
- The format requirements
- The context

In [93]:
filename = "326697.txt"
with open(filename, "r") as file:
    text = file.read()

prompt = f"""
Break down the context module by module. List the keynotes of each module using bullet points. 
Output the result in MARKDOWN format.

% FORMAT %

# Title
- bullet 1
    - sub-bullet 1
    - sub-bullet 2
    - sub-bullet 3
- bullet 2
- bullet 3

# Title
- bullet 1
    - sub-bullet 1
    - sub-bullet 2
    - ...
- bullet 2
- ...

% CONTEXT %
{text}
"""

Invoke OpenAI model to generate the presentation in `Markdown` format.

In [94]:
completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-16k", 
    messages=[{"role": "user", "content": prompt}]
)

completion.choices[0].message.content

"# Module 1\n- Introduction to Google Cloud and Cloud Computing\n- Managed infrastructure and managed services \n    - Infrastructure as a Service (IaaS)\n    - Platform as a Service (PaaS)\n- Google Cloud Network\n- Google Cloud's focus on security \n- Google publishes key elements of technology using open source licenses\n- Google Cloud's pricing structure and billing tools\n\n# Module 2\n- Google Cloud Resource Hierarchy \n    - Resources\n    - Projects\n    - Folders\n    - Organization Load\n- Defining policies and downward inheritance\n- Cloud Identity and Access Management (Cloud IAM)\n- Ways to access and interact with Google Cloud\n    - Cloud Console\n    - Cloud SDK and Cloud Shell\n    - APIs\n    - Cloud Console mobile app\n\n# Module 3\n- How Compute Engine works\n- Virtual machines and virtual networking\n- Virtual Private Cloud (VPC)\n- Compute Engine's auto-scaling feature\n- Google Virtual Private Cloud compatibility features\n    - Routing tables\n    - Firewalls\n 

The result looks pretty decent, however, there's no better way to test the result, apart from putting it to work. In order to do that, we need to transform the markdown into presentation using a library `python-pptx`. You need to `pip install` if you don't have it.

In [107]:
import json
from pptx import Presentation

presentation = Presentation()

title_slide_layout = presentation.slide_layouts[0]
slide = presentation.slides.add_slide(title_slide_layout)
slide.shapes.title.text = "Course Summary"
slide.placeholders[1].text = "Generated from YouTube video"

pages = completion.choices[0].message.content.split("\n\n")

for i, page in enumerate(pages):
    lines = page.split("\n")
    bullet_slide_layout = presentation.slide_layouts[1]
    slide = presentation.slides.add_slide(bullet_slide_layout)
    shapes = slide.shapes
    
    title_shape = shapes.title
    body_shape = shapes.placeholders[1]
    tf = body_shape.text_frame
    
    for line in lines:
        if line.startswith("#"):
            title_shape.text = line[1:].strip()
        if line.startswith("-"):
            p = tf.add_paragraph()
            p.text = line[1:].strip()
            p.level = 1

output_filename = "326697.pptx"
presentation.save(output_filename)

Now the result is here. In comparison, it's actually good. 

It's worth pointing out:
- The presentation has a minimalist design, which is due to the template used is pretty blunt.
- The content provided to LLMs still dictates the quality of presentation, not the other way round!

## Final Thoughts

- The output format can also be in `JSON`. By using JSON format, it can actually help reduce the amount of manual data manipulation, which really defeats the point of using markdown. And the whole post-processing can be a lot easier and less fiddly.
- I used `python-pptx` to create presentation, is this the best option out there? And what would be the best template to use? Can the library handle more sophisticated design layout?
- The biggest question is and still is: what does a good engaging presentation look like? I think having good templates and design would be a good start, however, the content and a personal style are still the key.