# Generate Podcast Synopsis for Various Genres

In [1]:
%load_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Set Up Azure OpenAI

In [3]:
import os
import openai
from dotenv import load_dotenv

# Set up Azure OpenAI
load_dotenv()
openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = "2022-12-01"
openai.api_key = os.getenv("OPENAI_API_KEY")

True

## Deploy a Model

In [4]:
# id of desired_model
desired_model = 'text-davinci-003' # suitable for text generation
desired_capability = 'completion'

# list models deployed with
deployment_id = None
result = openai.Deployment.list()

for deployment in result.data:
    if deployment["status"] != "succeeded":
        continue
    
    model = openai.Model.retrieve(deployment["model"])

    # check if desired_model is deployed, and if it has 'completion' capability
    if model["id"] == desired_model and model['capabilities'][desired_capability]:
        deployment_id = deployment["id"]
        
# if no model deployed, deploy one
if not deployment_id:
    print('No deployment with status: succeeded found.')

    # Deploy the model
    print(f'Creating a new deployment with model: {desired_model}')
    result = openai.Deployment.create(model=desired_model, scale_settings={"scale_type":"standard"})
    deployment_id = result["id"]
    print(f'Successfully created {desired_model} that supports text {desired_capability} with id: {deployment_id}.')
else:
    print(f'Found a succeeded deployment of "{desired_model}" that supports text {desired_capability} with id: {deployment_id}.')

Found a succeeded deployment of "text-davinci-003" that supports text completion with id: text-davinci-003.


## Text chunks generator

In [5]:
# A generator that split a text into smaller chunks of size n, preferably ending at the end of a sentence
def chunk_generator(text, n, tokenizer):
    tokens = tokenizer.encode(text)
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j


## Request API

In [22]:
def request_api(document, prompt_postfix, max_tokens):
    prompt = prompt_postfix.replace('<document>',document)
    #print(f'>>> prompt : {prompt}')

    response = openai.Completion.create(  
    deployment_id=deployment_id, 
    prompt=prompt,
    temperature=0.5,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=1,
    presence_penalty=1,
    stop='###')

    return response['choices'][0]['text']

## Generate Synopsis

In [16]:
def get_synopsis(content, prompt_postfix):
    import tiktoken

    synopsis_chunck = []
    n = 2000 # max tokens for chuncking
    max_tokens = 1000 # max tokens for response

    tokenizer = tiktoken.get_encoding('p50k_base')

    # Generate chunkcs    
    chunks = chunk_generator(content, n, tokenizer)

    # Decode chunk of text
    text_chunks = [tokenizer.decode(chunk) for chunk in chunks]

    # Request api
    for chunk in text_chunks:
        synopsis_chunck.append(request_api(chunk, prompt_postfix, max_tokens))
        #print(chunk)
        #print('>>> synopsis: \n' + synopsis_chunck[-1])

    # Synopsis
    synopsis = ' '.join(synopsis_chunck)

    return synopsis

### Genre : Comedy

With no further information provided in the prompt, the response is rather formal and dry for the genre of comedy.

In [None]:
fname = "../data/comedy-booking-online-transcript.txt"

with open(fname, 'r') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 
#print(content)

In [23]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nSummarise the transcript of a podcast above into a synopsis. 
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Michael McKintyre talks about his experience trying to buy tickets online for a show. He finds the process difficult and frustrating, as he is required to input many details such as his email address and country of residence. He also has to go through various challenges before booking the tickets, including confirming his own email address and choosing from an overwhelming list of countries when entering his address. Michael complains that companies are desperate for people's emails so they can send them updates, sales promotions and other advertisements throughout their lives. He also mentions how it used to be much easier with one password for everything but now passwords have become more complicated with numbers added at the end. Finally, he expresses annoyance over having to pay booking fees after spending hours on purchasing a ticket or product from Argos or iTunes."

Adding more information may help in achieving a more desirable results:
- genre
- desired style

Be creative and explicit about the desired outcome when desiging the prompt. 

In [25]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nCreate a synopsis to capture audience curiosy and heighten anticipation. This is a stand-up comedy. 
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Michael McKintyre takes you on a hilarious journey as he recounts his struggles with booking tickets online, from the literacy test to trying to remember all his passwords. Join him as he shares stories of his travels and experiences in today's world of technology, where even buying a simple toaster can be an ordeal! Get ready for some side-splitting laughter that will have you rolling in your seat!"

### Genre : Informational

In [27]:
fname = "../data/ft-interview-transcription.txt"

with open(fname, 'r') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 
#print(content)

In [28]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a synopsis from the transcription of an interview.  
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"In this Behind the Money podcast, Robert Armstrong, FT's US financial commentator, explains why the collapse of Silicon Valley Bank is not a repeat of 2008. He discusses how SVB was affected by rising interest rates and its balance sheet issues. He also explains that due to the two-tier system in American banking, SVB had less stringent regulations than larger banks. Rob advises against panicking and reassures listeners that those with deposits below $250k are covered by the government. Finally, he concludes that although there is cause for concern over bank regulation in light of SVB's failure, it does not appear as though any other major banks will be facing similar trouble. In this interview, Robert Armstrong discusses the two-tier system of banking regulation which has been in effect since 2008 and how it affects banks. He explains that while no regulatory system can prevent all bank failures, it should strive to minimise them by making sure banks are sensibly capitalised and use 

In [31]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a short synopsis from the transcription of an interview, such that it trigger curiosity, include a thought provoking question. Add "Let's find out!" at the end.  
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"The collapse of Silicon Valley Bank has been making headlines and raising concerns about the security of the banking system. FT's US financial commentator Robert Armstrong explains why this isn't a 2008 repeat, what went wrong with SVB, and why there is no need to panic. What does this moment tell us about US banking regulation? Let's find out! In this interview with Robert Armstrong, we explore the implications of a two-tier regulatory system for banks. We discuss how banks can better protect themselves from risk and what entrepreneurs should consider when choosing a bank to invest in. What are the potential consequences of this recent financial crisis? Let's find out!"