# Generate Podcast Synopsis for Various Genres

In [11]:
%load_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set Up Azure OpenAI

In [12]:
import os
import openai
from dotenv import load_dotenv

# Set up Azure OpenAI
load_dotenv()
openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = "2022-12-01"
openai.api_key = os.getenv("OPENAI_API_KEY")

False

## Deploy a Model

In [13]:
# id of desired_model
desired_model = 'text-davinci-003' # suitable for text generation
desired_capability = 'completion'

# list models deployed with
deployment_id = None
result = openai.Deployment.list()

for deployment in result.data:
    if deployment["status"] != "succeeded":
        continue
    
    model = openai.Model.retrieve(deployment["model"])

    # check if desired_model is deployed, and if it has 'completion' capability
    if model["id"] == desired_model and model['capabilities'][desired_capability]:
        deployment_id = deployment["id"]
        
# if no model deployed, deploy one
if not deployment_id:
    print('No deployment with status: succeeded found.')

    # Deploy the model
    print(f'Creating a new deployment with model: {desired_model}')
    result = openai.Deployment.create(model=desired_model, scale_settings={"scale_type":"standard"})
    deployment_id = result["id"]
    print(f'Successfully created {desired_model} that supports text {desired_capability} with id: {deployment_id}.')
else:
    print(f'Found a succeeded deployment of "{desired_model}" that supports text {desired_capability} with id: {deployment_id}.')

Found a succeeded deployment of "text-davinci-003" that supports text completion with id: text-davinci-003.


## Text chunks generator

In [14]:
# A generator that split a text into smaller chunks of size n, preferably ending at the end of a sentence
def chunk_generator(text, n, tokenizer):
    tokens = tokenizer.encode(text)
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j


## Request API

In [15]:
def request_api(document, prompt_postfix, max_tokens):
    prompt = prompt_postfix.replace('<document>',document)
    #print(f'>>> prompt : {prompt}')

    response = openai.Completion.create(  
    deployment_id=deployment_id, 
    prompt=prompt,
    temperature=0.5,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=1,
    presence_penalty=1,
    stop='###')

    return response['choices'][0]['text']

## Generate Synopsis

In [16]:
def get_synopsis(content, prompt_postfix):
    import tiktoken

    synopsis_chunck = []
    n = 2000 # max tokens for chuncking
    max_tokens = 1000 # max tokens for response

    tokenizer = tiktoken.get_encoding('p50k_base')

    # Generate chunkcs    
    chunks = chunk_generator(content, n, tokenizer)

    # Decode chunk of text
    text_chunks = [tokenizer.decode(chunk) for chunk in chunks]

    # Request api
    for chunk in text_chunks:
        synopsis_chunck.append(request_api(chunk, prompt_postfix, max_tokens))
        #print(chunk)
        #print('>>> synopsis: \n' + synopsis_chunck[-1])

    # Synopsis
    synopsis = ' '.join(synopsis_chunck)

    return synopsis

### Genre : Comedy

With no further information provided in the prompt, the response is rather formal and dry for the genre of comedy.

In [17]:
fname = "../data/comedy-booking-online-transcript.txt"

with open(fname, 'r', encoding='utf-8') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 
#print(content)

In [18]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nSummarise the transcript of a podcast above into a synopsis. 
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Michael McKintyre talks about the difficulties of buying tickets online and how it can take an hour or more to get through all the steps. He also complains about having to enter his e-mail address, confirm it, and agree to various conditions. He reminisces about when passwords only had one part and jokes that he doesn't remember his own security questions. Finally, he expresses confusion over why people would pay for a restricted view ticket as it is cheaper than getting a full view seat."

Adding more information may help in achieving a more desirable results:
- genre
- desired style

Be creative and explicit about the desired outcome when desiging the prompt. 

In [19]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nCreate a synopsis to capture audience curiosy and heighten anticipation. This is a stand-up comedy. 
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Michael McKintyre is back with a new stand-up comedy show, and he's sure to have you in stitches! Join him as he hilariously recounts his struggles with booking tickets online, trying to remember passwords, security questions and more. From the hilarious absurdity of having Angola on the list of countries when buying tickets for a local cinema, to being asked for your email address just to buy a toaster from Argos - Michael will make you laugh at all the ridiculousness that comes with modern life."

### Genre : Informational

In [20]:
fname = "../data/ft-interview-transcription.txt"

with open(fname, 'r', encoding='utf-8') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 
#print(content)

In [21]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a synopsis from the transcription of an interview.  
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Robert Armstrong, the FT’s US financial commentator, discusses the collapse of Silicon Valley Bank and why it is not a repeat of 2008. He explains that two factors led to SVB's downfall: bad decisions at the bank and a rapid increase in interest rates. Rob also talks about how banks operate and how SVB was dealing with an influx of deposits while their costs were rising due to increased interest rates. He then goes on to explain the government's role in protecting depositors' money as well as why there is a two-tier regulatory system for banking in the US. Finally, he emphasizes that panic should be avoided and that most individual banks are in good shape. \nIn this interview, Robert Armstrong discusses the two-tier system of regulation for banks in the United States since 2008 and how it affects banking today. He explains that while regulators may have to think more carefully about risk posed by securities on balance sheets, a perfectly safe banking system is impossible. He suggests 

In [22]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a short synopsis from the transcription of an interview, such that it trigger curiosity, include a thought provoking question. Add "Let's find out!" at the end.  
  \nSynopsis : 
"""
#print(prompt_postfix)

get_synopsis(content, prompt_postfix)

"Robert Armstrong, the FT's US financial commentator, has been closely following the collapse of Silicon Valley Bank. He explains why this case is different from 2008 and how two factors - bad decisions at SVB and a rapid increase in interest rates - led to its downfall. With deposits quadrupling in a couple of years, their profits started to disappear as their cost of funding rose. Will other banks be affected? Let's find out! In this Behind the Money episode, Robert Armstrong shares his insights on the lessons learned from the 2008 financial crisis and its implications for banking regulation. He talks about why there is still a risk of bank failures despite stringent regulations, and suggests that entrepreneurs should be asking their banks key questions to assess their capital structure. What does all this mean for the future of banking? Let's find out!"