# Leveraging Speech to text capabilties of OpenAI

This projects makes use of OpenAI's Whisper model to transcribe audio files in order to extract valuable information from them. This is achieved by using the transcription as content in a subsequent call to the GPT 3.5 turbo model

## Libraries are installed

In [None]:
! pip install langchain
! pip install openai

Collecting langchain
  Downloading langchain-0.1.9-py3-none-any.whl (816 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/817.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/817.0 kB[0m [31m5.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.21 (from langchain)
  Downloading langchain_community-0.0.24-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m46.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.26 (from langchain)
  Download

## Google drive is mounted as the MP3 files to be processed are stored there

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## OpenAI key is stored as environment variable

In [None]:
import os
os.environ['OPENAI_API_KEY'] = 'TU KEY'

## OpenAI client for Whisper is instantiated

In [None]:
#/content/gdrive/MyDrive/GenAIEne2024/Mock Call 21 Technical Support Sample Call (1).mp3
from openai import OpenAI
client = OpenAI()

## Audio is loaded

In [None]:
from IPython.display import Audio
audio = Audio(data="/content/gdrive/MyDrive/GenAIEne2024/Mock Call 21 Technical Support Sample Call (1).mp3", autoplay=False)
display(audio)

## Audio is sent to Whisper For Transcription

In [None]:
audio_file = open("/content/gdrive/MyDrive/GenAIEne2024/Mock Call 21 Technical Support Sample Call (1).mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file = audio_file
)
transcrito = transcript.text
transcrito

"Thank you for calling Internet Services. This is Chalene. How may I help you today? This is Linda. Our internet stopped since 9 and I have an online class at 2. So I need this internet fixed right now. Well thank you for letting us know about this immediately Linda so I can help you get back online ASAP. May I have the phone number associated with your internet service? The phone number is 855-3232. Thank you. So that's 855-3232. Correct. And can you confirm the name of the account please? It's Linda Bloom. That's me. Thanks so much. You're welcome. Linda, are you near to your modem right now? Yes, I'm sitting next to it. What do you want me to do? Okay, great. First, we'll go ahead and check the lights on your modem. Can you please tell me which lights are on right now? The lights look normal as if the service is working. Thank you. That sounds interesting. Let me run a quick line check to confirm the modem's connectivity, okay? Okay. Linda, any chance that you guys experience any po

## Transcript is printed

In [None]:
print(transcrito)

Thank you for calling Internet Services. This is Chalene. How may I help you today? This is Linda. Our internet stopped since 9 and I have an online class at 2. So I need this internet fixed right now. Well thank you for letting us know about this immediately Linda so I can help you get back online ASAP. May I have the phone number associated with your internet service? The phone number is 855-3232. Thank you. So that's 855-3232. Correct. And can you confirm the name of the account please? It's Linda Bloom. That's me. Thanks so much. You're welcome. Linda, are you near to your modem right now? Yes, I'm sitting next to it. What do you want me to do? Okay, great. First, we'll go ahead and check the lights on your modem. Can you please tell me which lights are on right now? The lights look normal as if the service is working. Thank you. That sounds interesting. Let me run a quick line check to confirm the modem's connectivity, okay? Okay. Linda, any chance that you guys experience any pow

## Create GPT 3.5 turbo client. The prompt asks the model to get names and phone numbers mentioned in the conversation that was transcribed

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(model = 'gpt-3.5-turbo')
prompt = """
You are a useful AI assistant that does named-entity-recognition tasks
You are given a conversation transcript, and extract the names of the people involved as
well as their phone-numbers (if any)
"""
result = chat(
    [
        SystemMessage(content=prompt),
        HumanMessage(content=transcrito)
    ]
)
print(result.content)

People involved:
1. Chalene
2. Linda Bloom

Phone number:
- Linda Bloom's phone number: 855-3232


## The same, but this time around information is presented in a semi-structured format

In [None]:
prompt = """
You are a useful AI assistant that does named-entity-recognition tasks
You are given a conversation transcript. You must extract the names of the people involved as
well as their phone-numbers (if any), and the reason why they are calling.
You MUST reply with the following format. Never reply with anything else:
data = {
  "speakers":[
    {
      "speaker name":<The person's name>,
      "speaker phone":<The person's phone number (if any)>
    }
  ],
  "reason":<The reason why they are calling>
}
"""
result = chat(
    [
        SystemMessage(content=prompt),
        HumanMessage(content=transcrito)
    ]
)
print(result.content)

{
  "speakers": [
    {
      "speaker name": "Chalene",
      "speaker phone": null
    },
    {
      "speaker name": "Linda Bloom",
      "speaker phone": "855-3232"
    }
  ],
  "reason": "Linda called Internet Services because her internet stopped working and she needed it fixed immediately for her online class."
}


## Results are transformed to a JSON document in order to be able to extrar certain kinds of info only

In [None]:
import json
test = json.loads(result.content)
test

{'speakers': [{'speaker name': 'Chalene', 'speaker phone': None},
  {'speaker name': 'Linda Bloom', 'speaker phone': '855-3232'}],
 'reason': 'Linda called Internet Services because her internet stopped working and she needed it fixed immediately for her online class.'}

## Display the resulting keys

In [None]:
test.keys()

dict_keys(['speakers', 'reason'])

## Give the JSON document parsing a try

In [None]:
test['reason']

'Linda called Internet Services because her internet stopped working and she needed it fixed immediately for her online class.'

## New prompt that asks to extract a summary of what a person said during the conversation

In [None]:
prompt = """
You are a useful AI assistant that does named-entity-recognition tasks
You are given a conversation transcript. You must extract the names of the people involved as
well as their phone-numbers (if any), and the reason why they are calling.
You MUST reply with the following format. Never reply with anything else:
{
  "speakers":[
    {
      "speaker name":<The person's name>,
      "speaker phone":<The person's phone number (if any)>,
      "speaker summary":<A bullet point summary of what this person said during the conversation>
    }
  ]
}
"""
result = chat(
    [
        SystemMessage(content=prompt),
        HumanMessage(content=transcrito)
    ]
)
print(result.content)

{
  "speakers":[
    {
      "speaker name": "Chalene",
      "speaker phone": "Not provided",
      "speaker summary": [
        "Introduced herself as Chalene from Internet Services",
        "Offered assistance to Linda with her internet connectivity issue",
        "Guided Linda through troubleshooting steps for modem/router"
      ]
    },
    {
      "speaker name": "Linda Bloom",
      "speaker phone": "855-3232",
      "speaker summary": [
        "Reported internet connectivity issue since 9 AM and the urgency due to an online class at 2 PM",
        "Confirmed phone number associated with the internet service",
        "Confirmed physical connection of devices and rebooted modem/router as guided by Chalene",
        "Expressed gratitude for assistance and learned about the benefits of power cycling modems"
      ]
    }
  ]
}


## Turn the response to a JSON document

In [None]:
test = json.loads(result.content)
test

{'speakers': [{'speaker name': 'Chalene',
   'speaker phone': 'Not provided',
   'speaker summary': ['Introduced herself as Chalene from Internet Services',
    'Offered assistance to Linda with her internet connectivity issue',
    'Guided Linda through troubleshooting steps for modem/router']},
  {'speaker name': 'Linda Bloom',
   'speaker phone': '855-3232',
   'speaker summary': ['Reported internet connectivity issue since 9 AM and the urgency due to an online class at 2 PM',
    'Confirmed phone number associated with the internet service',
    'Confirmed physical connection of devices and rebooted modem/router as guided by Chalene',
    'Expressed gratitude for assistance and learned about the benefits of power cycling modems']}]}

## Testing the JSON document structure

In [None]:
test['speakers']

[{'speaker name': 'Chalene',
  'speaker phone': 'Not provided',
  'speaker summary': ['Introduced herself as Chalene from Internet Services',
   'Offered assistance to Linda with her internet connectivity issue',
   'Guided Linda through troubleshooting steps for modem/router']},
 {'speaker name': 'Linda Bloom',
  'speaker phone': '855-3232',
  'speaker summary': ['Reported internet connectivity issue since 9 AM and the urgency due to an online class at 2 PM',
   'Confirmed phone number associated with the internet service',
   'Confirmed physical connection of devices and rebooted modem/router as guided by Chalene',
   'Expressed gratitude for assistance and learned about the benefits of power cycling modems']}]

## Turn this into a pandas dataframe

In [None]:
import pandas as pd
df = pd.DataFrame(test['speakers'])
df.head()

Unnamed: 0,speaker name,speaker phone,speaker summary
0,Chalene,Not provided,[Introduced herself as Chalene from Internet S...
1,Linda Bloom,855-3232,[Reported internet connectivity issue since 9 ...


## Which is then exported to an Excel sheet

In [None]:
df.to_excel('/content/gdrive/MyDrive/GenAIEne2024/Summary.xlsx', index=False)

## Putting the entire transcription logic into a function to streamline the transcription process

In [None]:
def transcribir(audio_path, debug=False):
  audio_file = open(audio_path, "rb")
  transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file = audio_file
  )
  transcrito = transcript.text
  if debug==True:
    print(transcrito)
  return transcrito

## Testing with new audio file

In [None]:
transcrito = transcribir("/content/gdrive/MyDrive/GenAIEne2024/multi-people-zoom.mp3")
transcrito

"Hey, Paul. Thanks for being here on time. Paul? Hey, Paul, can you hear me? I can't hear you. I can hear you. Can you hear me? Hey, guys. Hey, Tyler. Sorry I'm late. I'm having a hard time connecting. One second. Paul's having a sound issue. I can't hear you. Try adjusting your output settings. Can you hear me? It's the gear icon. Tyler, are you on hotel Wi-Fi? Yeah, I am. Never mind. I got it. I just had to change a few settings. Great. Great. Maybe we can get started then. Oh, great. I think your mic is picking up your speakers. My mic? Do you have headphones? Do you want me to put them on? No, I want you to smell them. No, I want you to put them on. Hey, Beth. Hey, John. Sorry I'm late. I had to download a new version of the platform. You should plan extra time for the updates. There's pretty much one every time. Sounds like someone just joined. Hey, guys, it's John. I had to call in. I'm stuck in traffic. Have I missed anything yet? No. It would have been nice for you to join the 

## Creating a new prompt to extract info from new audio file that has been transcribed

In [None]:
prompt = """
You are a useful AI assistant that does named-entity-recognition tasks
You are given a conversation transcript. You must extract the names of the people involved as
well as their phone-numbers (if any), and the reason why they are calling.
You MUST reply with the following format. Never reply with anything else:
{
  "speakers":[
    {
      "speaker name":<The person's name>,
      "speaker phone":<The person's phone number (if any)>,
      "speaker summary":<A bullet point summary of what this person said during the conversation>
    }
  ]
}
"""
result = chat(
    [
        SystemMessage(content=prompt),
        HumanMessage(content=transcrito)
    ]
)
print(result.content)
test = json.loads(result.content)
df = pd.DataFrame(test['speakers'])
df.head()

{
  "speakers":[
    {
      "speaker name":"Paul",
      "speaker phone":null,
      "speaker summary":[
        "Having sound connection issues",
        "Assisting with adjusting output settings"
      ]
    },
    {
      "speaker name":"Tyler",
      "speaker phone":null,
      "speaker summary":[
        "Connecting via hotel Wi-Fi",
        "Providing financial report updates",
        "Experiencing technical difficulties"
      ]
    },
    {
      "speaker name":"Beth",
      "speaker phone":null,
      "speaker summary":[
        "Downloading a new platform version",
        "Being asked about adjustments"
      ]
    },
    {
      "speaker name":"John",
      "speaker phone":null,
      "speaker summary":[
        "Joining the call while stuck in traffic",
        "Not missing any updates",
        "Noticing technical issues"
      ]
    }
  ]
}


Unnamed: 0,speaker name,speaker phone,speaker summary
0,Paul,,"[Having sound connection issues, Assisting wit..."
1,Tyler,,"[Connecting via hotel Wi-Fi, Providing financi..."
2,Beth,,"[Downloading a new platform version, Being ask..."
3,John,,"[Joining the call while stuck in traffic, Not ..."


#Creating function to play audio file directly on notebook

In [None]:
def play_audio(audio_path):
  audio = Audio(data=audio_path, autoplay=False)
  display(audio)

#Playing audio file

In [None]:
play_audio("/content/gdrive/MyDrive/GenAIEne2024/angry ANGRY BT customer! Very Funny! (1).mp3")

#Test transcription of new audio file

In [None]:
transcrito = transcribir("/content/gdrive/MyDrive/GenAIEne2024/angry ANGRY BT customer! Very Funny! (1).mp3")
transcrito

"What the hell do you want? Mr and Mrs Carter! What the hell do you want? We're calling today to ensure you're getting the best value and service BT have got on offer to you. This is a fucking phone! Call up your pissing ass and get me off this! This is an ex-Directly phone and that includes fucking British Telecom! We pay the fucking bills, now get the fuck off my phone line! Do you understand? I understand sir. Good, well make sure it's fucking written down and don't ring me again! Otherwise I'll come and sling your scrawny fucking neck and arm him physically! Do you comprehend? Yeah, all it takes is a couple of seconds. I've told you to go and fuck off! Do you comprehend? Yeah, no problems, thank you for your time. Go on, don't ever ring again! Right, thank you for your time and thank you for using BT. Got him. Jesus Christ, Diane, could you do me a favour and listen to that last call for me please?"

## Extracting info with new prompt

In [None]:
prompt = """
You are a useful AI assistant that summarizes conversations. These conversations are
normally relateds with tech-support services. In addition to the summary, you provide a score that
evaluates how good or bad the customer experience was. The score can be either good or bad.
You also provide the reason why such score was given.
You also provide recomendations about how the customer experience could be improved for that call.
Lastly, you provide a list of bullet points with the conversation's highlights.
The summary must not be a bullet point list, but a single paragraph instead. Follow this structure:
-----------------------------------------------------------------------------
SUMMARY: <The conversation summary>
SCORE: <The conversation score>
REASON FOR SCORE: <The reason for such score>
RECOMENDATIONS: <Bullet point list for recomendations>
CONVERSATION HIGHLIGHTS: <Bullet point list for conversation highlights>
"""
result = chat([
    SystemMessage(content=prompt),
    HumanMessage(content=transcrito)
])
print(result.content)

SUMMARY: The customer expressed frustration and anger towards a telemarketer from BT, demanding to be taken off their call list and using profanity throughout the conversation. The telemarketer eventually apologized and assured the customer they would not be contacted again. 
SCORE: Bad
REASON FOR SCORE: The customer experience was poor due to the aggressive language and tone used by the customer. 
RECOMENDATIONS: 
- Train representatives on how to handle irate customers professionally and calmly.
- Implement a more efficient system for customers to opt-out of marketing calls to avoid such confrontations in the future.
CONVERSATION HIGHLIGHTS:
- Customer expressing frustration and anger towards the telemarketer.
- Telemarketer apologizing and promising not to contact the customer again.


# Creating new prompt to score a conversation based on performance, as well as extracting highlights and providing recommendations on things that could have gone better

In [None]:
def scorer(transcrito):
  prompt = """
  You are a useful AI assistant that summarizes conversations. These conversations are
  normally relateds with tech-support services. In addition to the summary, you provide a score that
  evaluates how good or bad the customer experience was. The score can be either good or bad.
  You also provide the reason why such score was given.
  You also provide recomendations about how the customer experience could be improved for that call.
  Lastly, you provide a list of bullet points with the conversation's highlights.
  The summary must not be a bullet point list, but a single paragraph instead. Follow this structure:
  -----------------------------------------------------------------------------
  SUMMARY: <The conversation summary>
  SCORE: <The conversation score>
  REASON FOR SCORE: <The reason for such score>
  RECOMENDATIONS: <Bullet point list for recomendations>
  CONVERSATION HIGHLIGHTS: <Bullet point list for conversation highlights>
  """
  result = chat([
      SystemMessage(content=prompt),
      HumanMessage(content=transcrito)
  ])
  return result.content

## Testing with technical suport call #1

In [None]:
transcrito = transcribir("/content/gdrive/MyDrive/GenAIEne2024/Mock Call 21 Technical Support Sample Call (1).mp3")
print(scorer(transcrito))

SUMMARY: Linda called Internet Services because her internet was not working, and she had an online class to attend. The customer service representative, Chalene, guided Linda through checking the modem lights, confirming connections, and eventually rebooting the modem, which resolved the issue. Chalene explained the importance of power cycling the modem periodically to maintain optimal performance. Linda's internet was restored, and she was satisfied with the assistance provided.

SCORE: Good
REASON FOR SCORE: The customer service representative was attentive, guided Linda effectively through troubleshooting steps, and provided helpful information on maintaining internet performance.
RECOMENDATIONS:
- Continue providing clear and detailed instructions to customers during troubleshooting steps.
- Offer proactive tips for maintaining internet equipment to prevent future issues.
- Ensure customers feel supported and valued throughout the interaction.

CONVERSATION HIGHLIGHTS:
- Linda rep

## Testing with technical suport call #2

In [None]:
transcrito = transcribir("/content/gdrive/MyDrive/GenAIEne2024/esp-techsuport.mp3", debug=True)
print(scorer(transcrito))

Bienvenido a Atención a clientes HP. Si desea conocer nuestra política de privacidad, marque cero o visite la página www.hp.com.mx. Su llamada es muy importante para nosotros. Por favor, espere en la línea. Para soporte técnico, marque uno. Para saber acerca de su visita técnica, marque dos. Para una atención más personalizada, marque tres o espere en la línea. Para repetir el menú, marque gato. En breves momentos, un ejecutivo la atenderá con gusto. Hola, buenas tardes. Bienvenido a Soporte HP. Mi nombre es David. ¿En qué puedo ayudar? Hola, buenas tardes. Hablo para solucionar un problema con mi laptop. Lo que pasa es que no se conecta a ninguna red y no tiene audio. ¿Anteriormente tenías este problema? No, apenas lo presento. ¿Nota algo diferente en su computadora? Sí, noto que la pantalla de inicio cambió un poco. Sí, me parece que usé dentro en modo seguro, sin darse cuenta. Lo transferiré en un momento con un técnico. ¿Puede esperar en la línea? Ok, muchas gracias. Bienvenido al 

## New prompt to extract info from a conversation between patient and medic

In [None]:
def scorer_medico(transcrito):
  prompt = """
  You are a useful AI assistant that summarizes conversations. These conversations are
  normally related with medical services. In addition to the summary, you provide a score that
  evaluates how good or bad the customer experience was. The score can be either good or bad.
  You also provide the reason why such score was given.
  You also provide a list of the patient's symptoms
  You also provide recomendations about how the customer experience could be improved for that call.
  Lastly, you provide a list of bullet points with the conversation's highlights.
  The summary must not be a bullet point list, but a single paragraph instead. Follow this structure:
  -----------------------------------------------------------------------------
  SUMMARY: <The conversation summary>
  SCORE: <The conversation score>
  REASON FOR SCORE: <The reason for such score>
  PATIENT SYMPTOMS: <The patient's symptoms>
  RECOMENDATIONS: <Bullet point list for recomendations>
  CONVERSATION HIGHLIGHTS: <Bullet point list for conversation highlights>
  """
  result = chat([
      SystemMessage(content=prompt),
      HumanMessage(content=transcrito)
  ])
  return result.content

## Testing new function

In [None]:
transcrito = transcribir("/content/gdrive/MyDrive/GenAIEne2024/medico-esp.mp3", debug=True)
print(scorer_medico(transcrito))

Sofía, ¿cómo estás? Bien, doctor, ¿cómo estás? Pase, por favor. Muchas gracias. Soy el doctor Enríquez. Toma asiento. Gracias. Siéntate, por favor. Gracias. Bueno, Sofía, como te decía, soy el doctor Enríquez. Yo voy a ser el médico que te va a atender hoy. ¿Tenemos 10 minutos para esta consulta? No. Cuéntame, ¿cómo puedo ayudarte? Bueno, yo voy a ser el médico que te va a atender hoy. Cuéntame, ¿cómo puedo ayudarte? Desde ayer he estado sintiendo como que ardor al orinar. Y luego no sé qué pasa en el baño. Ajá. Principalmente eso. Y no son... Eso es lo principal, porque he tenido como fiebre o algún problema. Son simplemente no sé qué pasó y me da mucha ansiedad en el baño frecuentemente. Mientras estoy en clases y me hace molesto y me arde al orinar. Entiendo. ¿Hay algo más que te preocupa? No. Principalmente eso es mi preocupación. Que, o sea, me molesta tener que ir al baño muchas veces y que me arda. Vale. ¿Y qué crees que puede haber pasado? ¿O qué crees que tiene de estos molest