![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FApplied+GenAI&dt=Summarize+Conversations+-+Text+and+Audio.ipynb)

## Summarize Conversations Using Large Language Models - For Text and Audio Conversations

The Flow:
- Setup the environment: installs, APIs, parameters, packages, and clients
- Create some baseline data - text transcript of call to customer service
- Use Vertex LLMs
    - Continue a chat using an LLM as the Customer Service Agent
    - Summarize a chat using an LLM
- Audio Processing
    - Turn a chat transcript into an audio file for use in this example. Use different voices for the customer and the agent.
    - Transcribe the audio with speaker diarization (differentiate who is speaking).
- Use LLM to Create a summary of the chat transcript



---
## Colab Setup

To run this notebook in Colab click [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Summarize%20Conversations%20-%20Text%20and%20Audio.ipynb) and run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    # Text-To-Speech and Speech-To-Text Run as a service account - authorize default:
    !gcloud auth application-default login --quiet
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment.  Also, the APIs for Cloud Speech-To-Text and Cloud Text-To-Speech need to be enabled (if not already enabled).

### Installs (If Needed)

In [None]:
install = False
try: import google.cloud.speech
except ImportError:
    print('You need to pip install google-cloud-speech, ... commencing')
    !pip install google-cloud-speech -U -q
    install = True
try: import google.cloud.texttospeech
except ImportError:
    print('You need to pip install google-cloud-texttospeech, ... commencing')
    !pip install google-cloud-texttospeech -U -q
    install = True
try: import google.cloud.aiplatform
except ImportError:
    print('You need to pip install google-cloud-aiplatform (VERTEX AI), ... commencing')
    !pip install google-cloud-aiplatform -U -q
    install = True

### API Enablement

In [79]:
!gcloud services enable speech.googleapis.com
!gcloud services enable texttospeech.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [None]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

In [2]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [108]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'summarize-audio-conversation'

# change the following if the GCS bucket has a different name than the PROJECT_ID
GCS_BUCKET = PROJECT_ID

In [324]:
import IPython.display
import wave
import io

import vertexai.language_models
from vertexai.language_models import InputOutputTextPair
from google.cloud import texttospeech as tts
from google.cloud import speech as stt

In [325]:
vertexai.init(project = PROJECT_ID, location = REGION)
tts_client = tts.TextToSpeechClient()
stt_client = stt.SpeechClient()

---
## Baseline Data For Example

Create a chat dialogue as text!  For this example I am using the data `InputOutputTextPair` for the Vertex AI SDK - [link](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.InputOutputTextPair).  Create the conversation as a list of interactions:

```JSON
conversation = [
    InputOutputTextPair(
        input_text = '',
        output_text = ''
    ),
    ...,
    InputOutputTextPair(
        input_text = '',
        output_text = ''
    )
]
```

In [85]:
conversation = [
    InputOutputTextPair(
        input_text = "Hello",
        output_text = "Thank you for calling Google Cloud Support. How may I assist you today?"
    ),
    InputOutputTextPair(
        input_text = "Hi, I am trying to create a Google Cloud account and use the free credits, but I am not sure where to start.",
        output_text = "Sure, I can definitely help you with that. May I know if you already have a Google account?"
    ),
    InputOutputTextPair(
        input_text = "Yes, I do.",
        output_text = "Great. The first step would be to sign in to the Google Cloud Console. Do you know how to access it?"
    ),
    InputOutputTextPair(
        input_text = "No, I don't. Can you guide me?",
        output_text = "Sure. Please go to console.cloud.google.com and sign in with your Google account credentials. Once you are signed in, you will be directed to the Google Cloud Console dashboard."
    ),
    InputOutputTextPair(
        input_text = "Okay, I have signed in. What should I do next?",
        output_text = "Now, you'll need to create a project. A project is a collection of resources, such as Compute Engine instances and Cloud Storage buckets. To create a project, click on the 'Projects' tab and then click on the 'Create project' button. You'll need to provide a name for your project and then select a region. Once you've created your project, you'll be able to start using the free credits."
    ),
    InputOutputTextPair(
        input_text = "Great, I've created my project. Now what?",
        output_text = "Now, you'll need to enable the free trial. To do this, click on the 'Billing' tab and then click on the 'Enable free trial' button. You'll need to provide your credit card information and then click on the 'Submit' button."
    ),
    InputOutputTextPair(
        input_text = "Alright, I think I can do that.",
        output_text = "Excellent. You've now successfully created a Google Cloud account. You can start using the free credits that are available by clicking on the 'Credits' tab."
    ),
    InputOutputTextPair(
        input_text = "That sounds easy enough. How much free credit do I get?",
        output_text = "You'll receive $300 worth of free credit that you can use on any Google Cloud services for up to 12 months."
    ),
    InputOutputTextPair(
        input_text = "Is there anything else I should know?",
        output_text = "Yes, just be sure to keep an eye on your usage during your free trial period. Once your credits are used up, you'll start being charged for any additional usage. But don't worry, you'll receive notifications when you're approaching your credit limit."
    ),
    InputOutputTextPair(
        input_text = "Thank you so much for your help!",
        output_text = "You're welcome. Is there anything else I can help you with today?"
    ),
    InputOutputTextPair(
        input_text = "No, that's all. Thank you again!",
        output_text = "You're welcome. Have a great day and enjoy using Google Cloud!"
    ),
]

---
## Using Vertex AI LLM's

### LLM Chat Agent

In [86]:
chat_model = vertexai.language_models.ChatModel.from_pretrained("chat-bison@latest")

Let's truncate the full conversation in `conversation` from above, and use it to start chatting with LLM Chat Agent!

In [87]:
conversation_so_far = conversation[0:-3]

In [88]:
chat = chat_model.start_chat(
    context = 'I am a customer service agent.',
    examples = conversation_so_far
)

Now, continue the conversation with the same inputs as before but let the Chat Agent reply:

In [89]:
print(conversation[-3].input_text)
response = chat.send_message(conversation[-3].input_text)
response.text

Is there anything else I should know?


" Yes, there are a few other things you should know. First, the free trial is only available to new customers. Second, the free trial is only available for a limited time. Third, the free trial is subject to Google's terms and conditions."

Continue!

In [90]:
print(conversation[-2].input_text)
response = chat.send_message(conversation[-2].input_text)
response.text

Thank you so much for your help!


" You're welcome. Is there anything else I can help you with today?"

In [91]:
print(conversation[-1].input_text)
response = chat.send_message(conversation[-1].input_text)
response.text

No, that's all. Thank you again!


" You're welcome. Have a great day!"

### LLM To Summarize The Conversation

In [92]:
chat.message_history

[ChatMessage(content='Is there anything else I should know?', author='user'),
 ChatMessage(content=" Yes, there are a few other things you should know. First, the free trial is only available to new customers. Second, the free trial is only available for a limited time. Third, the free trial is subject to Google's terms and conditions.", author='bot'),
 ChatMessage(content='Thank you so much for your help!', author='user'),
 ChatMessage(content=" You're welcome. Is there anything else I can help you with today?", author='bot'),
 ChatMessage(content="No, that's all. Thank you again!", author='user'),
 ChatMessage(content=" You're welcome. Have a great day!", author='bot')]

Combine the conversation history in `conversation_so_far` with the agent assisted continuation in `chat.message_history`:

In [93]:
chat_history = ''.join(
    [f"\nspeaker 1: {c.input_text}\nspeaker 2: {c.output_text}" for c in conversation_so_far] +
    [f"\nspeaker {1 + int(m.author == 'bot')}: {m.content}" for m in chat.message_history]
)
print(chat_history)


speaker 1: Hello
speaker 2: Thank you for calling Google Cloud Support. How may I assist you today?
speaker 1: Hi, I am trying to create a Google Cloud account and use the free credits, but I am not sure where to start.
speaker 2: Sure, I can definitely help you with that. May I know if you already have a Google account?
speaker 1: Yes, I do.
speaker 2: Great. The first step would be to sign in to the Google Cloud Console. Do you know how to access it?
speaker 1: No, I don't. Can you guide me?
speaker 2: Sure. Please go to console.cloud.google.com and sign in with your Google account credentials. Once you are signed in, you will be directed to the Google Cloud Console dashboard.
speaker 1: Okay, I have signed in. What should I do next?
speaker 2: Now, you'll need to create a project. A project is a collection of resources, such as Compute Engine instances and Cloud Storage buckets. To create a project, click on the 'Projects' tab and then click on the 'Create project' button. You'll n

Generate a summary with an LLM:

In [94]:
textgen_model = vertexai.language_models.TextGenerationModel.from_pretrained('text-bison@latest')

In [95]:
summary = textgen_model.predict(f"Summarize the following conversation\n{chat_history}")

In [96]:
summary.safety_attributes

{'Finance': 0.2}

In [97]:
print(summary.text)

 The conversation is about how to create a Google Cloud account and use the free credits. The user is guided through the process of signing in to the Google Cloud Console, creating a project, enabling the free trial, and using the free credits. The user is also informed about the terms and conditions of the free trial.


---
## Audio

How do we recreate the summary process above when the source is Audio?  Speech-To-Text then Text-To-Speech.

Use multiple voices and then detect the different voices as speakers in the audio.

### First, Create Audio

We don't have audio but we do have the actual transcript.  Here, the [Cloud Text-To-Speech API](https://cloud.google.com/text-to-speech/docs/before-you-begin) is used to create an audio conversation.

In [285]:
voice_1 = tts.VoiceSelectionParams(language_code = "en-US", name = 'en-US-Studio-O')
voice_2 = tts.VoiceSelectionParams(language_code = "en-US", name = 'en-US-Studio-M')
audio = tts.AudioConfig(audio_encoding = tts.AudioEncoding.LINEAR16, speaking_rate = 1, sample_rate_hertz = 16000)

Create wave files for each speakers turn in the conversation:

In [286]:
waves = []
for c in conversation:
    input_1 = tts.SynthesisInput(text = c.input_text)
    response_1 = tts_client.synthesize_speech(input = input_1, voice = voice_1, audio_config = audio)
    input_2 = tts.SynthesisInput(text = c.output_text)
    response_2 = tts_client.synthesize_speech(input = input_2, voice = voice_2, audio_config = audio)
    waves += [response_1.audio_content, response_2.audio_content]

In [287]:
len(waves), type(waves[0])

(22, bytes)

Open all the wave files and get parameters and data:

In [297]:
wave_data = []
for w in waves:
    w_data = wave.open(io.BytesIO(w))
    wave_data.append([w_data.getparams(), w_data.readframes(w_data.getnframes())])
    w_data.close()

Concatenate all the wave data into a single `.wav` file:

In [304]:
output = wave.open('audio_conversation.wav', 'wb')
output.setparams(wave_data[0][0])
for i in range(len(wave_data)):
    output.writeframes(wave_data[i][1])
output.close()

In [306]:
IPython.display.Audio("audio_conversation.wav")

Move the file to GCS:

In [307]:
!gsutil mv audio_conversation.wav gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/

Copying file://audio_conversation.wav [Content-Type=audio/x-wav]...
Removing file://audio_conversation.wav...                                       

Operation completed over 1 objects/3.5 MiB.                                      


### Now, Turn Audio To Text

And recognize the different speakers!

[google.cloud.speech_v1.types.RecognitionConfig](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig)

In [326]:
audio = stt.RecognitionAudio(uri = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/audio_conversation.wav')

In [327]:
diarization_config = stt.SpeakerDiarizationConfig(
    enable_speaker_diarization = True,
    min_speaker_count = 2,
    max_speaker_count = 2,
)

In [328]:
config = stt.RecognitionConfig(
    encoding = stt.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz = 16000,
    language_code = "en-US",
    model = "latest_long",
    audio_channel_count = 1,
    enable_automatic_punctuation = True,
    enable_word_time_offsets = True,
    diarization_config = diarization_config
)

In [329]:
operation = stt_client.long_running_recognize(config = config, audio = audio)

In [330]:
response = operation.result(timeout = 90)

In [371]:
type(response)

google.cloud.speech_v1.types.cloud_speech.LongRunningRecognizeResponse

Check out the structure of this response:
    
[google.cloud.speech_v1.types.LongRunningRecognizeResponse](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.LongRunningRecognizeResponse)

In [374]:
len(response.results), type(response.results[-1])

(3, google.cloud.speech_v1.types.cloud_speech.SpeechRecognitionResult)

It's a list of this structure:

[google.cloud.speech_v1.types.SpeechRecognitionResult](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.SpeechRecognitionResult)

In [380]:
len(response.results[-1].alternatives), type(response.results[-1].alternatives[-1])

(1, google.cloud.speech_v1.types.cloud_speech.SpeechRecognitionAlternative)

And these `alternatives` are of this structure:

[google.cloud.speech_v1.types.SpeechRecognitionAlternative](https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.SpeechRecognitionAlternative)

Note, with diarization the last of these `alternatives` (index = -1) is the actual word by word diarization with speaker tags.

In [381]:
response.results[-1].alternatives[0].words[0]

start_time {
}
end_time {
  nanos: 200000000
}
word: "Hello,"
speaker_tag: 1

Make a chat transcript with the speakers marked:

In [391]:
chat_transcript = ''
speaker = 0
speaker_transcript = ''
for word in response.results[-1].alternatives[0].words:
    if word.speaker_tag != speaker:
        if speaker > 0: chat_transcript += f"\nspeaker {speaker}: {speaker_transcript}"
        speaker = word.speaker_tag
        speaker_transcript = word.word
    else:
        speaker_transcript += f" {word.word}"

In [392]:
print(chat_transcript)


speaker 1: Hello,
speaker 2: thank you for calling Google Cloud support. How may I assist you today?
speaker 1: Hi, I'm trying to create a Google Cloud account and use the free credits, but I am not sure where to start.
speaker 2: Sure I can definitely help you with that. May I know if you already have a Google account. Yes,
speaker 1: I do.
speaker 2: Great. The first step would be to sign into the Google Cloud console. Do you know how to access it?
speaker 1: No, I don't. Can you guide me?
speaker 2: Sure. Please go to console.cloud.google.com and sign in with your Google account credentials. Once you are signed in, you will be directed to the Google Cloud, console dashboard.
speaker 1: Okay, I have signed in, what should I do next?
speaker 2: Now you'll need to create a project. A project is a collection of resources such as compute engine instances and cloud storage buckets, to create a project, click on the projects Tab and then click on the create project button. You'll need to 

## Summarize The Audio Transcript

As shown above, the transcript can be summarize with the an LLM as follows: 

In [398]:
chat_transcript_summary = textgen_model.predict(f"Summarize the following conversation\n{chat_transcript}")

In [399]:
chat_transcript_summary.text

' The conversation is between a customer and a Google Cloud support agent. The customer is trying to create a Google Cloud account and use the free credits. The support agent guides the customer through the process of creating an account, creating a project, and enabling the free trial. The support agent also provides information on how much free credit the customer receives and how to keep track of usage.'

You can even get a summary from each speakers point of view:

In [401]:
textgen_model.predict(f"Summarize the following conversation from the point of view of each speaker:\n{chat_transcript}", max_output_tokens = 400)

 Speaker 1: 
I'm trying to create a Google Cloud account and use the free credits, but I'm not sure where to start. I already have a Google account. 
The agent guided me on how to sign in to the Google Cloud console and create a project. 
I was then guided on how to enable the free trial and how much credit I get. 
The agent also reminded me to keep an eye on my usage during my free trial period.

Speaker 2: 
The user is trying to create a Google Cloud account and use the free credits. 
I guided them on how to sign in to the Google Cloud console, create a project, and enable the free trial. 
I also reminded them to keep an eye on their usage during their free trial period.