The exchange I'm implementing is Australian Tax Office's mobile phone support.

Intents to handle:

- Check the progress of your tax return
- Request your existing tax file number
- Tax return preparation
- Linking myGov to myTax

Phrases to try:

- When do I get my tax return? Tax return progress?
- What is my tax file number? What is my TFN?
- I need help preparing my tax return. How I can do my tax return?
- I need to link myGov to myTax. How can I link myGov account to myTax?

In [23]:
import sounddevice as sd
import soundfile as sf
import time
import numpy as np
import whisper
import ollama
import json
import logging
from transformers import pipeline

In [3]:
#This is function to play audio

def playaudio(file):
    
    # Read the audio file as a NumPy array
    data, fs = sf.read(file, dtype='float32')

    # Play the audio file using sounddevice
    sd.play(data, fs)

    # Wait until the file is done playing
    status = sd.wait()
    
    return data
    
#This is to play the greetings
playaudio('audio/greetings.wav')

array([[0.00039673, 0.00039673],
       [0.00033569, 0.00033569],
       [0.00027466, 0.00027466],
       ...,
       [0.00021362, 0.00018311],
       [0.00021362, 0.00021362],
       [0.00021362, 0.00021362]], dtype=float32)

In [4]:
#This is function to record audio

def record_audio(file, duration):
    # Set the sampling frequency
    fs = 44100
    # Record audio using sounddevice
    data = sd.rec(int(duration * fs), samplerate=fs, channels=2)
    print("Recording started...")
    # Wait until the recording is done
    status = sd.wait()
    print("Recording stopped.")
    # Save the recorded audio to a file using soundfile
    sf.write(file, data, fs)

In [8]:
#This is to record the customer's intent

record_audio('audio/intent.wav', 3)

Recording started...
Recording stopped.


In [27]:
#This is to transcribe the customer's intent.

model = whisper.load_model("base").to('cpu')

intent = model.transcribe('audio/intent.wav')
print(intent['text'])

 What is my text file number?


In [6]:
#These are all the 4 intents I pre-recorded, which can help if you want to test different intents

intent_check = model.transcribe("audio/check_question.wav")
intent_prepare = model.transcribe("audio/prepare_question.wav")
intent_request = model.transcribe("audio/request_question.wav")
intent_link = model.transcribe("audio/link_question.wav")

print(intent_prepare['text'])
print(intent_request['text'])
print(intent_check['text'])
print(intent_link['text'])

 I need help preparing my tax return.
 What is my tax file number?
 When do I get my tax return?
 I need to link my gov to my tax.


In [25]:
#This is for logging the transcription

transcription_intent = intent['text']

# Set the file name and format of the log file
log_file = "transcript.log"
log_format = "%(asctime)s %(message)s"

# Configure the logging settings
logging.basicConfig(filename=log_file, format=log_format, level=logging.INFO)

# Write a message to the log file
logging.info(transcription_intent)

In [26]:
#This is to identify the sentiment of the customer's intent

classifier = pipeline(task="sentiment-analysis", model='distilbert-base-uncased-finetuned-sst-2-english')

sentiment = classifier(transcription_intent)
print(sentiment)

#This is to print the word operator, if the intent is negative enough
if sentiment[0]['label'] == 'NEGATIVE' and sentiment[0]['score'] > 0.99:
    print("operator")

[{'label': 'NEGATIVE', 'score': 0.9984287619590759}]
operator


In [12]:
#This is to use few-shot learning to classify the intent of the customer into 1 of 4 pre-designed intents

intent_prompt = 'There are 5 types of intents: Check the progress of your tax return, Request your existing tax file number, Tax return preparation, Linking myGov to myTax, and Irrelevant. If the text is like "When do I get my tax return?" or "Tax return progress", the intent is "Check the progress of your tax return". If the text is like "What is my tax file number?" or "What is my TFN?", the intent is "Request your existing tax file number". If the text is like "I need help preparing my tax return" or "How I can do my tax return?", the intent is "Tax return preparation". If the text is like "I need to link my myGov to myTax" or "How can I link myGov to myTax", the intent is "Linking myGov to myTax". Otherwise, the text is "Irrrelevant". Tag the following text with one of the intents: How can I do my tax return'

r = ollama.generate(
    model='gemma:2b', 
    prompt=intent_prompt,
    format='json')

answers = json.loads(r['response'])

print(answers)

{'intent': 'Tax return preparation'}


In [13]:
#This is to play a confirmation question audio following the classification of customer's intent

if answers['intent'] == 'Check the progress of your tax return':
    playaudio('audio/check.wav')
elif answers['intent'] == 'Request your existing tax file number':
    playaudio('audio/request.wav')
elif answers['intent'] == 'Tax return preparation':
    playaudio('audio/prepare.wav')
elif answers['intent'] == 'Linking myGov to myTax':
    playaudio('audio/link.wav')
else:
    print('confused')

In [14]:
#This is to record the customer's response to the confirmation question

record_audio('audio/response.wav', 3)

Recording started...
Recording stopped.


In [28]:
#This is to transcribe the response

model = whisper.load_model("base").to('cpu')

response = model.transcribe('audio/response.wav')
transcription_response = response['text']
print(transcription_response)

 Yes, it is.


In [17]:
response = ollama.generate(model='gemma:2b', prompt='Classify the following as saying yes or no: '+transcription_response)

print(response)

{'model': 'gemma:2b', 'created_at': '2024-03-10T11:54:16.7771726Z', 'response': 'The statement "Yes, it is" is a yes. It is a statement that is true and indicates the present truth.', 'done': True, 'context': [106, 1645, 108, 212107, 573, 2412, 685, 6508, 7778, 689, 793, 235292, 139, 3553, 235269, 665, 603, 235265, 107, 108, 106, 2516, 108, 651, 6218, 664, 3553, 235269, 665, 603, 235281, 603, 476, 7778, 235265, 1165, 603, 476, 6218, 674, 603, 1382, 578, 14939, 573, 2835, 7873, 235265, 107, 108], 'total_duration': 4609359400, 'load_duration': 7641200, 'prompt_eval_duration': 211771000, 'eval_count': 26, 'eval_duration': 4376480000}


In [22]:
#This is to play the final thanking audio, and print out the intent that was identified.

for word in ['yes', 'yep', 'yeah', 'true', 'yea']:
    if word in response['response'].lower():
        playaudio('audio/thanks.wav')
        print(answers['intent'])
        break
else:
    print('confused')

Tax return preparation
