The exchange I'm implementing is Australian Tax Office's mobile phone support.

Intents to handle:

- Check the progress of your tax return
- Request your existing tax file number
- Tax return preparation
- Linking myGov to myTax

Phrases to try:

- When do I get my tax return? Tax return progress?
- What is my tax file number? What is my TFN?
- I need help preparing my tax return. How I can do my tax return?
- I need to link myGov to myTax. How can I link myGov account to myTax?

In [7]:
import sounddevice as sd
import soundfile as sf
import time
import numpy as np
import whisper
import ollama
import json
import logging
from transformers import pipeline

In [2]:
#This is function to play audio

def playaudio(file):
    
    # Read the audio file as a NumPy array
    data, fs = sf.read(file, dtype='float32')

    # Play the audio file using sounddevice
    sd.play(data, fs)

    # Wait until the file is done playing
    status = sd.wait()
    
    return data
    
#This is to play the greetings
playaudio('audio/greetings.wav')

array([[0.00039673, 0.00039673],
       [0.00033569, 0.00033569],
       [0.00027466, 0.00027466],
       ...,
       [0.00021362, 0.00018311],
       [0.00021362, 0.00021362],
       [0.00021362, 0.00021362]], dtype=float32)

In [73]:
#This is function to record audio

def record_audio(file, duration):
    # Set the sampling frequency
    fs = 44100
    # Record audio using sounddevice
    data = sd.rec(int(duration * fs), samplerate=fs, channels=2)
    print("Recording started...")
    # Wait until the recording is done
    status = sd.wait()
    print("Recording stopped.")
    # Save the recorded audio to a file using soundfile
    sf.write(file, data, fs)

In [74]:
#This is to record the customer's intent

record_audio('audio/intent.wav', 3)

Recording started...
Recording stopped.


In [3]:
#This is to transcribe the customer's intent.

model = whisper.load_model("base")

intent = model.transcribe('audio/intent.wav')
print(intent['text'])



 Link my gov to my tax.


In [8]:
#These are all the 4 intents I pre-recorded, which can help if you want to test different intents

intent_check = model.transcribe("audio/check_question.wav")
intent_prepare = model.transcribe("audio/prepare_question.wav")
intent_request = model.transcribe("audio/request_question.wav")
intent_link = model.transcribe("audio/link_question.wav")

print(intent_prepare['text'])
print(intent_request['text'])
print(intent_check['text'])
print(intent_link['text'])



 I need help preparing my tax return.
 What is my tax file number?
 When do I get my tax return?
 I need to link my gov to my tax.


In [9]:
#This is for logging the transcription

transcription_intent = intent['text']

# Set the file name and format of the log file
log_file = "transcript.log"
log_format = "%(asctime)s %(message)s"

# Configure the logging settings
logging.basicConfig(filename=log_file, format=log_format, level=logging.INFO)

# Write a message to the log file
logging.info(transcription_intent)

In [10]:
#This is to identify the sentiment of the customer's intent

classifier = pipeline("sentiment-analysis")
sentiment = classifier(transcription_intent)
print(sentiment)

#This is to print operator, if the intent is negative enough
if sentiment[0]['label'] == 'NEGATIVE' and sentiment[0]['score'] > 0.99:
    print("operator")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9828671813011169}]


In [None]:
#This is to use few-shot learning to classify the intent of the customer into 1 of 4 pre-designed intents

r = ollama.generate(
    model='gemma:2b', 
    prompt='For questions like "When do I get my tax return?" or "Tax return progress", the intent is "Check the progress of your tax return". For questions like "What is my tax file number?" or "What is my TFN?", the intent is "Request your existing tax file number". For questions like, the intent is. For questions like "I need help preparing my tax return" or "How I can do my tax return?", the intent is "Tax return preparation". For questions like "I need to link my myGov to myTax" or "How can I link myGov to myTax", the intent is "Linking myGov to myTax". Otherwise, the question is irrelevant. Tag the following transcription with one of the following intent: Check the progress of your tax return, Request your existing tax file number, Tax return preparation, Linking myGov to myTax, or Irrelevant: '+transcription_intent,
    format='json')

answers = json.loads(r['response'])

print(answers)

In [88]:
#This is to play a confirmation question audio following the classification of customer's intent

if answers['intent'] == 'Check the progress of your tax return':
    playaudio('audio/check.wav')
elif answers['intent'] == 'Request your existing tax file number':
    playaudio('audio/request.wav')
elif answers['intent'] == 'Tax return preparation':
    playaudio('audio/prepare.wav')
elif answers['intent'] == 'Linking myGov to myTax':
    playaudio('audio/link.wav')
else:
    print('confused')

confused


In [89]:
#This is to record the customer's response to the confirmation question

record_audio('audio/response.wav', 3)

Recording started...
Recording stopped.


In [None]:
#This is to transcribe the response
response = model.transcribe('audio/response.wav')
transcription_response = response['text']
print(transcription_response)

In [None]:
#This is to classify the response as affirmative or negative
rr = ollama.generate(
    model='gemma:2b', 
    prompt='Tag the following transcription with affirmative or negative '+transcription_response,
    format='json')

answerss = json.loads(rr['response'])

print(answerss)

In [70]:
#This is to play the final thanking audio, and print out the intent that was identified.

if answerss['is_affirmative'] == True:
    playaudio('audio/thanks.wav')
    print(answers['intent'])
else:
    print('confused')

Check the progress of your tax return
