## Voice-Enabled Therapist Chatbot

This program uses OpenAI's "gpt-3.5-turbo" natural language processing model (which powers ChatGPT) as well as Google's Text-to-Speech (gTTS) library to create a simple therapist chatbot that interacts with the user via voice input and output.  The full conversation history is also displayed as text.

The program uses the gradio library to create a user interface that allows the user to record their voice input using their microphone. After the user speaks, the chatbot responds with a text message that is converted into speech and played back to the user.

Much of the code taken/adapted from: 
https://github.com/hackingthemarkets/chatgpt-api-whisper-api-voice-assistant/blob/main/therapist.py

Video:
https://www.youtube.com/watch?v=Si0vFx_dJ5Y

In [None]:
! pip install openai 
! pip install gradio 
! pip install gTTS pyttsx3 playsound

In [None]:
# Imports 
import openai
import gradio as gr
from gtts import gTTS #Import Google Text to Speech
from playsound import playsound
from IPython.display import Audio #Import Audio method from IPython's Display Classimport subprocess
import subprocess
import os
import warnings
warnings.filterwarnings("ignore")

# OpenAI key required
openai.api_key = 'YOUR KEY HERE" 

# Create list of messages, starting with initial message to the system
messages=[]
messages.append({"role": "system", "content": 'You are a therapist. Respond to all input in 25 words or less.'})

def transcribe(audio):
  """
  Transcribes the user's audio input using the OpenAI API,
  generates a response from the chatbot using GPT-3, converts the response into 
  speech using the gTTS library, updates the conversation history, and returns 
  the updated conversation history as a string.

  Parameters:
  audio (str): The filepath of the audio file containing the user's input.

  Returns:
  str: A string containing the updated conversation history, with each message formatted as "role: content" and separated by two newlines. 
  """
  # Declare messages a global variable (not local to the function)
  global messages  

  # Get user's audio, transcribe it and append it to messages 
  subprocess.run(["ffmpeg", "-i", audio, "3.wav"], capture_output=True)
  audio_file = open("3.wav", "rb")
  transcript = openai.Audio.transcribe("whisper-1", audio_file)
  # print(transcript) # For validation only 
  messages.append({"role": "user", "content": transcript["text"]})
  audio_file.close() # Close the file
  !rm 3.wav # deletes file (Google colab environment)   

  # Get the therapist's response, append to messages 
  response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)
  system_message = response["choices"][0]["message"]
  messages.append(system_message)

  # Create audio from therapist's text response   
  msg=system_message["content"]  
  # print(msg) # For validation 
  talk_file=make_into_speech(msg)
  display(Audio(talk_file, autoplay=True)) 

  # Update the rolling chat transcript 
  chat_transcript = ""
  for message in messages:
      if message['role'] != 'system':
          chat_transcript += message['role'] + ": " + message['content'] + "\n\n"

  return chat_transcript

def make_into_speech(words):
  """
  Takes a string as input, converts it to speech using the gTTS library, 
  saves the speech as a WAV file, and returns the filepath of the saved WAV file.    
  Parameters:
  - `words` (str): The input string to convert to speech.    
  Returns:
  - `sound_file` (str): The filepath of the saved WAV file.    
  Example:
  >>> make_into_speech('Hello, how are you today?')
  '2.wav'    
  The function converts the input string to speech and returns the filepath of the saved WAV file.
  """
  tts = gTTS(words) #Provide the string to convert to speech
  tts.save('2.wav') #Save the string converted to speech as a .wav file
  sound_file = '2.wav'
  return sound_file

# Launch the interface.  Using debug=True in the gradio launch function enables 
# the automatic display/activation of the audio player in the console; this autoplay 
# capability is not yet available in gradio without a workaround. 
ui = gr.Interface(fn=transcribe, inputs=gr.Audio(source="microphone", type="filepath",
                  label="Record Here -- Hit Clear to Start New Response"), 
                  outputs=[gr.Text(label="Chat Transcript")])
ui.launch(debug=True)  