# OpenAI and Naver Clova API Setup

This code block is part of the setup process for an AI Korean tutor project, which utilizes both the OpenAI API for chat completion and the Naver Clova API for speech-to-text (STT) and text-to-speech (TTS) functionalities.

In [1]:
# Import necessary libraries
import os
from openai import OpenAI  # Import OpenAI library
from dotenv import load_dotenv, find_dotenv  # Import dotenv for environment variable management

# Load environment variables from a .env file
# This enables the script to access sensitive information (like API keys) securely.
_ = load_dotenv(find_dotenv()) 

# Specify the GPT model to be used for the OpenAI API
gpt_model_name = "gpt-3.5-turbo-1106"  # This sets the GPT model name for OpenAI's Chat Completion.

# Load environment variables for Naver Cloud
# These variables are required to authenticate and interact with Naver Cloud's APIs.
ncloud_client_id = os.environ['NCLOUD_CLIENT_ID']  # Retrieves Naver Cloud client ID from environment variables
ncloud_client_secret = os.environ['NCLOUD_CLIENT_SECRET']  # Retrieves Naver Cloud client secret

# Initialize the OpenAI client with the API key
# This creates a client object for interacting with OpenAI's API.
client = OpenAI(
    api_key=os.environ['OPENAI_API_KEY'],  # Retrieves API key from environment variables for OpenAI
)


## Explanation

- **Import Libraries**: The script begins by importing required libraries. `os` is used for operating system dependent functionality like reading environment variables. The `openai` library provides the Python interface to interact with OpenAI's API, and `dotenv` is used for loading environment variables from a `.env` file, which is a standard way to manage configuration settings securely.

- **Loading Environment Variables**: The script uses `dotenv` to load environment variables. This approach is typically used to keep sensitive data (like API keys) out of the source code for security reasons.

- **GPT Model Specification**: The `gpt_model_name` variable specifies which GPT model to use from OpenAI's offerings. In this case, it's set to `"gpt-3.5-turbo-1106"`, indicating the specific version of the model.

- **Naver Cloud Credentials**: The script retrieves the Naver Cloud's client ID and secret from environment variables. These credentials are necessary to authenticate requests made to the Naver Clova API.

- **OpenAI Client Initialization**: Finally, an instance of the `OpenAI` client is initialized with the API key from the environment variables. This client object will be used to make requests to the OpenAI API.

# Naver Cloud Speech Recognition Integration

This code block is designed to interact with Naver Cloud's Speech Recognition API, specifically for converting speech to text (`STT`). This functionality is a crucial component of the AI Korean tutor project.

In [2]:
# Import the requests library for making HTTP requests
import requests

# Define a function to get the transcript of an audio file
def get_transcript(file_path):
    # Construct the URL for the Naver Cloud Speech Recognition API
    url = "https://naveropenapi.apigw.ntruss.com/recog/v1/stt?lang=Kor"

    # Open the audio file in binary mode
    data = open(file_path, 'rb')

    # Define the headers for the API request
    headers = {
        "X-NCP-APIGW-API-KEY-ID": ncloud_client_id,  # Naver Cloud client ID
        "X-NCP-APIGW-API-KEY": ncloud_client_secret,  # Naver Cloud client secret
        "Content-Type": "application/octet-stream"  # Content type for audio file
    }

    # Make a POST request to the Naver Cloud API
    response = requests.post(url, data=data, headers=headers)

    # Check if the request was successful
    if(response.status_code == 200):
        return response.json()['text']  # Return the transcribed text
    else:
        print("Error : " + response.text)  # Print the error if the request failed


## Explanation

- **Function Definition**: The `get_transcript` function is defined to handle the STT process. It takes `file_path` as an input, which is the path to the audio file to be transcribed.

- **API URL and Headers**: The URL for the Naver Cloud STT API is constructed, and the necessary headers are defined. These headers include the API keys for authentication and the content type for the data being sent.

- **Making the API Request**: The script uses `requests.post` to send the audio file to the Naver Cloud API for transcription. The audio file is opened in binary mode and sent as data in the POST request.

- **Response Handling**: The script checks the response status code. If the request was successful, it prints and returns the transcribed text from the response. If there's an error, it prints the error message.

# Function to Generate Tutor's Response Using GPT

This function, `get_gpt_response`, generates a response from the AI English tutor based on the student's transcribed speech and the conversation history.

In [3]:
# Predefined prompt that sets the context for the AI's role
system_prompt = """
You are an experienced Korean tutor who graduated from Seoul National University.
You are talking to a student who wants to practice speaking Korean. 
Help them practice speaking Korean by talking to your student and 
try to teach your student how to say what they would like to say.
The answer must be formatted as a JSON string
"""

def get_gpt_response(transcript, history):
    # Format the system message for context setting
    system_message = {
        "role": "system", 
        "content": system_prompt.replace("\n", " ")  # Removes newline characters for formatting
    }
    
    # Prepare the message list combining the system message and conversation history
    message_list = [system_message]
    message_list.extend(history)
    message_list.append({"role": "user", "content": transcript})  # Add the latest user input

    # Get the AI response using the OpenAI Chat Completion API
    response = client.chat.completions.create(
        model=gpt_model_name,  # Specifies the GPT model to use
        response_format={ "type": "json_object" },  # Requests response in JSON format
        messages=message_list  # Provides the context and conversation history
    )
    
    # Return the AI's message content
    return response.choices[0].message.content

## Explanation:

- **System Prompt**: This sets the context for the AI, defining its role as an English tutor. The prompt is crucial as it guides the AI's responses.

- **Function Parameters**: `transcript` is the latest user input (student's speech), and `history` contains previous messages in the conversation.

- **Message Formatting**: The conversation history and new user input are formatted as a list of messages, each with a role (`system` or `user`) and content.

- **AI Response Generation**: The `client.chat.completions.create` method is used to generate a response from the AI based on the provided context and conversation history.

- **Response Handling**: The function extracts and returns the content of the AI's response, formatted as requested in JSON.

# Naver Cloud Text-to-Speech (TTS) Integration

This code block integrates the Naver Cloud TTS API, which is used to convert text into speech. It's particularly useful in the AI Korean tutor project for generating audio from text.

In [4]:
import os
import urllib.request

# Define a function to convert text to speech
def get_audio(text):
    # URL-encode the input text
    encoded_text = urllib.parse.quote(text)
    # Prepare the data with parameters for the TTS request
    tts_parameters = "speaker=nara&volume=0&speed=0&pitch=0&format=wav&text=" + encoded_text
    # API URL
    tts_url = "https://naveropenapi.apigw.ntruss.com/tts-premium/v1/tts"
    
    # Create a request object with the API URL
    request = urllib.request.Request(tts_url)
    # Add headers for authentication
    request.add_header("X-NCP-APIGW-API-KEY-ID", ncloud_client_id)
    request.add_header("X-NCP-APIGW-API-KEY", ncloud_client_secret)
    
    # Try-except block for error handling
    try:
        # Send the request to the Naver TTS API
        response = urllib.request.urlopen(request, data=tts_parameters.encode('utf-8'))
        rescode = response.getcode()

        # Check if the request was successful
        if rescode == 200:
            print("Saving TTS WAV file...")
            response_body = response.read()
            file_path = 'speech.wav'
            with open(file_path, 'wb') as f:
                f.write(response_body)
            return file_path
        else:
            print(f"Error Code: {rescode}")
    except Exception as e:
        print(f"An error occurred: {e}")



## Explanation

- **Function Definition**: `get_audio` function is defined to convert given text to speech using the Naver Cloud TTS API.

- **Preparing Data**: The input text is URL-encoded, and TTS parameters (like speaker, volume, speed, pitch, format) are appended to the text. These parameters can be adjusted as per requirement.

- **API Request Setup**: The request object is created with the TTS API URL, and headers for authentication are added.

- **Error Handling**: The `try-except` block is used for robust error handling. It attempts to send the request and process the response, catching any exceptions that occur during the process.

- **Response Processing**: If the response code is 200 (success), the script saves the response (audio data) as a WAV file. Otherwise, it prints the error code. The error handling is also improved to catch and report any exceptions that might occur during the request or file writing process.

- **Saving the Audio File**: The received audio data is written to a file named speech.wav. This file path can be modified as needed.

# Audio Playback with PyAudio

This code block demonstrates how to play an audio file (specifically a WAV file) using the PyAudio library. This functionality can be essential in applications like an AI Korean tutor, where audio feedback or instructions are necessary.

In [10]:
import pyaudio
import wave
import os
    
def play_audio(wav_file):

    # Open the .wav file
    wf = wave.open(wav_file, 'rb')

    # Create a PyAudio object
    p = pyaudio.PyAudio()

    # Open a stream with the correct settings for the .wav file
    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True)

    # Read and play the audio in chunks
    chunk_size = 1024
    data = wf.readframes(chunk_size)
    while data:
        stream.write(data)
        data = wf.readframes(chunk_size)

    # Close the stream and PyAudio object
    stream.stop_stream()
    stream.close()
    p.terminate()
    os.remove(wav_file)


## Explanation

- **Function Definition**: The `play_audio` function is defined to handle audio playback. It takes a path to a WAV file as its argument.

- **Opening the WAV File**: The WAV file is opened in read-binary mode (`'rb'`), which allows the wave module to read the audio data.

- **PyAudio Object**: A PyAudio object (`p`) is created to manage audio streams.

- **Audio Stream Creation**: An audio stream is opened with settings that match the audio file (format, channels, and frame rate). These settings are derived from the WAV file itself.

- **Chunked Audio Playback**: The audio file is read and played in chunks (each of size 1024 bytes in this case). This approach is efficient for playing larger files, as it doesn't require loading the entire file into memory.

- **Stream Management**: After the entire file is played, the audio stream and the PyAudio object are properly closed and terminated to free up resources.

# Main Function to Facilitate Conversation with the AI Tutor

The function `talk_to_gpt` orchestrates the process of converting user speech to text, obtaining a response from the AI tutor, and then converting this response back to speech.

In [11]:
import json

# History list to keep track of the conversation
history = []

def talk_to_gpt(file_path):
    # Transcribe user speech to text
    user_transcript = get_transcript(file_path)

    # Get the GPT tutor's response to the user's transcript
    # Uses only the last 10 messages in history for context
    gpt_response = get_gpt_response(user_transcript, history[-10:])
    
    # Parse the JSON-formatted response from the GPT tutor
    gpt_response = json.loads(gpt_response)
    gpt_response = gpt_response['response']
    
    # Update the conversation history with user and assistant messages
    history.extend([
        {"role": "user", "content": user_transcript}, 
        {"role": "assistant", "content": gpt_response}
    ])

    # Play the GPT tutor's response using TTS
    audio_file_path = get_audio(gpt_response)
    
    play_audio(audio_file_path)

## Explanation

- **Speech-to-Text Conversion**: The `get_transcript` function is used to convert the user's speech (from the audio file at `file_path`) into text.

- **AI Response Generation**: The `get_gpt_response` function generates a response from the AI tutor based on the user's transcript and recent conversation history.

- **JSON Parsing**: The response from the AI tutor, which is in JSON format, is parsed to extract the textual response.

- **Conversation History Management**: The conversation history is updated with the latest user and assistant (AI tutor) messages. This history is used for context in subsequent interactions.

- **Printing and TTS Playback**: The AI tutor's response is recorded to `.wav` file with `get_audio` function and then played aloud using the `play_audio` function.

# Audio Recording Class for User Input

The `AudioRecorder` class encapsulates the functionality needed to record audio from the user, which can then be processed for speech-to-text conversion.

In [8]:
import threading
import sounddevice as sd
import numpy as np
import wavio

class AudioRecorder:
    def __init__(self):
        self.is_recording = False      # Flag to control recording state
        self.audio_data = []           # List to store audio frames
        self.fs = 44100                # Sample rate (in Hz)
        self.channels = 1              # Number of audio channels

    def start_recording(self):
        self.is_recording = True
        self.audio_data = []
        # Start recording in a separate thread
        threading.Thread(target=self.record).start()

    def stop_recording(self):
        self.is_recording = False      # Stop the recording

    def record(self):
        # Set up the audio input stream
        with sd.InputStream(samplerate=self.fs, channels=self.channels) as stream:
            while self.is_recording:
                data, _ = stream.read(1024)  # Read audio data from the input stream
                self.audio_data.append(data)  # Append data to the audio_data list

    def save(self, filename='output.wav'):
        # Save the recorded audio to a file
        if self.audio_data:
            wav_data = np.concatenate(self.audio_data, axis=0)  # Concatenate all audio frames
            wavio.write(filename, wav_data, self.fs, sampwidth=2)  # Write to WAV file
            print("Recording saved to", filename)
            return filename
        else:
            print("No recording data to save.")


## Explanation

- **Initialization**: Sets up initial variables like sample rate, channels, and recording state.

- **Start and Stop Recording**: Methods to control the start and stop of audio recording.

- **Multithreading for Recording**: Uses a separate thread to handle audio input, ensuring the main program remains responsive.

- **Audio Data Collection**: Continuously reads audio data from the microphone and stores it in a list.

- **Saving the Recording**: Concatenates the recorded audio frames and saves them as a WAV file. This file can then be used for further processing like speech-to-text.


# Interactive Interface for Audio Recording and Processing

This section of the code creates an interactive interface using IPython widgets to control the audio recording and initiate conversation with the AI tutor.

In [9]:
import ipywidgets as widgets
from IPython.display import display

# Initialize the audio recorder
recorder = AudioRecorder()

# Create buttons for starting and stopping the recording
start_button = widgets.Button(description="Start Recording")
stop_button = widgets.Button(description="Stop Recording")

def on_start_clicked(b):
    # Function to handle start button click
    recorder.start_recording()  # Start recording audio
    print("Recording started...")

def on_stop_clicked(b):
    # Function to handle stop button click
    print("Recording stopped and saved.")
    recorder.stop_recording()  # Stop recording audio
    file_name = recorder.save()  # Save the recorded audio to a file
    talk_to_gpt(file_name)  # Process the audio file through the AI tutor
    os.remove(file_name)  # Remove the temporary audio file

# Assign the click event handlers to the buttons
start_button.on_click(on_start_clicked)
stop_button.on_click(on_stop_clicked)

# Display the buttons in the Jupyter Notebook interface
display(start_button, stop_button)

Button(description='Start Recording', style=ButtonStyle())

Button(description='Stop Recording', style=ButtonStyle())

Recording started...
Recording stopped and saved.
Recording saved to output.wav
response == {"text":"안녕하세요 만나서 반갑습니다"}
response json == {'text': '안녕하세요 만나서 반갑습니다'}
user_transcript == 안녕하세요 만나서 반갑습니다
안녕하세요! 저도 만나서 반가워요. 한국어 연습을 도와드릴까요?
save TTS wav


## Explanation:

- **Button Widgets**: Two buttons are created using `ipywidgets` for starting and stopping the audio recording.

- **Event Handlers**: Functions `on_start_clicked` and `on_stop_clicked` are defined to handle the respective button clicks.
    - `on_start_clicked` starts the audio recording.
    - `on_stop_clicked` stops the recording, saves the audio, processes it through the AI tutor (`talk_to_gpt`), and then cleans up the temporary file.

- **Display Widgets**: The `display` function from `IPython.display` is used to render the buttons in the Jupyter Notebook.
