### Step 1: Understanding the interface
In this project, the goal is to create an interface that allows communication with a voice assistant, and a backend to manage the sending and receiving of responses.

The frontend will use HTML, CSS and JavaScript with popular libraries such as Bootstrap for basic styling, Font Awesome for icons and JQuery for efficient handling of actions. The user interface will be similar to other voice assistant applications, like Google Assistant. The code for the interface is provided and the focus of the course is on building the voice assistant and integrating it with various services and APIs. The provided code will help you to understand how the frontend and backend interact, and as you go through it, you will learn about the important parts and how it works, giving you a good understanding of how the frontend works and how to create this simple web page.

Run the following commands in the terminal to receive the outline of the project, rename it with another name and finally move into that directory:

```
git clone https://github.com/ibm-developer-skills-network/translator-with-voice-and-watsonx
cd translator-with-voice-and-watsonx
```

HTML, CSS, and JavaScript
The `index.html` file is responsible for the layout and structure of the web interface. This file contains the code for incorporating external libraries such as JQuery, Bootstrap, and FontAwesome Icons, as well as the CSS (`style.css`) and JavaScript code (`script.js`) that control the styling and interactivity of the interface.

The `style.css` file is responsible for customizing the visual appearance of the page's components. It also handles the loading animation using CSS keyframes. Keyframes are a way of defining the values of an animation at various points in time, allowing for a smooth transition between different styles and creating dynamic animations.

The `script.js` file is responsible for the page's interactivity and functionality. It contains the majority of the code and handles all the necessary functions such as switching between light and dark mode, sending messages, and displaying new messages on the screen. It even enables the users to record audio.


### Step 2: Understanding the server
The server is how the application will run and communicate with all our services. Flask is a web development framework for Python and can be used as a backend for the application. It is a lightweight and simple framework that makes it quick and easy to build web applications.

With Flask, you can create web pages and applications without requiring a lot of complex coding or using additional tools or libraries. You can create your own routes and handle user requests, and it also allows you to connect to external APIs and services to retrieve or send data.

This guided project uses Flask to handle the backend of your voice assistant. This means that you will be using Flask to create routes and handle HTTP requests and responses. When a user interacts with the voice assistant through the frontend interface, the request will be sent to the Flask backend. Flask will then process the request and send it to the appropriate service.

The code provided gives the outline for the server in the `server.py` file.

At the top of the file, there are several import statements. These statements are used to bring in external libraries and modules, which will be used in the current file. For example, `speech_text` is a function inside the `worker.py` file, while `ibm_watson_machine_learning` is a package that needs to be installed to use Watsonx's flan-ul2 model. These imported packages, modules, and libraries will allow you to access the additional functionalities and methods that they offer, making it easy to interact with the speech-to-text and flan-ul2 models in your code.

Underneath the imports, the Flask application is initialized, and a CORS policy is set. A CORS policy is used to allow or prevent web pages from making requests to different domains than the one that served the web page. Currently, it is set to * to allow any request.

The server.py file consists of 3 functions which are defined as routes, and the code to start the server.

Replace the first route in the `server.py` with the code below:

```
@app.route('/', methods=['GET'])
def index():
    return render_template('index.html')
```

**Function explanation**\
When a user tries to load the application, they initially send a request to go to the `/` endpoint. They will then trigger this `index` function and execute the code above. Currently, the returned code from the function is a render function to show the `index.html` file which is the frontend interface.

The second and third routes will be used to process all requests and handle sending information between the applications.

Finally, the application is started with the `app.run` command to run on port `8000` and let the host be `0.0.0.0` (a.k.a. `localhost`).

The next sections will take you through the process of completing the `process_message_route` and `speech_to_text_route` functions in this file and help you understand how to use the packages and endpoints.

### Step 3: Running the application
Docker allows for the creation of “containers” that package an application and its dependencies together. This allows the application to run consistently across different environments, as the container includes everything it needs to run. Additionally, using a Docker image to create and run applications can simplify the deployment process, as the image can be easily distributed and run on any machine that has Docker installed. This ensures that the application runs in the same way in development, testing, and production environments.

The `git clone` from Step 1 already comes with a `Dockerfile` and `requirements.txt` for this application. These files are used to build the image with the dependencies already installed. Looking into the `Dockerfile` you can see it's simple, it creates a Python environment, moves all the files from the local directory to the container, installs the required packages, and then starts the application by running the `python` command.

3 different containers need to run simultaneously to have the application run and interact with Text-to-Speech and Speech-to-Text capabilities.

**Starting the application**\
This image is quick to build as the application is quite small. These commands first build the application (running the commands in the `Dockerfile`) and tag (names) the built container as `voice-translator-powered-by-watsonx`, then run it in the foreground on `port 8000`. You'll need to run these commands every time you wish to make a new change to one of the files.

```
docker build . -t voice-translator-powered-by-watsonx
docker run -p 8000:8000 voice-translator-powered-by-watsonx
```

### Step 4: Integrating Watsonx API
It's time to give your voice assistant a brain! With the power of Watsonx's API, we can pass the transcribed text and receive responses that answer your questions.

**Authenticating for programmatic access**\
In this project, you do not need to specify your own `Watsonx_API` and `Project_id` to the below worker.py code. You can just specify `project_id="skills-network"` and leave `Watsonx_API` blank, as in this CloudIDE environment, we have already granted you access to API without your own `Watsonx_API` and `Project_id`.

**But it's important to note that this access method is exclusive to this Cloud IDE environment. If you are interested in using the model/API outside this environment (for example, in a local environment), detailed instructions and further information are available in this [tutorial]**(https://medium.com/the-power-of-ai/ibm-watsonx-ai-the-interface-and-api-e8e1c7227358).

```
# To call watsonx's LLM, we need to import the library of IBM Watson Machine Learning
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.foundation_models import Model

# placeholder for Watsonx_API and Project_id incase you need to use the code outside this environment
Watsonx_API = "Your WatsonX API"
Project_id= "Your Project ID"

# Define the credentials 
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com"
    #"apikey": 'API_KEY'
}

# Define the project id
#project_id = "PROJECT_ID"
project_id = "skills-network"
    
# Specify model_id that will be used for inferencing
model_id = ModelTypes.FLAN_UL2

# Define the model parameters
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 1024
}

# Define the LLM
model = Model(
    model_id=model_id,
    params=parameters,
    credentials=credentials,
    project_id=project_id
)
```

**Watsonx process message function**\
We will be updating the function called `watsonx_process_message`, which will take in a prompt and pass it to Watsonx's flan-ul2 API to receive a response. Essentially, it's the equivalent of pressing the send button to get a response from ChatGPT.

Go ahead and update the `watsonx_process_message` function in the `worker.py` file with the following.
```
def watsonx_process_message(user_message):
    # Set the prompt for Watsonx API
    prompt = f"""Respond to the query: ```{user_message}```"""
    response_text = model.generate_text(prompt=prompt)
    print("wastonx response:", response_text)
    return response_text
```

**Prompt refinement**\
We can further optimize our translation assistant. Since this is a translator, users shouldn't have to type "translate" every time. To address this, we've improved the prompt in the watsonx_process_message function to be more explicit.

For example, we now focus on translating sentences from English into Spanish, the updated prompt will look like below. Replace the prompt in the function with this:

```
prompt = f"""You are an assistant helping translate sentences from English into Spanish.
    Translate the query to Spanish: ```{user_message}```."""
```

This revised prompt makes it evident that the user intends to translate a sentence into Spanish, eliminating the need to explicitly mention "translate."

If your translation needs to involve languages other than Spanish, you can easily adapt the prompt. Simply replace "Spanish" in the prompt with the name of your required target language. This modification simplifies the user interaction and ensures that the translator remains user-friendly for various language pairs.

**Function explanation**\
The function is really simple, thanks to the very easy-to-use `ibm_watson_machine_learning` library.

Then we call Wastonx's API by using `model.generate_text` function and pass the prompt that we need the response for. Remember that `model` refers to the LLM we established earlier.

Again, you can tweak these parameters according to your personalized needs and can learn more about them by going to IBM Wastonx Prompt Lab where you can test all parameters in real-time like below:
![image](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0PPIEN/params.gif)

Finally, we return the `response_text` which stores the answer to our prompt.

### Step 5: Integrating Watson Speech-to-Text
Speech-to-Text functionality is a technology that converts speech into text using machine learning. It is useful for accessibility, productivity, convenience, multilingual support, and cost-effective solutions for a wide range of applications. For example, being able to take a user's voice as input for a chat application.

Using the embedded Watson Speech-to-Text AI model that was deployed earlier, it is possible to easily convert our Speech-to-Text by a simple API. This result can then be passed to Watsonx API for generating a response.

Starting Speech-to-Text
Skills Network provides its own Watson Speech-to-Text image that is run automatically in this environment. To access it, use this endpoint URL: `https://sn-watson-stt.labs.skills.network`

You can test it works by running this query:

`curl https://sn-watson-stt.labs.skills.network/speech-to-text/api/v1/models`

You should see a list of a few languages it can recognize. An example output is shown below.

```
{
   "models": [
       {
         "name": "es-LA_Telephony",
         "language": "es-LA",
         "description": "Latin American Spanish telephony model for narrowband audio (8kHz)",
          ...
      },
      {
         "name": "en-US_Multimedia",
         "language": "en-US",
         "description": "US English multimedia model for broadband audio (16kHz or more)",
          ...
      }
   ]
}
```

Next, try getting an example audio file to send a `/recognize` request to test the service. For example, you can download the example audio file by this command:

`curl "https://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/speech-to-text/0001.flac" -sLo example.flac`

Send the audio file to the service:

`curl "https://sn-watson-stt.labs.skills.network/speech-to-text/api/v1/recognize" --header "Content-Type: audio/flac" --data-binary @example.flac`

Example response:

```
{
   "result_index": 0,
   "results": [
      {
         "final": true,
         "alternatives": [
            {
               "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through colorado on sunday ",
               "confidence": 0.99
            }
         ]
      }
   ]
}
```

To use a different model, add the `model` query parameter to the request. The audio format can also be changed as long as the `Content-Type` header matches. For example:

```
curl "https://sn-watson-stt.labs.skills.network/speech-to-text/api/v1/recognize?model=es-LA_Telephony" --header "Content-Type: audio/flac" --data-binary @example.flac

{
   "result_index": 0,
   "results": [
      {
         "final": true,
         "alternatives": [
            {
               "transcript": "s ",
               "confidence": 0.39
            }
         ]
      }
   ]
}
```


**Implementation**\
We will be updating a function called `speech_to_textin` the `worker.py` file that will take in audio data received from the browser and pass it to the Watson Speech-to-Text API.

The `speech_to_text` function will take in audio data as a parameter, make an API call to the Watson Speech-to-Text API using the requests library, and return the transcription of the audio data.

Remember to replace the `...` for the `base_url` variable with the URL for your Speech-to-Text model (for example,https://sn-watson-stt.labs.skills.network).

```
import requests

def speech_to_text(audio_binary):

    # Set up Watson Speech-to-Text HTTP Api url
    base_url = '...'
    api_url = base_url+'/speech-to-text/api/v1/recognize'

    # Set up parameters for our HTTP reqeust
    params = {
        'model': 'en-US_Multimedia',
    }

    # Set up the body of our HTTP request
    body = audio_binary

    # Send a HTTP Post request
    response = requests.post(api_url, params=params, data=audio_binary).json()

    # Parse the response to get our transcribed text
    text = 'null'
    while bool(response.get('results')):
        print('Speech-to-Text response:', response)
        text = response.get('results').pop().get('alternatives').pop().get('transcript')
        print('recognised text: ', text)
        return text
```

**Function explanation**\
The requests library imported at the top of our `worker.py` file is a simple HTTP request library that we will be using to make API calls to the Watson Speech-to-Text API.

The function simply takes `audio_binary` as the only parameter and then sends it in the body of the HTTP request.

To make an HTTP Post request to Watson Speech-to-Text API, we need the following three elements:

* **URL** of the API: This is defined as `api_url` in our code and points to Watson's Speech-to-Text service
* **Parameters**: This is defined as `params` in our code. It's just a dictionary having one key-value pair i.e. `'model': 'en-US_Multimedia'` which tells Watson that we want to use the US English model for processing our speech
* **Body** of the request: this is defined as `body` and is equal to `audio_binary` since we are sending the audio data inside the body of our POST request.

We then use the requests library to send this HTTP request passing in the URL, params, and data(body) to it and then use `.json()` to convert the API's response to json format which is very easy to parse and can be treated like a dictionary in Python.

The structure of the response is something like this:
```
{
  "response": {
    "results": {
      "alternatives": {
        "transcript": "Recognised text from your speech"
      }
    }
  }
}
```

Therefore, we check if the response contains any results, and if it does, we extract the text by getting the nested transcript string as shown above. Then return this text.

*Small tip
Notice the print statements such as print(‘response’, response), it's always a good idea to print out the data you are receiving from some external place like an API in this case, as it really helps with debugging if something goes wrong*

### Step 6: Integrating Watson Text-to-Speech
Time to give your assistant a voice using Text-to-Speech functionality.

Once we have processed the user's message using Watsonx, let's add the final worker function that will convert that response to speech, so you get a more personalized feel as the Personal Assistant is going to read out the response to you. Just like other virtual assistants like Google, Alexa, Siri, etc.

**Starting Text-to-Speech**\
Skills Network provides its own Watson Text-to-Speech image that is run automatically in this environment. To access it, use this endpoint URL: `https://sn-watson-tts.labs.skills.network`

You can test it works by running this query:

`curl https://sn-watson-tts.labs.skills.network/text-to-speech/api/v1/voices`

You should see a list of a bunch of different voices this model can use. An example output is shown below.

```
{
   "voices": [
      {
         "name": "en-US_OliviaV3Voice",
         "language": "en-US",
         "gender": "female",
         "description": "Olivia: American English female voice. Dnn technology.",
         ...
      },
      {
         "name": "es-ES_EnriqueV3Voice",
         "language": "en-GB",
         "gender": "male",
         "description": "Enrique: Castilian Spanish (español castellano) male voice. Dnn technology.",
         ...
      },
      ...
   ]
}
```

Next, try sending an example text (ex: "Hello world") in JSON format to invoke `/synthesize` request. It will return an audio file named "output.wav" in the "translator-with-voice-and-watsonx" directory:

`curl "https://sn-watson-stt.labs.skills.network/text-to-speech/api/v1/synthesize" --header "Content-Type: application/json" --data '{"text":"Hello world"}' --header "Accept: audio/wav" --output output.wav`

To use a different model, add the `voice` query parameter to the request. To change the audio format, change the `Accept` header. For example:

`curl "https://sn-watson-stt.labs.skills.network/text-to-speech/api/v1/synthesize?voice=es-LA_SofiaV3Voice" --header "Content-Type: application/json" --data '{"text":"Hola! Hoy es un dia muy bonito."}' --header "Accept: audio/mp3" --output hola.mp3`

After executing the above command, you'll find the output file named "hola.mp3" in the "translator-with-voice-and-watsonx" directory.

**Text-to-Speech function**
In the `worker.py` file, the `text_to_speech` function passes data to Watson's Text-to-Speech API to get the data as spoken output.

This function is going to be similar to `speech_to_text` as we will be utilizing our request library again to make an HTTP request. Lets dive into the code. Again, remember to replace the `...` for the `base_url` variable with the URL for your Text-to-Speech model (for example, `https://sn-watson-tts.labs.skills.network`).

**Function explanation**\
The function takes `text` and `voice` as parameters. It adds voice as a parameter to the `api_url` if it's not empty or not default. It sends the `text` in the body of the HTTP request.

Similarly as before, to make an HTTP Post request to Watson Text-to-Speech API, we need the following three elements:

* **URL** of the API: This is defined as `api_url` in our code and points to Watson's Text-to-Speech service. This time we also append a voice parameter to the `api_url` if the user has sent a preferred voice in their request.
* **Headers**: This is defined as `headers` in our code. It's just a dictionary having two key-value pairs. The first is 'Accept':'audio/wav' which tells Watson that we are sending audio having wav format. The second one is 'Content-Type':'application/json', which means that the format of the body would be JSON
* **Body** of the request: This is defined as `json_data` and is a dictionary containing 'text':`text` key-value pair, this text will then be processed and converted to a speech.

We then use the requests library to send this HTTP to request passing in the URL, headers, and json(body) to it and then use `.json()` to convert the API's response to json format so we can parse it.

The structure of the response is something like this:

```
{
  "response": {
        content: The Audio data for the processed Text-to-Speech
    }
  }
}
```

Therefore, we return `response.content` which contains the audio data received.



### Step 7: Putting everything together by creating Flask API endpoints
Now by using the functions we defined in the previous sections, we can connect everything and complete the assistant.

The changes in this section will be for the `server.py` file.

The outline has already taken care of the imports for the functions from the `worker.py` file to the `server.py` file. This allows the `server.py` file to access these imported functions from the `worker.py` file.

`from worker import speech_to_text, text_to_speech, watsonx_process_message`

Now we will be updating two Flask routes, one for converting the user's Speech-to-Text (`speech_to_text_route`) and the other for processing their message and converting the Watsonx's response back to speech (`process_message_route`).

**Speech-to-Text route**\
This function is simple, as it converts the user's Speech-to-Text using the `speech_to_text` we defined in one of our previous sections and returns the response. Replace the `speech_to_text_route` function with the code below:

```
@app.route('/speech-to-text', methods=['POST'])
def speech_to_text_route():
    print("processing Speech-to-Text")
    audio_binary = request.data # Get the user's speech from their request
    text = speech_to_text(audio_binary) # Call speech_to_text function to transcribe the speech

    # Return the response to user in JSON format
    response = app.response_class(
        response=json.dumps({'text': text}),
        status=200,
        mimetype='application/json'
    )
    print(response)
    print(response.data)
    return response
```

**Function explanation**\
We start by storing the `request.data` in a variable called `audio_binary`, as we are sending the binary data of audio in the body of the request from the frontend. Then we use our previously defined function `speech_to_text` and pass in the `audio_binary` as a parameter to it. We store the return value in a new variable called text.

As our frontend expects a JSON response, we create a json response by using the Flask's `app.response_class` function and passing in three arguments:

1. **response**: This is the actual data that we want to send in the body of our HTTP response. We will be using `json.dumps` function and will pass in a simple dictionary containing only one key-value pair -`'text': text`
2. **status**: This is the status code of the HTTP response; we will set it to 200 which essentially means the response is OK and that the request has succeeded.
3. **mimetype**: This is the format of our response which is more formally written as `'application/json'` in HTTP request/response.

We then return the response.

**Process message route**
This function will basically accept a user's message in text form with their preferred voice. It will then use our previously defined helper functions to call the Watsonx's API to process this prompt and then finally convert that response to text using Watson's Text-to-Speech API and then return this data back to the user. Replace the `process_message_route` function to the code below:

```
@app.route('/process-message', methods=['POST'])
def process_message_route():
    user_message = request.json['userMessage'] # Get user's message from their request
    print('user_message', user_message)

    voice = request.json['voice'] # Get user\'s preferred voice from their request
    print('voice', voice)

    # Call watsonx_process_message function to process the user's message and get a response back
    watsonx_response_text = watsonx_process_message(user_message)

    # Clean the response to remove any emptylines
    watsonx_response_text = os.linesep.join([s for s in watsonx_response_text.splitlines() if s])

    # Call our text_to_speech function to convert Watsonx Api's reponse to speech
    watsonx_response_speech = text_to_speech(watsonx_response_text, voice)

    # convert watsonx_response_speech to base64 string so it can be sent back in the JSON response
    watsonx_response_speech = base64.b64encode(watsonx_response_speech).decode('utf-8')

    # Send a JSON response back to the user containing their message\'s response both in text and speech formats
    response = app.response_class(
        response=json.dumps({"watsonxResponseText": watsonx_response_text, "watsonxResponseSpeech": watsonx_response_speech}),
        status=200,
        mimetype='application/json'
    )

    print(response)
    return response
```

**Function explanation**\
We will start by storing the user's message in `user_message` by using `request.json['userMessage']`. Similarly, we will also store the user's preferred voice in `voice` by using `request.json['voice']`.

We will then use the helper function we defined earlier to process this user's message by calling `watsonx_process_message(user_message)` and storing the response in `watsonx_response_text`. We will then clean this response to remove any empty lines by using a simple one-liner function in Python that is, `os.linesep.join([s for s in watsonx_response_text.splitlines() if s])`.

Once we have this response cleaned, we will now use another helper function we defined earlier to convert it to speech. Therefore, we will call `text_to_speech` and pass in the two required parameters which are `watsonx_response_text` and `voice`. We will store the function's return value in a variable called `watsonx_response_speech`.

As the `watsonx_response_speech` is a type of audio data, we can't directly send this inside a json as it can only store textual data. Therefore, we will be using something called "base64 encoding". We can convert any type of binary data to a textual representation by encoding the data in base64 format. Hence, we will simply use `base64.b64encode(watsonx_response_speech).decode('utf-8')` and store the result back to `watsonx_response_speech`.

Now we have everything ready for our response so finally we will be using the same app.`response_class` function and send in the three parameters required. The `status` and `mimetype` will be exactly the same as we defined them in our previous `speech_to_text_route`. In the response we will use json.dumps function as we did before and will pass in a dictionary as a parameter containing `"watsonxResponseText":watsonx_response_text` and `"watsonxResponseSpeech":watsonx_response_speech`.

We then return the `response`.


### Step 8: Testing your personal assistant
The assistant is now complete and ready to use.

Now that we've updated the code quite considerably, it is a good time to rebuild our docker image and test to see that its working as expected in this environment.

Assuming the Text-to-Speech and Speech-to-Text models URLs are correctly set, you just need to rebuild the image for the application and rerun it so it has all the latest changes.

This step assumes that you have no running container for the application. If you do, please press Crtl (^) and C at the same time to stop the container.

```
docker build . -t voice-translator-powered-by-watsonx
docker run -p 8000:8000 voice-translator-powered-by-watsonx
```

