## Requisitos

Es necesario instalar el ejecutable ffmpeg para poder ejecutar este script. Seguir los pasos de instalación del siguiente repo sobre la <a href='https://github.com/jiaaro/pydub?tab=readme-ov-file#installation'>libreria pydub</a>.

EN caso de recibir error al ejecutar la función Play(), sobre el fichero .py de play modificar la linea 14 de código sustituyendo por lo siguiente:
```{python}
with NamedTemporaryFile("w+b", suffix=".wav", delete=False) as f:
```

## Instlación de librerias

In [None]:
!pip install speechrecognition
!pip install ibm_watson_machine_learning
!pip install langchain
!pip install load_dotenv
!pip install ipython
!pip install ipykernel
!pip install gtts
!pip install pydub
!pip install langchain-community

## Carga de Librerias

In [1]:
import speech_recognition as sr
from gtts import gTTS
from pydub import AudioSegment
from pydub.playback import play

from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM


  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
To install langchain-ibm run `pip install -U langchain-ibm`.


## Definición de variables de entorno

In [2]:
import getpass
import os
from dotenv import load_dotenv
load_dotenv('../.env')

try:
    REGION = os.environ["RUNTIME_ENV_REGION"]
except KeyError:
    # Set your region here if you are not running this notebook in the watsonx.ai Jupyter environment
    # us-south, eu-de, etc.
    REGION = "us-south"

try:
    api_key = os.environ["api_key"]
except KeyError:
    # Enter api key here if not running this notebook in the watsonx.ai Jupyter environment
    api_key = getpass.getpass("Please enter your WML api key (hit enter): ")

credentials = {
    "url": "https://" + REGION + ".ml.cloud.ibm.com",
    "apikey": api_key
}


In [3]:
import os

try:
    project_id = os.environ["project_id"]
except KeyError:
    # Enter project ID here if not running this notebook in the watsonx.ai Jupyter environment
    project_id = getpass.getpass("Please enter your WML project_id (hit enter): ")


Procesamos el audio y con el modelo de reconocimiento de voz de google transformamos el audio a texto.

In [4]:
def convert_audio_to_wav(ogg_file, wav_file, format='ogg'):
    audio = AudioSegment.from_file(ogg_file, format=format)
    audio.export(wav_file, format="wav")

def audio_to_text(audio_file):
    recognizer = sr.Recognizer()
    with sr.AudioFile(audio_file) as source:
        audio_data = recognizer.record(source)
        try:
            text = recognizer.recognize_google(audio_data, language='es-ES')
            return text
        except sr.UnknownValueError:
            return "No se pudo entender el audio"
        except sr.RequestError as e:
            return f"Error de solicitud; {e}"


Definimos el LLM para generar una respuesta al audio, y lo transformamos a formato audio con la libreria de transformación de texto a speech de google.

In [12]:
def get_model(model_id, credentials, project_id):
    model_id_1 = model_id
    parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 50,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.TEMPERATURE: 0.5,
    GenParams.TOP_K: 50,
    GenParams.TOP_P: 1,
    GenParams.STOP_SEQUENCES: ["."]
}
    model = Model(
        model_id=model_id_1,  
        params=parameters, 
        credentials=credentials,
        project_id=project_id)
    
    llm = WatsonxLLM(model=model)
    return llm

def generate_response(model, prompt):
    response = model.invoke(prompt)
    return response

def text_to_audio(text, output_file):
    tts = gTTS(text=text, lang='es')
    tts.save(output_file)


llm = get_model(ModelTypes.MIXTRAL_8X7B_INSTRUCT_V01_Q, credentials, project_id)



Combinamos todas las funciones previas y mostramos el audio generado.

In [13]:
def process_audio(origin_file, wav_file, llm, response_audio_file):
    convert_audio_to_wav(origin_file, wav_file, format='ogg')
    text = audio_to_text(wav_file)
    print(f"Texto reconocido: {text}")
    
    response = generate_response(llm, text)
    print(f"Respuesta generada: {response}")
    
    text_to_audio(response, response_audio_file)
    response_audio = AudioSegment.from_mp3(response_audio_file)
    
    play(response_audio)

In [14]:
# Nombre del archivo OGG de entrada y el archivo MP3 de salida
origin_file = "data/audio.ogg"
wav_file = "data/audio.wav"
response_audio_file = "data/response.mp3"

# Procesar el audio
process_audio(origin_file=origin_file, wav_file=wav_file, llm=llm, response_audio_file=response_audio_file)

Texto reconocido: qué es la Inteligencia Artificial generativa
Respuesta generada: 

La Inteligencia Artificial generativa (IA generativa) es una subcategoría de la Inteligencia Artificial (IA) que se centra en la creación de contenido nuevo y original.
