# Chatbot de voz con Deep Learning

Atención! este notebook es para ejecutar en Google Colaboratory

##1 - El problema a resolver

<img src="imagenes_chatbot/idea_general_chatbot.png">

##2 - Elementos del chatbot

Usaremos *wav2vec2* para la conversión voz a texto, y *BlenderBot* para generar la conversación:

<img src="imagenes_chatbot/chatbot_detallado.png">

Tanto *wav2vec2* como *BlenderBot* se basan en las [Redes Transformer](https://youtu.be/Wp8NocXW_C4):

<img src="imagenes_chatbot/red-transformer.png">

##3 - Conversión voz a texto con *wav2vec2*

[*wav2vec2*](https://arxiv.org/pdf/2006.11477.pdf) fue desarrollado por Facebook en 2020:

<img src="imagenes_chatbot/wav2vec2.png">

In [2]:
#wav2vec2 y blenderbot
!pip install transformers 
#mic
!pip install git+git://github.com/ricardodeazambuja/colab_utils.git 
# pre-procesamiento audio
!pip install librosa 

Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
Collecting huggingface-hub<1.0,>=0.14.1
  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-win_amd64.whl (3.5 MB)
Collecting filelock
  Downloading filelock-3.12.0-py3-none-any.whl (10 kB)
Collecting fsspec
  Downloading fsspec-2023.5.0-py3-none-any.whl (160 kB)
Installing collected packages: fsspec, filelock, tokenizers, huggingface-hub, transformers
Successfully installed filelock-3.12.0 fsspec-2023.5.0 huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.29.2


  Running command git clone -q git://github.com/ricardodeazambuja/colab_utils.git 'C:\Users\lopez\AppData\Local\Temp\pip-req-build-hed_lc7i'
  fatal: unable to connect to github.com:
  github.com[0: 20.201.28.151]: errno=Unknown error

ERROR: Command errored out with exit status 128: git clone -q git://github.com/ricardodeazambuja/colab_utils.git 'C:\Users\lopez\AppData\Local\Temp\pip-req-build-hed_lc7i' Check the logs for full command output.


Collecting git+git://github.com/ricardodeazambuja/colab_utils.git
  Cloning git://github.com/ricardodeazambuja/colab_utils.git to c:\users\lopez\appdata\local\temp\pip-req-build-hed_lc7i
Collecting librosa
  Downloading librosa-0.10.0.post2-py3-none-any.whl (253 kB)
Collecting soxr>=0.3.2
  Downloading soxr-0.3.5-cp39-cp39-win_amd64.whl (184 kB)
Collecting numba>=0.51.0
  Downloading numba-0.57.0-cp39-cp39-win_amd64.whl (2.6 MB)
Collecting soundfile>=0.12.1
  Downloading soundfile-0.12.1-py2.py3-none-win_amd64.whl (1.0 MB)
Collecting msgpack>=1.0
  Downloading msgpack-1.0.5-cp39-cp39-win_amd64.whl (62 kB)
Collecting lazy-loader>=0.1
  Downloading lazy_loader-0.2-py3-none-any.whl (8.6 kB)
Collecting pooch<1.7,>=1.0
  Downloading pooch-1.6.0-py3-none-any.whl (56 kB)
Collecting audioread>=2.1.9
  Downloading audioread-3.0.0.tar.gz (377 kB)
Collecting llvmlite<0.41,>=0.40.0dev0
  Downloading llvmlite-0.40.0-cp39-cp39-win_amd64.whl (27.7 MB)
Collecting appdirs>=1.3.0
  Downloading appdirs-1

In [4]:
!pip install torch

Collecting torch
  Downloading torch-2.0.1-cp39-cp39-win_amd64.whl (172.4 MB)
Collecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting sympy
  Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting mpmath>=0.19
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, sympy, networkx, torch
Successfully installed mpmath-1.3.0 networkx-3.1 sympy-1.12 torch-2.0.1


In [None]:
# Importar librerías
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
from colab_utils import getAudio
import librosa
import numpy as np

w2v2 = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
w2v2_processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")

In [3]:
# Capturar audio del mic (a 48 KHz)
audio, sr = getAudio()

In [4]:
# Cambiar tasa de muestreo a 16 KHz (requerido por wav2vec2)
audio_float = audio.astype(np.float32)
audio_16k = librosa.resample(audio_float, sr, 16000)
print(f'Tamaño audio original: {audio_16k.shape}')

# Voz a texto
entrada = w2v2_processor(audio_16k, sampling_rate=16000, return_tensors="pt").input_values
print(f'Tamaño entrada a wav2vec2: {entrada.shape}')
probabilidades = w2v2(entrada).logits
print(f'Tamaño arreglo probabilidades (salida de wav2vec2): {probabilidades.shape}')
predicciones = torch.argmax(probabilidades, dim=-1)
print(f'Tamaño arreglo predicciones: {predicciones.shape}')
transcripcion = w2v2_processor.decode(predicciones[0])
print(transcripcion)

Tamaño audio original: (30720,)
Tamaño entrada a wav2vec2: torch.Size([1, 30720])
Tamaño arreglo probabilidades (salida de wav2vec2): torch.Size([1, 95, 32])
Tamaño arreglo predicciones: torch.Size([1, 95])
I


##4 - *BlenderBot*



[*BlenderBot*](https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot/) también fue desarrollado por FaceBook en 2020, con el fin de permitir una interacción más humana y natural:

<img src="imagenes_chatbot/blenderbot.png">

In [5]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
blender = AutoModelForSeq2SeqLM.from_pretrained("facebook/blenderbot-400M-distill")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1153.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1505.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=126891.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=62871.0, style=ProgressStyle(descriptio…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=16.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=772.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=729755983.0, style=ProgressStyle(descri…




In [6]:
blender.generate

In [7]:
# Prueba inicial
entradaBlender = tokenizer([transcripcion], return_tensors='pt')
print(f'Frase de entrada: {transcripcion}')
print(f'Entrada a BlenderBot: {entradaBlender}')
ids_respuesta = blender.generate(**entradaBlender)
print(f'Salida BlenderBot: {ids_respuesta}')
respuesta = tokenizer.batch_decode(ids_respuesta)
print(f'Salida después del Tokenizer: {respuesta}')

Frase de entrada: I
Entrada a BlenderBot: {'input_ids': tensor([[281,   2]]), 'attention_mask': tensor([[1, 1]])}


To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


Salida BlenderBot: tensor([[   1,  946,  304,  360,  463,  286, 1272,   38,  281,  360,  265, 1784,
          298,  265, 2382,   21,  281,  913,  494,  394,  602,   21,    2]])
Salida después del Tokenizer: ['<s> Do you have any pets? I have a dog and a cat. I love them so much.</s>']


In [8]:
# Eliminar tokens de inicio y finalización de frase
respuesta = respuesta[0].replace('<s>','').replace('</s>','')
print(f'Salida en el formato correcto: {respuesta}')

Salida en el formato correcto:  Do you have any pets? I have a dog and a cat. I love them so much.


In [9]:
# Crear un corto chat de prueba
NFRASES = 5
nfrase = 1
while nfrase <= NFRASES:
  frase = input('-> MIGUEL: ')
  entradaBlender = tokenizer([frase], return_tensors='pt')
  ids_respuesta = blender.generate(**entradaBlender)
  respuesta = tokenizer.batch_decode(ids_respuesta)
  respuesta = respuesta[0].replace('<s>','').replace('</s>','')
  print(f'-> BLENDERBOT: {respuesta}')

  nfrase += 1

-> MIGUEL: Hi, I'm Dario. What is your name?
-> BLENDERBOT:  My name is samantha, nice to meet you. Do you have any hobbies?
-> MIGUEL: I´m watching the olympic games now
-> BLENDERBOT:  Are you a fan of the Olympics? I love watching them, especially the winter games.
-> MIGUEL: Yes, I like it. I like hockey on ice in the winter games, and ski
-> BLENDERBOT:  That sounds like a lot of fun. Do you live in a place where it snows a lot?
-> MIGUEL: I live in Argentina, where do you live?
-> BLENDERBOT:  I live on the east coast of the united states. I have never been to argentina.
-> MIGUEL: I have never been to united states either
-> BLENDERBOT:  Neither have I, but I would love to go someday. Have you ever been?


##5 - *wav2dec2* + *BlenderBot* y prueba del chatbot

Ahora introduciremos la captura de audio -> wav2dec2 -> BlenderBot en un loop:

In [None]:
NFRASES = 5
nfrase = 1

while nfrase <= NFRASES:
  input()     # Esperar a pulsar tecla para iniciar grabación
  
  # Capturar audio y llevarlo a 16 KHz
  audio, sr = getAudio()
  audio_float = audio.astype(np.float32)
  audio_16k = librosa.resample(audio_float, sr, 16000)

  # Voz a texto
  entrada = w2v2_processor(audio_16k, sampling_rate=16000, return_tensors="pt").input_values
  probabilidades = w2v2(entrada).logits
  predicciones = torch.argmax(probabilidades, dim=-1)
  frase = w2v2_processor.decode(predicciones[0])
  
  # Imprimir transcripción
  print(f'-> MIGUEL: {frase}')

  # BlenderBot
  entradaBlender = tokenizer([frase], return_tensors='pt')
  ids_respuesta = blender.generate(**entradaBlender)
  respuesta = tokenizer.batch_decode(ids_respuesta)
  respuesta = respuesta[0].replace('<s>','').replace('</s>','')
  print(f'-> BLENDERBOT: {respuesta}')

  nfrase += 1




-> MIGUEL: HI ARE YOU DOING ARE YOU THERE
-> BLENDERBOT:  No, I am not.  I am at work.  What are you up to?



-> MIGUEL: IAM JUST HERE AT HAME SITTING SITTED ON THE COUCH AND WATCHING PSALM AND NEPHLIX
-> BLENDERBOT:  WHAT HAPPENED? DID YOU KNOW IT WAS YOUR FRIENDS?



-> MIGUEL: MY FRIENDS WHAT HAPPENED WITH MY FRIENDS
-> BLENDERBOT:  I'm sorry to hear that.  What happened?  I hope it wasn't too bad.



-> MIGUEL: THING BUT HAPPENED A DO YOU WANT TO HAVE VENER LATER AFTER WORK
-> BLENDERBOT:  I don't think I want to go back to school. I feel like I'm wasting my time.



-> MIGUEL: WHEN ARE YOU COMING BACK HOME
-> BLENDERBOT:  I am going to the beach!  I am so excited.  I have never been on a cruise before.
