<a href="https://colab.research.google.com/github/mperetto/Python-Chat-with-audio/blob/main/Chat_with_audio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chat with audio interviews
This ongoing project aims to develop a chat app that allows users to upload an audio file and view the chat history of the file in the format of 'speaker_1: ... speaker_2: ...,' with the ability to ask questions about the interview.

## Project Structure
The project is subdivided in 3 main parts.

1.   Upload and conversion to .wav format of the audio file (using ffmpeg) in case it is not already in .wav format.
2.   Processing the diarization of the audio file using the pyannote model to recognize the speakers in the audio.
3.   Extract the text from the audio with Whisper model.
4.   Matching the audio text segment get by Whisper with the timing of the diarization.
5.   Visualize the extracted text in chat format and permit the Q&A.



### Installing the dependecies

In [None]:
!apt update && apt install ffmpeg
!pip install -q -U openai-whisper
!pip install -q https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip

### Import and set up the pyannote pipeline

In [None]:
from pyannote.audio import Pipeline
import torch

pipeline = Pipeline.from_pretrained(
    'pyannote/speaker-diarization@2.1',
    use_auth_token='YOUR_TOKEN_HERE' # To get the token execute this cell and follow the instructions
)
pipeline.to(torch.device('cuda'))

### Import audio file and convert it if necessary

In [None]:
import subprocess

path = "audio.m4a"

if path[-3:] != 'wav':
  subprocess.call(['ffmpeg', '-i', path, 'audio.wav', '-y'])
  path = 'audio.wav'

### Compute the diarization

In [None]:
diarization = pipeline(path, num_speakers=2)

In [None]:
diarization

### Import Whisper model and extract the text

In [None]:
import whisper

model = whisper.load_model("medium", download_root='/content/')
result = model.transcribe(path)

In [None]:
result["text"]

In [None]:
result["segments"][:10]