### Speech Summarizer

Building Speech summarizer that transcribes and summarizes speech input is quite simple with transformer based pretrained models and open-source libraries

#### Transcribe speech into text

Use deepspeech model to convert a speech file (e.g., speech.wav) into text file transcript.txt

Speech file should meet below specifications
1. <b>wav</b> file format
2. sample rate of <b>16KHz</b>
3. <b>mono</b> type audio channel

#### Variables

Variables for Transcription

In [1]:
DEEPSPEECH_MODEL_FILE=r"models/deepspeech-0.9.3-models.tflite" #add model to be downloaded to this location
INPUT_AUDIO_FILE=r"input/speech.wav"
OUTPUT_TRANSCRIPT_FILE=r"output/transcript.txt"

Variables for Summarisation

In [2]:
PYTORCH_MODEL_URL=r"https://cdn-lfs.huggingface.co/facebook/bart-large-cnn/2ac2745c02ac987d82c78a14b426de58d5e4178ae8039ba1c6881eccff3e82f1"
PYTORCH_MODEL_FILE=r"models/bart-large-cnn/pytorch_model.bin"
PYTORCH_MODEL_DIR = r"models\bart-large-cnn"

#### Download models

###### Download models for transcription

Click <a href="https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.tflite">here</a> to download the model and move it to models directory in the downloaded repository if it <b>doesn't exist</b>

###### Download models for summarisation

In [3]:
!curl -o {PYTORCH_MODEL_FILE} {PYTORCH_MODEL_URL}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0 1549M    0 5665k    0     0  7076k      0  0:03:44 --:--:--  0:03:44 7081k
  0 1549M    0 14.3M    0     0  8178k      0  0:03:14  0:00:01  0:03:13 8177k
  1 1549M    1 22.8M    0     0  8363k      0  0:03:09  0:00:02  0:03:07 8364k
  1 1549M    1 26.5M    0     0  6057k      0  0:04:22  0:00:04  0:04:18 6058k
  1 1549M    1 26.5M    0     0  5552k      0  0:04:45  0:00:04  0:04:41 5552k
  1 1549M    1 30.0M    0     0  5152k      0  0:05:08  0:00:05  0:05:03 4854k
  2 1549M    2 32.4M    0     0  4881k      0  0:05:25  0:00:06  0:05:19 3699k
  2 1549M    2 39.2M    0     0  5150k      0  0:05:08  0:00:07  0:05:01 3351k
  3 1549M    3 48.0M    0     0  5594k      0  0:04:43  0:00:08  0:04:35 5111k
  3 1549M    3 56.6M    0     0  5921k      0  0:04

In [4]:
!deepspeech --model {DEEPSPEECH_MODEL_FILE} --audio {INPUT_AUDIO_FILE} > {OUTPUT_TRANSCRIPT_FILE}

Loading model from file models/deepspeech-0.9.3-models.tflite
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
Loaded model in 0.0461s.
Running inference.
Inference took 49.304s for 67.892s audio file.


#### Load generated transcript into memory

In [5]:
transcribed_text = str()
with open(OUTPUT_TRANSCRIPT_FILE) as file:
    transcribed_text = file.read().strip()

In [6]:
transcribed_text

"there are certain things that a necessary to ensure that the future is good and some of those things are in the long term having long term sustainable transportand sustainable energy generation and to be a space exploring civilization and for humanity to be out their among the stars and be a multi planetry species i think the being a multi planetory species and being out there among the stars is important for the long terme survival of humanity and that's one reason kind of like life insurance for life collectively life as we know it but then the part that i find personally most motivating is that it creates a sense of adventure and it makes people excited about the future if you con sider two futures one way we are forever confine to earth until eventually something terrible happens ot another future where we are out there on many planets may be even going beyond the solar system i think that second version is incredibly exciting and inspiring at there need to be reasons to get up in

#### Summarize transcribed speech

##### Import libraries

In [7]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

##### Load models

In [8]:
tokenizer = AutoTokenizer.from_pretrained(PYTORCH_MODEL_DIR)
model = AutoModelForSeq2SeqLM.from_pretrained(PYTORCH_MODEL_DIR)

##### Summarize

In [9]:
input_ids = tokenizer(f"summarize: {transcribed_text}", return_tensors='pt').input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Being a multi planetry species is important for the long terme survival of humanity. It creates a sense of adventure and it makes people excited about the future. There need to be reasons to get up in the morning you know life can't just be about solving problems otherwise what's the point.
