### Speech Summarizer

Building Speech summarizer that transcribes and summarizes speech input is quite simple with transformer based pretrained models and open-source libraries

Speech file <b>must</b> meet below specifications
1. <b>wav</b> file format
2. sample rate of <b>16KHz</b>
3. <b>mono</b> type audio channel

<b>Note:</b> Update variables under <b>Variables</b> section to configure inputs if required before running the notebook. 

The variable <b>INPUT_AUDIO_FILE</b> takes input audio file to be summarised.

The last cell <b>outputs</b> the summarisation of input audio file into <b>OUTPUT_SUMMARISED_FILE</b>

To run notebook cell by cell, click on a cell and click <b>Run</b> button below the <b>Menu</b> bar. Or to run all cells, select <b>Cell --> Run All from Menu bar.</b>

#### Variables

In [1]:
INPUT_AUDIO_FILE=r"input/speech.wav"

Default variables for Transcription

In [2]:
DEEPSPEECH_MODEL_FILE=r"models/deepspeech-0.9.3-models.tflite" #add model to be downloaded to this location
OUTPUT_TRANSCRIPT_FILE=r"output/transcript.txt"

Default variables for Summarisation

In [3]:
PYTORCH_MODEL_URL=r"https://cdn-lfs.huggingface.co/facebook/bart-large-cnn/2ac2745c02ac987d82c78a14b426de58d5e4178ae8039ba1c6881eccff3e82f1"
PYTORCH_MODEL_FILE=r"models/bart-large-cnn/pytorch_model.bin"
PYTORCH_MODEL_DIR = r"models\bart-large-cnn"
OUTPUT_SUMMARISED_FILE=r"output/summarised_text.txt"

#### Download models

###### Download models for summarisation

In [4]:
!curl -o {PYTORCH_MODEL_FILE} {PYTORCH_MODEL_URL}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0 1549M    0 4035k    0     0   857k      0  0:30:51  0:00:04  0:30:47  857k
  0 1549M    0 12.7M    0     0  2296k      0  0:11:31  0:00:05  0:11:26 2683k
  1 1549M    1 18.9M    0     0  2886k      0  0:09:09  0:00:06  0:09:03 3991k
  1 1549M    1 22.1M    0     0  2868k      0  0:09:13  0:00:07  0:09:06 4509k
  1 1549M    1 24.6M    0     0  2866k      0  0:09:13  0:00:08  0:09:05 4966k
  1 1549M    1 27.4M    0     0  2893k      0  0:09

#### Transcribe speech into text

Use deepspeech model to convert a speech file (e.g., speech.wav) into text file transcript.txt

In [5]:
!deepspeech --model {DEEPSPEECH_MODEL_FILE} --audio {INPUT_AUDIO_FILE} > {OUTPUT_TRANSCRIPT_FILE}


 19 1549M   19  298M    0     0  7137k      0  0:03:42  0:00:42  0:03:00 5711k
 19 1549M   19  301M    0     0  7056k      0  0:03:44  0:00:43  0:03:01 4594k
 19 1549M   19  308M    0     0  7070k      0  0:03:44  0:00:44  0:03:00 4317k
 20 1549M   20  317M    0     0  7110k      0  0:03:43  0:00:45  0:02:58 5184k
 21 1549M   21  326M    0     0  7150k      0  0:03:41  0:00:46  0:02:55 6443k
 21 1549M   21  333M    0     0  7152k      0  0:03:41  0:00:47  0:02:54 7284k
 22 1549M   22  341M    0     0  7188k      0  0:03:40  0:00:48  0:02:52 8346k
 22 1549M   22  350M    0     0  7226k      0  0:03:39  0:00:49  0:02:50 8625k
 23 1549M   23  359M    0     0  7262k      0  0:03:38  0:00:50  0:02:48 8651k
 23 1549M   23  368M    0     0  7297k      0  0:03:37  0:00:51  0:02:46 8671k
 24 1549M   24  377M    0     0  7329k      0  0:03:36  0:00:52  0:02:44 9034k
 24 1549M   24  386M    0     0  7361k      0  0:03:35  0:00:53  0:02:42 9048k
 25 1549M   25  394M    0     0  7392k      0  0:03

 99 1549M   99 1540M    0     0  7780k      0  0:03:23  0:03:22  0:00:01 8289k
 99 1549M   99 1549M    0     0  7787k      0  0:03:23  0:03:23 --:--:-- 9057k
100 1549M  100 1549M    0     0  7783k      0  0:03:23  0:03:23 --:--:-- 8829k
Loading model from file models/deepspeech-0.9.3-models.tflite
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
Loaded model in 0.0792s.
Running inference.
Inference took 109.990s for 67.892s audio file.


#### Load generated transcript into memory

In [6]:
transcribed_text = str()
with open(OUTPUT_TRANSCRIPT_FILE) as file:
    transcribed_text = file.read().strip()

In [7]:
transcribed_text

"there are certain things that a necessary to ensure that the future is good and some of those things are in the long term having long term sustainable transportand sustainable energy generation and to be a space exploring civilization and for humanity to be out their among the stars and be a multi planetry species i think the being a multi planetory species and being out there among the stars is important for the long terme survival of humanity and that's one reason kind of like life insurance for life collectively life as we know it but then the part that i find personally most motivating is that it creates a sense of adventure and it makes people excited about the future if you con sider two futures one way we are forever confine to earth until eventually something terrible happens ot another future where we are out there on many planets may be even going beyond the solar system i think that second version is incredibly exciting and inspiring at there need to be reasons to get up in

#### Summarize transcribed speech

##### Import libraries

In [8]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

##### Load models

In [9]:
tokenizer = AutoTokenizer.from_pretrained(PYTORCH_MODEL_DIR)
model = AutoModelForSeq2SeqLM.from_pretrained(PYTORCH_MODEL_DIR)

##### Summarize

In [10]:
input_ids = tokenizer(f"summarize: {transcribed_text}", return_tensors='pt').input_ids
outputs = model.generate(input_ids)
with open(OUTPUT_SUMMARISED_FILE, 'w') as file:
    file.write(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Being a multi planetry species is important for the long terme survival of humanity. It creates a sense of adventure and it makes people excited about the future. There need to be reasons to get up in the morning you know life can't just be about solving problems otherwise what's the point.
