## Minute Master -- Have meetings to remember!
By AI Diomio

Our idea is to have a tool that can:

1. Caption a saved meeting from any language to English.
2. Provide a high-level summary
3. Provide the full content / and transcript in English
With other iterations and specific training it could:
4. List all participants
5. List all the actio:n items mentioned on the meeting
6. Present the list of shoutouts throughout the meeting
6. Provide a chatbot to answer specific questions about the meeting content



We use hugging face transformers to summarize the meeting and we played with other pipeline tasks to check what are the possibilites of what we can do with meeting information

In [4]:
!pip install datasets evaluate transformers[sentencepiece]



With zero-shot-classification we could label actions and shoutouts

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Using a sentence to test the "action" classification

In [28]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
    "Take as next step is to set up the agenda",
    candidate_labels=["education", "politics", "business","action"],
)


No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'Take as next step is to set up the agenda',
 'labels': ['action', 'business', 'politics', 'education'],
 'scores': [0.7674700617790222,
  0.14732205867767334,
  0.06242470443248749,
  0.022783188149333]}

Even using a subtle phrase, the model understands an action from "Don't be afraid to try some fish"

In [None]:
classifier(
    "Don't be afraid to try some fish",
    candidate_labels=["education", "politics", "business","action"],
)

{'sequence': "Don't be afraid to try some fish",
 'labels': ['action', 'education', 'business', 'politics'],
 'scores': [0.75529944896698,
  0.19551649689674377,
  0.032925304025411606,
  0.016258763149380684]}

In [None]:
classifier(
    "Don't be afraid to try some fish",
    candidate_labels=["education", "politics", "business","action"],
)

{'sequence': "Don't be afraid to try some fish",
 'labels': ['action', 'education', 'business', 'politics'],
 'scores': [0.75529944896698,
  0.19551649689674377,
  0.032925304025411606,
  0.016258763149380684]}

We saw that the model recognizes mentions and actions at the same time by assigning the higher percentages to these labels

In [None]:
classifier(
    "You can talk to Sandra",
    candidate_labels=["education", "politics", "business","action","Sandra"],
)

{'sequence': 'You can talk to Sandra',
 'labels': ['Sandra', 'action', 'business', 'education', 'politics'],
 'scores': [0.8845289945602417,
  0.06953760981559753,
  0.021512506529688835,
  0.01748407818377018,
  0.006936794612556696]}

We  tested with a shoutout and the model recognizes at the same time as the name mention. Like this we could check the shoutout to someone specific.

In [None]:
classifier(
    "Thanks to  Andres for his hard work",
    candidate_labels=["education", "politics", "business","action","shoutout","Andres"],
)

{'sequence': 'Thanks to  Andres for his hard work',
 'labels': ['Andres',
  'shoutout',
  'action',
  'business',
  'education',
  'politics'],
 'scores': [0.6564475297927856,
  0.2949400246143341,
  0.024663686752319336,
  0.015491814352571964,
  0.006172672379761934,
  0.002284292597323656]}

We use gradio to generate the interface for the models used

In [5]:
!pip install gradio

Collecting gradio
  Downloading gradio-3.39.0-py3-none-any.whl (19.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.9/19.9 MB[0m [31m55.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.1.0-py3-none-any.whl (14 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.100.1-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.8/65.8 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpy (from gradio)
  Downloading ffmpy-0.3.1.tar.gz (5.5 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting gradio-client>=0.3.0 (from gradio)
  Downloading gradio_client-0.3.0-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.2/294.2 kB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting httpx (from gradio)
  Downloading httpx-0.24.1-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We use open ai whisper to translate the video and the translated captions to it. This will allow to obtain transcript and content of the meeting to use them as input for pipeline models

In [7]:
!pip install openai-whisper==20230117

Collecting openai-whisper==20230117
  Using cached openai-whisper-20230117.tar.gz (1.2 MB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ffmpeg-python==0.2.0 (from openai-whisper==20230117)
  Using cached ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Building wheels for collected packages: openai-whisper
  Building wheel for openai-whisper (setup.py) ... [?25l[?25hdone
  Created wheel for openai-whisper: filename=openai_whisper-20230117-py3-none-any.whl size=1178612 sha256=47ba24d1095bd474979f90834d136d3507dc44e0d242d5be3c0c7545a90cfb91
  Stored in directory: /root/.cache/pip/wheels/64/4d/1a/ad5530800c07d2409dc8dfd0a26ea5068f10f14c0060142b8a
Successfully built openai-whisper
Installing collected packages: ffmpeg-python, openai-whisper
Successfully installed ffmpeg-python-0.2.0 openai-whisper-20230117


We also used this pytube library to download a short and started testing it with videos in spanish.
You can try that too!

In [8]:
!pip install git+https://github.com/pytube/pytube

Collecting git+https://github.com/pytube/pytube
  Cloning https://github.com/pytube/pytube to /tmp/pip-req-build-qwagnlk_
  Running command git clone --filter=blob:none --quiet https://github.com/pytube/pytube /tmp/pip-req-build-qwagnlk_
  Resolved https://github.com/pytube/pytube to commit a32fff39058a6f7e5e59ecd06a7467b71197ce35
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pytube
  Building wheel for pytube (setup.py) ... [?25l[?25hdone
  Created wheel for pytube: filename=pytube-15.0.0-py3-none-any.whl size=57579 sha256=bb0648db4c3ca7fca6c8f50718cc81c2cc01ead6446f5857da209298253f88c5
  Stored in directory: /tmp/pip-ephem-wheel-cache-br9t25a6/wheels/b0/a9/7d/d3579227a695fdd15288c35657b3332ef0d71430ca7f685769
Successfully built pytube
Installing collected packages: pytube
Successfully installed pytube-15.0.0


You can copy and paste the url of any video in Youtube in the sentence below and then run the following code block. We recommend downloading a short since our implementation would not process a video longer that 30 seconds. Also please use rename the video to a name without spaces or non-alphanumeric characters

In [10]:
video_url= 'https://youtube.com/shorts/7xn0eIb6dhw?feature=share'

In [11]:
from pytube import YouTube
YouTube(video_url).streams.first().download()
yt = YouTube(video_url)
yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download()

'/content/¿Cómo deben ser las reuniones de trabajo ⌚ 100 EFECTIVAS.mp4'

The code below provides the functions to process the video and generate the captioned video, along with audio and transcript.

In [12]:
import whisper
import os
import sys
import subprocess



from whisper.utils import write_vtt


model = whisper.load_model("small")

def video2mp3(video_file, output_ext="mp3"):
    filename, ext = os.path.splitext(video_file)
    subprocess.call(["ffmpeg", "-y", "-i", video_file, f"{filename}.{output_ext}"],
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.STDOUT)
    return f"{filename}.{output_ext}"

def translated_transcript( audio_file ):
    options = dict(beam_size=5, best_of=5,fp16=False)
    translate_options = dict(task="translate", **options)
    result = model.transcribe(audio_file,**translate_options)
    return result

def create_output_video(audio_file,input_video, transcript):
  output_dir = '/content/'
  audio_path = audio_file.split(".")[0]

  with open(os.path.join(output_dir, audio_path + ".vtt"), "w") as vtt:
    write_vtt(transcript["segments"], file=vtt)
    subtitle = audio_path + ".vtt"
    output_video = audio_path + "_subtitled.mp4"
    os.system(f"ffmpeg -i {input_video} -vf subtitles={subtitle} {output_video}")
  return output_video






100%|███████████████████████████████████████| 461M/461M [00:05<00:00, 86.6MiB/s]


Here we played some tests to obtain audio and transcript from a downloaded video

In [11]:
audio_meeting3 = video2mp3('/content/reuniones_efectivas.mp4', output_ext="mp3")

In [12]:
transcript3 = translated_transcript(audio_meeting3)


In [13]:
transcript3

{'text': " How to make a 100% effective meeting? First of all, establish a maximum duration. In Nodrizatec we do it for about 20 minutes or so. On the other hand, establish a meeting responsible that is in charge of moderating the topics and above all, of inviting participants, like Truco, send them a Google Calendar so they can schedule it. It is essential that you all have it clear before attending the points of the day and the objectives of the meeting. And as important is that you register an Act so that you can consult them in the future. Do you have a derivative task? Point it out to be able to give them a follow-up. Stop, stop, stop, stop, stop, stop. Do you want to know much more about this topic? I'll click here.",
 'segments': [{'id': 0,
   'seek': 0,
   'start': 0.0,
   'end': 3.0,
   'text': ' How to make a 100% effective meeting?',
   'tokens': [1012, 281, 652, 257, 2319, 4, 4942, 3440, 30],
   'temperature': 0.0,
   'avg_logprob': -0.3991865705150042,
   'compression_rati

In [25]:
content=transcript3['text']

 We wanted to obtain the raw content of the meeting so here we have it

In [26]:
content

" How to make a 100% effective meeting? First of all, establish a maximum duration. In Nodrizatec we do it for about 20 minutes or so. On the other hand, establish a meeting responsible that is in charge of moderating the topics and above all, of inviting participants, like Truco, send them a Google Calendar so they can schedule it. It is essential that you all have it clear before attending the points of the day and the objectives of the meeting. And as important is that you register an Act so that you can consult them in the future. Do you have a derivative task? Point it out to be able to give them a follow-up. Stop, stop, stop, stop, stop, stop. Do you want to know much more about this topic? I'll click here."

In [14]:
subtitled_video4 = create_output_video(audio_meeting3,'/content/shortVidMp4.mp4',transcript3)

In [21]:
content = transcript3.get(0)

In [22]:
type(content)

NoneType

In [29]:
classifier(
    content,
    candidate_labels=["current","future"],
)

{'sequence': " How to make a 100% effective meeting? First of all, establish a maximum duration. In Nodrizatec we do it for about 20 minutes or so. On the other hand, establish a meeting responsible that is in charge of moderating the topics and above all, of inviting participants, like Truco, send them a Google Calendar so they can schedule it. It is essential that you all have it clear before attending the points of the day and the objectives of the meeting. And as important is that you register an Act so that you can consult them in the future. Do you have a derivative task? Point it out to be able to give them a follow-up. Stop, stop, stop, stop, stop, stop. Do you want to know much more about this topic? I'll click here.",
 'labels': ['current', 'future'],
 'scores': [0.6220754981040955, 0.37792450189590454]}

Here we test that we could create some question-answerer model to get specific info from the meeting

In [31]:
qa_model = pipeline("question-answering")
question = "What is our plan?"
context = content
qa_model(question = question, context = context)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.06811316311359406,
 'start': 293,
 'end': 308,
 'answer': 'Google Calendar'}

In [33]:
qa_model = pipeline("question-answering")
question = "What did they talk about?"
context = content
qa_model(question = question, context = context)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.31947848200798035,
 'start': 293,
 'end': 308,
 'answer': 'Google Calendar'}

## Summarization

Main model that we will run is summarization of the meeting content.

In [34]:
summarizer = pipeline("summarization")
summarizer(content)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' How to make a 100% effective meeting? Establish a maximum duration and a meeting responsible that is in charge of moderating the topics and inviting participants . Send participants a Google Calendar so they can schedule it . Do you have a derivative task? Point out to be able to give them a follow-up .'}]

In [35]:
summary = summarizer(content)

In [36]:
summary

[{'summary_text': ' How to make a 100% effective meeting? Establish a maximum duration and a meeting responsible that is in charge of moderating the topics and inviting participants . Send participants a Google Calendar so they can schedule it . Do you have a derivative task? Point out to be able to give them a follow-up .'}]

In [43]:
final_summary = summary[0]['summary_text']

In [44]:
final_summary

' How to make a 100% effective meeting? Establish a maximum duration and a meeting responsible that is in charge of moderating the topics and inviting participants . Send participants a Google Calendar so they can schedule it . Do you have a derivative task? Point out to be able to give them a follow-up .'

We ran an example of NER to see how could it help to extract info, perhaps participants or mentions on the meeting.

In [45]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)


Downloading (…)okenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'B-PER', 'score': 0.9990139, 'index': 4, 'word': 'Wolfgang', 'start': 11, 'end': 19}, {'entity': 'B-LOC', 'score': 0.999645, 'index': 9, 'word': 'Berlin', 'start': 34, 'end': 40}]


In [None]:
context

'We have been experimenting with different digital channels as social networks and advertising online to reach new customers. What is our main goal in terms of growth this year? Our main goal this year is to increase our sales by 15% in comparison with the same period of last year.'

In [None]:
question2 = "How are we going to increase our sales?"

In [None]:
qa_model(question= question2, context = context)

{'score': 0.4818035662174225, 'start': 226, 'end': 232, 'answer': 'by 15%'}

## Final code

1. Video tools
2. Gradio App

**Video Tools**

This is the improved version of the code previously added

In [13]:
import whisper
import os
import sys
import subprocess
from transformers import pipeline


from whisper.utils import write_vtt


model = whisper.load_model("small")

def video2mp3(video_file, output_ext="mp3"):
    filename, ext = os.path.splitext(video_file)
    subprocess.call(["ffmpeg", "-y", "-i", video_file, f"{filename}.{output_ext}"],
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.STDOUT)
    return f"{filename}.{output_ext}"

def translated_transcript( audio_file ):
    options = dict(beam_size=5, best_of=5,fp16=False)
    translate_options = dict(task="translate", **options)
    result = model.transcribe(audio_file,**translate_options)
    return result

def summarize_video(transcript):
   summarizer = pipeline("summarization")
   summary = summarizer(transcript)
   return summary

def create_output_video(input_video):
  #output_dir = '/content/'
  output_dir =''
  audio_file = video2mp3(input_video, output_ext="mp3")
  transcript = translated_transcript(audio_file)
  audio_path = audio_file.split(".")[0]
  raw_summary= summarize_video(transcript['text'])
  summary = raw_summary[0]['summary_text']
  content = transcript['text']

  with open(os.path.join(output_dir, audio_path + ".vtt"), "w") as vtt:
    write_vtt(transcript["segments"], file=vtt)
    subtitle = audio_path + ".vtt"
    output_video = audio_path + "_subtitled.mp4"
    os.system(f"ffmpeg -i {input_video} -vf subtitles={subtitle} {output_video}")
  return output_video,summary,content

## Gradio Application

We use the previously defined functions to display in this UI

In [None]:
import gradio as gr
title = "Minute Master -- Your usual meetings, but better!"

block = gr.Blocks()

with block:

    with gr.Group():
        with gr.Box():
            with gr.Row():

                inp_video = gr.Video(
                    label="Input Video",
                    mirror_webcam = False
                )
                op_video = gr.Video(label="Captioned Video")
                summary = gr.TextArea(label="Summary")
                content = gr.TextArea(label="Content in English")
        btn = gr.Button("Generate Meeting Summary")
        btn.click(create_output_video, inputs=[inp_video], outputs=[op_video,summary,content])

        gr.HTML('''
        <div class="footer">
                    <p>Model by <a href="https://github.com/openai/whisper">OpenAI</a> - Gradio App by Ai Diomio
                    </p>
        </div>
        ''')

block.launch(server_name="0.0.0.0",server_port=7860,debug = True, share=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://978615d158b5ec6bb6.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1392, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1097, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 703, in wrapper
    response = f(*args, **kwargs)
  Fi

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

## How to run it successfully
In the code above there is an error that we could not see here . This is why we created a container in
docker.io/slancheros/minute_master. You can run the container and it will forward to port 7862 on localhost.

Recommendations:
1. Use a 30 seconds video ( a short from youtube should work)
2. The video should have a name without spaces or special characters

Please find the repository here:

https://github.com/slancheros/whisperAPI