## Meeting Assistance
task is to:
1. Take audio input from a meeting.
2. generate minutes
3. generate actions from it.

I will use a frontier model to convert the audio into text <br>
I will use an open-source model to generate minutes<br>
Stream back the result as actionable items as a form of markdown.


In [None]:
!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m908.2/908.2 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m71.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m110.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m98.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m69.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m130.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [None]:
## imports
import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch

In [None]:
## model details
AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Llama-3.1-8B-Instruct"

In [None]:
## connect to google drive
# drive.mount('/content/drive')
audio_file = "/content/meeting.mp3"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
## login to hugging face
hf_token = userdata.get("HF_TOKEN")
login(hf_token, add_to_git_credential=True)

In [None]:
## sign in to OpenAI
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
openai_client = OpenAI(api_key=OPENAI_API_KEY)

In [None]:
## convert audio to text
audiofile = open(audio_file, "rb")
transcription = openai_client.audio.transcriptions.create(
    model=AUDIO_MODEL,
    file=audiofile,
    response_format="text"
)
print(transcription)

I'm guessing in the session here today is there anyone who's not had a phone screening call yet? Everyone's had a phone screening call, everyone's got, okay, Srishti hasn't had a phone screening call yet. Anyone else who hasn't yet had a phone screening interview? No, all right. Just as revision for everyone else, right? So, what are the things you need to know about in a phone screening interview? Phone screening interviews usually are the first step, right? So, whenever you submit your online application to a recruitment company or to many bigger companies, the first thing they'll do is that they either have a recruiter or a talent acquisition professional who would give you a call, right, to validate the information that you've mentioned in your resume. These calls can be as quick as five minutes, they can go for as long as 30 minutes, right? This is one thing that you need to need to know about, that in phone screening interviews, right, you can't see the other person. So, all that

In [None]:
## define the system and user prompt and define the messages
SYSTEM_PROMPT= """
You are a very helpful assistant that produces minutes from transcripts, with summary, key discussion points,
takeaways and action items with owenrs, in markdown
"""

USER_PROMPT = f"""
Below is a transcript of a meeting where we were being taught about phone screening and in person one-one interviews.
please write minutes in markdown, including a summaries with key discussion points, takeaways and action items.
{transcription}
"""

messages = [
    {"role":"system", "content": SYSTEM_PROMPT},
    {"role":"user", "content": USER_PROMPT}
]

In [None]:
## quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
## using LLAMA for the text generation

# define the AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    LLAMA,
    trust_remote_code=True)
# pad to feed into NN and have consistent input size
tokenizer.pad_token = tokenizer.eos_token

# turn it into chat template and put it to cuda
inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt").to("cuda")

# enable the streamer
streamer = TextStreamer(tokenizer)

# define the model
model = AutoModelForCausalLM.from_pretrained(
    LLAMA,
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

# generate output
outputs = model.generate(
    inputs,
    max_new_tokens=5000,
    streamer=streamer
)

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a very helpful assistant that produces minutes from transcripts, with summary, key discussion points, 
takeaways and action items with owenrs, in markdown<|eot_id|><|start_header_id|>user<|end_header_id|>

Below is a transcript of a meeting where we were being taught about phone screening and in person one-one interviews.
please write minutes in markdown, including a summaries with key discussion points, takeaways and action items.
I'm guessing in the session here today is there anyone who's not had a phone screening call yet? Everyone's had a phone screening call, everyone's got, okay, Srishti hasn't had a phone screening call yet. Anyone else who hasn't yet had a phone screening interview? No, all right. Just as revision for everyone else, right? So, what are the things you need to know about in a phone screening interview? Phone screening interviews usu

In [None]:
response = tokenizer.decode(outputs[0])

In [None]:
display(Markdown(response))

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a very helpful assistant that produces minutes from transcripts, with summary, key discussion points, 
takeaways and action items with owenrs, in markdown<|eot_id|><|start_header_id|>user<|end_header_id|>

Below is a transcript of a meeting where we were being taught about phone screening and in person one-one interviews.
please write minutes in markdown, including a summaries with key discussion points, takeaways and action items.
I'm guessing in the session here today is there anyone who's not had a phone screening call yet? Everyone's had a phone screening call, everyone's got, okay, Srishti hasn't had a phone screening call yet. Anyone else who hasn't yet had a phone screening interview? No, all right. Just as revision for everyone else, right? So, what are the things you need to know about in a phone screening interview? Phone screening interviews usually are the first step, right? So, whenever you submit your online application to a recruitment company or to many bigger companies, the first thing they'll do is that they either have a recruiter or a talent acquisition professional who would give you a call, right, to validate the information that you've mentioned in your resume. These calls can be as quick as five minutes, they can go for as long as 30 minutes, right? This is one thing that you need to need to know about, that in phone screening interviews, right, you can't see the other person. So, all that you're trying to do is that you're trying to pick up all your communication as well, all the cues there are non-verbal. So, at this point in time, what I advise is that keep your answers concise and keep them to the point, right? That is one of the first things that we want to do. In one-on-one digital interviews, when you can see someone on Teams, when you can see someone in person, right, there's a lot more wiggle room because you can see that, hey, my answer is going on for too long and the other person is getting bored or you can see the interviewer, you can see that they want you to wrap up that answer or you can see them confused, right? In these interviews, in our phone screening interviews, you can't see any of those. So, again, a good rule of<|eot_id|><|start_header_id|>assistant<|end_header_id|>

**Meeting Minutes**
===============

**Summary**
----------

Today's session focused on the importance of phone screening and in-person one-on-one interviews. The speaker emphasized the need to be concise and attentive during phone screening interviews due to the lack of non-verbal cues.

**Key Discussion Points**
------------------------

*   Phone screening interviews are usually the first step in the hiring process.
*   The length of phone screening interviews can vary from 5 to 30 minutes.
*   In phone screening interviews, it's essential to keep answers concise and to the point due to the absence of non-verbal cues.
*   In contrast, in-person one-on-one interviews provide more wiggle room due to the presence of visual cues.

**Takeaways**
------------

*   Be prepared to provide concise answers during phone screening interviews.
*   Pay attention to the interviewer's cues, even though they may not be visible.
*   Practice your communication skills to effectively convey your thoughts during phone screening interviews.

**Action Items**
--------------

*   Review and practice phone screening interview techniques.
*   Be prepared to answer questions concisely and to the point.
*   Pay attention to the interviewer's cues and adjust your communication style accordingly.

**Next Steps**
--------------

*   Srishti will schedule a phone screening interview with the speaker to practice the skills discussed during the session.
*   The speaker will provide additional resources and support to help the team prepare for future interviews.<|eot_id|>