First, we need to install the Hugging Face Transformers library, which gives us access to state-of-the-art speech and language models. If you haven’t installed it yet, run:

```
pip install transformers torch
```

Now, let’s import the necessary libraries and set up our device for computation. If you have a GPU, the code will automatically use it for better performance!

## Transcribing mp3 file to numpy requires the FFmpeg. Follow the steps in this link to have it installed. [(How to install FFmpeg in windows)](https://www.wikihow.com/Install-FFmpeg-on-Windows)

In [1]:
from transformers import pipeline,AutoModelForCausalLM, AutoTokenizer
import torch
from IPython.display import  clear_output


device = 'cuda' if torch.cuda.is_available() else 'cpu'

# SPEECH-TO-TEXT WITH WHISPER

# If MP4 (or other video format) convert to MP3

In [None]:
import subprocess

def convert_mp4_to_mp3(input_file, output_file):
    """Convert an MP4 file to MP3 using ffmpeg."""
    try:
        command = [
            "ffmpeg",
            "-i", input_file,    # Input file
            "-vn",               # No video
            "-acodec", "libmp3lame",  # MP3 codec
            "-b:a", "192k",      # Audio bitrate
            output_file          # Output file
        ]
        subprocess.run(command, check=True)
        print(f"Conversion successful: {output_file}")
    except subprocess.CalledProcessError as e:
        print(f"Error during conversion: {e}")

# Example usage
convert_mp4_to_mp3("meeting.mp4", "meeting.mp3")


ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

Conversion successful: output.mp3


size=   17682kB time=00:12:34.34 bitrate= 192.0kbits/s speed= 145x    
video:0kB audio:17681kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.003999%


Next, we’ll use OpenAI’s Whisper model to transcribe an audio file. We’re using the whisper-small model for efficiency, but you can switch to whisper-large for better accuracy.

In [7]:
pipe  = pipeline("automatic-speech-recognition",
                    "openai/whisper-small", 
                    chunk_length_s=30,
                    stride_length_s=5,
                    return_timestamps=True,
                    device=device, 
                    generate_kwargs = {"language": 'French', "task": "translate"}) 

Device set to use cuda


Now, let’s process an audio file. The model will transcribe and translate it into English."

In [8]:
transcription = pipe("Recording 2.flac" )
#Once the transcription is complete, we’ll format the text with timestamps for better readability.
formatted_lyrics = ""
for line in transcription['chunks']:
    text = line["text"]
    ts = line["timestamp"]
    formatted_lyrics += f"{ts}-->{text}\n"

print(formatted_lyrics.strip())

Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.


(0.0, 145.0)--> and The and and and and and and and and and and and and and It's ok, it's ok Hello. How are you? I'm fine. How are you? I'm fine. How are you? I'm fine. I'm fine. Hello. I'm with Maximilian Moro next to you. He's not connected to the NB. Hello. Hello. Hello. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm aligned. So this small thing and this small thing are aligned. So you see they drew the green lines here they
(145.0, 227.0)--> want to check that this part and this part are parallel and here they want to check I think the form what do you think about it? I get that? I guess that it's out. That can be out. It's patch-corner efficient. Maybe both can do. But this one, this one especially, it's quite... I don't think patch-corner can do this. But that's actually really good. I'm not sure because it comes in logically and patch code is valid This one is simple, I'm not sure about this one I think the probl

In [9]:
# Let’s also save the transcription to a text file so we can use it later!"
with open("transcription.txt", "w", encoding="utf-8") as file:
    file.write(formatted_lyrics.strip())

print("Transcription saved to transcription.txt")

Transcription saved to transcription.txt


# SUMMARIZING WITH LLAMA
Now, let’s take it a step further! We’ll use Meta’s LLaMA model to summarize the transcript. For this, we need the Llama-3.2-3B-Instruct model.

In [10]:
DEFAULT_MODEL = "meta-llama/Llama-3.2-3B-Instruct"


model = AutoModelForCausalLM.from_pretrained(
    DEFAULT_MODEL,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    device_map=device,
)

tokenizer = AutoTokenizer.from_pretrained(DEFAULT_MODEL, use_safetensors=True)
tokenizer.pad_token_id = tokenizer.eos_token_id

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

We’ll define a conversation prompt instructing LLaMA to summarize the meeting transcript in simple English.

In [12]:
conversation = [
    {"role": "system", "content": ''' Write a detailed summary of the meeting transcript'''},
    {"role": "user", "content": f'''{formatted_lyrics.strip()}'''},
]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# print(prompt)

with torch.no_grad():
    output = model.generate(
        **inputs,
        do_sample=True,
        max_new_tokens=1000
    )

processed_text = tokenizer.decode(output[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

print(processed_text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


assistant

The provided text appears to be a transcript of a meeting between multiple individuals, including Maximilian Moro, Carlos, and others. The meeting seems to be discussing various topics, including:

1. **Control and inspection of components**: The group discusses the importance of controlling and inspecting components, particularly in the context of a map or a system. They talk about the need for a camera to capture images of the components and the importance of aligning them.
2. **Camera setup and testing**: The group discusses the setup and testing of cameras, including the use of different types of cameras and the importance of testing for parallelism and alignment.
3. **Image processing and analysis**: The group talks about the need for image processing and analysis, including the use of software to detect defects and anomalies in the images.
4. **Feasibility studies**: The group discusses the importance of feasibility studies, including the creation of a set of images fo

The provided text appears to be a transcript of a meeting between multiple individuals, including Maximilian Moro, Carlos, and others. The meeting seems to be discussing various topics, including:

1. **Control and inspection of components**: The group discusses the importance of controlling and inspecting components, particularly in the context of a map or a system. They talk about the need for a camera to capture images of the components and the importance of aligning them.
2. **Camera setup and testing**: The group discusses the setup and testing of cameras, including the use of different types of cameras and the importance of testing for parallelism and alignment.
3. **Image processing and analysis**: The group talks about the need for image processing and analysis, including the use of software to detect defects and anomalies in the images.
4. **Feasibility studies**: The group discusses the importance of feasibility studies, including the creation of a set of images for each control and the testing of these images to determine the conditions of success.
5. **Camera placement and ergonomics**: The group talks about the importance of camera placement and ergonomics, including the need to position cameras to capture the best images and to reduce the risk of errors.
6. **Camera calibration and maintenance**: The group discusses the importance of camera calibration and maintenance, including the need to change batteries and perform regular checks on the cameras.
7. **Image quality and resolution**: The group talks about the importance of image quality and resolution, including the need to capture high-quality images to ensure accurate inspection and analysis.
8. **Labeling and annotation**: The group discusses the importance of labeling and annotating images, including the need to clearly label and identify components and defects.

Some specific points mentioned in the transcript include:

* The need to use a camera with a good resolution to capture high-quality images
* The importance of aligning images to ensure accurate inspection and analysis
* The need to test cameras for parallelism and alignment
* The importance of feasibility studies to determine the conditions of success for each control
* The need to position cameras to capture the best images and reduce the risk of errors
* The importance of regular camera calibration and maintenance
* The need to clearly label and annotate images to ensure accurate inspection and analysis

Overall, the meeting appears to be focused on improving the inspection and analysis process using cameras and image processing techniques, and ensuring that the cameras are properly set up and maintained to capture high-quality images.