I just can't make it SYNC any ideas would be helpful. hours on hours days after days and nothing. #670

search620 · 2024-01-19T23:15:19Z

I am using this audio file for example:
https://www.thepodcastexchange.ca/s/McDonalds_LNG_061019.wav

There is no way to make the end of it sync with the audio.

Original Transcription:
Start: 00:00:00,009, End: 00:00:19,940, Text: Hey, Canadians, this is for you guys and for McDonald's. And so I'm going to read it in my most Canadian voice possible. It's impossible to be bummed out
when you think about a McDonald's Happy Meal, eh? I think everyone has a nostalgic attachment to them, including me and Christian. I can't tell you how many of the little toys we collected. Right, Christian?
Start: 00:00:20,606, End: 00:00:48,659, Text: All the time. And I love your accent so far. Thanks. And well, Canadians are very sarcastic, so I appreciate that. And we fought over the toys even with our fists were flying. You broke my nose on one occasion. I did. Oh man, that was actually really bad. Really dark. McDonald's cares about families, especially in Canada. So they care about the ingredients in their happy meals. The hamburgers are made with 100% beef from where else? Canada. Canada.
Start: 00:00:49,683, End: 00:01:05,793, Text: from the plains of Canada with no artificial preservatives, additives, or fillers, and Happy Meal chicken McNuggets. I wonder if the chickens are Canadian. And grilled chicken snack wraps. That's hard to say. And made with 100%... Yes, they are. Canadian-raised seasoned chicken.
Start: 00:01:05,793, End: 00:01:24,939, Text: No artificial flavors, colors, or preservatives. Enjoy quality ingredients while spending quality time with your family, whether they're Canadian or friends visiting from out of the country. Happy Meals start at just $3.99. $3.99. That's not that many loonies or toonies. Only at McDonald's.

Aligned Transcription:
Start: 00:00:00,029, End: 00:00:03,611, Text: Hey, Canadians, this is for you guys and for McDonald's.
Start: 00:00:03,770, End: 00:00:06,932, Text: And so I'm going to read it in my most Canadian voice possible.
Start: 00:00:07,493, End: 00:00:11,074, Text: It's impossible to be bummed out when you think about a McDonald's Happy Meal, eh?
Start: 00:00:11,654, End: 00:00:15,678, Text: I think everyone has a nostalgic attachment to them, including me and Christian.
Start: 00:00:16,138, End: 00:00:18,638, Text: I can't tell you how many of the little toys we collected.
Start: 00:00:19,039, End: 00:00:19,559, Text: Right, Christian?
Start: 00:00:20,725, End: 00:00:21,486, Text: All the time.
Start: 00:00:22,027, End: 00:00:23,507, Text: And I love your accent so far.
Start: 00:00:23,928, End: 00:00:24,288, Text: Thanks.
Start: 00:00:24,908, End: 00:00:28,070, Text: And well, Canadians are very sarcastic, so I appreciate that.
Start: 00:00:28,769, End: 00:00:32,351, Text: And we fought over the toys even with our fists were flying.
Start: 00:00:32,491, End: 00:00:33,953, Text: You broke my nose on one occasion.
Start: 00:00:34,353, End: 00:00:34,612, Text: I did.
Start: 00:00:34,713, End: 00:00:36,334, Text: Oh man, that was actually really bad.
Start: 00:00:36,374, End: 00:00:36,933, Text: Really dark.
Start: 00:00:38,634, End: 00:00:41,095, Text: McDonald's cares about families, especially in Canada.
Start: 00:00:41,496, End: 00:00:43,798, Text: So they care about the ingredients in their happy meals.
Start: 00:00:44,357, End: 00:00:45,859, Text: The hamburgers are made with 100% beef from where else?
Start: 00:00:45,899, End: 00:00:46,139, Text: Canada.
Start: 00:00:46,219, End: 00:00:46,439, Text: Canada.
Start: 00:00:49,704, End: 00:00:57,469, Text: from the plains of Canada with no artificial preservatives, additives, or fillers, and Happy Meal chicken McNuggets.
Start: 00:00:57,488, End: 00:00:58,670, Text: I wonder if the chickens are Canadian.
Start: 00:00:59,090, End: 00:01:00,511, Text: And grilled chicken snack wraps.
Start: 00:01:00,670, End: 00:01:01,350, Text: That's hard to say.
Start: 00:01:01,750, End: 00:01:03,332, Text: And made with 100%... Yes, they are.
Start: 00:01:03,371, End: 00:01:05,073, Text: Canadian-raised seasoned chicken.
Start: 00:01:05,933, End: 00:01:08,194, Text: No artificial flavors, colors, or preservatives.
Start: 00:01:08,594, End: 00:01:16,016, Text: Enjoy quality ingredients while spending quality time with your family, whether they're Canadian or friends visiting from out of the country.
Start: 00:01:16,637, End: 00:01:17,617, Text: Happy Meals start at just $3.99.
Start: 00:01:17,658, End: 00:01:17,617, Text: $3.99.
Start: 00:01:17,677, End: 00:01:19,778, Text: That's not that many loonies or toonies.
Start: 00:01:19,798, End: 00:01:20,259, Text: Only at McDonald's.

As you can see something here is not working and I don't even know how the "alignment" work and from where it getting the timestamp if the original transcribe don't give this information.

this is the script that I am trying to build for srt file:

import whisperx
import gc
import os

# Setup
device = "cuda"
audio_file = r"C:\Users\HOME\Desktop\Testing Whisper\7.wav"
batch_size = 12
compute_type = "float16"
language_code = "en"
model = whisperx.load_model("large-v3", device, compute_type=compute_type)
print_progress = True

# Load and transcribe audio
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size, print_progress=print_progress, language=language_code)

# Function to format time in SRT format
def format_time(seconds):
    millisec = int((seconds % 1) * 1000)
    minutes, seconds = divmod(int(seconds), 60)
    hours, minutes = divmod(minutes, 60)
    return f"{hours:02}:{minutes:02}:{seconds:02},{millisec:03}"

# Print Original Transcription
print("Original Transcription:")
for segment in result['segments']:
    print(f"Start: {format_time(segment['start'])}, End: {format_time(segment['end'])}, Text: {segment['text']}")

# Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=language_code, device=device)
aligned_result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

# Print Aligned Transcription
print("\nAligned Transcription:")
for segment in aligned_result['segments']:
    print(f"Start: {format_time(segment['start'])}, End: {format_time(segment['end'])}, Text: {segment['text']}")

# Function to convert transcription result to SRT format
def convert_to_srt(transcription_segments, srt_file_path):
    with open(srt_file_path, 'w', encoding='utf-8') as file:
        for i, segment in enumerate(transcription_segments, start=1):
            start_time = format_time(segment['start'])
            end_time = format_time(segment['end'])
            text = segment['text']
            file.write(f"{i}\n{start_time} --> {end_time}\n{text}\n\n")

# Main script
srt_file_name = os.path.splitext(audio_file)[0] + '.srt'
convert_to_srt(aligned_result['segments'], srt_file_name)

# Clean up if necessary
gc.collect()

The text was updated successfully, but these errors were encountered:

Purfview mentioned this issue Jan 22, 2024

Support for WhisperX standalone Purfview/whisper-standalone-win#174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I just can't make it SYNC any ideas would be helpful. hours on hours days after days and nothing. #670

I just can't make it SYNC any ideas would be helpful. hours on hours days after days and nothing. #670

search620 commented Jan 19, 2024

I just can't make it SYNC any ideas would be helpful. hours on hours days after days and nothing. #670

I just can't make it SYNC any ideas would be helpful. hours on hours days after days and nothing. #670

Comments

search620 commented Jan 19, 2024