<a href="https://colab.research.google.com/github/stipid/videotools/blob/main/whisper-timestamp-simple.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

我们直接从 GitHub 安装最新版本的 whisper-timestamped，这通常包含了对新环境的最新修复。

In [1]:
# 在一个新的 Colab 单元格中运行
!pip install -q git+https://github.com/linto-ai/whisper-timestamped.git

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.1/48.1 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.2/803.2 kB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m801.7/801.7 kB[0m [31m43.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m105.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m77.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

为了方便测试，我们从网上下载一个示例音频文件。

In [None]:
# 在一个新的 Colab 单元格中运行
# !curl -L "https://github.com/linto-ai/whisper-timestamped/blob/master/tests/data/apollo11.mp3" -o "apollo11.mp3"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  179k    0  179k    0     0   266k      0 --:--:-- --:--:-- --:--:--  266k


现在环境已经配置好了，可以直接导入并使用。

In [None]:
# 在一个新的 Colab 单元格中运行
import whisper_timestamped as whisper
import os

# Install ffmpeg if it's not already installed
if not os.path.exists('/usr/bin/ffmpeg'):
    !apt update -qq && apt install -qq ffmpeg

# 加载模型
# 'tiny', 'base', 'small', 'medium', 'large'
model = whisper.load_model("large-v3", device="cuda") # 如果你的Colab有GPU，使用 "cuda"

# print(model)

# 加载音频
audio = whisper.load_audio("DiaryofaWimpyKid03.mp3")

# 进行转录和时间戳对齐
result = whisper.transcribe(model, audio, language="en")

# 打印结果进行验证
import json
print(json.dumps(result, indent = 2, ensure_ascii = False))

# 也可以更直观地查看每个词的时间戳
for segment in result['segments']:
    for word in segment['words']:
        start_time = word['start']
        end_time = word['end']
        text = word['text']
        print(f"[{start_time:.2f}s -> {end_time:.2f}s] {text}")

100%|█████████████████████████████████████| 2.88G/2.88G [01:41<00:00, 30.3MiB/s]
100%|██████████| 15232/15232 [00:54<00:00, 277.06frames/s]

{
  "text": " September. Tuesday. First of all, let me get something straight. This is a journal, not a diary. I know what it says on the cover, but when Mom went out to buy this thing, I specifically told her to get one that didn't say diary on it. Great. All I need is for some jerk to catch me carrying this book around and get the wrong idea. The other thing I want to clear up right away is that this was Mom's idea, not mine. But if she thinks I'm going to write down my feelings in here or whatever, she's crazy. So just don't expect me to be all dear diary this and dear diary that. The only reason I agreed to do this at all is because I figure later on, when I'm rich and famous, I'll have better things to do than answer people's stupid questions all day long. So this book is going to come in handy. Like I said, I'll be famous one day, but for now, I'm stuck in middle school with a bunch of morons. Let me just say for the record that I think middle school is the dumbest idea ever inve




In [4]:
model_size = "large-v2"
language = "en"
if language != "auto":
  language_param = f"--language {language}"

new_directory = "output"
output_format = '--output_format "srt,json,vtt"'
filename = "Diary of a Wimpy Kid 10.mp3"
run = f'whisper_timestamped "{filename}" --model {model_size} {language_param} --output_dir {new_directory} {output_format}'
print(run)
!{run}

whisper_timestamped "Diary of a Wimpy Kid 10.mp3" --model large-v2 --language en --output_dir output --output_format "srt,json,vtt"
100% 13288/13288 [00:36<00:00, 368.18frames/s]


In [5]:
import platform
import torch
import torchaudio
import sys

print("--- System Information ---")
!cat /etc/os-release | grep "PRETTY_NAME"
!uname -r
print("\n")

print("--- Hardware Information ---")
print("CPU Info:")
!lscpu | grep "Model name"
print("\nGPU Info:")
!nvidia-smi --query-gpu=gpu_name,driver_version,memory.total --format=csv,noheader
print("\nRAM Info:")
!free -h | grep "Mem:"
print("\nDisk Info:")
!df -h /
print("\n")


print("--- Software Information ---")
print(f"Python Version: {platform.python_version()}")
print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"cuDNN version: {torch.backends.cudnn.version()}")
print(f"Torchaudio Version: {torchaudio.__version__}")
print("\n")


print("--- All Installed Packages (sample) ---")
# 只显示包含 'torch', 'whisper', 'numpy' 的包，避免列表过长
!pip list | grep -E 'torch|whisper|numpy|transformers'

--- System Information ---
PRETTY_NAME="Ubuntu 22.04.4 LTS"
6.1.123+


--- Hardware Information ---
CPU Info:
Model name:                           Intel(R) Xeon(R) CPU @ 2.00GHz

GPU Info:
Tesla T4, 550.54.15, 15360 MiB

RAM Info:
Mem:            12Gi       935Mi       7.4Gi       2.0Mi       4.4Gi        11Gi

Disk Info:
Filesystem      Size  Used Avail Use% Mounted on
overlay         113G   45G   69G  40% /


--- Software Information ---
Python Version: 3.11.13
PyTorch Version: 2.6.0+cu124
CUDA available: True
CUDA version: 12.4
cuDNN version: 90100
Torchaudio Version: 2.6.0+cu124


--- All Installed Packages (sample) ---
numpy                                 2.0.2
openai-whisper                        20250625
sentence-transformers                 4.1.0
torch                                 2.6.0+cu124
torchao                               0.10.0
torchaudio                            2.6.0+cu124
torchdata                             0.11.0
torchsummary                          1.5.

In [None]:
!ls -l DiaryofaWimpyKid02.mp3
!ffprobe DiaryofaWimpyKid02.mp3

-rw-r--r-- 1 root root 2469339 Jul  7 04:23 DiaryofaWimpyKid02.mp3
ffprobe version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libt