<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_whisper_large_v3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab shows how to run Whisper large v3 with sherpa-onnx on CPU as well as on GPU with CUDA.

# Build sherpa-onnx from source with CUDA support

In [1]:
import torch
a = torch.rand(1000).cuda()
print(a.device)

cuda:0


In [2]:
%%shell

git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx

mkdir -p build
cd build
cmake \
  -DBUILD_SHARED_LIBS=ON \
  -DSHERPA_ONNX_ENABLE_GPU=ON \
  ..

make -j2 sherpa-onnx-offline

Cloning into 'sherpa-onnx'...
remote: Enumerating objects: 13138, done.[K
remote: Counting objects: 100% (4115/4115), done.[K
remote: Compressing objects: 100% (1285/1285), done.[K
remote: Total 13138 (delta 3344), reused 3066 (delta 2813), pack-reused 9023[K
Receiving objects: 100% (13138/13138), 6.19 MiB | 11.24 MiB/s, done.
Resolving deltas: 100% (8647/8647), done.
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- No CMAKE_BUILD_TYPE given, default to Release
  Compiling for NVIDIA GPU is enabled.  Please make sure cudatoolkit

 



In [3]:
%%shell
git lfs install
git clone https://huggingface.co/csukuangfj/sherpa-onnx-whisper-large-v3

ls -lh sherpa-onnx-whisper-large-v3

Git LFS initialized.
Cloning into 'sherpa-onnx-whisper-large-v3'...
remote: Enumerating objects: 21, done.[K
remote: Counting objects: 100% (17/17), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 21 (delta 1), reused 0 (delta 0), pack-reused 4 (from 1)[K
Unpacking objects: 100% (21/21), 1.00 MiB | 1.44 MiB/s, done.
Filtering content: 100% (4/4), 5.75 GiB | 54.10 MiB/s, done.
total 5.8G
-rw-r--r-- 1 root root 2.8M Jul 12 16:10 large-v3-decoder.onnx
-rw-r--r-- 1 root root 3.0G Jul 12 16:12 large-v3-decoder.weights
-rw-r--r-- 1 root root 745K Jul 12 16:10 large-v3-encoder.onnx
-rw-r--r-- 1 root root 2.8G Jul 12 16:11 large-v3-encoder.weights
-rw-r--r-- 1 root root 798K Jul 12 16:10 large-v3-tokens.txt
drwxr-xr-x 2 root root 4.0K Jul 12 16:10 test_wavs




## Run with CPU

In [4]:
%%shell

ls -lh sherpa-onnx/build/bin/sherpa-onnx-offline

-rwxr-xr-x 1 root root 2.6M Jul 12 16:10 sherpa-onnx/build/bin/sherpa-onnx-offline




In [5]:
%%shell

exe=$PWD/sherpa-onnx/build/bin/sherpa-onnx-offline

echo $exe
cd sherpa-onnx-whisper-large-v3

time $exe \
  --whisper-encoder=./large-v3-encoder.onnx \
  --whisper-decoder=./large-v3-decoder.onnx \
  --tokens=./large-v3-tokens.txt \
  --num-threads=2 \
  ./test_wavs/0.wav


/content/sherpa-onnx/build/bin/sherpa-onnx-offline
/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 /content/sherpa-onnx/build/bin/sherpa-onnx-offline --whisper-encoder=./large-v3-encoder.onnx --whisper-decoder=./large-v3-decoder.onnx --tokens=./large-v3-tokens.txt --num-threads=2 ./test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="./large-v3-encoder.onnx", decoder="./large-v3-decoder.onnx", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), telespeech_



## Install CUDA 11.8

We are using onnxruntime 1.18.1 in sherpa-onnx.

According to
https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
we need to install cuda 11.8

please follow
https://k2-fsa.github.io/k2/installation/cuda-cudnn.html#cuda-11-8
to install it.

In [6]:
%%shell

mkdir -p /star-fj/fangjun/software/cuda-11.8.0



In [7]:
%%shell

wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

chmod +x cuda_11.8.0_520.61.05_linux.run

./cuda_11.8.0_520.61.05_linux.run \
  --silent \
  --toolkit \
  --installpath=/star-fj/fangjun/software/cuda-11.8.0 \
  --no-opengl-libs \
  --no-drm \
  --no-man-page

--2024-07-12 16:13:55--  https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4336730777 (4.0G) [application/octet-stream]
Saving to: ‘cuda_11.8.0_520.61.05_linux.run’


2024-07-12 16:15:23 (47.3 MB/s) - ‘cuda_11.8.0_520.61.05_linux.run’ saved [4336730777/4336730777]





In [8]:
%%shell

wget https://huggingface.co/csukuangfj/cudnn/resolve/main/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz

tar xvf cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz --strip-components=1 -C /star-fj/fangjun/software/cuda-11.8.0

--2024-07-12 16:19:03--  https://huggingface.co/csukuangfj/cudnn/resolve/main/cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz
Resolving huggingface.co (huggingface.co)... 18.164.174.23, 18.164.174.17, 18.164.174.118, ...
Connecting to huggingface.co (huggingface.co)|18.164.174.23|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/csukuangfj/cudnn/a6d9887267e28590c9db95ce65cbe96a668df0352338b7d337e0532ded33485c?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz%3B+filename%3D%22cudnn-linux-x86_64-8.9.1.23_cuda11-archive.tar.xz%22%3B&response-content-type=application%2Fx-xz&Expires=1721060343&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyMTA2MDM0M319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9jc3VrdWFuZ2ZqL2N1ZG5uL2E2ZDk4ODcyNjdlMjg1OTBjOWRiOTVjZTY1Y2JlOTZhNjY4ZGYwMzUyMzM4YjdkMzM3ZTA1MzJkZWQzMzQ4NWM%7EcmVzcG9



In [13]:
%%shell

exe=$PWD/sherpa-onnx/build/bin/sherpa-onnx-offline

echo $exe
cd sherpa-onnx-whisper-large-v3


time $exe \
  --whisper-encoder=./large-v3-encoder.onnx \
  --whisper-decoder=./large-v3-decoder.onnx \
  --tokens=./large-v3-tokens.txt \
  --provider=cuda \
  --num-threads=2 \
  ./test_wavs/0.wav || true


/content/sherpa-onnx/build/bin/sherpa-onnx-offline
/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 /content/sherpa-onnx/build/bin/sherpa-onnx-offline --whisper-encoder=./large-v3-encoder.onnx --whisper-decoder=./large-v3-decoder.onnx --tokens=./large-v3-tokens.txt --provider=cuda --num-threads=2 ./test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="./large-v3-encoder.onnx", decoder="./large-v3-decoder.onnx", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=



In [14]:
%%shell

ldd ./sherpa-onnx/build/_deps/onnxruntime-src/lib/libonnxruntime_providers_cuda.so

	linux-vdso.so.1 (0x00007ffc8fbd3000)
	libcublasLt.so.11 => not found
	libcublas.so.11 => not found
	libcudnn.so.8 => /lib/x86_64-linux-gnu/libcudnn.so.8 (0x00007fe2ec000000)
	libcurand.so.10 => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 (0x00007fe2e5600000)
	libcufft.so.10 => not found
	libcudart.so.11.0 => not found
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe2ec285000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe2ec280000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe2ec27b000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe2e53d4000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe2ebf19000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe2ec259000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe2e51ab000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe30b553000)




In [15]:
%%shell

cd /star-fj/fangjun/software/cuda-11.8.0
find . -name libcublasLt.so.11

find . -name libcublas.so.11


./targets/x86_64-linux/lib/libcublasLt.so.11
./targets/x86_64-linux/lib/libcublas.so.11




In [16]:
%%shell

export CUDA_HOME=/star-fj/fangjun/software/cuda-11.8.0
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib:$LD_LIBRARY_PATH

export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
export CUDA_TOOLKIT_ROOT=$CUDA_HOME
export CUDA_BIN_PATH=$CUDA_HOME
export CUDA_PATH=$CUDA_HOME
export CUDA_INC_PATH=$CUDA_HOME/targets/x86_64-linux
export CFLAGS=-I$CUDA_HOME/targets/x86_64-linux/include:$CFLAGS

exe=$PWD/sherpa-onnx/build/bin/sherpa-onnx-offline

echo $exe
cd sherpa-onnx-whisper-large-v3

time $exe \
  --whisper-encoder=./large-v3-encoder.onnx \
  --whisper-decoder=./large-v3-decoder.onnx \
  --tokens=./large-v3-tokens.txt \
  --provider=cuda \
  --num-threads=2 \
  ./test_wavs/0.wav

/content/sherpa-onnx/build/bin/sherpa-onnx-offline
/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 /content/sherpa-onnx/build/bin/sherpa-onnx-offline --whisper-encoder=./large-v3-encoder.onnx --whisper-decoder=./large-v3-decoder.onnx --tokens=./large-v3-tokens.txt --provider=cuda --num-threads=2 ./test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="./large-v3-encoder.onnx", decoder="./large-v3-decoder.onnx", language="", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=



In [17]:
%%shell

nvidia-smi

Fri Jul 12 16:23:02 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   78C    P0              42W /  70W |    105MiB / 15360MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

