Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : add support for new distilled Whisper models #1424

Merged
merged 2 commits into from
Nov 5, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 3, 2023

ref #1414

Initial support for https://huggingface.co/distil-whisper

Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with whisper.cpp.

# clone OpenAI whisper and whisper.cpp
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp

# get the models
cd whisper.cpp/models
git clone https://huggingface.co/distil-whisper/distil-medium.en
git clone https://huggingface.co/distil-whisper/distil-large-v2

# convert to ggml
python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
mv ggml-model.bin ggml-medium.en-distil.bin

python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
mv ggml-model.bin ggml-large-distil.bin

Run the transcription as usual:

make -j && ./main -m models/ggml-medium.en-distil.bin -f samples/gb0.wav

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/gb0.wav' (2037686 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     3.52 MB, ( 1726.62 / 21845.34)

[00:00:00.000 --> 00:00:30.000]   Good morning. This Tuesday is election day. After months of spirited debate and vigorous campaigning. The time has come for Americans to make important decisions about our nation's future. I encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy. But as the campaigns come to a close, Republicans, Democrats and independents can find common ground on at least one point. Our system of
[00:00:30.000 --> 00:01:00.000]   representative democracy is one of America's greatest strengths. The United States was founded on the belief that all men are created equal. Every election day, millions of Americans of all races, religions, and backgrounds step into voting booths throughout the nation. Whether they are rich or poor, old or young, each of them has an equal share in choosing the path that our country will take. And every ballot they cast is a reminder that our founding principles are alive and well. Voting is one of the great privileges of America. Voting
[00:01:00.000 --> 00:01:30.000]   American citizenship. And it has always required brave defenders. As you head to the polls next week, remember the sacrifices that had been made by generations of Americans in uniform to preserve our way of life. From Bunker Hill to Baghdad, the men and women of American armed forces have been devoted guardians of our democracy. All of us owe them and their families a special debt of gratitude on Election Day. Americans should also remember the important example that our elections set throughout the world.
[00:01:30.000 --> 00:02:00.000]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq and look to the united States for proof that self-government can endure and nations that still live under tyranny and oppression can find hope and inspiration in our commitment to liberty. For more than two centuries Americans have demonstrated the ability of free people to choose their own leaders. Our nation has flourished because of its commitment to trusting the wisdom of our citizenry. In this year's election, we will see this tradition continue.
[00:02:00.000 --> 00:02:30.000]   that we are blessed to live in a free nation guided by the will of the people. Thank you for listening.


whisper_print_timings:     load time =   367.80 ms
whisper_print_timings:     fallbacks =   8 p /   0 h
whisper_print_timings:      mel time =    64.09 ms
whisper_print_timings:   sample time =  1023.11 ms /  2007 runs (    0.51 ms per run)
whisper_print_timings:   encode time =  2848.93 ms /     5 runs (  569.79 ms per run)
whisper_print_timings:   decode time =  4086.64 ms /  1986 runs (    2.06 ms per run)
whisper_print_timings:   prompt time =    39.07 ms /    13 runs (    3.01 ms per run)
whisper_print_timings:    total time =  8500.04 ms
make -j && ./main -m models/ggml-large-distil.bin -f samples/gb0.wav

system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/gb0.wav' (2037686 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

ggml_metal_add_buffer: allocated 'kv_self_1       ' buffer, size =     4.39 MB, ( 3278.36 / 21845.34)

[00:00:01.000 --> 00:00:30.000]   Good today, this Tuesday is election day. After months of spirited debate and vigorous campaigning, the time has come for Americans to make importing decisions about our nation's future. I encourage all Americans to go to the polls and vote. Election season brings out the spirit of competition between our political parties. And that competition is an essential part of a healthy democracy, but as the campaigns come to a close, Republicans, Democrats, an independents can find common ground on at least one point.
[00:00:30.740 --> 00:01:00.000]   Democratic democracy is one of America's greatest strengths. The United States was founded on the belief that all men are created equal. Every election day, millions of Americans of all races, religions and backgrounds step into voting boosts throughout t nation, whether they are their rich or poor, old or young. Each of them has an equal share in choosing the path that our country will take. And every ballot they cast is a reminder that our founding principles are alive and well. Voting is one of the grepilagu the
[00:01:00.700 --> 00:01:30.000]   American citizenship, and it has always required brave defenders. As you head to the polls next week, remember the sacrifices that have been made by generations of Americans in uniform to preserve our way of life, from Bucker Hill to Baghdad. The men and women of American armed forces have been devoted guardians of our democracy. All of us owe them and their families a special debt of gratitude on Election Day. Americans should also remember the important example that our elections set throughout the world.
[00:01:30.760 --> 00:02:00.000]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq can look to the United States for proof that self-government can endure, and nations that still live under tyranny and oppression can find hope and inspiration in our comitement to liberty. For more than two centuries, Americans have demonstrated the ability of free people to choose their own leaders. Our nation has flourished because of its commitment to trusting the wisdom of our citizenry. In thi year's election, we will see this tradition continue.
[00:02:01.000 --> 00:02:30.000]   that we are blessed to live in a free nation guided by the will of the people. Thank you for listening.


whisper_print_timings:     load time =   628.78 ms
whisper_print_timings:     fallbacks =   8 p /   0 h
whisper_print_timings:      mel time =    61.44 ms
whisper_print_timings:   sample time =  1195.35 ms /  2339 runs (    0.51 ms per run)
whisper_print_timings:   encode time =  4966.54 ms /     5 runs (  993.31 ms per run)
whisper_print_timings:   decode time =  5783.05 ms /  2318 runs (    2.49 ms per run)
whisper_print_timings:   prompt time =    49.82 ms /    13 runs (    3.83 ms per run)
whisper_print_timings:    total time = 12764.79 ms

@nchudleigh
Copy link
Contributor

nchudleigh commented Nov 4, 2023

Benched on M1 Pro, looks promising

Commit Model Hardware Recording Length (seconds) Thread Processor Count Load Time (ms) Sample Time (ms) Encode Time (ms) Decode Time (ms) Sample Time per Run (ms) Encode Time per Run (ms) Decode Time per Run (ms) Total Time (ms)
b8c93c5 tiny.en Apple M1 Pro 28.225 8 1 45.83 56.11 51.28 234.15 0.43 51.28 1.8 459.06
b8c93c5 base.en Apple M1 Pro 28.225 8 1 78.47 54.31 86.47 352.43 0.4 86.47 2.61 648.84
b8c93c5 medium-distil Apple M1 Pro 28.225 8 1 366.75 49.91 607.9 254.47 0.43 607.9 2.19 1371.29
b8c93c5 small.en Apple M1 Pro 28.225 8 1 227.93 55.84 237.69 814.98 0.4 237.69 5.91 1427.06
b8c93c5 medium.en Apple M1 Pro 28.225 8 1 582.13 56.07 663.41 1788.39 0.42 663.41 13.45 3198.35
b8c93c5 medium Apple M1 Pro 28.225 8 1 594.81 56.27 668.03 1857.32 0.4 668.03 13.46 3331.47
b8c93c5 large-distil Apple M1 Pro 28.225 8 1 695.16 233.35 1099.83 1259.63 0.49 1099.83 2.65 3384.95
b8c93c5 large Apple M1 Pro 28.225 8 1 1724.06 55.01 1200.55 2870.04 0.42 1200.55 21.91 6039.54

@royshil
Copy link

royshil commented Nov 5, 2023

Distil Whisper on HF model now provides GGML prebuilt (no need to convert?):
e.g.

@ggerganov ggerganov merged commit 39cfad0 into master Nov 5, 2023
67 of 68 checks passed
vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023
* whisper : add support for new distilled Whisper models

* whisper : print log when using distilled models
felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023
* whisper : add support for new distilled Whisper models

* whisper : print log when using distilled models
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
* whisper : add support for new distilled Whisper models

* whisper : print log when using distilled models
@vgarleanu
Copy link

Apologies for resurrecting a merged PR @ggerganov, but whats the reasoning behind disabling timestamps if the model is distilled?

@ggerganov
Copy link
Owner Author

AFAIK distilled models are not trained with timestamps, so the inference should not try to predict those

@vgarleanu
Copy link

AFAIK distilled models are not trained with timestamps, so the inference should not try to predict those

I see. I find that interesting tho, as when I commented the line that disabled timestamps out, correct word level timestamps were generated by distil-whisper, and were accurate.

@NightMachinery
Copy link

I downloaded the latest distilled model:

wget https://huggingface.co/distil-whisper/distil-large-v3-ggml/resolve/main/ggml-distil-large-v3.bin -P ./models

But when running this model using:

./stream -m models/ggml-distil-large-v3.bin -t 6 --step 0 --length 30000 -vth 0.6

I don't see the message using distilled model - forcing no_timestamps. Is this expected behavior? Is it using the so called chunked algorithm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants