Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper : add support for large v3 #1444

Merged
merged 4 commits into from Nov 7, 2023
Merged

whisper : add support for large v3 #1444

merged 4 commits into from Nov 7, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 7, 2023

NOTE: re-download ggml-large.bin to get the v3 version

  • ggml-large.bin is the new v3 model
  • ggml-large-v2.bin is the old v2 model
./models/download-ggml-model.sh large

This should be ready to merge.

I did some anecdotal tests using the audio samples in this repo and seems like v3 tends to repeat some lines more than v2. Could be a problem on whisper.cpp side, though I ran one of the audio samples with the OG whisper and it repeats in a similar way:

$ ▶ whisper samples/hp0.wav --model large
/opt/homebrew/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:11.840]  Henry F. Phillips, from Wikipedia, the free encyclopedia, at en.wikipedia.org
[00:11.840 --> 00:26.140]  Henry F. Phillips, 1890-1958
[00:26.140 --> 00:38.160]  A U.S. businessman from Portland, Oregon, has the honor of having the Phillips head screw and screwdriver named after him.
[00:39.280 --> 00:52.120]  The importance of the cross-head screw design lies in its self-centering property, useful on automated production lines that use powered screwdrivers.
[00:53.760 --> 00:56.120]  Phillips' major contribution was in...
[00:56.140 --> 01:04.640]  driving the cross-head concept forward, to the point where it was adopted by screwmakers and automobile companies.
[01:05.580 --> 01:10.380]  Although he received patents for the design in 1936,
[01:10.380 --> 01:17.720]  U.S. Patent No. 2,046,343
[01:17.720 --> 01:24.100]  U.S. Patents 2,046,837
[01:24.100 --> 01:25.380]  to 2,046,837
[01:26.140 --> 01:30.100]  to 2,046,842
[01:30.100 --> 01:32.340]  to 2,046,840
[01:32.340 --> 01:38.160]  to 2,046,840
[01:38.160 --> 01:47.880]  The American Screw Company was responsible for devising a means of manufacturing the Life FAQ function of Phillips hard come to proving praise by itsọi
[01:47.880 --> 01:50.960]  and licensed their method.
[01:50.960 --> 01:53.580]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:53.580 --> 01:55.500]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:55.500 --> 01:56.000]  Other screw makers of the 1930s dismissed the Phillips concept since...
[01:56.000 --> 02:02.240]  since it calls for a relatively complex, recessed socket shape in the head of the screw,
[02:03.080 --> 02:08.160]  as distinct from the simple milled slot of a slotted-type screw.
[02:08.740 --> 02:17.740]  The Phillips Screw Company and the American Screw Company went on to devise the posidrive screw,
[02:18.420 --> 02:25.260]  which differs from the Phillips in that it is designed to accommodate greater torque than the Phillips.
[02:26.000 --> 02:32.920]  An image accompanied this article, captioned, Phillips Screw Head.
[02:34.400 --> 02:39.440]  The following is an infobox which accompanies this article.
[02:40.660 --> 02:45.660]  Infobox, part of the series on screw drive types.
[02:47.160 --> 02:51.560]  Slotted, commonly, erroneously, flathead.
[02:52.820 --> 02:55.220]  Phillips, crosshead.

Anyway, we can't make any conclusions based on this single case, so will merge this for now and see what people report.


Edit: ran one more example with the OG whisper and this one even produces wrong characters (starts at 01:27.220):

$ ▶ whisper tests/es-0-16khz.wav --model large
/opt/homebrew/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: Spanish
[00:00.000 --> 00:06.720]  Hola, ¿cómo están todos? Mi nombre es Julián Birrueta Mendoza y en este podcast les vengo
[00:06.720 --> 00:11.780]  a hablar sobre la contaminación del agua. Bueno, empezaré por decir que el ser humano
[00:11.780 --> 00:16.840]  no está midiendo las consecuencias de sus actos. No hay duda que uno de los mayores
[00:16.840 --> 00:21.060]  problemas a los que se enfrentan muchas poblaciones actualmente es la contaminación del agua.
[00:22.740 --> 00:27.220]  Principalmente porque, como bien sabemos, el agua prácticamente es fundamental para
[00:27.220 --> 00:31.340]  la vida, por lo que la contaminación puede ser algo muy negativo para el desarrollo
[00:31.340 --> 00:36.900]  tanto económico como social de los pueblos o de las poblaciones próximas en ese lugar
[00:36.900 --> 00:41.500]  contaminado. Los comienzos de la contaminación, como lo
[00:41.500 --> 00:46.100]  definen muchos expertos en la materia, la contaminación del agua es causada por las
[00:46.100 --> 00:50.760]  actividades humanas. Es un fenómeno ambiental de importancia, el cual se comienza a producir
[00:50.760 --> 00:56.040]  desde los primeros intentos de industrialización para transformarse luego en un problema tan
[00:56.040 --> 00:57.040]  habitual como generalización.
[00:57.220 --> 01:03.340]  Generalmente, la contaminación del agua se produce a través de la introducción directa
[01:03.340 --> 01:11.340]  o indirecta en los acuíferos o cauces de agua, ríos, mares, lagos, océanos, etcétera,
[01:11.340 --> 01:15.180]  o de diversas sustancias que pueden ser consideradas como contaminantes.
[01:15.180 --> 01:22.580]  Pero existen dos formas principales de contaminación del agua. Una de ellas tiene que ver con la
[01:22.580 --> 01:27.200]  contaminación natural del agua, que se corresponde con el ciclo natural de esta contaminación.
[01:27.220 --> 01:29.440]  El régimen de contaminación es basicamente並bada sobre la contaminación y su contenido
[01:29.440 --> 01:33.020]  es declarado como contaminante como un tipo de fuente asiática que dañaría la血as
[01:33.020 --> 01:57.020]  de включar o envianges y reducir la bud
[01:57.220 --> 02:04.200]  Bueno amigos, yo los invito a que no contaminen el agua y que sepan cuidar la naturaleza.
[02:05.100 --> 02:08.840]  Los saluda su buen amigo y compañero Julián Virreta.
[02:10.040 --> 02:10.460]  Nos vemos.

Not sure if I'm doing something wrong - would be helpful if people can confirm this.

@bobqianic bobqianic mentioned this pull request Nov 7, 2023
@ggerganov
Copy link
Owner Author

ggerganov commented Nov 7, 2023

I cannot push to HuggingFace - any idea what is wrong?

git push
batch response: Authorization error. B | 0 B/s                                                                                                                                                                                                                                             
Uploading LFS objects:   0% (0/2), 0 B | 0 B/s, done.
error: failed to push some refs to 'https://huggingface.co/ggerganov/whisper.cpp'

I have created a write access token and have used huggingface-cli login, but it keeps rejecting

Edit: this fixed the issue https://discuss.huggingface.co/t/cant-push-to-new-space/35319/24

Pushing the new model to https://huggingface.co/ggerganov/whisper.cpp

@ggerganov ggerganov merged commit 2cdfc4e into master Nov 7, 2023
72 of 73 checks passed
@neurostar
Copy link

Thanks for quickly supporting v3 model!
Tested in m2 macbook air, works great with metal (fp16, q5)
coreml conversion do not work yet. huggingface transformers need to updated (they just commit patch).

@LeiHao0
Copy link

LeiHao0 commented Nov 7, 2023

I found the same that v3 repeats more than v2, even with VAD audios.

@arabcoders also mentioned it duplicates more in Japanese under the official whisper repo in here.
openai/whisper#1762
It seems something wrong inside the model, and hope it can be fixed soon. Currently I rollback to v2.

Again, thank you for the quick support on v3 model and keep the v2 as well. This is the only one whisper project that works like a charm on my Mac Studio with Metal/MPS and even CoreML enabled!

vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
@monk1337
Copy link

monk1337 commented Nov 7, 2023

Awesome, thanks for quick support ;)

@emcodem
Copy link

emcodem commented Nov 8, 2023

Also in my first tests i found that, V3 large repeats or hallucinates a LOT more than V2. Not sure if it was a good idea that the V3 model is now the default large model - at least not without the obviously needed changes that mitigate the new repetitions

felrock pushed a commit to felrock/whisper.cpp that referenced this pull request Nov 18, 2023
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
* whisper : add support for large v3

* bench : fix build + fix go bindings

* bench : fix n_mels

* models : update readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants