"Субтитры подогнал «Симон»" - dirty datasets for Russian subtitles. #2131
LocalVoidPictures
started this conversation in
General
Replies: 1 comment 1 reply
-
large-v2 has much lesser hallucinations than the original large model please try if it does not help , you need to research a bit on hallucinations , there are several posts on the subject. p.s. do the 7 minutes in the video include any speech , or just silence or music ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On multiple occasions, I have received "nonsensical" results, such as the one in the title (read literally - "Subtitles by Simon"). You can frequently find this line by doing a simple online search. This phrase has nothing in common with actual transcription. Including it in the dataset results in wildly incorrect results.
Funnily enough, I'm only getting this when using a large model, not medium or tiny.
Where do we report this?
Here's a sample of what I'm getting sometimes:
Beta Was this translation helpful? Give feedback.
All reactions