-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.ass file using UTF-16LE encoding with a BOM cannot be detected #11482
Comments
Can be reproduced, it seems that the file decoding failed. The UTF-8 encoding with a BOM will also fail. Update: When I try to load the subtitle using ffplay, it also fails. so it's most likely an ffmpeg issue and should be reported there. |
If use This appears to be an issue with FFmpeg’s ass_probe, which rejects all files that don’t start with optionally preceded. |
After that, I tried a dirty patch to fix the issue in FFmpeg, which can also fix another related issue #2846 From 4e88f228ec52d84fa84cfb892da51911b100150e Mon Sep 17 00:00:00 2001
From: dyphire <qimoge@gmail.com>
Date: Wed, 27 Sep 2023 22:16:48 +0800
Subject: [PATCH] avformat/assdec: more lenient probing
---
libavformat/assdec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavformat/assdec.c b/libavformat/assdec.c
index 6b37af15e8..e2ad3f9a34 100644
--- a/libavformat/assdec.c
+++ b/libavformat/assdec.c
@@ -34,7 +34,7 @@ typedef struct ASSContext {
static int ass_probe(const AVProbeData *p)
{
- char buf[13];
+ char buf[1000];
FFTextReader tr;
ff_text_init_buf(&tr, p->buf, p->buf_size);
@@ -43,7 +43,7 @@ static int ass_probe(const AVProbeData *p)
ff_text_read(&tr, buf, sizeof(buf));
- if (!memcmp(buf, "[Script Info]", 13))
+ if (strstr(buf, "[Script Info]") || strstr(buf, "[V4 Styles]") || strstr(buf, "[V4+ Styles]"))
return AVPROBE_SCORE_MAX;
return 0;
-- test build for windows: https://github.com/dyphire/mpv-winbuild/actions/runs/4493477864 I'm not satisfied with the way this patch works, but it's surprising that this issue has been around for over 7 years and still hasn't been fixed in FFmpeg. |
I reported this to trac https://trac.ffmpeg.org/ticket/10583 While the above diff should work, ffmpeg already has code for stripping bom marks so that patch is unlikely to be accepted upstream since it doesn't address the root cause of the issue. |
There are no bugs, this file just has a double BOM. I've verified that the first 4 bytes are There are no bugs, however it might be possible to hack lavf to load these types of files, but I'd prefer not to submit such a patch. Are there a ton of files like this?
This is not true, I tested utf-8, utf16be and utf16le BOMs they all work correctly. You can also work around it like so
|
In fact, there are many subtitles files with this issue, and I have also seen feedback elsewhere. However, what I am more concerned about is the issue #2846 |
Even if you make ffmpeg probe such broken files correctly, it's pointless because libass doesn't know how to parse PlayResX/PlayResY without the section header. So your subtitles will appear broken anyway. I could make such a change to lavf if libass became dumber in parsing like VSFilter, but currently there's not much point.
Hopefully it gets merged lol https://ffmpeg.org/pipermail/ffmpeg-devel/2023-September/314767.html Also if you have a sample of a broken UTF-8 BOM file it'd be helpful /cc @TheOneric |
The broken UTF-8 BOM file was only encoded from the example file, so it should be the same bug.
As mentioned in the comments above libass/libass#325 (comment), libass’ own whole-file parser handles files without This is an ffmpeg quirk rather than a libass issue. There are thousands or more of these subtitle files on the internet (I suspect they were generated by similar scripts), and it seems foolish to not attempt to fix this compatibility issue. |
It can detect that it's an ass file, and playback dialogue correctly. However it won't read options without a header, unlike VSFilter. You just get this message after fixing the ffmpeg bug, making the whole thing kind of pointless
|
I don't think this is completely meaningless. If the ass tag is not used in these subtitles (It is likely that the srt subtitles were converted using the wrong script), then it has almost no impact.
In addition, there are extreme examples here libass/libass#325 (comment), these ass files completely lack PlayResX and PlayResY. libass and vsfilter will do the same thing. |
I'll look into this in a few days, but it'd be better if libass attempted to find "PlayResX/Y:" even if it couldn't find the "[Script Info]" section header. Of course, if there's a valid reason to not do this, then that's fair as well. |
I noticed that only a simple libass patch is needed to parse missing "PlayResX/Y:" when the "[Script Info]" header does not exist. From 4cfbbe2f70c02e156785de9bc12d124b6914bd29 Mon Sep 17 00:00:00 2001
From: dyphire <qimoge@gmail.com>
Date: Wed, 27 Sep 2023 22:48:04 +0800
Subject: [PATCH] Try to parse script properties even if missing "[Script Info]"
---
libass/ass.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libass/ass.c b/libass/ass.c
index 0679483..c0734d2 100644
--- a/libass/ass.c
+++ b/libass/ass.c
@@ -1143,7 +1143,7 @@ mem_fail:
static int process_line(ASS_Track *track, char *str)
{
skip_spaces(&str);
- if (!ass_strncasecmp(str, "[Script Info]", 13)) {
+ if (!ass_strncasecmp(str, "[Script Info]", 13) || !ass_strncasecmp(str, "ScriptType:", 11)) {
track->parser_priv->state = PST_INFO;
} else if (!ass_strncasecmp(str, "[V4 Styles]", 11)) {
track->parser_priv->state = PST_STYLES;
-- |
The following is an example file:
[WLGO&PPX][Itazura_na_Kiss][DVDRip][05][x264_AAC][880x480][FB7AD9EF].gb.zip
After saving as a file using e.g. UTF-8 encoding, the problem disappears, but it'll be more convenient if UTF-16LE with a BOM is supported.
The text was updated successfully, but these errors were encountered: