Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise SRTParseError(expected_start, actual_start, unmatched_content) #82

Closed
sevospl opened this issue Jun 9, 2020 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@sevospl
Copy link

sevospl commented Jun 9, 2020

Using Python 3.6.9 and the latest version of ffsubsync. Subtitles downloaded using subliminal. Subtitles are in Polish which usually means UTF-8. It works sometimes but it fails sometimes too.

One example:

Traceback (most recent call last): File "/root/subsync/bin/ffsubsync", line 8, in <module> sys.exit(main()) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/ffsubsync.py", line 261, in main return run(args) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/ffsubsync.py", line 117, in run for scale_factor in framerate_ratios File "/root/subsync/lib/python3.6/site-packages/ffsubsync/ffsubsync.py", line 117, in <listcomp> for scale_factor in framerate_ratios File "/root/subsync/lib/python3.6/site-packages/ffsubsync/sklearn_shim.py", line 212, in fit Xt, fit_params = self._fit(X, y, **fit_params) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/sklearn_shim.py", line 177, in _fit **fit_params_steps[name]) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/sklearn_shim.py", line 368, in _fit_transform_one res = transformer.fit_transform(X, y, **fit_params) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/sklearn_shim.py", line 40, in fit_transform return self.fit(X, **fit_params).transform(X) File "/root/subsync/lib/python3.6/site-packages/ffsubsync/subtitle_parser.py", line 107, in fit raise exc File "/root/subsync/lib/python3.6/site-packages/ffsubsync/subtitle_parser.py", line 96, in fit start_seconds=self.start_seconds), File "/root/subsync/lib/python3.6/site-packages/ffsubsync/subtitle_parser.py", line 44, in _preprocess_subs next_sub = GenericSubtitle.wrap_inner_subtitle(next(subs)) File "/root/subsync/lib/python3.6/site-packages/srt.py", line 362, in parse _raise_if_not_contiguous(srt, expected_start, len(srt)) File "/root/subsync/lib/python3.6/site-packages/srt.py", line 387, in _raise_if_not_contiguous raise SRTParseError(expected_start, actual_start, unmatched_content) srt.SRTParseError: Expected contiguous start of match or end of input at char 0, but started at char 60576

Any idea how it can be fixed?

@sevospl sevospl added the out-of-sync Catch-all label for still out-of-sync subtitles. label Jun 9, 2020
@smacke smacke added bug Something isn't working and removed out-of-sync Catch-all label for still out-of-sync subtitles. labels Jun 9, 2020
@smacke
Copy link
Owner

smacke commented Jun 9, 2020

Hi there, thanks for filing an issue. I would need the subtitles file to investigate further, could I trouble you to link it here?

@sevospl
Copy link
Author

sevospl commented Jun 9, 2020

I just sent you the subtitle by e-mail. I hope this helps ;)

@smacke
Copy link
Owner

smacke commented Jun 9, 2020

Thank you @sevospl ! Will look into this.

@smacke
Copy link
Owner

smacke commented Jun 9, 2020

Hi @sevospl, I checked and those files look like they are not valid unicode. (Reading them using default open() / read() in python gives UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 77: invalid start byte.)

ffsubsync uses the cchardet package to automatically infer decodings, and in this case the encoding was inferred as Windows-1250. This seems to be correct as far as I can tell; it looks like the real issue is that these subtitles are in MicroDVD format and not srt. We use pysubs2 for non srt, but it's also having trouble parsing these files when I rename the extension and gives the error ValueError: could not convert string to float: 'movie info: RMVB 636x264 25.0fps 459.9 MB|/SubEdit b.4043'. So somehow pysubs2 is having trouble parsing the framerate out of the MicroDVD header; I created #84 to track.

There are some other issues as well related to ffsubsync brittleness regarding input and output subtitle formats (tracking in #83); if you are OK with using substation alpha as your synced format in the mean time, please do the following for a workaround:

  1. Rename your unsycned subtitles to end in .sub instead of .srt.
  2. Replace the first line of the .sub file with the following string: {1}{1}25
  3. Upgrade ffsubsync to grab some hotfixes I implemented while investigating this issue: pip install --upgrade ffsubsync
  4. Use substation alpha as the output format, i.e. do ffs vid.mkv -i unsynced.sub -o synced.ssa.

If you end up trying the workaround, please let me know how it works!

@sevospl
Copy link
Author

sevospl commented Jun 9, 2020

So, I tried it with another movie. I didn't even have to add the 25FPS string, it was already in there. I did what you suggested me to do: changed the name from .srt to .sub and then used ff and I got .ssa as an output. It got synced up within a second. Awesome!

So what you noticed is that subliminal "tricked" me into thinking it's a .SRT file while it was .SUB from the very beginning - and that's how ffsubsync got confused too. I tried sub -> srt instead of ssa but it didn't work out.

I guess I'll have to convert ssa to srt manually as of now. I hope it can be automatized somehow - like detecting kind of subtitles and converting (if possible) to srt. :)

Thanks for the help!

@smacke
Copy link
Owner

smacke commented Jun 9, 2020

No problem! Thanks for the useful test cases; they'll help me in the future to make ffsubsync more robust to a variety of (possibly malformed) subtitle formats. I'll go ahead and close this issue for now since we're tracking the related issues elsewhere now.

@smacke smacke closed this as completed Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants