Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed caption export tool #633

Closed
oyvindln opened this issue Apr 19, 2021 · 10 comments
Closed

Closed caption export tool #633

oyvindln opened this issue Apr 19, 2021 · 10 comments
Labels
enhancement ld-decode-tools An issue only affecting the ld-decode-tools

Comments

@oyvindln
Copy link
Contributor

It would be useful to be able to export the closed captions to a subtitle format, maybe with ld-extract-metadata, not sure if this is out of scope for this project or not (unless there is actually such a feature somewhere already but couldn't find it).

@atsampson
Copy link
Collaborator

Yes, that sounds like a job for ld-extract-metadata to me.

@simoninns simoninns added enhancement ld-decode-tools An issue only affecting the ld-decode-tools labels Dec 12, 2021
simoninns added a commit that referenced this issue Dec 12, 2021
closed caption data from the input metadata into a plain text file.
@simoninns
Copy link
Collaborator

I've done some initial work on this but, like everything laserdisc, it's not a straight-forward problem. The CC data includes lots of commands that tell the subtitling hardware what to do; like scroll up, move cursor, go back, etc, etc. None of this is readily represented in a textual output.

So, for now, I've added code which strips most commands except a few which are output as spaces or new-lines. Testing this on my Cinderella LD it produces a pleasing text file that's very readable.

Command line is something like:

ld-export-metadata --closed-captions /home/sdi/Decodes/cc.txt /home/sdi/Decodes/cinder/cindys1.tbc.json

Where cc.txt is the output plain-text file.

Of course, this isn't even close to what a subtitle file probably should look like so I'm open to suggestions as to what the correct format should be (along with links to the formatting standard). Once that's agreed, we can work it towards that.

Please test and then make some suggestions! (Remember that, thanks to American lawyers, subtitle files are auto-copyright and belong to the owner of the motion picture, so don't include any as examples without randomizing a bit).

simoninns added a commit that referenced this issue Dec 12, 2021
ld-export-metadata can now copy closed caption data from the input metadata into a plain text file.
@Gamnn
Copy link
Contributor

Gamnn commented Dec 12, 2021

https://github.com/CCExtractor/ccextractor might be a good resource for this.

@simoninns
Copy link
Collaborator

Just checked the git wiki for that project and the only documentation is for building the project - the format description pages are blank... any other suggestions that have documentation - or do you know of input format documents for the link?

@Gamnn
Copy link
Contributor

Gamnn commented Dec 13, 2021

There were some docs in https://github.com/CCExtractor/ccextractor/tree/master/docs

But from there I followed some links to http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML

The .scc file format which it describes seems to be the standard for closed captions.

@simoninns
Copy link
Collaborator

Thanks; the .SCC description makes a lot of sense - since the current code is pulling the text and commands out of the visible line data it should be possible to do - the time-code extraction seems to be the odd bit; the spec doesn't really seem to say why you line-break and insert a new code... I'll have a play.

@atsampson
Copy link
Collaborator

There's also .srt format if you want something really low-tech.

@simoninns
Copy link
Collaborator

I think SCC is the way to go, since it doesn't add the need to translate the CC into captions - looking at @Gamnn's links you output SCC (which is basically timecoded CC raw-data) and then the CCExtractor tool converts them into other formats like .srt.

The CC protocol is pretty complex - so letting another tool interpret it is a lot less work I think.

Anyway; I'll give the SCC stuff a go and see if I can use CCExtractor to get something useful from it... otherwise I can make a simple output and drop it as .srt.

@simoninns
Copy link
Collaborator

ok, I've removed the raw text output and replaced it with an SCC formatted output. The included timecodes are relative to the input video file (i.e. they are calculated from the field number rather than the VBI timecode/frame number). I'd guess that adding both relative and VBI-based timecode output would be a good future feature though.

Turns out the CCExtractor is not the right tool for using the SCC file. I found ttconv which happens to have a nice web-based UI too:

https://ttconv.sandflow.com/

if you take the .scc output from ld-export-metadata and set ttconv to output .srt it gives you a nice VLC compatible file back:

Screenshot from 2021-12-13 15-54-02

...and VLC seems to be totally happy with the result:

Screenshot from 2021-12-13 15-41-49

The code is simple right now, but I have limited test discs with CC; so please test.

@simoninns
Copy link
Collaborator

Added wiki documentation:

https://github.com/happycube/ld-decode/wiki/Working-with-subtitles

Closing this issue as complete now. Please test and report any suggestions/bugs/problems as new issues. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ld-decode-tools An issue only affecting the ld-decode-tools
Projects
None yet
Development

No branches or pull requests

4 participants