Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

info on storing/using font files in attachments #115

Merged
merged 10 commits into from Jan 30, 2022
Merged

Conversation

dericed
Copy link
Contributor

@dericed dericed commented Mar 18, 2017

No description provided.

@dericed
Copy link
Contributor Author

dericed commented Mar 18, 2017

The practice of storing fonts in mkv seems to be community knowledge but there's almost no hint of the practice in the specs. This adds some expectations about that here. IIUC a font may or may not used AttachmentLink.

Some questions:

  • does this feature anticipate constraints on the used font? format of font? restrictions of mime-type?
  • Is my draft correct that AttachmentLink is not needed if the font format calls the attachment by name?
  • If the local system and the attachments but contain a font of the same name that a subtitle track uses, but there is no AttachmentLink, then what should be used: local or attachment?

@mbunkus
Copy link
Contributor

mbunkus commented Mar 18, 2017

For the record: mkvmerge doesn't link attachments to tracks at the moment. I don't know of software that does.

To your questions:

  1. There are no constraints. I don't think we've ever talked about this, but I'd have voted not to pose constraints and to keep it flexible if we had.
  2. Yes, the name should be enough, and that's in fact how players operate. Note that anything but the name wouldn't work anyway, as subtitle tracks can refer to more than just one font. Therefore the decision which font (and thereby which attachment) to use must be placed inband, inside the blocks (block group/simple block), not just as a static link in the attachment.
  3. I'd say that the attachment should have priority; the system font should only be used as a fallback. That offers the creator the most control over the presentation.

attachments.md Outdated Show resolved Hide resolved
attachments.md Outdated Show resolved Hide resolved
@dericed
Copy link
Contributor Author

dericed commented Mar 19, 2017

So then when is AttachmentLink intended to be used?

@dericed dericed force-pushed the attached-fonts branch 2 times, most recently from ecd5d4f to 74b7b53 Compare March 19, 2017 16:06
@dericed
Copy link
Contributor Author

dericed commented Mar 19, 2017

So far I could only find 2 files that use AttachmentLink, but in both cases it is invalid since the element is storing the value 0 and each file contains no attachments.

http://www.archive.org/download/scope2010/scope.mkv
http://www.archive.org/download/mohammed1988/ABDUL.MAJID.ALZINDANI.mkv

Both use libDivXMediaFormat 3.0.0.0386 as a MuxingApp.

@mbunkus
Copy link
Contributor

mbunkus commented Mar 19, 2017

To be honest I don't have a good example/use case for AttachmentLink. I guess that it could be used to determine which attachments have to be kept and which can be dropped when you re-mux a file skipping one or more tracks (as it's a link from the track headers to the attachments; so a re-muxer can throw away all attachments that aren't referenced via AttachmentLink elements in at least one of the tracks that are kept).

@robUx4
Copy link
Contributor

robUx4 commented Apr 26, 2017

Yes, the AttachmentLink is mostly for remuxing purposes. If it's not used at all now we could drop it. It's only useful if people actually used it a lot.
As for the name to use for the font there's the filename and the font name (Comic Sans vs lame.ttf). I think the name used in SSA/ASS is the font name, not the filename. And the attachment only has the FileName mandatory. Maybe the Description should be mandatory for fonts (again, how is it used now) and have the actual font name. On the other hand it's probably hard for muxers that would need to know how to parse TTF, OTF, etc formats...

@dericed
Copy link
Contributor Author

dericed commented May 13, 2017

If an MKV file has an attached font and an SRT subtitle track (which makes to reference to the font name), then should what is the player recommended to do?

Since AttachmentLink has almost no in the wild use, I can remove some of the language about it in my pull request, but on the other hand it seems difficult to associate fonts and subtitle tracks when the filename/fontname isn't available in both.

@mbunkus
Copy link
Contributor

mbunkus commented May 14, 2017

I think what actual players do, at the moment, is to extract all attached fonts to the file system somewhere, register them with the OS, play back the file (at that point the font names will be known to the OS), and remove the temporarily stored fonts afterwards. The assumption is that if a font is attached, it is highly likely that it's used in the subtitles.

There doesn't seem to be a real need for a strong link between track and attachments at the moment.

@retokromer
Copy link
Contributor

retokromer commented May 14, 2017

Yes, for subtitles and, very rarely (I guess we did it only once and not in Matroska), for intertitle versions in different languages.

attachments.md Outdated Show resolved Hide resolved
Copy link
Contributor

@robUx4 robUx4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify the use of AttachmentLink

@robUx4
Copy link
Contributor

robUx4 commented May 26, 2017

Updated to remove references to AttachmentLink, we should mark it as deprecated.

robUx4 added a commit that referenced this pull request May 26, 2017
Following the discussions on #115
@robUx4 robUx4 mentioned this pull request May 26, 2017
@dericed
Copy link
Contributor Author

dericed commented May 26, 2017

LGTM

@dericed
Copy link
Contributor Author

dericed commented May 26, 2017

LGTM, but I don't understand why there were originally upper limits of 999.9 and 9999.9.

@robUx4
Copy link
Contributor

robUx4 commented May 27, 2017 via email

@retokromer
Copy link
Contributor

LGTM

attachments.md Outdated Show resolved Hide resolved
@robUx4 robUx4 added the spec_main Main Matroska spec document target label Dec 18, 2018
@retokromer
Copy link
Contributor

What is the status on this?

(FYI: In the meantime I found different alphabets for subtitles in different Indian languages. And there are also top to bottom "subtitles", in addition to the left to right and right to left ones.)

@mcr mcr self-assigned this Mar 31, 2020
Copy link
Contributor

@mbunkus mbunkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider the wording here to be very ambiguous, as there's talk about "fonts stored in Attachments that match the names…". Is this talking about a font name? An Attachment name? The file name or the logical name encoded somewhere within the font file?

Personally I'd rather see a section go into detail and describe the possibilities here:

Depending on the font format in question, each font file can contain a name which will be referred to as Font Name from now on. This Font Name can be different than the Attachment's FileName, even when disregarding the extension.
In order to select a font for display, a Matroska Reader SHOULD consider both the Font Name and the base name of the Attachment's FileName, preferring the former when there are multiple matches. If none of the Attachments are a match, the Matroska Reader SHOULD attempt to find a system font whose Font Name matches the one used in the subtitle track.

In addition to different wording, we might also have to state that the procedure should only be used for certain MIME types — so that we don't expect players to try to interpret every attachment as a font. Font MIME types are slightly problematic, though, as official MIME types for fonts haven't been around that long, and a lot of players only support the unofficial ones. For that reason mkvmerge is still using the old and non-standard application/x-truetype-font. See also this MKVToolNix FAQ entry.

@mbunkus
Copy link
Contributor

mbunkus commented May 30, 2021

For that reason mkvmerge is still using the old and non-standard application/x-truetype-font.

This is wrong and only due to a bug. It uses the proper font/… types with up-to-date libmagic.

@robUx4 robUx4 self-requested a review August 27, 2021 06:00
@robUx4 robUx4 force-pushed the attached-fonts branch 2 times, most recently from 39bc9ef to da03658 Compare August 27, 2021 06:59
@robUx4
Copy link
Contributor

robUx4 commented Aug 27, 2021

I rewrote the last paragraph to

  • not assume the font name is stored in attachments, in fact it's not. A font file may contain more than one font variant ?
  • allow restricting font loading to fonts with an AttachmentLink, if any is found (usually not).
  • file fonts have priority over system fonts (they may have the same name but different content)
  • list MIME types that should be interpreted as fonts

That should address @mbunkus 's issue in #115 (review).

@Liisachan
Copy link

It seems that application/x-truetype-font is also sometimes used for font/otf, not only for font/ttf. Not sure exactly what writing apps did/do this, though.

The situation is rather confusing and it may be helpful to document the historical background - why a player may need to handle various mimetypes if it wants to support embedded fonts. Two basic points are:

  1. The standard top-level mimetype font is relatively new, defined in 2017. The standard itself was being unstable for a while (hence e.g. application/font-sfnt).
  2. Matroska has existed since 2003 and font-embedding in Matroska was already practical as early as in 2004. Back then, a Matroska Team member explicitly asked us to use application/x-truetype-font https://forum.doom9.org/showthread.php?s=&threadid=72086

So, nearly 100% of font-embedded MKVs created between 2004 and 2020-ish use legacy mimetype(s); not because they are "broken" but simply because they were created before the standard was established.

Btw, it's true that a single font file may have multiple font names. (1) Obvious for TTC. (2) Font Family name vs. Full font name: e.g. "Times New Roman" + {\b1} and "Times New Roman Bold" may work identically in SSA/ASS. (3) A CJK font generally has a CJK name and an ASCII name, e.g. SimHei (ASCII font name) = 黑体 (Chinese font name).

If the font name(s) of an attachned font file are necessary for some reason, one has to look into the attached file itself (the 'name' table of TTF) - because font names are not stored as Matroska elements, and the font file name can be quite random (e.g. lucon.ttf -> Lucida Console).

dericed and others added 9 commits October 26, 2021 10:10
…ated as it's not used.

And since it's main goal was to remove attachments when removing tracks that reference them.
* allow restricting font loading to fonts with an AttachmentLink, if any is found (usually not).
* file fonts have priority over system fonts (they may have the same name but different content)
and what a writer should do

According to the findings from #518
and mention "font/otf" for "application/x-truetype-font"
+ don't assume to use a font it has to be installed (VLC doesn't)
@robUx4
Copy link
Contributor

robUx4 commented Oct 26, 2021

Personally I'd rather see a section go into detail and describe the possibilities here:

I integrated @mbunkus text into the paragraph and reordered things. Now we have a proper Font Name definition. (although it says we may also use the filename)

@robUx4 robUx4 requested a review from mbunkus October 26, 2021 08:12
@Liisachan
Copy link

dc90789
An official MIME type application/font-sfnt is missing
https://www.iana.org/assignments/media-types/application/font-sfnt
which is DEPRECATED in favor of font/sfnt but still valid. Around 2019, mkvmerge used this exotic mime type on Mac (thanks to libmagic).

@robUx4
Copy link
Contributor

robUx4 commented Nov 14, 2021

I added application/font-sfnt and application/font-woff.

I did not add application/font-tdpfr which seems web only and rare. We can't really "recommend" people to support it. The " TrueDoc Portable Font Resource" format doesn't seem that common and don't seem to be good candidates for embedding in a file

A TrueDoc Portable for resource (or PFR) is a platform independent scalable font object which is produce by a character shape player. Input may be either TrueType or Type 1 of any flavor on either Windows, Mac, or Unix. TrueDoc Portable Font Resources provide good compression ratios, are platform independent, and because they are not in an native font format (TrueType or Type 1) they can not be easily installed.

It is in fact a web-only thing that required special browser/plugin support.

@Liisachan
Copy link

I added application/font-sfnt and application/font-woff.

I did not add application/font-tdpfr which seems web only and rare. We can't really "recommend" people to support it. The " TrueDoc Portable Font Resource" format doesn't seem that common and don't seem to be good candidates for embedding in a file

A TrueDoc Portable for resource (or PFR) is a platform independent scalable font object which is produce by a character shape player. Input may be either TrueType or Type 1 of any flavor on either Windows, Mac, or Unix. TrueDoc Portable Font Resources provide good compression ratios, are platform independent, and because they are not in an native font format (TrueType or Type 1) they can not be easily installed.

It is in fact a web-only thing that required special browser/plugin support.

Thanks, that makes sense. Actually, font/woff and font/woff2 added in dc90789 are also "web-font" formats, not (yet?) supported either afaik by any media players, although they might be supported in the future to make attachments smaller. Given that no players are supporting them at all, "SHOULD support" may be realistically a bit too strong for woff, woff2.

That said, since SHOULD is not MUST, technically that is not a big problem and everything is looking good to me now. Thanks very much for carefully having documented this confusing, complicated situation. Really appreciated, and this should be really helpful for various developers. E.g. see https://trac.ffmpeg.org/ticket/9419

@robUx4 robUx4 dismissed mbunkus’s stale review January 30, 2022 08:08

The changes have been integrated and the list of MIME types mentioned, application/x-truetype-font being valid for font/ttf and font/otf

@robUx4 robUx4 merged commit 36ab550 into master Jan 30, 2022
@robUx4 robUx4 deleted the attached-fonts branch January 30, 2022 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarifications spec_main Main Matroska spec document target
Projects
Development

Successfully merging this pull request may close these issues.

documentation handling font files in Matroska
6 participants