-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parsing for iTunes metadata atoms #182
Conversation
also write a helper for bool
/// is parsed. | ||
#[derive(Debug, Default, Clone)] | ||
pub struct UserdataBox { | ||
pub meta: Option<MetadataBox> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While calling the udta
box userdata
in the MediaContext
is probably a reasonable decision, I'm not sure if the field for the nested meta
box should be named metadata
, since "metadata" can be ambiguous between track metadata, and metadata tags relating to the media the file or stream contains. Since the name of the box itself is meta
, and it's a relatively self-explanatory FourCC, keeping it as meta
makes sense, but naming it metadata
would also be more consistent.
If you have an opinion either way, would be happy to change it.
@kinetiknz could I get a review please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This generally looks good; a few nits and one important issue/query to resolve.
mp4parse/src/lib.rs
Outdated
Ok(()) | ||
} | ||
|
||
fn read_bool_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Option<bool>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename these new read_* additions to read_ilst_* or move them inside read_ilst to make it clear they're ILST-specific.
mp4parse/src/lib.rs
Outdated
fn read_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Vec<u8>> { | ||
// Skip past the padding bytes | ||
skip(&mut src.content, src.head.offset as usize)?; | ||
let size = src.content.limit() as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space after =.
mp4parse/src/lib.rs
Outdated
) | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop double newlines here and below.
mp4parse/src/lib.rs
Outdated
} | ||
|
||
|
||
fn read_multiple_u8_data<T: Read>(src: &mut BMFFBox<T>) -> Result<Vec<Vec<u8>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could make most of read_multiple_u8_data and read_u8_data common.
mp4parse/src/lib.rs
Outdated
@@ -773,6 +947,11 @@ fn read_moov<T: Read>(f: &mut BMFFBox<T>, context: &mut MediaContext) -> Result< | |||
debug!("{:?}", pssh); | |||
vec_push(&mut context.psshs, pssh)?; | |||
} | |||
BoxType::UserdataBox => { | |||
let udta = read_udta(&mut b)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My primary concern with these additions is that the main user of mp4parse-rust (Gecko) is exposed to new forms of parse failure if invalid udta (or children) are encountered. We don't want to reject MP4s we previously accepted due to content Gecko isn't interested in.
I think a simple solution to this is to make MediaContext::userdata a Result<Option<Userdata>>
and avoid bubbling an error result up from here instead. That does make the MediaContext API slightly odd, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Result<Option<UserdataBox>>
, I elected to make it Option<Result<UserdataBox>>
instead, since Result<Option<_>>
doesn't implement Default
.
I considered just ignoring the error and returning None
if udta parsing fails, so as to not introduce any weirdness into the MediaContext API, but that seems to me that it would conflate the lack of an udta atom with a corrupt udta atom.
Also make use of read_ilst_multiple_u8_data in read_ilst_u8_data
47f8d95
to
275b1e0
Compare
Thanks, looks good! |
With PR mozilla#182, mp4parse will now attempt to read metadata, including potential large items like cover art. With read_buf's previous limit of 1MB, this would be fairly trivial to hit with a large cover art image. This addresses a parsing failure reported in BMO 1609573. In the event that 10MB is still too restrictive, it probably makes sense to audit and split read_buf uses into small and large size classes.
With PR #182, mp4parse will now attempt to read metadata, including potential large items like cover art. With read_buf's previous limit of 1MB, this would be fairly trivial to hit with a large cover art image. This addresses a parsing failure reported in BMO 1609573. In the event that 10MB is still too restrictive, it probably makes sense to audit and split read_buf uses into small and large size classes.
Fixes #150.
Metadata in .MP4 files can be stored in either iTunes-style, or 3GPP-style formats. The most commonly found today is the iTunes-style format, which this Pull Request implements parsing for. This is useful for mp4parse to serve as a Rust library that can read these tags off MP4 audio and video files, and can be used as a foundation for a pure-Rust TagLib-like library, along with other metadata-parsing libraries available on Cargo.
iTunes metadata is stored within a Metadata
meta
atom within a root-level User dataudta
atom. While thismeta
atom may contain other atoms such askeys
, we are mostly concerned with theilst
List atom, which holds a list of various named atoms that represent metadata.Each metadata box or entry within
ilst
has a unique key and is parsed as a full box, within which adata
atom that contains the actual data, in either a string or binaryu8
format, is contained. Each metadata entry may contain only onedata
atom, with the exception of the cover art albumcovr
, which may have multipledata
atoms that each contain JPEG or PNG data. With the exception of©lyr
andldes
, string metadata tags are limited to 256 characters in length, while binary metadata often have idiosyncratic formats that must be special-cased.Parsing for all if not most common tags have been implemented, with the exception of the iTunes specific iTMS
xxID
entries, due to lack of documentation on these tags.The
udta
andmeta
boxes are not exposed in the C API, since Firefox does not have a current need to use these atoms. They are currently only exposed in the Rust API.This public domain picture of a black hole is embedded within
metadata.mp4
andmetadata_gnre.mp4
for testing thecovr
atom. The test files themselves are just copies ofminimal.mp4
with a bunch of tags added to them.A huge thanks to the AtomicParsley project for documentation on the format of individual tag boxes.