Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing non-mp4 file as mp4 allocates way too many objects #109

Closed
lazka opened this issue Jul 4, 2014 · 12 comments
Closed

Parsing non-mp4 file as mp4 allocates way too many objects #109

lazka opened this issue Jul 4, 2014 · 12 comments
Labels

Comments

@lazka
Copy link
Member

@lazka lazka commented Jul 4, 2014

Originally reported by: Christoph Reiter (Bitbucket: lazka, GitHub: lazka)


From sidnei.d...@gmail.com on April 09, 2012 16:06:02

In the problematic case I've found, the file was mostly comprised of '\x00', which triggered a problematic case with Atom parsing.

* The atom 'length' is parsed as zero and the atom name as '\x00\x00\x00\x00'
* A new Atom object is created for every 8 bytes of the source file until it hits a non-zero length, which with some luck might seek to EOF.

In the memory dump I have, it ended up creating ~130Mb worth of Atom objects in memory, for what seems to be a ~5Mb file with mostly '\x00' in it.

Original issue: http://code.google.com/p/mutagen/issues/detail?id=109


@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 09, 2012 14:49:24

- Created file with:
  dd if=/dev/zero of=output.m4a  bs=1M  count=10

- Loaded Atoms with:
  from mutagen.mp4 import Atoms
  atoms = Atoms(open("output.m4a", "rb"))

- Dumped memory with meliae.scanner.dump_all_objects
Total 3941125 objects, 50 types, Total size = 523.1MiB (548541255 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0 1310720  33 450887680  82  82     344 Atom
     1 1314725  33  54024927   9  92    4871 str
     2 1310910  33  31461840   5  97      24 int
     3     134   0  11024656   2  9911007840 list
     4     262   0    391312   0  99   49432 dict

- Added '__slots__' to Atom:
Total 3941132 objects, 50 types, Total size = 193.1MiB (202511553 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0 1310720  33 104857600  51  51      80 Atom
     1 1314730  33  54025145  26  78    4871 str
     2 1310910  33  31461840  15  93      24 int
     3     134   0  11024656   5  9911007840 list
     4     262   0    391312   0  99   49432 dict

- Changed to ignore atoms with atom.length = 0 (not sure this is valid anyway?):
Total 9005 objects, 49 types, Total size = 1.4MiB (1450768 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0     262   2    391312  26  26   49432 dict
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 09, 2012 14:58:16

Two suggested patches.

Attachment: slots.diff ignore-zero-length-atom.diff

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 09, 2012 16:13:56

FYI, previous comment taken from: http://wiki.multimedia.cx/index.php?title=QuickTime_container#Byte_Ordering """
An atom consists of a size, a type, and a data payload. An atom is laid out as follows:

bytes 0-3    atom size (including 8-byte size and type preamble)
bytes 4-7    atom type
bytes 8..n   data

The 4 bytes allotted for the atom size field limit the maximum size of an atom to 4 GB. Quicktime also has a provision to allow atoms with 64-bit atom size fields by setting the size field 1 and adding the 8-byte size field after the atom type:

bytes 0-3    always 0x00000001
bytes 4-7    atom type
bytes 8-15   atom size (including 16-byte size and type preamble)
bytes 16..n  data

This is a logical exception since an atom always needs to be at least 8 bytes in length to account for the preamble. Therefore, if the size field is 1, load the 64-bit atom size from just after the atom type field.

If, on the other hand, the size field is 0, then the atom extends to the end of the file.
"""
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 09, 2012 16:00:57

Turns out a zero-length atom is supposed to mean 'I'm the last atom, and my length extends to EOF'. Here's an updated patch.

Attachment: zero-length-atom.diff

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reiter.christoph@gmail.com on April 10, 2012 00:20:23

- you forgot to seek back
 - zero size is only allowed for top level atoms, so you can raise an exception if that's not the case (unfortunately that info is not available, so maybe extend Atom to take a parent=None ?)
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reiter.christoph@gmail.com on April 10, 2012 04:18:17

The atom name could still be in _CONTAINERS with a zero length.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 10, 2012 04:09:54

Uhm, why seek back? The last line in Atom.__init__ will seek to self.offset + self.length, which will put it back to EOF again. I guess in the case it's a top-level container atom?

I'm wary to add a parent attribute, since that might end up causing a reference cycle (not sure that's what you mean anyway). Maybe having level=0 instead of parent=None, and passing level=self.level+1 to children?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reiter.christoph@gmail.com on April 10, 2012 04:49:16

Right, that should be changed to raise an exception.
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 10, 2012 04:30:08

Re-reading the information above, it also seems that if size is < 8 then it must be either 0 or 1, and anything else is invalid?
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From sidnei.d...@gmail.com on April 10, 2012 05:01:43

Here's an updated patch with the requested changes.

Attachment: zero-length-atom.diff

@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reiter.christoph@gmail.com on July 24, 2012 05:01:47

Changes: I removed the __slots__ again, since the creation of many objects shouldn't happen anymore and I don't want to risk breaking anyone’s code..
@lazka

This comment has been minimized.

Copy link
Member Author

@lazka lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From reiter.christoph@gmail.com on July 24, 2012 04:53:44

This issue was closed by revision r115 .

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.