Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issues with large binary files #11

Merged
merged 1 commit into from Feb 17, 2021

Conversation

MartinBonner
Copy link

Specifically, pack and unpack PropWords properties with list
comprehensions rather than appending each word to a bytes object
(creating a new object each time), or to an array.

Also use struct.Struct to avoid having to compile the format millions of
times.

The changes to writing out the dtb make a factor of at least x1000 when
writing out a 20Mb file ("at least" because I lost patience). The
changes to reading in are much less dramatic - but it's still at least
x3. I can read in and write out the 20Mb file in about 1.5s now.

Specifically, pack and unpack PropWords properties with list
comprehensions rather than appending each word to a bytes object
(creating a new object each time), or to an array.

Also use struct.Struct to avoid having to compile the format millions of
times.

The changes to writing out the dtb make a factor of at least x1000 when
writing out a 20Mb file ("at least" because I lost patience).  The
changes to reading in are much less dramatic - but it's still at least
x3.  I can read in and write out the 20Mb file in about 1.5s now.
@coveralls
Copy link

coveralls commented Feb 17, 2021

Pull Request Test Coverage Report for Build 42

  • 5 of 5 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.01%) to 86.152%

Totals Coverage Status
Change from base Build 40: -0.01%
Covered Lines: 871
Relevant Lines: 1011

💛 - Coveralls

@@ -41,11 +42,10 @@ def new_property(name: str, raw_value: bytes) -> object:
elif len(raw_value) and len(raw_value) % 4 == 0:
obj = PropWords(name)
# Extract words from raw value
for i in range(0, len(raw_value), 4):
obj.append(unpack(">I", raw_value[i:i + 4])[0])
obj.data = [BIGENDIAN_WORD.unpack(raw_value[i:i + 4])[0] for i in range(0, len(raw_value), 4)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two changes here. Firstly, use BIGENDIAN_WORD.unpack so that we don't have to compile the format a few million times. If we use Struct at all, this cannot be controversial (it speeds up reading big files by x2).

Secondly, rather than use .append on what we know is an initially empty list, construct the list with a single comprehension and then assign it. It could be argued that this is breaking the encapsulation of PropWords - if so, I can write a mutator for PropWords which takes the list and assigns it (it's worth another x2 on performance).

return obj

elif len(raw_value) and len(raw_value):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a necessary part of the change - it just seemed very odd, so I cleaned it up.

@@ -342,8 +342,7 @@ def to_dtb(self, strings: str, pos: int = 0, version: int = Header.MAX_VERSION):
strpos = len(strings)
strings += self.name + '\0'
blob = pack('>III', DTB_PROP, len(self.data) * 4, strpos)
for word in self.data:
blob += pack('>I', word)
blob += bytes().join([BIGENDIAN_WORD.pack(word) for word in self.data])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the critical change - it makes the write operation complete in a small number of seconds rather than at least 10's of minutes.

@molejar molejar merged commit 310bb49 into molejar:master Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants