feat: PublishSlides: reduced memory consumption #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses memory consumption on pptx PublishSlides.
PptxContent copy routine uses media duplicate checking. Images/media were loaded in memory all the time (as a cache, the content was copied to a byte array) and all media were compared against each other on cache lookup (byte by byte comparison). Now, SHA256 hash is computed on media content and stored in memory (256 bits per media data), so cache lookup is based on content type + hash. This optimization removes media byte content from memory and improves cache lookup speed (but needs to compute a cache).
This improvement also affects all operations where CopyMedia/Images are involved (not only PublishSlides).
Rough perf stats on my 1.8 GB pptx file (tested via dotMemory):
Peak memory is retrieved via
Process.GetCurrentProcess().PeakWorkingSet64