-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in insert_pdf since version 1.18.15 #1351
Comments
I know there is supposed to be a minor memory leak in |
Thank you for these investgations!
More technical background:
My test script tries to simulate your use case: prepend some page to each of 100 PDFs. ================================================================================
PyMuPDF 1.18.14
Memory deltas (MB) for doc.insert_pdf('sdw_2021_3.pdf')
Method profile: insert 1 source page into 100 target PDFs.
--------------------------------------------------------------------------------
Memory delta per method exec 0.124
================================================================================
Duration: 0 sec
================================================================================
PyMuPDF 1.18.15
Memory deltas (MB) for doc.insert_pdf('sdw_2021_3.pdf')
Method profile: insert 1 source page into 100 target PDFs.
--------------------------------------------------------------------------------
Memory delta per method exec 0.125
================================================================================
Duration: 0 sec
================================================================================
PyMuPDF 1.19.1
Memory deltas (MB) for doc.insert_pdf('sdw_2021_3.pdf')
Method profile: insert 1 source page into 100 target PDFs.
--------------------------------------------------------------------------------
Memory delta per method exec 0.124
================================================================================
Duration: 0 sec The delta of 124 KB is the additionally occupied storage after a PDF has received a new front page. This number is dependent on the source and target PDFs, resp. source page(s). Key observation: there is no difference between PyMuPDF versions. I would welcome your comments. |
Thank you for your answer @JorjMcKie, |
Hi @JorjMcKie, diff --git a/fitz/fitz.i b/fitz/fitz.i
index fc5fd80..023afb0 100644
--- a/fitz/fitz.i
+++ b/fitz/fitz.i
@@ -4138,7 +4144,25 @@ if basestate:
return self
def __exit__(self, *args):
- self.close()
+ if hasattr(self, "_reset_page_refs"):
+ self._reset_page_refs()
+ if hasattr(self, "Graftmaps"):
+ for k in self.Graftmaps.keys():
+ self.Graftmaps[k] = None
+ if hasattr(self, "this") and self.thisown:
+ try:
+ self.__swig_destroy__(self)
+ except:
+ pass
+ self.thisown = False
+
+ self.Graftmaps = {}
+ self.ShownPages = {}
+ self.InsertedImages = {}
+ self.stream = None
+ self._reset_page_refs = DUMMY
+ self.__swig_destroy__ = DUMMY
+ self.is_closed = True
%}
}
};
|
Not much at least - I tried that one, too. |
Try it. That code is actually just a copy of what happens under |
It does not seem like an exact copy of what is in |
what are they? |
From def close(self) -> None:
"""Close document."""
if self.is_closed:
raise ValueError("document closed")
if hasattr(self, "_outline") and self._outline:
self._dropOutline(self._outline)
self._outline = None
self._reset_page_refs()
self.metadata = None
self.stream = None
self.is_closed = True
self.FontInfos = []
for k in self.Graftmaps.keys():
self.Graftmaps[k] = None
self.Graftmaps = {}
self.ShownPages = {}
self.InsertedImages = {}
val = _fitz.Document_close(self)
self.thisown = False
return val
def __exit__(self, *args):
if hasattr(self, "_reset_page_refs"):
self._reset_page_refs()
if hasattr(self, "Graftmaps"):
for k in self.Graftmaps.keys():
self.Graftmaps[k] = None
if hasattr(self, "this") and self.thisown:
try:
self.__swig_destroy__(self)
except:
pass
self.thisown = False
self.Graftmaps = {}
self.ShownPages = {}
self.InsertedImages = {}
self.stream = None
self._reset_page_refs = DUMMY
self.__swig_destroy__ = DUMMY
self.is_closed = True
Do you know why we should not simply call |
I am reverting this in the new version 1.19.3. There are preliminary wheels here. |
New version 1.19.3 is being uploaded to PyPI. |
Great, thanks! |
Describe the bug
When I upgraded to version 1.18.15, I experienced an important memory leak.
I am using only
insert_pdf
to add one coverpage to a lot of PDF files.To Reproduce
I don't have a code snippet, but if you simply run
insert_pdf
multiple times, I think you'll be able to reproduce.Your configuration
I am on Debian 11
I also tested:
And:
Additional context
Here is the output of
tracemalloc
for a hundred calls ofinsert_pdf
for different versions.First, version 1.18.14, where there is no memory leak:
Version 1.18.15, first apparition of the memory leak:
Version 1.18.19, also with memory leak:
And finally, version 1.19.1, still with memory leak:
The text was updated successfully, but these errors were encountered: