-
Notifications
You must be signed in to change notification settings - Fork 149
[OCR] Memory leak #725
Comments
Thanks! Let me know if there's anything that I can do to help find the cause. |
This ticket is going to be trickier to fix. I think the simplest way to go for me is to disable the duplicated-PDF detection and import many times a PDF without text. |
Some additional info: it already output that it was opening the same pdf multiple times (IIRC, about 15 times in a row, for each document - it's impossible to see in the edited log, sorry about that). If it could keep the pdf names (and would allow to name pdfs), would that help with this? (I'd love that... because it would allow the collection to work even when paperworks doesn't work - but it might be a policy thing...) |
It's not surprising actually. So I took the lazy but reliable way: I use lazy initialization. Evey time Paperwork needs an information from the PDF (number of pages, page rendering, page text extraction, etc), it opens the file, gets the information and close it. Disk cache is taking the hit .. :/
It's a design thing. Giving documents a title is one of the most requested features, but this is beside the point of Paperwork. Paperwork is about being lazy, not about spending time sorting and naming documents. |
Arghs, yes, sounds like an annoying issue... :-/ Thank you very much for putting thought into this and making a report! Keeping file names would indeed help a lot to make the program more useful to me. I understand about the design. Yet I'm asking myself if perhaps the requesting users have a point, esp. if they are numerous, as you say...? As long as it's optional, it would help those who ask for it, and could be ignored by those who don't. When I look at the file thumbnails at the left, I feel a bit blind or lost, not knowing what is what (the thumbnails don't help much with certificates or invoices, they all look the same). Esp. if you don't know the specific search term (like mmh, was that certificate about ultrasound or sonography?), or have tagged things badly, or OCR didn't go well. But I know that it appears to be fashionable now to make GNOME apps as feature-free and uncustomizable as possible, so people who have grown up with their mobile phones can still just tap and use the app, even on a tablet with no keyboard. And that's a valid reason. Bonne Année et merci beaucoup! |
It looks like there is a memory leak in the OCR process.
See #724 (comment) for reference
The text was updated successfully, but these errors were encountered: