Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing document with large ObjectStreams uses lots of memory #836

Closed
fancycode opened this issue Mar 20, 2024 · 0 comments · Fixed by #837
Closed

Parsing document with large ObjectStreams uses lots of memory #836

fancycode opened this issue Mar 20, 2024 · 0 comments · Fixed by #837
Assignees

Comments

@fancycode
Copy link
Contributor

fancycode commented Mar 20, 2024

Tested with latest master (3282d8a) and the test script below:

package main

import (
	"log"
	"os"
	"runtime"

	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
)

func main() {
	log.SetFlags(log.Flags() | log.Lmicroseconds)
	fp, err := os.Open("prezz_2016.pdf")
	if err != nil {
		log.Fatal(err)
	}
	defer fp.Close()

	conf := model.NewDefaultConfiguration()
	log.Printf("Parsing ...")
	var start, end runtime.MemStats
	runtime.ReadMemStats(&start)
	pdf, err := pdfcpu.Read(fp, conf)
	runtime.GC()
	runtime.ReadMemStats(&end)
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Done, uses %d MiBytes heap memory, %d MiBytes system memory",
		(end.HeapAlloc-start.HeapAlloc)/(1024*1024),
		(end.HeapSys-start.HeapSys)/(1024*1024),
	)

	if err := pdf.EnsurePageCount(); err != nil {
		log.Fatal(err)
	}

	log.Printf("Parsed %d pages", pdf.PageCount)
}

Example file: prezz_2016.pdf

Original source:
http://www.sistemapiemonte.it/eXoRisorse/dwd/servizi/OperePubbliche/prezzario/prezz_2016.pdf

Output on my machine (Ubuntu 20.04, Go 1.22.1):

$ time go run test.go 
2024/03/20 14:14:37.969259 Parsing ...
2024/03/20 14:14:49.661571 Done, uses 4244 MiBytes heap memory, 6783 MiBytes system memory
2024/03/20 14:14:49.661610 Parsed 1133 pages

real	0m12,381s
user	0m19,722s
sys	0m2,471s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants