Skip to content

redskal/pdf_parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf_parser by flotzilla - updated by redskal/@sam_phisher

I was looking for a simple PDF parser library and found this one, but it only worked from files on-disk. My use case required me to process data in-memory, so I've adapted it.

It's pretty much a copy-pasta of the relevant functions which were then modified slightly. Hopefully I'll get around to reimplementing this properly from scratch, or with some major refactoring. For now, it serves the purpose for my PoC.

Usage should be something like:

import "github.com/redskal/pdf_parser"

pdf, err := pdf_parser.ParsePdfInMem(byteArrayOfPdfFile)

pdf.GetTitle()
pdf.GetAuthor()
pdf.GetCreator()
pdf.GetISBN()
pdf.GetPublishers() []string
pdf.GetLanguages() []string
pdf.GetDescription()
pdf.GetPagesCount()

Original README is below.


Pdf metadata parser

Go library for parsing pdf metadata information

License

MIT

Usage

import "github.com/flotzilla/pdf_parser"

// parse file
pdf, errors := pdf_parser.ParsePdf("filepath/file.pdf")

// main functions
pdf.GetTitle()
pdf.GetAuthor()
pdf.GetCreator()
pdf.GetISBN()
pdf.GetPublishers() []string
pdf.GetLanguages() []string
pdf.GetDescription()
pdf.GetPagesCount()

Using with custom github.com/sirupsen/logrus logger

import "github.com/flotzilla/pdf_parser"

l := logger.New()
l.SetOutput(os.Stdout)
lg.SetFormatter(&logger.JSONFormatter{})

SetLogger(lg)
file, _ := filepath.Abs("filepath/file.pdf")
pdf, err := ParsePdf(file)

About

go pdf parser library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 100.0%