When the pdf are merged by pdftk, the bookmark and other metadata are lost, this script is used to convert rawdata generated from pdftk to standard pdfmark format.

PDF pdfmark file can be applied back to pdf using ghostscript command gs

See reference for lots of discussion


This script is influenced a lot by W.Trevor King especially for unicode handling, and his python script had more complete solution, see


pdftk input.pdf dump_data >
pdfbokmark.rb < > pdfmarks # may update pdfmarks for broken pages
pdftk A=book-cover.pdf B=sdcamp.zh.pdf cat A3-4 B3-end A7 output merged.pdf
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=result.pdf merged.pdf pdfmarks


This is sample rawdata

## generated
InfoKey: Creator
InfoValue: Cloud API Docs Plugin
InfoKey: Title
InfoValue: Cloud Files&#8482; Developer Guide
InfoKey: Producer
InfoValue: Apache FOP Version 1.0
InfoKey: CreationDate
InfoValue: D:20111115123218-06'00'
PdfID0: e941cffd7c16fbaba852f26754a562f8
PdfID1: e941cffd7c16fbaba852f26754a562f8
NumberOfPages: 51
BookmarkTitle: Cloud Files&#8482; Developer Guide
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkTitle: Table of Contents
BookmarkLevel: 1
BookmarkPageNumber: 3
BookmarkTitle: 1. Overview
BookmarkLevel: 1
BookmarkPageNumber: 8
BookmarkTitle: 1.1.&#160;Intended Audience
BookmarkLevel: 2
BookmarkPageNumber: 8
BookmarkTitle: 1.2.&#160;Document Change History
BookmarkLevel: 2
BookmarkPageNumber: 9

and the output should be something like below, see pdfmark Reference Manual for more

[ /Title (Document title)
  /Author (Author name)
  /Subject (Subject description)
  /Keywords (comma, separated, keywords)
  /ModDate (D:20061204092842)
  /CreationDate (D:20061204092842)
  /Creator (application name or creator note)
  /Producer (PDF producer name or note)
  /DOCINFO pdfmark
[/Title (Cloud Files&#8482; Developer Guide) /Page 1 /OUT pdfmark
[/Count 3 /Title (Chapter 1) /Page 1 /OUT pdfmark
[/Count -2 /Title (Section 1.1) /Page 2 /OUT pdfmark
[/Title (Section 1.1.1) /Page 3 /OUT pdfmark
[/Title (Section 1.2.2) /Page 3 /OUT pdfmark
[/Count -1 /Title (Section 1.2) /Page 4 /OUT pdfmark
[/Title (Section 1.2.1) /Page 4 /OUT pdfmark
[/Title (Section 1.3) /Page 3 /OUT pdfmark

If the title is unicode, it makes tricky to handle.


