Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add more thorough searching for :MediaBox entry #468

Closed
wants to merge 1 commit into from

4 participants

Kris Hicks Brad Ediger James Healy Gregory Brown
Kris Hicks

This is a solution, though perhaps not the best one, for this issue in prawnpdf/prawn: #386

I think it might be relevant to change the search in document.state.store to be the following:

def media_box
  if dictionary.data.has_key?(:MediaBox)
    dictionary.data[:MediaBox]
  else
--  document.state.store.detect { |ref|
++  document.state.store.to_a.reverse.detect { |ref|
      ref.data.has_key?(:MediaBox)
    }.data[:MediaBox]
  end
end

But I think that would only matter if you want to search the previous pages in the order they were added (if my understanding of how ObjectStore works is correct).

I'm happy to take suggestions or a full rewrite from someone that knows the internals of Prawn better.

Cheers.

Brad Ediger
Collaborator

The code looks fine to me. @yob any concerns or suggestions?

James Healy
Collaborator
yob commented

If I'm reading the diff correctly it's just searching for a :MediaBox entry anywhere in the file and using that?

That will work fine for most PDFs with consistent page sizes, but only through luck.

Page objects are leaf nodes in a tree of Pages objects, and walking up the tree until a MediaBox is found is the correct thing to do.

Kris Hicks

Indeed, it does just search for any MediaBox entry.

@yob: I'm curious to know what a better solution is. Given no MediaBox entry for a PDF, should a proper exception be raised stating that the PDF cannot be used as a template, or is there something else that can be done to work around the fact that it doesn't have one?

James Healy
Collaborator
yob commented

Am I reading the diff wrong? It looks like it replaces the search up the tree with a "select any MediaBox from the document" search.

Are you certain the PDF you're trying to use as a template has no MediaBox, even in the parent Pages objects? That's quite rare, but I guess possible.

James Healy
Collaborator
yob commented

Missing a MediaBox is against the spec, but Adobe Acrobat allows it and assumes the MediaBox is 0,0,612,792 (US Letter).

I think we should keep the inheritance of page attributes as per the spec and assume US Letter in the very rare case that a PDF without a MediaBox is used as a template.

Brad Ediger
Collaborator

Thanks for catching that, @yob. Yes, upon further inspection, that is my reading of the diff as well, and does not appear to be correct as it could catch the MediaBoxes of siblings (or unrelated branches) in the page tree.

Brad Ediger
Collaborator

I think we should keep the inheritance of page attributes as per the spec and assume US Letter in the very rare case that a PDF without a MediaBox is used as a template.

Yes, I agree. So does that "very rare case" cover the issue that this ticket was opened to cover, @krishicks?

Kris Hicks

I'm happy with that result. I wasn't really happy with the fix I proposed, as it still allowed for the case where the method would return nil. The method should never return nil. Making it return [0,0,612,792] as a default sounds fantastic. I'll update the pull request later.

Gregory Brown

Can someone confirm for me whether this is a purely template related issue, or whether it has uses outside of templates as well?

Gregory Brown

Closing out as a templates-related issue that needs revision. Please re-open if and when revisions are made //cc @cheba

Kris Hicks

This is a templates-only problem as only when a template is used is the MediaBox potentially missing.

I see a few problems:

  • Page#init_from_object throws away options set in start_new_page, such as size, layout, margins. This means that it's not even possible to hint when the size is known but the MediaBox is missing
  • Page#inherited_dictionary_value may return nil, which causes Document#generate_margin_box to explode

And a new problem I've discovered with another PDF:

  • The MediaBox returned by PDF::Reader may be a PDF::Reader::Reference which needs to be dereferenced
Kris Hicks krishicks referenced this pull request
Closed

Better MediaBox Support #593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 2, 2013
  1. Kris Hicks
This page is out of date. Refresh to see the latest.
Showing with 28 additions and 24 deletions.
  1. +8 −18 lib/prawn/core/page.rb
  2. +20 −6 spec/template_spec.rb
26 lib/prawn/core/page.rb
View
@@ -34,7 +34,7 @@ def initialize(document, options={})
def layout
return @layout if @layout
- mb = dictionary.data[:MediaBox]
+ mb = media_box
if mb[3] > mb[2]
:portrait
else
@@ -139,7 +139,7 @@ def imported_page?
end
def dimensions
- return inherited_dictionary_value(:MediaBox) if imported_page?
+ return media_box if imported_page?
coords = Prawn::Document::PageGeometry::SIZES[size] || size
[0,0] + case(layout)
@@ -185,23 +185,13 @@ def init_new_page(options)
@stamp_dictionary = nil
end
- # some entries in the Page dict can be inherited from parent Pages dicts.
- #
- # Starting with the current page dict, this method will walk up the
- # inheritance chain return the first value that is found for key
- #
- # inherited_dictionary_value(:MediaBox)
- # => [ 0, 0, 595, 842 ]
- #
- def inherited_dictionary_value(key, local_dict = nil)
- local_dict ||= dictionary.data
-
- if local_dict.has_key?(key)
- local_dict[key]
- elsif local_dict.has_key?(:Parent)
- inherited_dictionary_value(key, local_dict[:Parent].data)
+ def media_box
+ if dictionary.data.has_key?(:MediaBox)
+ dictionary.data[:MediaBox]
else
- nil
+ document.state.store.detect { |ref|
+ ref.data.has_key?(:MediaBox)
+ }.data[:MediaBox]
end
end
26 spec/template_spec.rb
View
@@ -161,12 +161,21 @@
fonts.size.should == 2
end
- it "should correctly import a template file that is missing a MediaBox entry" do
- filename = "#{Prawn::DATADIR}/pdfs/page_without_mediabox.pdf"
+ context "when the template is missing a MediaBox entry" do
+ it "should correctly import the template" do
+ filename = "#{Prawn::DATADIR}/pdfs/page_without_mediabox.pdf"
- @pdf = Prawn::Document.new(:template => filename)
- str = @pdf.render
- str[0,4].should == "%PDF"
+ @pdf = Prawn::Document.new(:template => filename)
+ str = @pdf.render
+ str[0,4].should == "%PDF"
+ end
+
+ it "should allow you to create a new page manually" do
+ filename = "#{Prawn::DATADIR}/pdfs/page_without_mediabox.pdf"
+
+ @pdf = Prawn::Document.new(:template => filename, :skip_page_creation => true)
+ lambda { @pdf.start_new_page }.should_not raise_error
+ end
end
context "with the template as a stream" do
@@ -346,6 +355,11 @@
text = PDF::Inspector::Text.analyze(@pdf.render)
text.strings.first.should == "This is template page 2"
end
- end
+ it "should work when the template is missing a MediaBox entry" do
+ filename = "#{Prawn::DATADIR}/pdfs/page_without_mediabox.pdf"
+ @pdf = Prawn::Document.new(:skip_page_creation => true)
+ lambda { @pdf.start_new_page(:template => filename) }.should_not raise_error
+ end
+ end
end
Something went wrong with that request. Please try again.