Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting error messages as layers #374

Closed
RyanSpittler opened this issue Aug 9, 2016 · 4 comments

Comments

@RyanSpittler
Copy link

commented Aug 9, 2016

We are using MiniMagick to convert PDF files into individual PNG images. Part of doing that is iterating over the "pages" (aka layers) of the original file and converting each page into its own image. Here's what we're doing in the uploader:

  def generate_pngs
    image = ::MiniMagick::Image.open(current_path)
    current_name = File.basename(current_path).chomp(File.extname(current_path))
    @process_dir = "#{CarrierWave.root}/#{cache_dir}/#{SecureRandom.hex.first(8)}"

    Dir.mkdir @process_dir

    Rails.logger.info {"Generating pngs from file: #{current_name}"}
    image.pages.each_with_index do |page, index|
      padded_index = (index + 1).to_s.rjust(2, '0')
      MiniMagick::Tool::Convert.new do |convert|
        convert.density "200"
        convert << page.path
        convert << "#{@process_dir}/#{current_name}-#{padded_index}.png"
      end
    end

    # do some other stuff for our database

    rescue ::MiniMagick::Error, ::MiniMagick::Invalid => e
      default = I18n.translate(:"errors.messages.mini_magick_processing_error", :e => e, :locale => :en)
      message = I18n.translate(:"errors.messages.mini_magick_processing_error", :e => e, :default => default)
      raise CarrierWave::ProcessingError, message
  end

One of the PDFs we received from our customers was an Epson Scan. When we attempted to process this PDF in the above way, it was creating the following error when it tried to iterate through the individual pages:

Failed to manipulate with MiniMagick, maybe it is not an image? Original Error: `convert -density 200 /path/temporary_file.pdf[1] /path/new_file.png` failed with error: convert.im6: Postscript delegate failed `/path/temporary_file.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/677. convert.im6: no images defined `/path/new_file.png' @ error/convert.c/ConvertImageCommand/3044.

Note: The error is wrapped with the CarrierWave error, while it was rescued from one of the MiniMagick errors.

We got the original file, and were able to reproduce the issue. The original only has a single page, and this is reflected in the MetaData. This is where we found that it was an Epson Scan. Manipulating this in Rails Console revealed part of the problem:

app(dev)> image = ::MiniMagick::Image.open("/path/original_file.pdf")
 **** Warning: can't process font stream, loading font by the name.
 **** Error reading a content stream. The page may be incomplete.
 **** File did not complete the page properly and may be damaged.

 **** This file had errors that were repaired or ignored.
 **** The file was produced by:
 **** >>>> ��EPSON Scan <<<<
 **** Please notify the author of the software that produced this
 **** file that it does not conform to Adobe's published PDF
 **** specification.

#<MiniMagick::Image:0x0000000c74b088 @path="/path/temporary_file.pdf", @tempfile=#<Tempfile:/path/temporary_file.pdf (closed)>, @info=#<MiniMagick::Image::Info:0x0000000c74b060 @path="/path/temporary_file.pdf", @info={}>>

There was apparently an issue with one of the fonts. But what are those special characters? Digging deeper:

app(dev)> image.pages
[
 [0] #<MiniMagick::Image:0x0000000c7376f0 @path=“/path/temporary_file.pdf[0]”, @tempfile=nil, @info=#<MiniMagick::Image::Info:0x0000000c7376c8 @path=“/path/temporary_file.pdf[0]", @info={}>>,
 [1] #<MiniMagick::Image:0x0000000c737600 @path=“/path/temporary_file.pdf[1]", @tempfile=nil, @info=#<MiniMagick::Image::Info:0x0000000c7375d8 @path=“/path/temporary_file.pdf[1]", @info={}>>,
 [2] #<MiniMagick::Image:0x0000000c737510 @path=“/path/temporary_file.pdf[2]", @tempfile=nil, @info=#<MiniMagick::Image::Info:0x0000000c7374e8 @path=“/path/temporary_file.pdf[2]", @info={}>>
]
app(dev)> image.identify
"Can't find the font file /usr/share/fonts/truetype/fonts-japanese-mincho.ttf\nCan't find the font file /usr/share/fonts/truetype/fonts-japanese-mincho.ttf\n/path/temporary_file.pdf PDF 612x792 612x792+0+0 16-bit Bilevel DirectClass 61KB 0.000u 0:00.000"

Note: Each of the above outputs contained the lengthy warning about fonts, repaired errors, and the Epson Scan, which were removed for brevity.

From MiniMagick::Image:

def layers
  layers_count = identify.lines.count
  layers_count.times.map do |idx|
    MiniMagick::Image.new("#{path}[#{idx}]")
  end
end
alias pages layers
alias frames layers

So, the .pages method is returning three pages, even though it should be just one. The reason is because it is calling .identify, which returns two error lines, and then the actual information.
Ideally, MiniMagick should only return the number of pages that include necessary information. If the error about fonts is necessary, it should be relayed to STDOUT, instead of counted as a layer/page/frame.

Workaround

Use .verbose to correct when MiniMagick calls .identify. For example:

app(dev)> image.verbose
"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/path/file_name1” “-f/path/file_name2” “-f/path/file_name3”

/path/file_name PNG 612x792 612x792+0+0 8-bit DirectClass 261KB 0.010u 0:00.010
/path/temporary_file.pdf PDF 612x792 612x792+0+0 16-bit DirectClass 261KB 0.000u 0:00.000
/path/temporary_file.pdf PDF 612x792 612x792+0+0 16-bit DirectClass 393KB 0.100u 0:00.030
#<MiniMagick::Image:0x0000000c74b088 @path=“/path/temporary_file.pdf", @tempfile=#<Tempfile:/path/temporary_file.pdf (closed)>, @info=#<MiniMagick::Image::Info:0x0000000c74b060 @path=“/path/temporary_file.pdf", @info={}>>
app(dev)> image.pages
[
 [0] #<MiniMagick::Image:0x0000000c727660 @path=“/path/temporary_file.pdf[0]", @tempfile=nil, @info=#<MiniMagick::Image::Info:0x0000000c727638 @path=“/path/temporary_file.pdf[0]", @info={}>>
]

Note: Removed the warning portion for brevity.

I'm not certain what is being changed. Since MiniMagick sends missing methods as a parameter to ImageMagick, .verbose becomes -verbose when the image is converted. For some reason, this fixes all future calls to .identify, including when it is counted for .pages. You can do this as a standalone step, or you can chain the method. The output of .verbose is the image.

@RyanSpittler

This comment has been minimized.

Copy link
Author

commented Aug 9, 2016

I'm working on getting an example Epson Scan PDF without any sensitive information.

@joao

This comment has been minimized.

Copy link

commented Sep 21, 2016

Did you ever solved this error?
Also getting it on production servers (Linux), but not on development (Mac).

@janko

This comment has been minimized.

Copy link
Member

commented Dec 3, 2016

@RyanSpittler Thank you for the awesome detailed analysis!!

Hmm, MiniMagick knows not to include stderr in the output, so this could only mean that the ImageMagick Postscript warnings were printed to stdout, which is a bit unfortunate. The problem is that IIRC identify output can differ on different types of images, so I'm not sure what would be a reliable way to parse out the lines that are actual files.

I'm aware a lot of time has passed since you reported the issue, but it would be awesome to get hands on a PDF that produces this behaviour, so that I can come up with a fix for this.

About the .verbose fixing the issue, it means that mogrify -verbose was run on the image, which shouldn't do any changes, but it might be that ImageMagick automatically fixes issues like the font loading when ran through mogrify.

Btw, I found it funny how many nested errors there are here:

Failed to manipulate with MiniMagick, maybe it is not an image? Original Error:
  └── `convert -density 200 /path/temporary_file.pdf[1] /path/new_file.png` failed with error:
    └── convert.im6: Postscript delegate failed `/path/temporary_file.pdf':
      └── No such file or directory @ error/pdf.c/ReadPDFImage/677. convert.im6:
        └── no images defined `/path/new_file.png' @ error/convert.c/ConvertImageCommand/3044.
@janko

This comment has been minimized.

Copy link
Member

commented Mar 28, 2017

I will close this issue because even if I'm able to reproduce it, it would be really ugly to try to parse out the errors from stdout. MiniMagick separates stdout from stderr, which means that ImageMagick prints these warnings to stdout, so this should be fixed in ImageMagick itself.

@janko janko closed this Mar 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.