Fix comparations for image colorspace literals #132

bashkirtsevich · 2018-03-25T15:44:07Z

See issue #131
Python: 3.6.3

pietermarsman · 2019-08-04T11:42:55Z

Hi @bashkirtsevich, thanks for your contribution. This is a very late response...

We don't have any test that run this code so I am not sure if this breaks any other kind of pdfs. I can imagine that sometimes the colorspace is a list and sometimes it is not. Do you have any idea if that is the case?

If you want this to be merged you should also add a a test. That makes sure that this code can actually be run (which I believe by looking at it) and it will make sure that we don't change it without thinking.

Do you have time to work on this?

bashkirtsevich · 2019-08-04T17:17:00Z

In my PR I just fix logical mistakes such as is and in.
We using this library with my patch, and its finally work on a lots of PDF documents.
Merge not necessary. May be this PR can helpful as a knowledge base.

pietermarsman · 2019-08-19T11:04:29Z

I've just checked if any of our current tests is using the ImageWriter, and this is not the case.

But we do have some sample pdf's that contain images. This is the result of evaluating image.colorspace at the start of ImageWriter.export_image().

nonfree/dmca.pdf.

[/'Indexed', /'DeviceRGB', 255, <PDFObjRef:86>]

nonfree/175.pdf

[/'DeviceRGB']
[/'DeviceRGB']

Both pdfs have a list of colorspaces instead of a single value. And thus our current way of checking e.g. image.colorspace is LITERAL_DEVICE_RGB is wrong.

pietermarsman · 2019-08-19T11:13:08Z

I've checked the code from this PR with nonfree/dmca.pdf and it improves the output! With this PR the code detects that it is a RGB image and should be written to disk as a .bmp. With the old code the image is not recognized as RGB and it is written as a generic .img file that cannot be easily opened.

pietermarsman

I would like to have some tests for this code but @bashkirtsevich indicated that he does not have time to work on this.

I've checked the code with pdf's that contain images and that works!

So, I suggest we merge this into develop.

…compatible

chengmengli06 · 2020-05-20T11:26:24Z

@pietermarsman I have a pdf with some images write as general ".img" files, how could I convert these images to jpg format? I use pdf2txt, the first 3 bytes of the stream are H\x89\xec, the corresponding hex is 4889ec.

pietermarsman · 2020-05-21T14:56:34Z

pdfminer.six writes to a .img file if it cannot infer the type of the image. It supports jpeg (.jpg), jbig2 (.jb2) and grayscale (.bmp) images. The bytes of an unrecognized images type are written to this extention: '.%d.%dx%d.img' % (image.bits, width, height). From the filename you can infer the number bits, width and height of the image. I don't know how you should open it.

Please share if you find a way to open the images. Perhaps we can add it to pdfminer.six.

Fix comparations for image colorspace literals

3c44bba

tataganesh changed the base branch from master to develop November 8, 2018 19:07

eladkehat added a commit to eladkehat/yapdfminer that referenced this pull request May 1, 2019

Fix comparations for image colorspace literals .

a7c6ea3

Original: pdfminer/pdfminer.six#132

pietermarsman approved these changes Aug 19, 2019

View reviewed changes

pietermarsman added the type: bug label Oct 15, 2019

pietermarsman added 5 commits October 15, 2019 15:48

Added: test for extracting images from pdfs

b013d9d

Merge branch 'develop' into master

e7bf910

Use absolute path to test pdfs in new test

cd1462a

Merge branch 'master' of github.com:bashkirtsevich/pdfminer.six

5868e3d

Use mkdtemp and shutil instead of TemporaryDirectory to make python2 …

eb5f717

…compatible

pietermarsman merged commit 4df6d4e into pdfminer:develop Oct 15, 2019

pietermarsman mentioned this pull request Oct 15, 2019

Wrong image saving #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix comparations for image colorspace literals #132

Fix comparations for image colorspace literals #132

bashkirtsevich commented Mar 25, 2018 •

edited

pietermarsman commented Aug 4, 2019

bashkirtsevich commented Aug 4, 2019 •

edited

pietermarsman commented Aug 19, 2019 •

edited

pietermarsman commented Aug 19, 2019

pietermarsman left a comment

chengmengli06 commented May 20, 2020 •

edited

pietermarsman commented May 21, 2020

Fix comparations for image colorspace literals #132

Fix comparations for image colorspace literals #132

Conversation

bashkirtsevich commented Mar 25, 2018 • edited

pietermarsman commented Aug 4, 2019

bashkirtsevich commented Aug 4, 2019 • edited

pietermarsman commented Aug 19, 2019 • edited

pietermarsman commented Aug 19, 2019

pietermarsman left a comment

Choose a reason for hiding this comment

chengmengli06 commented May 20, 2020 • edited

pietermarsman commented May 21, 2020

bashkirtsevich commented Mar 25, 2018 •

edited

bashkirtsevich commented Aug 4, 2019 •

edited

pietermarsman commented Aug 19, 2019 •

edited

chengmengli06 commented May 20, 2020 •

edited