Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with ghostscript 9.28 #425

Closed
spwhitton opened this issue Sep 5, 2019 · 9 comments
Closed

Compatibility with ghostscript 9.28 #425

spwhitton opened this issue Sep 5, 2019 · 9 comments

Comments

@spwhitton
Copy link
Contributor

It seems that ocrmypdf is not compatible with ghostscript 9.28. I am seeing test suite errors like these when I try to run ocrmypdf's test suite in Debian unstable (with the latest pikepdf):

/usr/lib/python3/dist-packages/ocrmypdf/exec/ghostscript.py:297: SubprocessOutputError
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 1/1 [00:00<00:00, 376.24page/s]
OCR: 100%|██████████| 1.0/1.0 [00:07<00:00,  7.63s/page]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:280    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
                                     GPL Ghostscript RELEASE CANDIDATE 1 9.28: Setting Overprint Mode to 1
                                      not permitted in PDF/A-2, overprint mode not set
                                     
                                     Error: /invalidfileaccess in --file--
                                     Operand stack:
                                        --nostringval--   --nostringval--   (/usr/lib/python3/dist-packages/ocrmypdf/data/sRGB.icc)   (r)
                                     Execution stack:
                                        %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1974   1   3   %oparray_pop   1973   1   3   %oparray_pop   1961   1   3   %oparray_pop   1817   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--
                                     Dictionary stack:
                                        --dict:736/1123(ro)(G)--   --dict:1/20(G)--   --dict:76/200(L)--
                                     Current allocation mode is local
                                     Last OS error: Permission denied
                                     Current file position is 576
                                     GPL Ghostscript RELEASE CANDIDATE 1 9.28: Unrecoverable error, exit code 1

I can provide the full log by e-mail if you need that (it's big).

@jbarlow83
Copy link
Collaborator

Should be fixed for v9.0.3. I didn't test your exact configuration but the change removes the external file access Ghostscript was complaining about.

@spwhitton
Copy link
Contributor Author

Thank you for v9.0.3, but unfortunately, despite using v9.0.3 there are still other failures:

============================= test session starts ==============================
platform linux -- Python 3.7.4+, pytest-4.6.5, py-1.8.0, pluggy-0.12.0
rootdir: /tmp/autopkgtest.zPRMaR/build.urp/src, inifile: setup.cfg, testpaths: tests
plugins: helpers-namespace-2019.1.8, cov-2.7.1
collected 237 items

tests/test_completion.py x.                                              [  0%]
tests/test_ghostscript.py ..                                             [  1%]
tests/test_graft.py ..                                                   [  2%]
tests/test_hocrtransform.py .                                            [  2%]
tests/test_lept.py ..........                                            [  7%]
tests/test_main.py .F................................................... [ 29%]
ss...............ss......................s..............                 [ 53%]
tests/test_metadata.py .....ssss....ss...                                [ 60%]
tests/test_optimize.py ....sss                                           [ 63%]
tests/test_page_numbers.py ...............                               [ 70%]
tests/test_pdfinfo.py ..............                                     [ 75%]
tests/test_qpdf.py .                                                     [ 76%]
tests/test_rotation.py FssF..Fssssssssssssssss.                          [ 86%]
tests/test_stdio.py ..ss...                                              [ 89%]
tests/test_tess4.py ......                                               [ 91%]
tests/test_unpaper.py ......                                             [ 94%]
tests/test_userunit.py ...                                               [ 95%]
tests/test_validation.py ..........                                      [100%]

=================================== FAILURES ===================================
_________________________________ test_deskew __________________________________

spoof_tesseract_noop = {'ADTTMP': '/tmp/autopkgtest.zPRMaR/autopkgtest_tmp', 'ADT_ARTIFACTS': '/tmp/autopkgtest.zPRMaR/test-suite-artifacts',...TS': '/tmp/autopkgtest.zPRMaR/test-suite-artifacts', 'AUTOPKGTEST_TMP': '/tmp/autopkgtest.zPRMaR/autopkgtest_tmp', ...}
resources = PosixPath('/tmp/autopkgtest.zPRMaR/build.urp/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_deskew0')

    def test_deskew(spoof_tesseract_noop, resources, outdir):
        # Run with deskew
        deskewed_pdf = check_ocrmypdf(
            resources / 'skew.pdf', outdir / 'skew.pdf', '-d', env=spoof_tesseract_noop
        )
    
        # Now render as an image again and use Leptonica to find the skew angle
        # to confirm that it was deskewed
        log = logging.getLogger()
    
        deskewed_png = outdir / 'deskewed.png'
    
        ghostscript.rasterize_pdf(
            deskewed_pdf,
            deskewed_png,
            xres=150,
            yres=150,
            raster_device='pngmono',
            log=log,
            pageno=1,
        )
    
        pix = Pix.open(deskewed_png)
        skew_angle, _skew_confidence = pix.find_skew()
    
        print(skew_angle)
>       assert -0.5 < skew_angle < 0.5, "Deskewing failed"
E       TypeError: '<' not supported between instances of 'float' and 'NoneType'

tests/test_main.py:116: TypeError
----------------------------- Captured stdout call -----------------------------
None
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 1/1 [00:00<00:00, 345.52page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00,  2.42page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
_________________________ test_monochrome_correlation __________________________

resources = PosixPath('/tmp/autopkgtest.zPRMaR/build.urp/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_monochrome_correlation0')

    def test_monochrome_correlation(resources, outdir):
        # Verify leptonica: check that an incorrect rotated image has poor
        # correlation with reference
        corr = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=1,  # north facing page
            test_pdf=resources / 'cardinal.pdf',
            test_pageno=3,  # south facing page
        )
        assert corr < 0.10
        corr = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=2,
            test_pdf=resources / 'cardinal.pdf',
            test_pageno=2,
        )
>       assert corr > 0.90
E       assert 0.0 > 0.9

tests/test_rotation.py:98: AssertionError
------------------------------ Captured log call -------------------------------
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
_______________ test_autorotate_threshold[1-correlation > 0.80] ________________

spoof_tesseract_cache = {'ADTTMP': '/tmp/autopkgtest.zPRMaR/autopkgtest_tmp', 'ADT_ARTIFACTS': '/tmp/autopkgtest.zPRMaR/test-suite-artifacts',...TS': '/tmp/autopkgtest.zPRMaR/test-suite-artifacts', 'AUTOPKGTEST_TMP': '/tmp/autopkgtest.zPRMaR/autopkgtest_tmp', ...}
threshold = '1', correlation_test = 'correlation > 0.80'
resources = PosixPath('/tmp/autopkgtest.zPRMaR/build.urp/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_autorotate_threshold_1_co0')

    @pytest.mark.parametrize(
        'threshold, correlation_test',
        [
            ('1', 'correlation > 0.80'),  # Low thresh -> always rotate -> high corr
            ('99', 'correlation < 0.10'),  # High thres -> never rotate -> low corr
        ],
    )
    def test_autorotate_threshold(
        spoof_tesseract_cache, threshold, correlation_test, resources, outdir
    ):
        out = check_ocrmypdf(
            resources / 'cardinal.pdf',
            outdir / 'out.pdf',
            '--rotate-pages-threshold',
            threshold,
            '-r',
            # '-v',
            # '1',
            env=spoof_tesseract_cache,
        )
    
        correlation = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=1,
            test_pdf=outdir / 'out.pdf',
            test_pageno=3,
        )
>       assert eval(correlation_test)  # pylint: disable=w0123
E       AssertionError: assert False
E        +  where False = eval('correlation > 0.80')

tests/test_rotation.py:155: AssertionError
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 4/4 [00:00<00:00, 407.68page/s]
OCR: 100%|██████████| 4.0/4.0 [00:01<00:00,  3.90page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    3:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    2:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    4:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    3:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    2:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    4:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:281 GPL Ghostscript RELEASE CANDIDATE 1 9.28: Setting Overprint Mode to 1
                                      not permitted in PDF/A-2, overprint mode not set
                                     
                                        **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
__________________________ test_rotate_deskew_timeout __________________________

resources = PosixPath('/tmp/autopkgtest.zPRMaR/build.urp/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_rotate_deskew_timeout0')

    def test_rotate_deskew_timeout(resources, outdir):
        check_ocrmypdf(
            resources / 'rotated_skew.pdf',
            outdir / 'deskewed.pdf',
            '--rotate-pages',
            '--rotate-pages-threshold',
            '0',
            '--deskew',
            '--tesseract-timeout',
            '0',
            '--pdf-renderer',
            'sandwich',
        )
    
        correlation = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'ccitt.pdf',
            reference_pageno=1,
            test_pdf=outdir / 'deskewed.pdf',
            test_pageno=1,
        )
    
        # Confirm that the page still got deskewed
>       assert correlation > 0.50
E       assert 0.0 > 0.5

tests/test_rotation.py:219: AssertionError
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 1/1 [00:00<00:00, 389.26page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00,  1.94page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
======== 4 failed, 198 passed, 34 skipped, 1 xfailed in 404.87 seconds =========

@jbarlow83
Copy link
Collaborator

I looked into this and my conclusion is that Ghostscript 9.28rc1 is quite broken.

All of those errors Ghostscript is producing are nonsense. The one that actually causes the trouble is Recursive XObject detected, ignoring "Im0", object number. By (incorrectly afaict) discarding the image Im0 and rendering without it, these tests are broken. These errors are independent of ocrmypdf. The file in question, cardinal.pdf, passes validation with qpdf, verapdf and Acrobat.

Ghostscript 9.28 rc2 was released upstream but has not made it to Debian. Might as well for that before raising the issue with Artifex since it may go away.

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Sep 6, 2019

I reported the issue against 9.28rc1 as Debian bug 939530. I don't know how to link that issue to related ocrmypdf issue in Debian's bug tracker.

If -rc2 doesn't resolve it then I'll take it up with Artifex.

@spwhitton
Copy link
Contributor Author

Thanks. I've done the linking!

@spwhitton
Copy link
Contributor Author

Alright, against rc2, only four tests fail:

============================= test session starts ==============================
platform linux -- Python 3.7.4+, pytest-4.6.5, py-1.8.0, pluggy-0.12.0
rootdir: /tmp/autopkgtest.anuEUK/build.Mo8/src, inifile: setup.cfg, testpaths: tests
plugins: helpers-namespace-2019.1.8, cov-2.7.1
collected 237 items

tests/test_completion.py x.                                              [  0%]
tests/test_ghostscript.py ..                                             [  1%]
tests/test_graft.py ..                                                   [  2%]
tests/test_hocrtransform.py .                                            [  2%]
tests/test_lept.py ..........                                            [  7%]
tests/test_main.py .F................................................... [ 29%]
ss...............ss......................s..............                 [ 53%]
tests/test_metadata.py .....ssss....ss...                                [ 60%]
tests/test_optimize.py ....sss                                           [ 63%]
tests/test_page_numbers.py ...............                               [ 70%]
tests/test_pdfinfo.py ..............                                     [ 75%]
tests/test_qpdf.py .                                                     [ 76%]
tests/test_rotation.py FssF..Fssssssssssssssss.                          [ 86%]
tests/test_stdio.py ..ss...                                              [ 89%]
tests/test_tess4.py ......                                               [ 91%]
tests/test_unpaper.py ......                                             [ 94%]
tests/test_userunit.py ...                                               [ 95%]
tests/test_validation.py ..........                                      [100%]

=================================== FAILURES ===================================
_________________________________ test_deskew __________________________________

spoof_tesseract_noop = {'ADTTMP': '/tmp/autopkgtest.anuEUK/autopkgtest_tmp', 'ADT_ARTIFACTS': '/tmp/autopkgtest.anuEUK/test-suite-artifacts',...TS': '/tmp/autopkgtest.anuEUK/test-suite-artifacts', 'AUTOPKGTEST_TMP': '/tmp/autopkgtest.anuEUK/autopkgtest_tmp', ...}
resources = PosixPath('/tmp/autopkgtest.anuEUK/build.Mo8/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_deskew0')

    def test_deskew(spoof_tesseract_noop, resources, outdir):
        # Run with deskew
        deskewed_pdf = check_ocrmypdf(
            resources / 'skew.pdf', outdir / 'skew.pdf', '-d', env=spoof_tesseract_noop
        )
    
        # Now render as an image again and use Leptonica to find the skew angle
        # to confirm that it was deskewed
        log = logging.getLogger()
    
        deskewed_png = outdir / 'deskewed.png'
    
        ghostscript.rasterize_pdf(
            deskewed_pdf,
            deskewed_png,
            xres=150,
            yres=150,
            raster_device='pngmono',
            log=log,
            pageno=1,
        )
    
        pix = Pix.open(deskewed_png)
        skew_angle, _skew_confidence = pix.find_skew()
    
        print(skew_angle)
>       assert -0.5 < skew_angle < 0.5, "Deskewing failed"
E       TypeError: '<' not supported between instances of 'float' and 'NoneType'

tests/test_main.py:116: TypeError
----------------------------- Captured stdout call -----------------------------
None
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 1/1 [00:00<00:00, 297.74page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00,  2.43page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
_________________________ test_monochrome_correlation __________________________

resources = PosixPath('/tmp/autopkgtest.anuEUK/build.Mo8/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_monochrome_correlation0')

    def test_monochrome_correlation(resources, outdir):
        # Verify leptonica: check that an incorrect rotated image has poor
        # correlation with reference
        corr = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=1,  # north facing page
            test_pdf=resources / 'cardinal.pdf',
            test_pageno=3,  # south facing page
        )
        assert corr < 0.10
        corr = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=2,
            test_pdf=resources / 'cardinal.pdf',
            test_pageno=2,
        )
>       assert corr > 0.90
E       assert 0.0 > 0.9

tests/test_rotation.py:98: AssertionError
------------------------------ Captured log call -------------------------------
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
_______________ test_autorotate_threshold[1-correlation > 0.80] ________________

spoof_tesseract_cache = {'ADTTMP': '/tmp/autopkgtest.anuEUK/autopkgtest_tmp', 'ADT_ARTIFACTS': '/tmp/autopkgtest.anuEUK/test-suite-artifacts',...TS': '/tmp/autopkgtest.anuEUK/test-suite-artifacts', 'AUTOPKGTEST_TMP': '/tmp/autopkgtest.anuEUK/autopkgtest_tmp', ...}
threshold = '1', correlation_test = 'correlation > 0.80'
resources = PosixPath('/tmp/autopkgtest.anuEUK/build.Mo8/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_autorotate_threshold_1_co0')

    @pytest.mark.parametrize(
        'threshold, correlation_test',
        [
            ('1', 'correlation > 0.80'),  # Low thresh -> always rotate -> high corr
            ('99', 'correlation < 0.10'),  # High thres -> never rotate -> low corr
        ],
    )
    def test_autorotate_threshold(
        spoof_tesseract_cache, threshold, correlation_test, resources, outdir
    ):
        out = check_ocrmypdf(
            resources / 'cardinal.pdf',
            outdir / 'out.pdf',
            '--rotate-pages-threshold',
            threshold,
            '-r',
            # '-v',
            # '1',
            env=spoof_tesseract_cache,
        )
    
        correlation = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'cardinal.pdf',
            reference_pageno=1,
            test_pdf=outdir / 'out.pdf',
            test_pageno=3,
        )
>       assert eval(correlation_test)  # pylint: disable=w0123
E       AssertionError: assert False
E        +  where False = eval('correlation > 0.80')

tests/test_rotation.py:155: AssertionError
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 4/4 [00:00<00:00, 550.72page/s]
OCR: 100%|██████████| 4.0/4.0 [00:01<00:00,  3.54page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    3:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    2:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    4:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    2:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    4:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    3:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:281 GPL Ghostscript RELEASE CANDIDATE 2 9.28: Setting Overprint Mode to 1
                                      not permitted in PDF/A-2, overprint mode not set
                                     
                                        **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
                                        **** Error: Recursive XObject detected, ignoring "Im0", object number 14
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
ERROR    root:ghostscript.py:167    **** Error reading a content stream. The page may be incomplete.
                                                Output may be incorrect.
                                    **** Error: File did not complete the page properly and may be damaged.
                                                Output may be incorrect.
__________________________ test_rotate_deskew_timeout __________________________

resources = PosixPath('/tmp/autopkgtest.anuEUK/build.Mo8/src/tests/resources')
outdir = PosixPath('/tmp/pytest-of-spwhitton/pytest-0/test_rotate_deskew_timeout0')

    def test_rotate_deskew_timeout(resources, outdir):
        check_ocrmypdf(
            resources / 'rotated_skew.pdf',
            outdir / 'deskewed.pdf',
            '--rotate-pages',
            '--rotate-pages-threshold',
            '0',
            '--deskew',
            '--tesseract-timeout',
            '0',
            '--pdf-renderer',
            'sandwich',
        )
    
        correlation = check_monochrome_correlation(
            outdir,
            reference_pdf=resources / 'ccitt.pdf',
            reference_pageno=1,
            test_pdf=outdir / 'deskewed.pdf',
            test_pageno=1,
        )
    
        # Confirm that the page still got deskewed
>       assert correlation > 0.50
E       assert 0.0 > 0.5

tests/test_rotation.py:219: AssertionError
----------------------------- Captured stderr call -----------------------------
Scan: 100%|██████████| 1/1 [00:00<00:00, 359.41page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00,  1.95page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call -------------------------------
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
ERROR    ocrmypdf:ghostscript.py:167    1:    **** Error reading a content stream. The page may be incomplete.
                                                    Output may be incorrect.
                                        **** Error: File did not complete the page properly and may be damaged.
                                                    Output may be incorrect.
WARNING  ocrmypdf:_pipeline.py:743 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
======== 4 failed, 198 passed, 34 skipped, 1 xfailed in 408.89 seconds =========

@jbarlow83
Copy link
Collaborator

Looks identical to 9.28rc1 to me.

Tried to report it but bugs.ghostscript.com has been down (https misconfigured?) for about half a day.

@jbarlow83
Copy link
Collaborator

Reported as
https://bugs.ghostscript.com/show_bug.cgi?id=701552

They traced to how Debian was compiling its version of Ghostscript. I believe this means all of the upstream packages should work.

@spwhitton
Copy link
Contributor Author

spwhitton commented Sep 18, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants