Support Chinese characters in PDF testing #435

mdmintz · 2019-11-29T05:28:21Z

Support Chinese characters in PDF testing

Updated methods:

    def get_pdf_text(self, pdf, page=None, maxpages=None,
                     password=None, codec='utf-8', wrap=False, nav=False,
                     override=False):
        """ Gets text from a PDF file.
            PDF can be either a URL or a file path on the local file system.
            @Params
            pdf - The URL or file path of the PDF file.
            page - The page number (or a list of page numbers) of the PDF.
                    If a page number is provided, looks only at that page.
                        (1 is the first page, 2 is the second page, etc.)
                    If no page number is provided, returns all PDF text.
            maxpages - Instead of providing a page number, you can provide
                       the number of pages to use from the beginning.
            password - If the PDF is password-protected, enter it here.
            codec - The compression format for character encoding.
                    (The default codec used by this method is 'utf-8'.)
            wrap - Replaces ' \n' with ' ' so that individual sentences
                   from a PDF don't get broken up into seperate lines when
                   getting converted into text format.
            nav - If PDF is a URL, navigates to the URL in the browser first.
                  (Not needed because the PDF will be downloaded anyway.)
            override - If the PDF file to be downloaded already exists in the
                       downloaded_files/ folder, that PDF will be used
                       instead of downloading it again. """

    def assert_pdf_text(self, pdf, text, page=None, maxpages=None,
                        password=None, codec='utf-8', wrap=True, nav=False,
                        override=False):
        """ Asserts text in a PDF file.
            PDF can be either a URL or a file path on the local file system.
            @Params
            pdf - The URL or file path of the PDF file.
            text - The expected text to verify in the PDF.
            page - The page number of the PDF to use (optional).
                    If a page number is provided, looks only at that page.
                        (1 is the first page, 2 is the second page, etc.)
                    If no page number is provided, looks at all the pages.
            maxpages - Instead of providing a page number, you can provide
                       the number of pages to use from the beginning.
            password - If the PDF is password-protected, enter it here.
            codec - The compression format for character encoding.
                    (The default codec used by this method is 'utf-8'.)
            wrap - Replaces ' \n' with ' ' so that individual sentences
                   from a PDF don't get broken up into seperate lines when
                   getting converted into text format.
            nav - If PDF is a URL, navigates to the URL in the browser first.
                  (Not needed because the PDF will be downloaded anyway.)
            override - If the PDF file to be downloaded already exists in the
                       downloaded_files/ folder, that PDF will be used
                       instead of downloading it again. """

mdmintz added 9 commits November 29, 2019 00:13

Allow the use of Chinese characters in PDF testing

c64665b

Update a test

9c8e97d

Add a test for verifying PDF testing with Chinese characters

4efd4df

Use pdfminer.six instead of pypdf2 for reading PDF files

c05b5b5

Update the version of pytest-html

94e13f4

Version 1.33.8

adbe519

Update method_summary

16275eb

Fix pytest-html compatibility

6f765fb

Configure the junit_family option explicitly in pytest.ini

7452015

mdmintz merged commit 360aba3 into master Nov 29, 2019

mdmintz deleted the handle-pdfs-with-chinese-characters branch November 29, 2019 05:55

mdmintz changed the title ~~Allow the use of Chinese characters in PDF testing~~ Support Chinese characters in PDF testing Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Chinese characters in PDF testing #435

Support Chinese characters in PDF testing #435

Uh oh!

mdmintz commented Nov 29, 2019 •

edited

Loading

Uh oh!

Uh oh!

Support Chinese characters in PDF testing #435

Support Chinese characters in PDF testing #435

Uh oh!

Conversation

mdmintz commented Nov 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Support Chinese characters in PDF testing

Uh oh!

Uh oh!

mdmintz commented Nov 29, 2019 •

edited

Loading