Skip to content

Support Chinese characters in PDF testing #435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 29, 2019

Conversation

mdmintz
Copy link
Member

@mdmintz mdmintz commented Nov 29, 2019

Support Chinese characters in PDF testing

  • Updated methods:
    def get_pdf_text(self, pdf, page=None, maxpages=None,
                     password=None, codec='utf-8', wrap=False, nav=False,
                     override=False):
        """ Gets text from a PDF file.
            PDF can be either a URL or a file path on the local file system.
            @Params
            pdf - The URL or file path of the PDF file.
            page - The page number (or a list of page numbers) of the PDF.
                    If a page number is provided, looks only at that page.
                        (1 is the first page, 2 is the second page, etc.)
                    If no page number is provided, returns all PDF text.
            maxpages - Instead of providing a page number, you can provide
                       the number of pages to use from the beginning.
            password - If the PDF is password-protected, enter it here.
            codec - The compression format for character encoding.
                    (The default codec used by this method is 'utf-8'.)
            wrap - Replaces ' \n' with ' ' so that individual sentences
                   from a PDF don't get broken up into seperate lines when
                   getting converted into text format.
            nav - If PDF is a URL, navigates to the URL in the browser first.
                  (Not needed because the PDF will be downloaded anyway.)
            override - If the PDF file to be downloaded already exists in the
                       downloaded_files/ folder, that PDF will be used
                       instead of downloading it again. """

    def assert_pdf_text(self, pdf, text, page=None, maxpages=None,
                        password=None, codec='utf-8', wrap=True, nav=False,
                        override=False):
        """ Asserts text in a PDF file.
            PDF can be either a URL or a file path on the local file system.
            @Params
            pdf - The URL or file path of the PDF file.
            text - The expected text to verify in the PDF.
            page - The page number of the PDF to use (optional).
                    If a page number is provided, looks only at that page.
                        (1 is the first page, 2 is the second page, etc.)
                    If no page number is provided, looks at all the pages.
            maxpages - Instead of providing a page number, you can provide
                       the number of pages to use from the beginning.
            password - If the PDF is password-protected, enter it here.
            codec - The compression format for character encoding.
                    (The default codec used by this method is 'utf-8'.)
            wrap - Replaces ' \n' with ' ' so that individual sentences
                   from a PDF don't get broken up into seperate lines when
                   getting converted into text format.
            nav - If PDF is a URL, navigates to the URL in the browser first.
                  (Not needed because the PDF will be downloaded anyway.)
            override - If the PDF file to be downloaded already exists in the
                       downloaded_files/ folder, that PDF will be used
                       instead of downloading it again. """

@mdmintz mdmintz merged commit 360aba3 into master Nov 29, 2019
@mdmintz mdmintz deleted the handle-pdfs-with-chinese-characters branch November 29, 2019 05:55
@mdmintz mdmintz changed the title Allow the use of Chinese characters in PDF testing Support Chinese characters in PDF testing Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant