Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extremely low efficiency FPDF._out implement #185

Open
YouJiacheng opened this issue Aug 5, 2021 · 3 comments
Open

extremely low efficiency FPDF._out implement #185

YouJiacheng opened this issue Aug 5, 2021 · 3 comments

Comments

@YouJiacheng
Copy link

YouJiacheng commented Aug 5, 2021

In fpdf(not fpdf2),self.buffer is str not bytearray

def _out(self, s):
    #Add a line to the document
    if PY3K and isinstance(s, bytes):
        # manage binary data as latin1 until PEP461-like function is implemented
        s = s.decode("latin1")          
    elif not PY3K and isinstance(s, unicode):
        s = s.encode("latin1")    # default encoding (font name and similar)      
    elif not isinstance(s, basestring):
        s = str(s)
    if(self.state==2):
        self.pages[self.page]+=s+"\n"
    else:
        self.buffer+=s+"\n" # type(self.buffer) == str, which leads to O(N^2) time !!!!!!!!!!!!
@YouJiacheng YouJiacheng changed the title extremely low efficient FPDF._out implement extremely low efficiency FPDF._out implement Aug 5, 2021
@YouJiacheng
Copy link
Author

I fix it by subclassing FPDF, others encounter same problem can use following code

from fpdf import FPDF

# still use str concat within each page
class FPDF_fixed1(FPDF):
    def __init__(self, orientation='P', unit='mm', format='A4'):
        super().__init__(orientation=orientation, unit=unit, format=format)
        self.buffer = bytearray()

    def _out(self, s):
        if(self.state == 2):
            # still use str concat within each page
            if isinstance(s, bytes):
                s = s.decode('latin1')
            elif not isinstance(s, str):
                s = str(s)
            self.pages[self.page] += s + '\n'
        else:
            if not isinstance(s, bytes):
                if not isinstance(s, str):
                    s = str(s)
                s = s.encode('latin1')
            self.buffer += s + b'\n'

    def output(self, name=''):
        if(self.state < 3):
            self.close()
        with open(name, 'wb') as f:
            f.write(self.buffer)

# fully bytearray version, but not support compression and page number
# you can override _putpages to re-support compression and page number 
class FPDF_fixed2(FPDF):
    def __init__(self, orientation='P', unit='mm', format='A4'):
        super().__init__(orientation=orientation, unit=unit, format=format)
        self.buffer = bytearray()

    def _out(self, s):
        if not isinstance(s, bytes):
            if not isinstance(s, str):
                s = str(s)
            s = s.encode('latin1')
        if(self.state == 2):
            self.pages[self.page] += s + b'\n'
        else:
            self.buffer += s + b'\n'

    def output(self, name=''):
        if(self.state < 3):
            self.close()
        with open(name, 'wb') as f:
            f.write(self.buffer)

    def _beginpage(self, orientation):
        super()._beginpage(orientation)
        self.pages[self.page] = bytearray()
    
    def set_compression(self, compress): # disable
        return super().set_compression(False)
    
    def alias_nb_pages(self, alias): # disable
        pass

@YouJiacheng
Copy link
Author

after fix that problem, time cost for converting ~500 * ~1M jpg to pdf is reduced from 1200s(*) to 3s

  • : I only measure time cost for converting 50 * ~1M jpg to pdf, which is 12s

@Lucas-C
Copy link

Lucas-C commented Aug 23, 2021

PyFPDF is not maintained anymore, you may want to check PyFPDF/fpdf2

Its FPDF._out method is a lot faster thanks to the usage of a bytebuffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants