Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed Question #478

Open
joereddington-public opened this issue Jan 31, 2019 · 2 comments
Open

Speed Question #478

joereddington-public opened this issue Jan 31, 2019 · 2 comments

Comments

@joereddington-public
Copy link

Hey - so this might be a question about python or it might be about python-pptx.

I'm looking at the following code:

from pptx import Presentation

slide = Presentation("tests/testinputs/CK20V2.pptx").slides[1]
for i in range(1000):
    x=slide.shapes[0].top


#!/usr/bin/python
# -*- coding: utf-8 -*-
import unittest
from pptx import Presentation

class TestPPTX(unittest.TestCase):

    def test_one(self):
        prs = Presentation("tests/testinputs/CK20V2.pptx")
        for i in range(10000):
            x = prs.slides[0].shapes[0].top
        
    def test_two(self):
        prs = Presentation("tests/testinputs/CK20V2.pptx")
        shape= {}       
        shape['top']=1231231301
        for i in range(10000):
            x = shape['top']

if __name__ == '__main__':
    unittest.main()

When I run the rests I get:

Josephs-Mini:newFactory joepublic$ python mcve.py TestPPTX.test_one
.
----------------------------------------------------------------------
Ran 1 test in 15.947s

OK
Josephs-Mini:newFactory joepublic$ python mcve.py TestPPTX.test_two
.
----------------------------------------------------------------------
Ran 1 test in 3.317s

OK
Josephs-Mini:newFactory joepublic$ 

The difference is speeds makes me think that the presentation isn't being entirely loaded from disk when I open it - is there a way to speed up the access?

@joereddington-public
Copy link
Author

(did 100,000 itterations and it's pretty clear)

Josephs-Mini:newFactory joepublic$ python mcve.py TestPPTX.test_two
.
----------------------------------------------------------------------
Ran 1 test in 2.416s

OK
Josephs-Mini:newFactory joepublic$ python mcve.py TestPPTX.test_one
.
----------------------------------------------------------------------
Ran 1 test in 123.790s

OK
Josephs-Mini:newFactory joepublic$ 

@scanny
Copy link
Owner

scanny commented Jan 31, 2019

Hi Joe, I'm not clear about the real-life problem here, so I'm not getting what we're trying to solve for.

But in abstract terms, I think the biggest part of the performance difference is de-referencing time. Every time Python needs to navigate a dot-separator '.' (operator really) in a value expression, it incurs certain cost. For example, in shapes[0].top, Python not only needs to accomplish a dict lookup, it also then needs to find the "top" member inside that returned object.

These de-referencing operations take time. It can be a significant portion of the time Python spends doing anything, which is generally true of a dynamic language. This is the main time that Cython removes when used to optimize Python code.

So if you're interested in speed, especially in tight loops, you need to maintain a reference to each "anchor" object you need. Like this would perform much better I'm sure

def test_three(self):
    prs = Presentation("tests/testinputs/CK20V2.pptx")
    shape = prs.slides[0].shapes[0]
    for i in range(10000):
        x = shape.top

There could be other factors here as well, like what access operations incur re-parsing of the XML and which don't, but first would be to minimize de-referencing in a tight loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants