Improve label-position-tolerance performance #1387

Closed
herm opened this Issue Aug 14, 2012 · 10 comments

Comments

Projects
None yet
2 participants
Member

herm commented Aug 14, 2012

the current approach is not very efficient. When a placement can’t be made at the intended place up to 200(!) slightly different places are tried. This usually means moving the text one pixel and trying again. I think the displacement should grow exponentially. So instead of trying +-1, +-2, +-3, +-4, +-5,… try +-1, +-2, +-4, +-8, +-16, …

http://mapnik.org/news/2012/08/13/gsoc2012-status8/

herm was assigned Aug 14, 2012

Owner

springmeyer commented Mar 13, 2013

this actually produces decent visual result changes in our tests: https://gist.github.com/springmeyer/5156913

Member

herm commented Mar 15, 2013

The speedup from this change is probably not very big. It reduces the time by a factor 2 in the best case. In the worst case the slow down is by a factor of 50. The average probably is a very small speedup.

The real solution is to fix the whole placement finder as done in the harfbuzz branch. I will work on creating a up to date version of it in the next days. Then we can talk about merging it.

Owner

springmeyer commented Mar 15, 2013

@herm - awesome. Have you done any profiling comparing master to harfbuzz yet?

Member

herm commented Mar 31, 2013

I ran 100 iterations of the text related visual test rendered to an in-memory image instead of a file. All test result are after 089ca7d.

Master:

 Performance counter stats for './test.py -d cairo -d grid -d agg !text -s 1 -r 100':

      68669.237165 task-clock                #    0.865 CPUs utilized          
            16,079 context-switches          #    0.234 K/sec                  
               325 cpu-migrations            #    0.005 K/sec                  
           159,774 page-faults               #    0.002 M/sec                  
   122,631,814,663 cycles                    #    1.786 GHz                     [49.94%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   129,143,172,067 instructions              #    1.05  insns per cycle         [74.97%]
    24,247,906,692 branches                  #  353.112 M/sec                   [75.02%]
     1,037,539,499 branch-misses             #    4.28% of all branches         [75.05%]

      79.404409129 seconds time elapsed

Harfbuzz branch with dummy shaper (basically the same shaping functionality as in master):

 Performance counter stats for './test.py -d cairo -d grid -d agg !text -s 1 -r 100':

      61914.419763 task-clock                #    0.999 CPUs utilized          
             1,194 context-switches          #    0.019 K/sec                  
               446 cpu-migrations            #    0.007 K/sec                  
           159,420 page-faults               #    0.003 M/sec                  
   110,761,366,591 cycles                    #    1.789 GHz                     [50.01%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   129,055,099,567 instructions              #    1.17  insns per cycle         [75.01%]
    24,243,730,658 branches                  #  391.568 M/sec                   [74.99%]
     1,015,644,683 branch-misses             #    4.19% of all branches         [75.00%]

      61.996422817 seconds time elapsed

Harfbuzz branch with harfbuzz as the shaping engine:

 Performance counter stats for './test.py -d cairo -d grid -d agg !text -s 1 -r 100':

     122198.873020 task-clock                #    0.998 CPUs utilized          
             1,851 context-switches          #    0.015 K/sec                  
               677 cpu-migrations            #    0.006 K/sec                  
           240,196 page-faults               #    0.002 M/sec                  
   218,636,785,426 cycles                    #    1.789 GHz                     [50.01%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   287,510,551,842 instructions              #    1.32  insns per cycle         [75.00%]
    63,404,604,562 branches                  #  518.864 M/sec                   [75.00%]
     1,865,460,731 branch-misses             #    2.94% of all branches         [74.99%]

     122.423250823 seconds time elapsed

So with the same level of shaping features as master the new code is about 27% faster, but with harfbuzz as the shaping engine it is 50% slower. Some tests are faster in master and some are slower.

I will look into what causes the tests to run slower.

Member

herm commented Mar 31, 2013

Using tcmalloc via LD_PRELOAD results in a 6% performance improvement. Malloc performance was critical in almost all profiles I have seen so far. So we should consider shipping with tcmalloc as the default allocator.

Owner

springmeyer commented Mar 31, 2013

I've seen that jemalloc usually beats out tcmalloc on Linux. But that it does not change the relative places that take time in the code.

Owner

springmeyer commented Jul 29, 2014

not seeing anything more to do here.

Owner

springmeyer commented Aug 22, 2014

re-opening, seeing some poor label-position tolerance behavior with shields in master

springmeyer reopened this Aug 22, 2014

Owner

springmeyer commented Aug 22, 2014

New issue for the label-position-tolerance issue I was seeing: #2384.

Keeping this one open however since general performance is still a concern and the idea still stands of either logarithmically or exponentially increasing the tolerance used internally in succession to reduce the total # of checks.

Owner

springmeyer commented Sep 5, 2014

nah, closing will handle at #2384.

springmeyer closed this Sep 5, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment