Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline squared distance calculation #290

Closed
wants to merge 2 commits into from

Conversation

@Komzpa
Copy link
Member

commented Sep 4, 2018

distance2d_sqr_pt_pt is hot call in many spots, but is not getting inlined as it's defined in other .c file.

Komzpa added 2 commits Sep 4, 2018

@strk strk closed this in 5eb3014 Sep 6, 2018

@dbaston

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

Can you share your process on identifying this as a "hot call" ?

@Komzpa

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2018

@dbaston
I open PGDG-shipped lwgeom and disassemble it:

objdump -S /usr/lib/liblwgeom-2.5.so.0.0.0 | less

I look at the function I care about (lwgeom_cluster_2d_kmeans) and see that all the sub-functions of it got inlined into main one. Let's see not inlined ones:

objdump -S /usr/lib/liblwgeom-2.5.so.0.0.0 |grep -E "(>:|callq)" | less
0000000000060e30 <lwgeom_cluster_2d_kmeans@@Base>:
   60e90:       e8 3b f7 fc ff          callq  305d0 <lwalloc@@Base>
   60ea2:       e8 f9 e8 fa ff          callq  f7a0 <memset@plt>
   60eaa:       e8 21 f7 fc ff          callq  305d0 <lwalloc@@Base>
   60ec4:       e8 d7 e8 fa ff          callq  f7a0 <memset@plt>
   60ecc:       e8 ff f6 fc ff          callq  305d0 <lwalloc@@Base>
   60ede:       e8 ed f6 fc ff          callq  305d0 <lwalloc@@Base>
   60eeb:       e8 e0 f6 fc ff          callq  305d0 <lwalloc@@Base>
   60efd:       e8 9e e8 fa ff          callq  f7a0 <memset@plt>
   60f0c:       e8 8f e8 fa ff          callq  f7a0 <memset@plt>
   60f19:       e8 82 e8 fa ff          callq  f7a0 <memset@plt>
   60f5c:       e8 5f b2 fb ff          callq  1c1c0 <lwgeom_is_empty@@Base>
   60f68:       e8 23 ac fb ff          callq  1bb90 <lwgeom_get_type@@Base>
   60f79:       e8 c2 d4 fe ff          callq  4e440 <lwgeom_centroid@@Base>
   60f89:       e8 32 b2 fb ff          callq  1c1c0 <lwgeom_is_empty@@Base>
   60fad:       e8 0e a4 fb ff          callq  1b3c0 <lwgeom_as_lwpoint@@Base>
   60fb8:       e8 c3 9b fb ff          callq  1ab80 <getPoint2d_cp@@Base>
   61030:       e8 bb f5 fc ff          callq  305f0 <lwfree@@Base>
   61038:       e8 b3 f5 fc ff          callq  305f0 <lwfree@@Base>
   61042:       e8 a9 f5 fc ff          callq  305f0 <lwfree@@Base>
   6104c:       e8 9f f5 fc ff          callq  305f0 <lwfree@@Base>
   61068:       e8 53 a3 fb ff          callq  1b3c0 <lwgeom_as_lwpoint@@Base>
   610c6:       e8 35 3c fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   610d8:       e8 23 3c fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   6110b:       e8 d0 e3 fa ff          callq  f4e0 <fmax@plt>
   611cb:       e8 a0 f1 fc ff          callq  30370 <lwnotice@@Base>
   61256:       e8 75 f3 fc ff          callq  305d0 <lwalloc@@Base>
   61265:       e8 66 f3 fc ff          callq  305d0 <lwalloc@@Base>
   612a1:       ff d2                   callq  *%rdx
   612c6:       e8 85 e6 fa ff          callq  f950 <memcpy@plt>
   612f4:       e8 07 3a fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   6130c:       e8 ef 39 fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   61354:       e8 47 e4 fa ff          callq  f7a0 <memset@plt>
   61421:       e8 5a e4 fa ff          callq  f880 <memcmp@plt>
   61447:       e8 a4 f1 fc ff          callq  305f0 <lwfree@@Base>
   61451:       e8 9a f1 fc ff          callq  305f0 <lwfree@@Base>
   614a0:       e8 8b ef fc ff          callq  30430 <lwerror@@Base>
   614b1:       e8 ba a9 fb ff          callq  1be70 <lwgeom_free@@Base>
   614cb:       e8 a0 ee fc ff          callq  30370 <lwnotice@@Base>
   614e7:       e8 84 ee fc ff          callq  30370 <lwnotice@@Base>
   614f1:       e8 fa f0 fc ff          callq  305f0 <lwfree@@Base>
   614fb:       e8 f0 f0 fc ff          callq  305f0 <lwfree@@Base>
   61514:       e8 17 ef fc ff          callq  30430 <lwerror@@Base>
   6151e:       e8 cd f0 fc ff          callq  305f0 <lwfree@@Base>
   61526:       e8 c5 f0 fc ff          callq  305f0 <lwfree@@Base>
   61530:       e8 bb f0 fc ff          callq  305f0 <lwfree@@Base>
   6153a:       e8 b1 f0 fc ff          callq  305f0 <lwfree@@Base>
   61544:       e8 a7 f0 fc ff          callq  305f0 <lwfree@@Base>
   6155c:       e8 6f f0 fc ff          callq  305d0 <lwalloc@@Base>
   61593:       e8 68 37 fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   61653:       e8 a8 36 fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   6165d:       e8 0e e3 fa ff          callq  f970 <fmin@plt>
   616e2:       e8 09 ef fc ff          callq  305f0 <lwfree@@Base>
   61706:       e8 65 e0 fa ff          callq  f770 <__assert_fail@plt>
   61725:       e8 46 e0 fa ff          callq  f770 <__assert_fail@plt>
   61744:       e8 27 e0 fa ff          callq  f770 <__assert_fail@plt>
   61763:       e8 08 e0 fa ff          callq  f770 <__assert_fail@plt>
   61782:       e8 e9 df fa ff          callq  f770 <__assert_fail@plt>

From knowing the code I see that distance2d_sqr_pt_pt is called on every point to every centroid (k*n times) on every loop iteration, and is expected to be really fast. The structure gets packed and then unpacked:

0000000000014d00 <distance2d_sqr_pt_pt@@Base>:
   14d00:       f2 0f 10 06             movsd  (%rsi),%xmm0
   14d04:       f2 0f 10 4e 08          movsd  0x8(%rsi),%xmm1
   14d09:       f2 0f 5c 07             subsd  (%rdi),%xmm0
   14d0d:       f2 0f 5c 4f 08          subsd  0x8(%rdi),%xmm1
   14d12:       f2 0f 59 c0             mulsd  %xmm0,%xmm0
   14d16:       f2 0f 59 c9             mulsd  %xmm1,%xmm1
   14d1a:       f2 0f 58 c1             addsd  %xmm1,%xmm0
   14d1e:       c3                      retq   
   14d1f:       90                      nop

.....
<in kmeans:>
   612e5:       48 8b 33                mov    (%rbx),%rsi
   612e8:       4c 89 e7                mov    %r12,%rdi
   612eb:       41 be 01 00 00 00       mov    $0x1,%r14d
   612f1:       45 31 ff                xor    %r15d,%r15d
   612f4:       e8 07 3a fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
   612f9:       66 0f 28 c8             movapd %xmm0,%xmm1
   612fd:       0f 1f 00                nopl   (%rax)
   61300:       4a 8b 34 f3             mov    (%rbx,%r14,8),%rsi
   61304:       4c 89 e7                mov    %r12,%rdi
   61307:       f2 0f 11 0c 24          movsd  %xmm1,(%rsp)
   6130c:       e8 ef 39 fb ff          callq  14d00 <distance2d_sqr_pt_pt@@Base>
....

after change, disassembly shows that loop got vectorized:

objdump -S liblwgeom/.libs/liblwgeom-3.0.so.0.0.0  | less
....
   7ac57:       0f 84 33 04 00 00       je     7b090 <lwgeom_cluster_2d_kmeans+0x8c0>
   7ac5d:       49 8b 55 00             mov    0x0(%r13),%rdx
   7ac61:       c5 fb 10 58 08          vmovsd 0x8(%rax),%xmm3
   7ac66:       c5 fb 10 20             vmovsd (%rax),%xmm4
   7ac6a:       31 c9                   xor    %ecx,%ecx
   7ac6c:       b8 01 00 00 00          mov    $0x1,%eax
   7ac71:       c5 fb 10 42 08          vmovsd 0x8(%rdx),%xmm0
   7ac76:       c5 fb 10 12             vmovsd (%rdx),%xmm2
   7ac7a:       c5 fb 5c c3             vsubsd %xmm3,%xmm0,%xmm0
   7ac7e:       c5 eb 5c d4             vsubsd %xmm4,%xmm2,%xmm2
   7ac82:       c5 fb 59 c0             vmulsd %xmm0,%xmm0,%xmm0
   7ac86:       c4 e2 f9 99 d2          vfmadd132sd %xmm2,%xmm0,%xmm2
   7ac8b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   7ac90:       49 8b 54 c5 00          mov    0x0(%r13,%rax,8),%rdx
   7ac95:       c5 fb 10 4a 08          vmovsd 0x8(%rdx),%xmm1
   7ac9a:       c5 fb 10 02             vmovsd (%rdx),%xmm0
   7ac9e:       c5 f3 5c cb             vsubsd %xmm3,%xmm1,%xmm1
   7aca2:       c5 fb 5c c4             vsubsd %xmm4,%xmm0,%xmm0
   7aca6:       c5 f3 59 c9             vmulsd %xmm1,%xmm1,%xmm1
   7acaa:       c4 e2 f1 99 c0          vfmadd132sd %xmm0,%xmm1,%xmm0
   7acaf:       c5 f9 2e d0             vucomisd %xmm0,%xmm2
   7acb3:       c5 fb 5d d2             vminsd %xmm2,%xmm0,%xmm2
   7acb7:       0f 47 c8                cmova  %eax,%ecx
   7acba:       48 83 c0 01             add    $0x1,%rax
   7acbe:       41 39 c7                cmp    %eax,%r15d
   7acc1:       77 cd                   ja     7ac90 <lwgeom_cluster_2d_kmeans+0x4c0>

That's basically it.

@dbaston

This comment has been minimized.

Copy link
Member

commented Sep 6, 2018

Cool, thanks. I've also found callgrind to be instructive for this kind of work. I think a similar issue is present on the point accessors like getPoint2d_cp.

@Komzpa

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2018

These need some attention:

kom@nucat:~$ objdump -S -C --no-show-raw-insn /usr/lib/liblwgeom-2.5.so.0  |cut -f2 |grep -E "callq"| sort | uniq -c | sort -nr| head -n40
    669 callq  30430 <lwerror@@Base>
    274 callq  305d0 <lwalloc@@Base>
    254 callq  f6b0 <__stack_chk_fail@plt>
    250 callq  305f0 <lwfree@@Base>
    208 callq  1ab80 <getPoint2d_cp@@Base>
    182 callq  fdb0 <__sprintf_chk@plt>
    135 callq  f470 <GEOSGeom_destroy@plt>
    106 callq  1c1c0 <lwgeom_is_empty@@Base>
    106 callq  1be70 <lwgeom_free@@Base>
    104 callq  305b0 <lwtype_name@@Base>
    102 callq  53770 <lwgeom_split@@Base+0xa90>
     88 callq  1a750 <getPoint4d_p@@Base>
     80 callq  f680 <strlen@plt>
     76 callq  30370 <lwnotice@@Base>
     71 callq  18cf0 <getPoint_internal@@Base>
     61 callq  18eb0 <ptarray_append_point@@Base>
     58 callq  f770 <__assert_fail@plt>
     57 callq  10260 <stringbuffer_aprintf@@Base>
     55 callq  3a9f0 <geographic_point_init@@Base>
     53 callq  54580 <lwt_be_lastErrorMessage@@Base>
     53 callq  4ceb0 <lwgeom_geos_error@@Base+0x200>
     52 callq  177b0 <ptarray_free@@Base>
     51 callq  10110 <stringbuffer_append@@Base>
     44 callq  36ec0 <lwprint_double@@Base>
     44 callq  21a10 <lwcollection_construct_empty@@Base>
     43 callq  4d6e0 <LWGEOM2GEOS@@Base>
     43 callq  175f0 <ptarray_construct_empty@@Base>
     40 callq  1ab10 <getPoint2d_p@@Base>                                                                                                                                                                                                                                                   
     39 callq  1bba0 <lwgeom_has_z@@Base>                                                                                                                                                                                                                                                   
     37 callq  1b5d0 <lwline_as_lwgeom@@Base>                                                                                                                                                                                                                                               
     36 callq  2fc60 <wkt_parser_collection_finalize@@Base>                                                                                                                                                                                                                                 
     35 callq  30c90 <lw_segment_side@@Base>                                                                                                                                                                                                                                                
     34 callq  f950 <memcpy@plt>                                                                                                                                                                                                                                                            
     32 callq  fcf0 <sqrt@plt>                                                                                                                                                                                                                                                              
     31 callq  1be00 <lwgeom_is_collection@@Base>                                                                                                                                                                                                                                           
     31 callq  1acb0 <ptarray_set_point4d@@Base>                                                                                                                                                                                                                                            
     28 callq  3ac60 <geog2cart@@Base>                                                                                                                                                                                                                                                      
     28 callq  305e0 <lwrealloc@@Base>                                                                                                                                                                                                                                                      
     28 callq  22440 <lwcollection_add_lwgeom@@Base>                                                                                                                                                                                                                                        
     28 callq  21eb0 <lwcollection_free@@Base>     
@Komzpa Komzpa referenced this pull request Sep 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.