Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

Open
ahendriksen opened this issue Feb 23, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@ahendriksen
Copy link
Contributor

Describe the bug

Including the spdlog headers is quite expensive. Just adding #include <spdlog/spdlog.h> to an empty file adds 2.8 seconds to the compilation time. For the pairwise distance kernels, removing the spdlog include can reduce compile times by 50%.

Steps/Code to reproduce bug

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '// empty file')
real    0m1.042s

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '#include <spdlog/spdlog.h> ')
real    0m3.840s

Expected behavior
A smaller increase in compile time. For context, including <string> adds on the order of 100ms to the compilation time:

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '#include <string> ')
real    0m1.160s

Additional context

RMM
RMM also uses spdlog. In practice the compile time improvements will only be obtained when RMM also removes its spdlog dependency.

Reason
The reason that compilation takes much longer is that spdlog instantiates a bunch of templates in every translation unit when used as a header only library. This happens in pattern_formatter::handle_flag_, which is instantiated here. Just adding back the spdlog header doubles the compile times of cicc (device side) and also gcc on the host side.

Precompiled-library
Another option is to not use spdlog as a header only library. The effect can be simulated by defining SPDLOG_COMPILED_LIB. When this is defined, spdlog adds only 0.5 seconds:

time nvcc -DSPDLOG_COMPILED_LIB -arch sm_70 -I/home/ahendriksen/projects/raft-spdlog-issue/cpp/build/_deps/spdlog-src/include -x cu -c <(echo '#include <spdlog/spdlog.h> ')
real    0m1.520s

time nvcc -DSPDLOG_COMPILED_LIB -arch sm_70 -I/home/ahendriksen/projects/raft-spdlog-issue/cpp/build/_deps/spdlog-src/include -x cu -c <(echo '//empty file')
real    0m1.053s
@ahendriksen
Copy link
Contributor Author

ahendriksen commented Feb 23, 2023

Cross-link to RMM issue: rapidsai/rmm#1222

@cjnolet
Copy link
Member

cjnolet commented Mar 8, 2023

Thanks for doing this comparison @ahendriksen!

Have you, by chance, compared the end-to-end runtimes before and after the change to using the spdlog compiled lib? I'm attaching two ninja_log files- one before and one after.

I haven't done any further analysis on these files other than to notice that the end-to-end compile time only seemed to go down by about 1.5mins. That being said, there's a couple stragglers that took quite some time to compile (ivf-flat for example) which don't yet have specializations so I think we can address those separately to reduce the compile times further.

ninja_log_spdlog.zip

Also attached are the patches for the changes to RAFT and RMM to get them to use spdlog's compiled binary.

spdlog_compiled_patches.zip

@ahendriksen
Copy link
Contributor Author

Good point. I have analyzed your ninja logs and share results below.

Some caveats:

  1. Are you sure that the ninja logs you sent are for a recent branch? I can still see the uint32_t specializations for the distance library.
  2. The ninja log files contained previous builds. Some files in both the headers and compiled log were compiled twice. The analysis below is only for the last (second) build.
  3. In the headers build, an additional 8 files were compiled that took an additional ~120 seconds (which is negligible).

As you point out, looking at total compile time is not always useful because of stragglers. Therefore, I have looked at the compile times per translation unit and the sum of the compile times per translation unit.

Summary of results:

  • The sum of compile times was reduced by 10%.
  • The median reduction in compile time per translation unit is 11 seconds
  • Translation units that already took a long time to compile therefore saw the smallest (relative) improvement

All results: (python script to generate is included below)

Sum of compile times for compiled spdlog:    36580.8 seconds
Sum of compile times for header-only spdlog: 40334.0 seconds

Compile times for paths only found in headers (seconds):
CMakeFiles/CORE_TEST.dir/test/core/nvtx.cpp.o                                      15.7
CMakeFiles/CORE_TEST.dir/test/core/span.cpp.o                                       4.4
CMakeFiles/CORE_TEST.dir/test/core/math_device.cu.o                                29.2
CMakeFiles/CORE_TEST.dir/test/core/operators_host.cpp.o                             3.9
CMakeFiles/CORE_TEST.dir/test/core/interruptible.cu.o                              24.1
CMakeFiles/CORE_TEST.dir/test/core/memory_type.cpp.o                                1.9
CMakeFiles/CORE_TEST.dir/test/core/span.cu.o                                       28.2
CMakeFiles/CORE_TEST.dir/test/core/math_host.cpp.o                                  4.1

Comparison of compile times between headers and compiled: 
path                                                                         header (s)  compiled (s)  change (s) change (%)
CMakeFiles/CORE_TEST.dir/test/core/logger.cpp.o                                    17.7           5.2       -12.5     -70.6%
CMakeFiles/CORE_TEST.dir/test/test.cpp.o                                           18.7           5.8       -12.9     -69.1%
CMakeFiles/CORE_TEST.dir/test/core/handle.cpp.o                                    22.6          11.3       -11.3     -50.1%
CMakeFiles/UTILS_TEST.dir/test/util/cudart_utils.cpp.o                             20.8          10.6       -10.2     -49.1%
CMakeFiles/UTILS_TEST.dir/test/util/pow2_utils.cu.o                                23.0          12.8       -10.2     -44.4%
istance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_double.cu.o   28.6          16.9       -11.7     -40.9%
distance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_float.cu.o   27.4          16.3       -11.1     -40.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/svd.cu.o                                    45.6          27.8       -17.8     -39.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/axpy.cu.o                                   41.8          25.8       -16.0     -38.2%
CMakeFiles/UTILS_TEST.dir/test/core/seive.cu.o                                     21.4          13.3        -8.0     -37.7%
CMakeFiles/CORE_TEST.dir/test/core/mdspan_utils.cu.o                               32.7          20.7       -11.9     -36.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmin.cu.o                                 42.9          27.5       -15.5     -36.1%
CMakeFiles/LABEL_TEST.dir/test/label/merge_labels.cu.o                             40.6          26.3       -14.3     -35.2%
CMakeFiles/MATRIX_TEST.dir/test/matrix/reverse.cu.o                                42.0          27.3       -14.7     -35.0%
CMakeFiles/CORE_TEST.dir/test/core/mdarray.cu.o                                    41.8          27.2       -14.6     -34.9%
distance/specializations/detail/l2_sqrt_unexpanded_double_double_double_int.cu.o   34.7          22.8       -12.0     -34.4%
istance/distance/specializations/detail/russel_rao_double_double_double_int.cu.o   35.8          23.6       -12.3     -34.2%
CMakeFiles/STATS_TEST.dir/test/stats/cov.cu.o                                      46.6          30.8       -15.9     -34.0%
LABEL_TEST                                                                          0.3           0.2        -0.1     -33.8%
CMakeFiles/STATS_TEST.dir/test/stats/rand_index.cu.o                               36.0          23.9       -12.0     -33.5%
t_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_float.cu.o   27.1          18.1        -9.0     -33.4%
CMakeFiles/UTILS_TEST.dir/test/util/device_atomics.cu.o                            25.2          16.8        -8.4     -33.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/divide.cu.o                                 41.7          27.8       -13.9     -33.3%
CMakeFiles/MATRIX_TEST.dir/test/sparse/spectral_matrix.cu.o                        38.7          25.9       -12.8     -33.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmax.cu.o                                 44.8          29.9       -14.8     -33.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/multiply.cu.o                               40.4          27.1       -13.3     -32.9%
CMakeFiles/LINALG_TEST.dir/test/linalg/strided_reduction.cu.o                      40.0          26.9       -13.2     -32.9%
CMakeFiles/MATRIX_TEST.dir/test/matrix/diagonal.cu.o                               43.4          29.3       -14.0     -32.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/dot.cu.o                                    41.1          27.8       -13.2     -32.3%
CMakeFiles/STATS_TEST.dir/test/stats/sum.cu.o                                      39.4          26.7       -12.6     -32.1%
rc/distance/distance/specializations/detail/kernels/gram_matrix_base_double.cu.o   35.8          24.3       -11.5     -32.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/cholesky_r1.cu.o                            38.8          26.4       -12.4     -32.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map_then_reduce.cu.o                        44.1          30.1       -14.1     -31.9%
CMakeFiles/STATS_TEST.dir/test/stats/mean_center.cu.o                              47.7          32.5       -15.2     -31.9%
ance/distance/specializations/detail/l2_unexpanded_double_double_double_int.cu.o   35.7          24.4       -11.3     -31.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/subtract.cu.o                               42.5          29.0       -13.4     -31.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/norm.cu.o                                   41.3          28.3       -13.0     -31.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/coalesced_reduction.cu.o                    43.9          30.3       -13.7     -31.1%
_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_double.cu.o   26.9          18.6        -8.3     -31.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_l1.cu.o                            47.7          33.0       -14.7     -30.8%
CMakeFiles/RANDOM_TEST.dir/test/random/permute.cu.o                                49.1          34.0       -15.1     -30.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig_sel.cu.o                                42.8          29.6       -13.2     -30.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/triangular.cu.o                             40.9          28.3       -12.6     -30.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_coo.cu.o                            39.7          27.5       -12.2     -30.7%
CMakeFiles/STATS_TEST.dir/test/stats/entropy.cu.o                                  39.6          27.5       -12.1     -30.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/spgemmi.cu.o                                37.5          26.0       -11.4     -30.5%
CMakeFiles/SPARSE_TEST.dir/test/sparse/row_op.cu.o                                 40.9          28.5       -12.4     -30.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_uint8_t.cu.o   39.8          27.8       -12.1     -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/stddev.cu.o                                   43.7          30.4       -13.2     -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/mean.cu.o                                     40.3          28.1       -12.1     -30.1%
ir/src/distance/distance/specializations/detail/l1_double_double_double_int.cu.o   34.5          24.2       -10.3     -29.8%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_float.cu.o   37.9          26.7       -11.2     -29.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemm_layout.cu.o                            44.4          31.3       -13.2     -29.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_canberra.cu.o                      49.0          34.5       -14.4     -29.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemv.cu.o                                   40.4          28.5       -11.9     -29.5%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_jensen_shannon.cu.o                46.7          33.0       -13.7     -29.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig.cu.o                                    44.1          31.2       -12.9     -29.3%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_uint8_t.cu.o   36.7          26.0       -10.7     -29.3%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hamming.cu.o                       49.6          35.1       -14.5     -29.2%
CMakeFiles/RANDOM_TEST.dir/test/random/make_blobs.cu.o                             48.9          34.6       -14.3     -29.2%
CMakeFiles/LINALG_TEST.dir/test/linalg/transpose.cu.o                              41.7          29.6       -12.1     -29.0%
istance/distance/specializations/detail/l2_unexpanded_float_float_float_int.cu.o   47.2          33.6       -13.6     -28.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_unexp.cu.o                     49.4          35.2       -14.2     -28.7%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hellinger.cu.o                     45.4          32.4       -13.0     -28.6%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_uint8_t.cu.o   36.6          26.2       -10.4     -28.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/ternary_op.cu.o                             43.9          31.5       -12.4     -28.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/add.cu.o                                    46.1          33.0       -13.0     -28.3%
dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_double.cu.o   36.3          26.1       -10.3     -28.2%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_transpose.cu.o                          39.1          28.1       -11.0     -28.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_rows_by_key.cu.o                     41.9          30.2       -11.7     -28.0%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_int8_t.cu.o   36.9          26.6       -10.3     -28.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_exp.cu.o                       47.6          34.4       -13.2     -27.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/sort.cu.o                                   42.9          31.1       -11.8     -27.4%
CMakeFiles/STATS_TEST.dir/test/stats/information_criterion.cu.o                    39.6          28.8       -10.8     -27.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/degree.cu.o                                 38.7          28.2       -10.5     -27.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/math.cu.o                                   46.9          34.2       -12.7     -27.1%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_eucsqrt_exp.cu.o                   47.1          34.4       -12.7     -27.0%
.dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_float.cu.o   45.8          33.4       -12.3     -26.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_russell_rao.cu.o                   48.2          35.3       -13.0     -26.9%
tance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int64.cu.o   50.7          37.1       -13.6     -26.8%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_int.cu.o                                50.0          36.7       -13.3     -26.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_float.cu.o        67.7          49.7       -18.0     -26.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_to_dense.cu.o                           37.5          27.5        -9.9     -26.5%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_double.cu.o       66.5          48.9       -17.6     -26.5%
ance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int64.cu.o   38.0          27.9       -10.1     -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/add.cu.o                                    39.4          29.0       -10.4     -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/unary_op.cu.o                               51.6          37.9       -13.6     -26.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_csr.cu.o                            47.3          34.9       -12.4     -26.3%
stance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int.cu.o   36.6          27.0        -9.6     -26.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/reduce.cu.o                                 44.1          32.5       -11.6     -26.3%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/pairwise_distance.cu.o      30.0          22.1        -7.9     -26.2%
MakeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_float.cu.o   37.3          27.5        -9.8     -26.2%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_minkowski.cu.o                     47.2          34.8       -12.3     -26.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/epsilon_neighborhood.cu.o             44.0          32.5       -11.5     -26.1%
istance/distance/specializations/detail/kernels/polynomial_kernel_float_int.cu.o   47.4          35.0       -12.3     -26.0%
ce/distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_int.cu.o   46.6          34.5       -12.1     -25.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_correlation.cu.o                   45.3          33.6       -11.7     -25.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_row_slice.cu.o                          36.4          27.0        -9.4     -25.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_kl_divergence.cu.o                 47.6          35.4       -12.3     -25.8%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/dist_coo_spmv.cu.o                     61.3          45.5       -15.8     -25.7%
CMakeFiles/STATS_TEST.dir/test/stats/kl_divergence.cu.o                            36.5          27.1        -9.4     -25.7%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_float.cu.o   35.0          26.0        -9.0     -25.7%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_int8_t.cu.o   36.0          26.7        -9.2     -25.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/binary_op.cu.o                              45.4          33.8       -11.6     -25.6%
istance/distance/specializations/detail/russel_rao_float_float_float_uint32.cu.o   41.1          30.6       -10.5     -25.5%
c/distance/distance/specializations/detail/russel_rao_float_float_float_int.cu.o   42.7          31.8       -10.8     -25.4%
CORE_TEST                                                                           0.7           0.5        -0.2     -25.1%
nce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_int8_t.cu.o   36.4          27.3        -9.1     -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/norm.cu.o                                   40.9          30.8       -10.2     -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/filter.cu.o                                 54.0          40.6       -13.3     -24.7%
CMakeFiles/STATS_TEST.dir/test/stats/weighted_mean.cu.o                            52.2          39.3       -12.9     -24.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/matrix.cu.o                                 44.9          33.8       -11.0     -24.6%
akeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_double.cu.o   37.5          28.3        -9.2     -24.6%
CMakeFiles/CORE_TEST.dir/test/core/operators_device.cu.o                           35.7          26.9        -8.8     -24.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_float.cu.o   36.0          27.2        -8.8     -24.5%
CMakeFiles/STATS_TEST.dir/test/stats/histogram.cu.o                                39.2          29.6        -9.6     -24.5%
CMakeFiles/RANDOM_TEST.dir/test/random/rmat_rectangular_generator.cu.o             39.8          30.1        -9.7     -24.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/power.cu.o                                  42.1          31.9       -10.2     -24.2%
CMakeFiles/RANDOM_TEST.dir/test/random/multi_variable_gaussian.cu.o                50.8          38.6       -12.2     -24.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map.cu.o                                    51.6          39.3       -12.4     -24.0%
CMakeFiles/MATRIX_TEST.dir/test/matrix/gather.cu.o                                 51.2          38.9       -12.2     -23.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_chebyshev.cu.o                     45.6          34.7       -10.9     -23.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/rsvd.cu.o                                   53.5          40.8       -12.7     -23.7%
CMakeFiles/RANDOM_TEST.dir/test/random/sample_without_replacement.cu.o             65.2          50.0       -15.1     -23.2%
CMakeFiles/STATS_TEST.dir/test/stats/dispersion.cu.o                               43.3          33.3       -10.0     -23.1%
stance/distance/specializations/detail/l2_expanded_double_double_double_int.cu.o   46.1          35.5       -10.6     -23.0%
CMakeFiles/LABEL_TEST.dir/test/label/label.cu.o                                    42.2          32.5        -9.6     -22.9%
b.dir/src/distance/distance/specializations/detail/l1_float_float_float_int.cu.o   45.2          34.9       -10.3     -22.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_cos.cu.o                           47.9          37.1       -10.9     -22.7%
e/distance/specializations/detail/l2_sqrt_expanded_double_double_double_int.cu.o   46.9          36.3       -10.6     -22.6%
CMakeFiles/STATS_TEST.dir/test/stats/contingencyMatrix.cu.o                        63.0          48.8       -14.2     -22.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/mean_squared_error.cu.o                     37.5          29.1        -8.4     -22.4%
CMakeFiles/STATS_TEST.dir/test/stats/adjusted_rand_index.cu.o                      71.0          55.3       -15.7     -22.1%
CMakeFiles/STATS_TEST.dir/test/stats/minmax.cu.o                                   41.7          32.5        -9.2     -22.1%
stance/distance/specializations/detail/kernels/polynomial_kernel_double_int.cu.o   33.8          26.4        -7.3     -21.7%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_discrete.cu.o                           50.7          39.6       -11.0     -21.7%
src/distance/distance/specializations/detail/kernels/gram_matrix_base_float.cu.o   44.2          34.6        -9.6     -21.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_cols_by_key.cu.o                     48.3          37.9       -10.5     -21.6%
istance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int.cu.o   46.1          36.3        -9.8     -21.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/sqrt.cu.o                                   40.3          31.8        -8.5     -21.2%
distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_uint32.cu.o   43.4          34.3        -9.2     -21.1%
CMakeFiles/RANDOM_TEST.dir/test/random/rng.cu.o                                    62.2          49.1       -13.1     -21.0%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_deserialize.cu.o     53.6          42.3       -11.3     -21.0%
CMakeFiles/RANDOM_TEST.dir/test/random/make_regression.cu.o                        54.4          43.0       -11.4     -20.9%
CMakeFiles/STATS_TEST.dir/test/stats/meanvar.cu.o                                  40.4          32.0        -8.4     -20.8%
ir/src/distance/distance/specializations/detail/l1_float_float_float_uint32.cu.o   45.0          35.9        -9.1     -20.3%
ance/distance/specializations/detail/l2_unexpanded_float_float_float_uint32.cu.o   43.0          34.4        -8.6     -19.9%
CMakeFiles/SOLVERS_TEST.dir/test/sparse/mst.cu.o                                   65.3          52.4       -12.9     -19.8%
CMakeFiles/STATS_TEST.dir/test/stats/trustworthiness.cu.o                          78.1          62.7       -15.4     -19.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/slice.cu.o                                  38.6          31.1        -7.5     -19.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/columnSort.cu.o                             57.7          46.4       -11.3     -19.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/faiss_mr.cu.o                         56.7          45.7       -11.1     -19.5%
CMakeFiles/STATS_TEST.dir/test/stats/r2_score.cu.o                                 70.4          56.9       -13.5     -19.2%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_build.cu.o          156.9         126.9       -30.0     -19.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector.cu.o                          69.5          56.3       -13.2     -19.0%
CMakeFiles/SOLVERS_TEST.dir/test/cluster/cluster_solvers_deprecated.cu.o           69.8          56.9       -13.0     -18.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_double.cu.o     39.9          32.7        -7.3     -18.2%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/fused_l2_min_arg.cu.o       37.1          30.4        -6.7     -18.2%
CMakeFiles/STATS_TEST.dir/test/stats/completeness_score.cu.o                       65.8          53.9       -11.9     -18.1%
CMakeFiles/STATS_TEST.dir/test/stats/homogeneity_score.cu.o                        63.2          51.8       -11.4     -18.0%
CMakeFiles/SOLVERS_TEST.dir/test/lap/lap.cu.o                                      51.8          42.6        -9.3     -17.9%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans.cu.o                              136.6         112.3       -24.3     -17.8%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/haversine.cu.o                        47.7          39.2        -8.5     -17.7%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_float.cu.o      40.0          33.0        -7.0     -17.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/symmetrize.cu.o                             52.5          43.4        -9.1     -17.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/normalize.cu.o                              84.0          69.5       -14.5     -17.3%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/cluster_solvers.cu.o                      86.0          71.4       -14.6     -17.0%
nce_lib.dir/src/distance/distance/specializations/detail/hellinger_expanded.cu.o   62.8          52.2       -10.6     -16.9%
aft_distance_lib.dir/src/distance/distance/specializations/detail/chebyshev.cu.o   71.4          59.4       -12.0     -16.8%
CMakeFiles/STATS_TEST.dir/test/stats/v_measure.cu.o                                60.2          50.5        -9.7     -16.2%
CMakeFiles/STATS_TEST.dir/test/stats/accuracy.cu.o                                 66.7          56.3       -10.4     -15.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/gram.cu.o                               56.1          47.5        -8.7     -15.4%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_serialize.cu.o       45.1          38.4        -6.7     -14.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/fused_l2_nn.cu.o                        75.3          64.2       -11.1     -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce.cu.o                                 68.3          58.2       -10.1     -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector_op.cu.o                       70.5          60.2       -10.3     -14.6%
libraft_distance.so                                                                 1.4           1.2        -0.2     -14.3%
CMakeFiles/CORE_TEST.dir/test/core/numpy_serializer.cu.o                           75.9          65.2       -10.7     -14.1%
nce_lib.dir/src/distance/distance/specializations/detail/hamming_unexpanded.cu.o   67.3          58.2        -9.2     -13.6%
CMakeFiles/STATS_TEST.dir/test/stats/mutual_info_score.cu.o                        61.0          52.7        -8.3     -13.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/linewise_op.cu.o                            80.1          69.3       -10.7     -13.4%
CMakeFiles/STATS_TEST.dir/test/stats/silhouette_score.cu.o                         71.2          62.1        -9.1     -12.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_smem_lut.cu.o  265.5         233.6       -32.0     -12.0%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_2d.cu.o  110.0          97.0       -13.0     -11.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_basediff.cu.o  245.2         216.5       -28.7     -11.7%
nce/distance/specializations/detail/jensen_shannon_double_double_double_int.cu.o  217.1         192.0       -25.1     -11.6%
STATS_TEST                                                                          0.5           0.5        -0.1     -11.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_basediff.cu.o  274.3         243.3       -31.0     -11.3%
CMakeFiles/MATRIX_TEST.dir/test/matrix/select_k.cu.o                              469.2         416.2       -53.0     -11.3%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/distance.cu.o                          81.7          72.5        -9.2     -11.3%
CMakeFiles/STATS_TEST.dir/test/stats/regression_metrics.cu.o                       67.6          60.3        -7.3     -10.8%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_3d.cu.o  116.7         104.2       -12.5     -10.7%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_basediff.cu.o  271.2         242.6       -28.7     -10.6%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans_balanced.cu.o                     206.4         184.5       -21.8     -10.6%
stance/distance/specializations/detail/l2_expanded_float_float_float_uint32.cu.o  186.3         166.7       -19.7     -10.6%
e/distance/specializations/detail/l2_sqrt_expanded_float_float_float_uint32.cu.o  184.9         165.4       -19.5     -10.5%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/brute_force.cu.o       260.7         233.4       -27.3     -10.5%
t_distance_lib.dir/src/distance/distance/specializations/detail/correlation.cu.o   79.6          71.3        -8.3     -10.4%
NEIGHBORS_TEST                                                                      1.0           0.9        -0.1     -10.1%
/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_uint8_t_uint64_t.cu.o 1043.6         937.8       -105.8     -10.1%
istance/distance/specializations/detail/kl_divergence_float_float_float_int.cu.o  197.7         178.0       -19.7     -10.0%
/distance/distance/specializations/detail/l2_expanded_float_float_float_int.cu.o  190.1         172.0       -18.2     -9.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/knn.cu.o                             367.5         332.7       -34.8     -9.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/norm.cu.o                                   74.8          67.7        -7.1     -9.5%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_half_fast.cu.o  259.8         235.8       -24.0     -9.3%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_build_index.cu.o     137.8         125.1       -12.7     -9.2%
ance/distance/specializations/detail/l2_sqrt_expanded_float_float_float_int.cu.o  180.8         164.3       -16.5     -9.1%
raft_distance_lib.dir/src/distance/distance/specializations/detail/canberra.cu.o  250.0         227.7       -22.3     -8.9%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ball_cover.cu.o                      131.1         119.5       -11.6     -8.9%
ance/distance/specializations/detail/kl_divergence_double_double_double_int.cu.o  258.0         235.1       -22.8     -8.9%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_fast.cu.o  275.1         251.5       -23.6     -8.6%
stance/neighbors/specializations/detail/ivfpq_compute_similarity_float_fast.cu.o  253.1         232.1       -21.0     -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint32_t.cu.o  836.5         767.3       -69.2     -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint64_t.cu.o  964.1         888.0       -76.2     -7.9%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_smem_lut.cu.o  278.9         258.2       -20.7     -7.4%
Files/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_int.cu.o  360.4         334.9       -25.6     -7.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/refine.cu.o                          163.3         152.2       -11.2     -6.8%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_false.cu.o  282.6         264.0       -18.6     -6.6%
ance/distance/specializations/detail/kl_divergence_float_float_float_uint32.cu.o  197.3         184.6       -12.8     -6.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_smem_lut.cu.o  299.8         280.6       -19.1     -6.4%
CLUSTER_TEST                                                                        0.4           0.3        -0.0     -6.3%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_smem_lut.cu.o  255.8         241.3       -14.4     -5.6%
s/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_int.cu.o  334.2         315.7       -18.6     -5.6%
CMakeFiles/UTILS_TEST.dir/test/util/bitonic_sort.cu.o                             234.7         221.7       -13.0     -5.5%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_knn_query.cu.o       880.9         832.3       -48.6     -5.5%
nce/distance/specializations/detail/jensen_shannon_float_float_float_uint32.cu.o  200.0         189.4       -10.6     -5.3%
akeFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_true.cu.o  287.1         272.0       -15.2     -5.3%
akeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_uint8_t_uint64_t.cu.o  747.2         708.5       -38.7     -5.2%
DISTANCE_TEST                                                                       0.4           0.3        -0.0     -5.0%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/knn_graph.cu.o         134.6         128.1        -6.6     -4.9%
s/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_int8_t_uint64_t.cu.o 1034.0         983.8       -50.2     -4.9%
s/raft_distance_lib.dir/src/distance/distance/specializations/detail/cosine.cu.o  336.6         320.3       -16.3     -4.9%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_all_knn_query.cu.o   941.0         897.8       -43.2     -4.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_float.cu.o  517.6         494.4       -23.2     -4.5%
eFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_false.cu.o  276.6         264.2       -12.3     -4.5%
libraft_nn.so                                                                       0.3           0.3        -0.0     -4.3%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_3d.cu.o  168.3         161.3        -7.0     -4.1%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_true.cu.o  287.2         275.4       -11.8     -4.1%
iles/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_uint.cu.o  354.7         340.2       -14.5     -4.1%
ance/distance/specializations/detail/lp_unexpanded_double_double_double_int.cu.o  622.1         597.9       -24.2     -3.9%
CMakeFiles/install.util                                                             0.3           0.2        -0.0     -3.8%
MakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_int8_t_uint64_t.cu.o  730.8         703.0       -27.8     -3.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_adj.cu.o                          224.7         216.3        -8.4     -3.7%
/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_uint.cu.o  333.7         321.3       -12.4     -3.7%
ance/distance/specializations/detail/lp_unexpanded_float_float_float_uint32.cu.o  487.5         470.5       -17.0     -3.5%
ir/src/distance/neighbors/specializations/detail/ivfpq_search_float_int64_t.cu.o  951.0         918.2       -32.8     -3.4%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/linkage.cu.o                            1166.6        1126.6       -40.0     -3.4%
stance/distance/specializations/detail/jensen_shannon_float_float_float_int.cu.o  387.6         375.7       -11.9     -3.1%
istance/distance/specializations/detail/lp_unexpanded_float_float_float_int.cu.o  482.4         469.2       -13.1     -2.7%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_fast.cu.o  279.4         272.1        -7.2     -2.6%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_basediff.cu.o  243.4         237.2        -6.1     -2.5%
CMakeFiles/SOLVERS_TEST.dir/test/linalg/eigen_solvers.cu.o                        815.6         795.9       -19.7     -2.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_uint8_t.cu.o  811.9         792.4       -19.5     -2.4%
nce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_int8_t.cu.o  797.5         781.1       -16.4     -2.1%
LINALG_TEST                                                                         1.4           1.4        -0.0     -1.6%
es/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_float_uint64_t.cu.o  991.4         975.7       -15.7     -1.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint32_t.cu.o  617.6         607.8        -9.8     -1.6%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_2d.cu.o  165.6         163.1        -2.4     -1.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_int64_t.cu.o   732.0         725.8        -6.2     -0.8%
akeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/connect_components.cu.o  362.0         359.1        -2.9     -0.8%
SPARSE_DIST_TEST                                                                    0.1           0.1        -0.0     -0.7%
UTILS_TEST                                                                          0.1           0.1        +0.0     +0.0%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/fused_l2_knn.cu.o                    911.6         915.8        +4.2     +0.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/selection.cu.o                       389.8         392.1        +2.3     +0.6%
SOLVERS_TEST                                                                        0.1           0.1        +0.0     +0.7%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_flat.cu.o                   1515.9        1533.0       +17.1     +1.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint64_t.cu.o  712.4         727.7       +15.3     +2.2%
CMakeFiles/UTILS_TEST.dir/test/util/integer_utils.cpp.o                             2.3           2.3        +0.1     +2.5%
SPARSE_NEIGHBORS_TEST                                                               0.2           0.2        +0.0     +7.1%
RANDOM_TEST                                                                         0.5           0.5        +0.0     +8.7%
MATRIX_TEST                                                                         0.4           0.5        +0.1     +13.3%
SPARSE_TEST                                                                         0.4           0.6        +0.2     +43.0%
from pathlib import Path
from collections import Counter

def parse_ninja_log(log_path):
    text = Path(log_path).read_text()
    start, end, mtime, path, cmd = list(zip(*[line.split("\t") for line in text.splitlines()[1:]]))
    start = list(map(int, start))
    end = list(map(int, end))
    seconds = [(e - s) / 1000. for e, s in zip(end, start)]
    mtime = list(map(int, mtime))

    return dict(
        start=start,
        end=end,
        seconds=seconds,
        mtime=mtime,
        path=path,
        cmd=cmd
    )

def discard_earlier_builds(d):
    prev_end = 0
    start_index = 0
    # end must be monotonically increasing. If we find and end value that is
    # lower than the end value on the previous row, we know that a new build has
    # started.
    for i, end in enumerate(d['end']):
        if end < prev_end:
            start_index = i
        prev_end = end

    return {k: v[start_index:] for k, v in d.items()}

def print_duplicates(d):
    # d is a dict returned by parse_ninja_log
    print(f"  # {'path':<60}     sec  cmd hash             sec  other cmd hash")
    dup_paths = sorted(set(p for p, count in Counter(d['path']).items() if count > 1))
    for i, p in enumerate(dup_paths):
        print(f"{i:3d} {p[-60:]:<60}: ", end="")
        for p_other, sec, cmd in zip(d['path'], d['seconds'], d['cmd']):
            if p == p_other:
                print(f"{sec:6.1f}  ({cmd})", end="")
        print()

compiled = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_compiled")
headers = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_headers")

compiled = discard_earlier_builds(compiled)
headers = discard_earlier_builds(headers)

# Print sum of compile times of each translation unit:
print(f"Sum of compile times for compiled spdlog:    {sum(compiled['seconds']):.1f} seconds")
print(f"Sum of compile times for header-only spdlog: {sum(headers['seconds']):.1f} seconds\n")

compiled_times = dict(zip(compiled['path'], compiled['seconds']))
headers_times = dict(zip(headers['path'], headers['seconds']))

print("Compile times for paths only found in headers (seconds):")
for p in set(headers['path']) - set(compiled['path']):
    print(f"{p[-80:]:<80} {headers_times[p]:6.1f}")


# Compare compile time per path between compiled and headers:
results = [(path, headers_times[path], compiled_times[path]) for path in compiled_times.keys()]
# Add relative change as a percentage
results = [(p, hsec, csec, csec - hsec, 100. * (csec / hsec - 1)) for p, hsec, csec in results]
# Sort by relative change
results = sorted(results, key=lambda x: x[4])

# Print results
print("\nComparison of compile times between headers and compiled: ")
print(f"{'path':<70}       header (s)  compiled (s)  change (s) change (%)")
for p, hsec, csec, diff, rel in results:
    print(f"{p[-80:]:<80} {hsec:6.1f}        {csec:6.1f}       {diff:+5.1f}     {rel:+4.1f}%")

@cjnolet
Copy link
Member

cjnolet commented Mar 12, 2023

I'm proposing that RMM allow the user to set whether the compiled or header-only spdlog target is used. I would honestly prefer if we just defaulted to compiled everywhere accept for users who "really" want fully header-only operation.

@ahendriksen
Copy link
Contributor Author

Thanks for looking into this Corey! I agree it is a good idea to consider using the precompiled spdlog library. If we go the precompiled route, would this require adding a runtime dependency on spdlog in the conda package as well? We currently do not seem to have a Conda dependency on spdlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants