Skip to content

πŸ† The winner code for ACM SIGMOD 2023 Programming Contest, can build highly accurate KNN graphs efficiently

Notifications You must be signed in to change notification settings

for0nething/SIGMOD-Programming-Contest-2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SIGMOD Programming Contest 2023

This is the code for champion solution of ACM SIGMOD programming competition 2023. The core idea of this approach can be found in the poster.

Using this code, an almost fully accurate K-nearest-neighbor graph can be built for tens of millions of high-dimensional vectors in a very short time.

Description

The code is based on the open-source implementation of NN-descent, KGraph, and has greatly improved both the effectiveness and efficiency of the original algorithm. This enhanced code was then leveraged to tackle the contest problem.

Folder Structure

.
β”œβ”€β”€ nn-descent                          # An optimized implementation of nn-descent
β”œβ”€β”€ knn-construction-kgraph.cc          # Code for solving the contest problem
β”œβ”€β”€ io.h                                # Load dataset and save the result
β”œβ”€β”€ run.sh                              # Shell to automatically compile and run the code
β”œβ”€β”€ makefile                            # The makefile to compile the contest code
└── README.md                           

Quick Start

  • Step 1: Install necessary dependencies: IntelMKL, Eigen, openMP, and Boost.

    • Intel MKL: The 2023.0 version was used in the contest. Some related environment variables need to be properly set. I list the ones I used here as a reference:
      • export LD_LIBRARY_PATH=/home/jiayi/disk/mklBLAS/mkl/latest/lib/intel64:$LD_LIBRARY_PATH
      • export C_INCLUDE_PATH=/home/jiayi/disk/mklBLAS/mkl/latest/include:$C_INCLUDE_PATH
      • export MKLROOT=/home/jiayi/disk/mklBLAS/mkl/latest/
      • export MKL_LIBRARY_DIRS=/home/jiayi/disk/mklBLAS/mkl/latest/lib/intel64
    • Eigen: Eigen is a C++ template library for linear algebra. It uses a MPL2 license. Since it is a header-only library, you can easily install it following the steps on its website.
    • OpenMP: We used the OpenMP 4.5 in the contest. For Ubuntu, it should have been pre-installed.
    • Boost: Boost provides free peer-reviewed portable C++ source libraries. We used the Boost 1.65.1 in the contest.
    • Note: The parameters in makefile and /nn-descent/CMakeLists.txt that are related to the paths of MKL and Eigen need to be set according to the actual situation.
  • Step 2: Install the optimized nn-descent implementation.

    • cd nn-descent
    • ./autoInstall.sh
  • Step 3: Compile and run the code to tackle the contest problem. Please first switch to the root directory of this code.

    • ./run.sh
    • Note that the compiled program accepts the same arguments as the example code provided in the contest. So you can also change datasets in the same way.
    • The parameters that obtained the best result in the contest are already set in knn-construction-kgraph.cc. So you do not need to change anything.

Results

Build 100-NN graph for 10M 100-dimensional vectors in the contest. Test on Azure Standard F32s_v2.

Recall: 0.987

Building Runtime: 1854s

About

πŸ† The winner code for ACM SIGMOD 2023 Programming Contest, can build highly accurate KNN graphs efficiently

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages