Skip to content

Enhance cosine similarity compution #267

@omarmahamid

Description

@omarmahamid

name: Feature request
about: Suggest an idea for this project

cosine similarity is in this way:

Image

instead of computing 3 dot times O(3N) we can enhance it the factor of linear time by 3 to be O(N)

I did a JMH benchmark:

package com.microsoft.semantickernel.data;

import com.microsoft.semantickernel.data.vectorsearch.VectorOperations;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.CommandLineOptionException;
import org.openjdk.jmh.runner.options.CommandLineOptions;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;


@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 2)
public class VectorOperationsBenchmark {

    List<Float> vector1;
    List<Float> vector2;

    @Setup(Level.Trial)
    public void setUp(){
        Random rand = new Random();
        vector1 = new ArrayList<>();
        vector2 = new ArrayList<>();

        for (int i = 0; i < 100; i++) {
            vector1.add(rand.nextFloat());
            vector2.add(rand.nextFloat());
        }
    }


    @Benchmark
    public void benchmarkVectorOperations() {
        VectorOperations.cosineSimilarity(vector1, vector2);
    }


    @Benchmark
    public void benchmarkVectorOperationsOptimized() {
        VectorOperations.optimizeCosineSimilarity(vector1, vector2);
    }

    public static void main(String[] args) throws RunnerException, CommandLineOptionException {
        Options opt = new OptionsBuilder()
            .parent(new CommandLineOptions(args))
            .timeUnit(TimeUnit.NANOSECONDS)
            .include(VectorOperationsBenchmark.class.getSimpleName())
            .build();

        new Runner(opt).run();
    }

}

and the result is:

Benchmark                                                     Mode  Cnt    Score   Error  Units
VectorOperationsBenchmark.benchmarkVectorOperations           avgt    2  202.292          ns/op
VectorOperationsBenchmark.benchmarkVectorOperationsOptimized  avgt    2   96.457          ns/op

it's a major performance for huge vectors.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions