Skip to content

cosine produces values larger than 1 #5

@rcannood

Description

@rcannood

Due to rounding errors, the similarity in the current implementation can be larger than 1. This is not really desired, as calculating for instance cos^-1 will then produce NaNs:

> acos(sim)
2 x 2 Matrix of class "dsyMatrix"
         [,1]     [,2]
[1,]      NaN 0.390607
[2,] 0.390607      NaN

Whereas by setting the maximum value to 1, everything will work as intended:

> sim@x[sim@x > 1] <- 1
> acos(sim)
2 x 2 Matrix of class "dsyMatrix"
         [,1]     [,2]
[1,] 0.000000 0.390607
[2,] 0.390607 0.000000

This can easily be tested with the following test:

test_that("cosine similarity can't be larger than 1", {
  x <- Matrix::Matrix(c(1, 2, 5, 3), ncol = 2, sparse = TRUE)
  sim <- simil(x, y = NULL, method = "cosine")
  expect_true(all(sim <= 1))
})

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions