Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: average_hash returns inconsistent hash lengths #21

Closed
dwachsmuth opened this issue Sep 7, 2021 · 6 comments
Closed

Bug: average_hash returns inconsistent hash lengths #21

dwachsmuth opened this issue Sep 7, 2021 · 6 comments

Comments

@dwachsmuth
Copy link

I have been attempting to use OpenImageR's hash functions in a package designed to facilitate large-scale image comparisons, and everything has been smooth with one major exception.

When I call average_hash with default settings, 95% of the time I get the expected length-64 hash result (in binary mode), while 5% of the time the hash is length-56 (i.e. one row or column of the underlying 8x8 matrix seems to have been dropped), and very occasionally the hash is length-49 (which I imagine means a row AND a column have been dropped).

By comparing these defective hashes with other images which are perceptually identical but bitwise different (and which get the full 64 bits in their hash), it is clear that the last 8 elements of the hash is disappearing. (I.e. the first 56 bits of the two hashes are the same, while one has 8 additional bits and the other simply ends.)

This does not happen with phash, which, in several hundred thousand different tests has always returned a length-64 hash.

I have no idea what is causing this, but I have a reproducible example involving a few images I have uploaded.

library(OpenImageR)

tmp_64 <- tempfile(fileext = ".jpg")
tmp_49 <- tempfile(fileext = ".jpg")

# These are a pair of files which illustrate the issue
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_64.jpg", destfile = tmp_64)
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_49.jpg", destfile = tmp_49)

img_64 <- OpenImageR::readImage(tmp_64)
img_49 <- OpenImageR::readImage(tmp_49)

grey_64 <- rgb_2gray(img_64)
grey_49 <- rgb_2gray(img_49)

# With default arguments, both of these calls should return a length-64 binary hash
hash_64 <- average_hash(grey_64, MODE = "binary")
hash_49 <- average_hash(grey_49, MODE = "binary")

# But this isn't true; the second image is only length-49
length(hash_64) == length(hash_49)
@mlampros
Copy link
Owner

mlampros commented Sep 7, 2021

hi @dwachsmuth and thanks for reporting this issue.

When you specify a hash_size of 8 then internally the gray image will be resized to (8 x 8), this might be a problem with the nearest resize method (the way it works is described in this SO thread)
The function internally calls the 'floor()' so it rounds down by default and this seems to be a problem if the 'width' and 'height' are quite small combined with the input image size (odd number of rows or columns). I could use 'ceil()' but I'd like that the code is similar to how the algorithm works
Therefore I've modified the function so that the image dimensions match and I've added a warning in case the output image dimensions are not the same with the input 'width' and 'height' parameters specified by the user. I'll be glad if you test it with all your images and report back if there are any issues before I submit the updated version to CRAN (next week highly probable)
As an alternative you could also use the 'bilinear' method which in your case returns the correct output dimensions (see the following code snippet). To test the following code you have to install the latest version from github using

remotes::install_github('mlampros/OpenImageR')

require(OpenImageR)

tmp_64 <- tempfile(fileext = ".jpg")
tmp_49 <- tempfile(fileext = ".jpg")

# These are a pair of files which illustrate the issue
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_64.jpg", destfile = tmp_64)
download.file("https://upgo.lab.mcgill.ca/resources/hash_test_49.jpg", destfile = tmp_49)

img_64 <- OpenImageR::readImage(tmp_64)
img_49 <- OpenImageR::readImage(tmp_49)

grey_64 <- rgb_2gray(img_64)
grey_49 <- rgb_2gray(img_49)

# With default arguments, both of these calls should return a length-64 binary hash
hash_64 <- average_hash(grey_64, MODE = "binary", hash_size = 8, resize = "nearest")
dim(hash_64)

hash_64_bil <- average_hash(grey_64, MODE = "binary", hash_size = 8, resize = "bilinear")
dim(hash_64_bil)

hash_49 <- average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "nearest")    # returns the correct output but gives a warning
# hash_49 <- suppressWarnings(average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "nearest"))
dim(hash_49)

hash_49_bil <- average_hash(grey_49, MODE = "binary", hash_size = 8, resize = "bilinear")
dim(hash_49_bil)


lst_dims = list(dim(hash_64), dim(hash_64_bil), dim(hash_49), dim(hash_49_bil))
all(unlist(lapply(lst_dims, function(x) lst_dims[[1]] == x)))

@dwachsmuth
Copy link
Author

Hi @mlampros , many thanks for the speedy reply! Your explanation makes a lot of sense, but I'm unable to install the dev version of the package, because I keep running into a compile error which I haven't been able to circumvent. I'm happy to wait until the CRAN binary is ready, and then I will re-run my code on my test image set and report results to you.

@mlampros
Copy link
Owner

mlampros commented Sep 9, 2021

I'll write a few tests for the modified code and I'll submit to CRAN the new version. I'll do that in the next 2 or 3 days. Once the new version is accepted by CRAN I'll notify you.

@mlampros
Copy link
Owner

@dwachsmuth I've submitted the updated version a week ago and today it seems that testing was completed in all different OS's so you can download the new version from CRAN.

@github-actions
Copy link

This is Robo-lampros because the Human-lampros is lazy. This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. Feel free to re-open a closed issue and the Human-lampros will respond.

@github-actions github-actions bot added the stale label Sep 29, 2021
@github-actions
Copy link

github-actions bot commented Oct 6, 2021

This issue was automatically closed because of being stale. Feel free to re-open a closed issue and the Human-lampros will respond.

@github-actions github-actions bot closed this as completed Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants