This repository has been archived by the owner on Mar 20, 2019. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add get_file_info method and MovieHasher to search for movies based o…
…n the OpenSubtitles file hash
- Loading branch information
Showing
4 changed files
with
86 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# MovieHasher based on the one found here: | ||
# http://trac.opensubtitles.org/projects/opensubtitles/wiki/HashSourceCodes | ||
# | ||
# This will compute a unique hash for the movie that can be used for lookups on TMDB | ||
# The algorithm calculates size + 64bit chksum of the first and last 64k (even if they overlap because the file is smaller than 128k). | ||
# Make sure to uncomment and run the tests for this before making any changes | ||
module TMDBParty | ||
module MovieHasher | ||
CHUNK_SIZE = 64 * 1024 # in bytes | ||
|
||
def self.compute_hash(file) | ||
filesize = file.size | ||
hash = filesize | ||
|
||
# Read 64 kbytes, divide up into 64 bits and add each | ||
# to hash. Do for beginning and end of file. | ||
# Q = unsigned long long = 64 bit | ||
file.read(CHUNK_SIZE).unpack("Q*").each do |n| | ||
hash = hash + n & 0xffffffffffffffff # to remain as 64 bit number | ||
end | ||
|
||
file.seek([0, filesize - CHUNK_SIZE].max, IO::SEEK_SET) | ||
|
||
# And again for the end of the file | ||
file.read(CHUNK_SIZE).unpack("Q*").each do |n| | ||
hash = hash + n & 0xffffffffffffffff | ||
end | ||
|
||
sprintf("%016x", hash) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
require 'spec_helper' | ||
|
||
# Download the sample files from here: | ||
# http://trac.opensubtitles.org/projects/opensubtitles/wiki/HashSourceCodes | ||
# and uncomment these tests before making any changes | ||
describe TMDBParty::MovieHasher do | ||
# Not sure the best way to have tests for this without having real files to work with. | ||
it "should compute hash" do | ||
pending | ||
File.open('breakdance.avi') do |file| | ||
TMDBParty::MovieHasher.compute_hash(file).should == "8e245d9679d31e12" | ||
end | ||
end | ||
|
||
it "should compute hash on large file" do | ||
pending | ||
File.open('dummy.bin') do |file| | ||
TMDBParty::MovieHasher.compute_hash(file).should == "61f7751fc2a72bfb" | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters