Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Duplicate finder in python.... http://amrutlar.com/projects/python_d…
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
Basic requirements: 1) CPU that supports at least SSE3/SSSE3 for the hadd instruction for the SSE enabled version, if not can remove the #define SSE out of c_sim.c file 2) Cython 3) Python 2.6.6 onward 4) Numpy 5) PIL # Supported features: 1) File based hashing (md5) + byte by byte compare to verify the hash match 2) SIM (resample image to 32x32 then compute the difference) - if difference is less than a certain amount they are classified as similiar # Todo 1) SIFT/SUFT ? 2) various type of perceptual hashes 3) Wavlet.... 4) SVD 5) Various scaling algo, uses simple average value algo to scale down the image for the SIM algo/feature 6) Image filtering ex: a) SIM does not deal good with images that is overall a solid block of single/majority color b) SIM also does not deal good with line drawing (because it'll average out to be white) c) Need to find a way to filter/identify these types of image to filter it out if possible before feeding it to various image processing algos TODO: 1) Look in if i even need floats, may want to convert all data to uint8, including uint16 so then i can use the "MPSADBW" SSE4.1 instruction, which basically takes a whole bunch of integers and do "sum of absolute differences" 2) Look into uint8 loading instructions to get it loaded as fast as possible _mm_sad_epu8 Clean up the #pragma, don't need the array or other stuff in there anymore so makes sense to clean that stuff up 3) look into some sort of data structure that lets you identify/discard already compared duplicate to cut down on compares a bit if possible