poor performance compared to simple size, head, tail and inode pre filtering #4

trim21 · 2023-04-20T02:50:52Z

I have a 633.9GiB dataset in one fs and try this project on it. it looks it take more than 5 minutes ( I didn't finish it)

I write a simple single thread python script and just filter with file into (size, head bytes, tail bytes) then filter the same inode, there are only 9GiB files to do a full hash 🤔

Then I try this project, it seems that it need to hash 40GB files

INFO:pydupes:Size filter reduced file count to: 20 (40.2GiB)

The text was updated successfully, but these errors were encountered:

erikreed · 2023-09-21T19:57:40Z

Late response, but that log statement is only referring to the size filter, not necessarily what is hashed. I didn't add an explicit log on the sum of what needs to be hashed since that requires a pass over all the files to completed.

trim21 changed the title ~~poor performance compared to sinple size, head, tail and inode pre filtering~~ poor performance compared to simple size, head, tail and inode pre filtering Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor performance compared to simple size, head, tail and inode pre filtering #4

poor performance compared to simple size, head, tail and inode pre filtering #4

trim21 commented Apr 20, 2023

erikreed commented Sep 21, 2023

poor performance compared to simple size, head, tail and inode pre filtering #4

poor performance compared to simple size, head, tail and inode pre filtering #4

Comments

trim21 commented Apr 20, 2023

erikreed commented Sep 21, 2023