Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor performance compared to simple size, head, tail and inode pre filtering #4

Open
trim21 opened this issue Apr 20, 2023 · 1 comment

Comments

@trim21
Copy link
Contributor

trim21 commented Apr 20, 2023

I have a 633.9GiB dataset in one fs and try this project on it. it looks it take more than 5 minutes ( I didn't finish it)

I write a simple single thread python script and just filter with file into (size, head bytes, tail bytes) then filter the same inode, there are only 9GiB files to do a full hash 🤔

Then I try this project, it seems that it need to hash 40GB files

INFO:pydupes:Size filter reduced file count to: 20 (40.2GiB)
@trim21 trim21 changed the title poor performance compared to sinple size, head, tail and inode pre filtering poor performance compared to simple size, head, tail and inode pre filtering Apr 21, 2023
@erikreed
Copy link
Owner

Late response, but that log statement is only referring to the size filter, not necessarily what is hashed. I didn't add an explicit log on the sum of what needs to be hashed since that requires a pass over all the files to completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants