Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add support for finding duplicates #11385

Open
Worgle123 opened this issue Feb 21, 2023 · 13 comments
Open

Feature: Add support for finding duplicates #11385

Worgle123 opened this issue Feb 21, 2023 · 13 comments

Comments

@Worgle123
Copy link

Worgle123 commented Feb 21, 2023

The Idea:

I have pretty massive numbers of files, and I have been constantly frustrated about the default duplicate finder, which only appears to find duplicate names. I would love to see this feature find it's way to Files, as at the moment, I have to use paid clunky software, which doesn't even appear to do a great job. Something that had a look at file contents or even just size would be amazing!

How I should work.

The feature should have several layers of scanning. It could be initiated either automatically if certain parameters (like name, or instead, name and then size) were activated, or within the right click menu of a folder/disk, or maybe within the right click menu if a user had selected 1 or more files (this would mean it would only scan for duplicates of the selected files).

It could start by finding all files with a not just identical, but similar name, and then also be set up to do a more resource intensive scan of file sizes find duplicates. It would only progress to scanning file sizes after it had found the ones with similar names. It would only scan the ones with similar names, and could have adjustable sensitivity to reduce it's performance impact. Maybe a system could automatically set this to a recommended sensitivity depending on file types. To give an example, when searching only through images, you generally have a decent sized difference between images, so sensitivity could be turned down. When looking through text documents though, they tend to have a relatively consistent file size, so close to maximum sensitivity would be needed. It may (if it is not to hard) be able to pick a recommended average score if there is some difference in file types. Even if it would not work if there was a large difference, it would help with the process if, say, there were just 1 or 2 rogue files in a scan. This would mean the system would not break just for the sake of 1 or 2 files.

It would also have to be able to judge whether they are in the same format or not, as otherwise there could be the accidental clash of (for an example: CR3 and .JPEG files).

If it found any matches, it could then progress to scanning the actual contents of the file, or (probably better) just opening it in a separate window for the user to judge the similarity. After the scan was completed, it would then open a grid/list area of all located duplicates, (where you could check/uncheck suspected duplicates) and ask the user for a next step (eg. delete/rename/move to a different location).

In a Nutshell

Will add an advanced duplicate finder.

Files Version

2.0.31.0

Windows Version

Windows 11, version 22H2

Comments

Could be both auto (when a file with the same name is added to a folder) and have an option to scan specific file areas.

@yaira2 yaira2 changed the title Advanced duplicate finder. Feature: Advanced support for finding duplicates Feb 21, 2023
@yaira2 yaira2 changed the title Feature: Advanced support for finding duplicates Feature: Add support for finding duplicates Feb 21, 2023
@Worgle123 Worgle123 changed the title Feature: Add support for finding duplicates Feature: Add better support for finding duplicates Feb 21, 2023
@Worgle123
Copy link
Author

Worgle123 commented Feb 21, 2023

Maybe it could also check for file size similarities if the names were not just identical, but similar?

@yaira2 yaira2 changed the title Feature: Add better support for finding duplicates Feature: Add support for finding duplicates Feb 21, 2023
@cinqmilleans
Copy link
Contributor

This is a very useful feature that brings real added value. Unfortunately, having tried several, it seems too complex to me to integrate. It takes too many options to handle. You have to display millions of results and keep a user-friendly interface. Stains can take several hours. It is software in itself, but unfortunately there are no good free ones.

@Worgle123
Copy link
Author

Worgle123 commented Feb 23, 2023

This is a very useful feature that brings real added value. Unfortunately, having tried several, it seems too complex to me to integrate. It takes too many options to handle. You have to display millions of results and keep a user-friendly interface. Stains can take several hours. It is software in itself, but unfortunately there are no good free ones.

Interesting that it would be so complex. Maybe further down the road though? It could be improved over time, and slowly built up, with more feautures. Any improvement would really be very welcome here! Remember, great things come from large amounts of time and effort. Thanks!

@cinqmilleans
Copy link
Contributor

Of course, but you have to be aware of the work to be done. You will need a powerful database. You shouldn't interfere with the rest of the application either (in terms of performance and UX). Maybe it would be better to make it a spinoff app.

@manfromarce
Copy link
Contributor

It may be reasonable in my opinion to implement only manual scans via a context menu item that opens a separate window (such as Properties) and limit to one simultaneous scan. In this way it would be helpful but would't impact performance and the app in general too much as automatic background scans would do. It would also be less complex, e.g. I don't think CCleaner's duplicates finder mantains a database.

@Worgle123
Copy link
Author

It may be reasonable in my opinion to implement only manual scans via a context menu item that opens a separate window (such as Properties) and limit to one simultaneous scan. In this way it would be helpful but would't impact performance and the app in general too much as automatic background scans would do. It would also be less complex, e.g. I don't think CCleaner's duplicates finder mantains a database.

Just saw this now (6 March) I agree that 1 simultaneous scan would be a better proposal. How would you propose it would scan for duplicates? I thought that It could have a size filter, and it could have varying levels of sensitivity. Maybe it could even set a computer proposed sensitivity.

Thanks!

@Josh65-2201
Copy link
Member

It could be possible to use a hash value to check. but that would only find exact duplicates in file data.

@manfromarce
Copy link
Contributor

It could offer different toggeable options such as:
✅ exact duplicates
✅ same extension, similar size
✅ same extension, similar name and size
but I don't know what would be a good default tolerance

@Worgle123
Copy link
Author

It could be possible to use a hash value to check. but that would only find exact duplicates in file data.

Could you explain this please? Excuse my ignorance, but what are hash values?

@Worgle123
Copy link
Author

It could offer different togglable options such as: ✅ exact duplicates ✅ same extension, similar size ✅ same extension, similar name and size but I don't know what would be a good default tolerance

I agree with you. As for tolerance, maybe it could have different sensitivities for separate extensions? .docx .txt and other such files would need more accurate scans, as their size is generally pretty similar, but with images, sizes tend to vary more. I believe that I already suggested something similar in the original suggestion.

@Jay-o-Way
Copy link
Contributor

For those who need it: CCleaner has this function. And i bet a number of others will have it too.

@yaira2
Copy link
Member

yaira2 commented May 30, 2023

It might be useful to notify the user of duplicates in the downloads folder.

@Worgle123
Copy link
Author

Worgle123 commented May 31, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 Planning stage
Development

No branches or pull requests

6 participants