Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: warn about duplicate files names #107

Closed
vrusinov opened this issue Mar 17, 2015 · 10 comments
Closed

Feature request: warn about duplicate files names #107

vrusinov opened this issue Mar 17, 2015 · 10 comments
Assignees

Comments

@vrusinov
Copy link

I understand difficulty of handing duplicate files and agree that it might be not worth the effort.

However, there are simple things we can do to make life with duplicate files easier. First thing we could do is detect duplicate files on google drive's side and warn user about them before/after pull. This way it would be at least clear why drive trying to pull same files again & again.

@odeke-em
Copy link
Owner

Yes in deed, I've had to reiterate this to lots of issue reporters so I think this is a good request if drive can warn them too.

@odeke-em odeke-em self-assigned this Mar 20, 2015
@raintonr
Copy link

I like the philosophy of not trying to do 2 way sync, but think some kind of dupe detection here is a must. The use case I have in mind is for backup of photos to Google Drive. So there would always be a push from desktop/laptop to drive.

However, I just want to be able to run a single command and have all the new images I've placed across all picture folders uploaded to drive. Duplicates should be skipped.

Ah... aside from the case where my RAW photo files will never change, but the XMP (the file where non-destructive edit changes are written) may. So in reality I need a push, ignoring .NEF files if they exist, but overwriting .XMP files if they have changed.

Is that the kind of think this client is designed for or should I look elsewhere?

BTW, on the subject of duplicate detection, does Google allow for finding MD5 or other hash from a file on their side? So if the client has a file that looks the same a quick hash check should confirm it is actually the same.

@odeke-em
Copy link
Owner

@raintonr I think you are talking of an actual duplicate file which could be manifested more than once despite having different names, but the checksum being the same. And yes, Google allows for checksum querying. This issue is related to clashing file names/paths since Google Drive allows files with same names in the same folder but your file system does not. This means that any file name clashes within the same directory result in overwritting by the latest file.

@odeke-em odeke-em changed the title Feature request: warn about duplicate files Feature request: warn about duplicate files names Mar 20, 2015
@raintonr
Copy link

@odeke-em, no not really. When I push my 'Pictures' tree I'd like this to happen:

  • If GDrive has a file with the same checksum & name as the one I'm pushing in the same folder then do nothing.
  • If GDriive has a file with the same name but different checksum then overwrite their copy with mine. Optionally don't do this if mine is older than theirs and give a warning.
  • If GDrive has a file with same checksum & name as the one I'm pushing but it's in a different folder on their side then move their copy to the location I'm pushing to, to avoid a second upload. Optionally make that move a copy.

Does that sound reasonable?

@vrusinov
Copy link
Author

What you are talking about is wider issue. My feature request is much more narrow and I'd like to keep it this way.

What I'd like to see is when you pull (drive pull somedir), drive should warn if remote somedir have files with the same name.
This is only about pull, and only for better visibility.

Handing duplicate files during push is much more complicated and risky due to risk of data loss. I'd rather we do smaller steps.

@odeke-em
Copy link
Owner

@raintonr

+ If GDrive has a file with the same checksum & name as the one I'm pushing in the same folder then do nothing.

drive already does this

If GDriive has a file with the same name but different checksum then overwrite their copy with mine. Optionally don't do this if mine is older than theirs and give a warning.

drive already does this. The option depends on if there is conflict.

If GDrive has a file with same checksum & name as the one I'm pushing but it's in a different folder on their side then move their copy to the location I'm pushing to, to avoid a second upload. Optionally make that move a copy.

That is mode that would have to be integrated, but then what happens to people that have performed a drive copy or actually might purposefully have the same file in different directories? Also you'll have to note that this mode will involve some heavy work because there is no guessing how long your checksum-ing will take.

This is why I was saying that you are talking of a different type of duplicate file handling. Yours is actually broadcasting to drive a checksum and then performing a move. All @vrusinov is asking for is handling same name which is what the title says.

@raintonr
Copy link

@odeke-em, thank you for the clarification. All good then (for me). I just suggested the other options that might be useful for others.

@vrusinov, sorry for the slight hijack.

@odeke-em
Copy link
Owner

@raintonr actually it will be useful. A couple of iterations from now I'll be providing this mode as an option for pull and push. I have been fighting duplicate files on file systems for the last 3 years so your suggestion strikes close to home. If I don't release soon, make sure to remind me with a fresh issue explaining this mode. Otherwise, please watch drive for a couple of releases from now ;)

@odeke-em
Copy link
Owner

odeke-em commented May 1, 2015

Addressed by #166, please reopen if persists.

@gitnito
Copy link

gitnito commented Aug 24, 2015

Since most of these clashes happen when you have a Google document of the same name as a regular file (like .txt, .pdf, etc.), why not handle it like the way the Windows gdrive client does? It basically saves the Google doc file on the local filesystem as "filename.gdoc". Google docs files are easy to detect anyway, since they are just links to a URL. In fact, it is precisely the fact they are just links that these clashes happen. For example, lets say you have a gdoc with very important data, that you want to have access to when you are offline. One way I deal with this is by saving a PDF version of it in the same folder as the gdoc, so that I can read the PDF from my local drive when offline. This will cause a name clash. The solution is not perfect, since you may have two non-gdoc files in drive with the same name, but I bet it will solve 99% of the cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants