Google Drive external storage has some missing files in his root folder #16267

Closed
jvillafanez opened this Issue May 12, 2015 · 26 comments

Projects

None yet

8 participants

@jvillafanez
Contributor

Steps to reproduce

No steps. Just checked the files in google drive and in ownCloud and some of them were missing (more files in Google Drive than in ownCloud)

Expected behaviour

All files are shown. (Any known restrictions?)

Actual behaviour

Some files in the root folder are missing in ownCloud

Server configuration

Operating system: ubuntu 14.04

Web server: apache 2.4.7

Database: mysql

PHP version: 5.5.9

ownCloud version: OC 8.1 beta 1

Updated from an older ownCloud or fresh install: fresh install

Additional notes

  • I've moved one of the missing files from Google Drive to another folder, and the file appeared in ownCloud inside that folder.
  • I've also uploaded a file both from ownCloud and from Google Drive and there wasn't any issue.
  • I haven't read about any limitation in the official ownCloud documentation (I might be overlooked it)
@jvillafanez jvillafanez added this to the 8.1-current milestone May 12, 2015
@DeepDiver1975
Member

possible explanation: gdrive allows multiple files in the same folder with the same name - ownCloud doesn't.

I'm setting this to 8.2 where we will focus on external storages @karlitschek @MTRichards

@DeepDiver1975 DeepDiver1975 modified the milestone: 8.2-next, 8.1-current May 12, 2015
@jvillafanez
Contributor

That's probably the case for some files, but anyway I'd expect that at least one of those files appears, which isn't the case. In addition, there is another file missing that doesn't fit in the explanation.

The files are old (from 2013 or older) without any activity I think.

@MTRichards
Contributor

ok.

@DeepDiver1975
Member

ok.

thx

@MorrisJobke MorrisJobke added the bug label Jun 25, 2015
@PVince81
Collaborator

@jvillafanez were these google docs files (office files) or generic files ?
I think I saw some issues that google docs' own file sometimes cannot be retrieved properly, or had no extension.

@jvillafanez
Contributor

At the day of today, the files missing are the one that appear several times in gdrive, as pointed out on #16267 (comment)

@Xenopathic
Member

What are we going to do about this? ownCloud assumes the file path is unique, changing it to use arbitrary IDs, potentially for files with the same path, would completely change how ownCloud handles files. And we can't just arbitrarily select one of the duplicate files to show, that'd be madness

@DeepDiver1975
Member

What are we going to do about this? ownCloud assumes the file path is unique, changing it to use arbitrary IDs, potentially for files with the same path, would completely change how ownCloud handles files. And we can't just arbitrarily select one of the duplicate files to show, that'd be madness

I honestly vote for won't fix in this situation - at least for now.

@cmonteroluque @karlitschek @MTRichards I guess we don't want to rearchitect our file system handling just because of gdrive???

@karlitschek
Member

let's look into that while looking into flysystem later

@DeepDiver1975
Member

let's look into that while looking into flysystem later

so 9.0 then - right?

@karlitschek
Member

yes

@DeepDiver1975 DeepDiver1975 modified the milestone: 9.0-next, 8.2-current Sep 24, 2015
@MTRichards
Contributor

yeah, we have no choice here.

@cmonteroluque
Contributor

based on our conversation yesterday, yes

@PVince81
Collaborator

Flysystem doesn't have a GDrive support yet, see thephpleague/flysystem#192

@PVince81
Collaborator

From what I see, it seems that whoever developed the GDrive plugin already knew about duplicate files: https://github.com/owncloud/core/blob/v8.2.2/apps/files_external/lib/google.php#L123 and https://github.com/owncloud/core/blob/v8.2.2/apps/files_external/lib/google.php#L278

So there's a chance that this has worked in the past and might need some minor tweaks.

Ideally I'd say the best would be to only return the file with the most recent timestamp.

@PVince81 PVince81 self-assigned this Feb 11, 2016
@PVince81
Collaborator

I'll give it a go.

@PVince81
Collaborator

Steps to reproduce:

  1. Create a local file "bacon.txt" with some contents (mine is from http://baconipsum.com)
  2. Upload the file directly to Google drive
  3. Edit the local file "bacon.txt", save
  4. Upload that file again to Google drive (it doesn't ask you to overwrite and simply shows it side by side with the old one!)
  5. Configure GDrive ext storage as "/GoogleDrive" in ownCloud
  6. Using Webdav/cadaver, access "/GoogleDrive" and list the files

What I see in the log:

{"reqId":"CbgzsQ8JQTDvQO59eIAr","remoteAddr":"127.0.0.1","app":"files_external","message":"Ignoring duplicate file name: bacon.txt on Google Drive for Google user: Vincent Petry","level":1,"time":"2016-02-11T11:36:13+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/GoogleDrive\/"}
{"reqId":"JSKkQZ7rnpPXzcPjb3RC","remoteAddr":"127.0.0.1","app":"files_external","message":"Ignoring duplicate file name: bacon.txt on Google Drive for Google user: Vincent Petry","level":1,"time":"2016-02-11T11:36:23+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/GoogleDrive\/"}
{"reqId":"JSKkQZ7rnpPXzcPjb3RC","remoteAddr":"127.0.0.1","app":"files_external","message":"Ignoring duplicate file name: bacon.txt on Google Drive for Google user: Vincent Petry","level":1,"time":"2016-02-11T11:36:25+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/GoogleDrive\/"}

Ok, so the duplicate file was detected and possibly discarded on purpose.

Digging into the code...

@PVince81
Collaborator

Yep, the file is ignored on purpose: https://github.com/owncloud/core/blob/v8.2.2/apps/files_external/lib/google.php#L284

Do we want to change the behavior to make it use the most recent files ?
It is not guaranteed that whoever uploaded the duplicate file to Google Drive intended it to be the exact same file... so there is a slight risk of messing up the user's data.

@PVince81
Collaborator

From googling last time it seems that Google's own sync client would download duplicate files as "file (1).txt" "file (2).txt" to the local filesystem. However ownCloud cannot do such mapping because we don't support file name aliases.

I don't think we should implement the "use most recent file" logic because some people might purposefully have multiple versions of files on GDrive. There is also a risk of overwriting the wrong one when the user changes it in ownCloud.

I'd vote for either won't fix like @DeepDiver1975 said, or move to backlog in case we ever implement file name alias support.

@PVince81
Collaborator

AHA, I think I found something.

Actually if you have a file name with duplicates, like "bacon.txt" then the skipping logic will also skip the next file in the folder "hotpot.jpg" even though that file has no duplicate.

And the log says:

{"reqId":"ujExDXPlyj\/AMPpQ8kQH","remoteAddr":"127.0.0.1","app":"PHP","message":"Undefined offset: 1 at \/srv\/www\/htdocs\/owncloud\/lib\/private\/files\/stream\/dir.php#44","level":3,"time":"2016-02-15T14:38:51+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/"}

Just observed this on 8.1.5.

@PVince81
Collaborator

Cannot reproduce on 8.2.2.

So the issue is as follows:

  1. Create two files "bacon.txt" in Google Drive
  2. Upload more files that alphabetically follow "b", for example "h.txt", "t.txt", etc
  3. Setup GDrive in ownCloud
  4. Browse the folder

In 8.1.5: the next file after "b" is missing, in my example "h.txt" would be missing from the listing
In 8.2.2: all files appear (except the duplicate ones, by design)

So the bug is that it was skipping more files than it was supposed to.
Let's see whether it's an easy backport...

@PVince81
Collaborator

Okay, that PR here seems to fix the issue: #14779
This PR touches all external storages so I'm not willing to backport this.

However this pointed out to another possible 8.1 specific solution: it looks like opendir or our dir wrapper is having trouble with arrays where some indices are missing (due to removing duplicates).

I tried reindexing the array before passing it there and it works. PR will follow (8.1 specific!)

@PVince81
Collaborator

8.1 specific fix is here: #22409

Other versions are fine.
I think this is the original issue that @jvillafanez was talking about. Not the fact that we ignore duplicate files by design.

@PVince81
Collaborator

To clarify: this is not about fixing the duplicate files. By design we skip them. However, the mechanism that does the skipping was skipping too much. The fix for that is in #22409 for 8.1, it works properly in OC >= 8.2 already.

Since this is specific to 8.1, moving the milestone to 8.1.6.

@cmonteroluque

@cmonteroluque
Contributor

@PVince81 thanks

@PVince81
Collaborator
PVince81 commented Mar 7, 2016

Fixed through #22409

@PVince81 PVince81 closed this Mar 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment