Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'drive' is trying to re-push a subset of files when it should not #236

Closed
js-ojus opened this issue Jun 16, 2015 · 20 comments
Closed

'drive' is trying to re-push a subset of files when it should not #236

js-ojus opened this issue Jun 16, 2015 · 20 comments

Comments

@js-ojus
Copy link

js-ojus commented Jun 16, 2015

Description

Even though the files are identical in the local system and Google Drive, when drive push is run repeatedly, it sometimes reports that some local files need to be pushed.

Steps to reproduce

  1. Add several files to a mounted directory.
  2. Issue drive push.
  3. Add several more files to the directory.
  4. Issue drive push.

The last step may have to be repeated several times.

Expected

Pushed files that are identical in the local system and on Google Drive should not be pushed again. Checking md5sum verifies that the files are indeed identical.

Observed

A subset of the files get listed for push, and are marked M.

Operating system

Ubuntu 14.04.1 LTS 64-bit

'drive' version

0.2.3

@js-ojus
Copy link
Author

js-ojus commented Jun 16, 2015

While I do not receive the JSON error reported therein, probably #191 is related to this problem. I get additional copies of files when I agree to push again, too!

And, this problem could be specific to Linux/amd64, as well!

@js-ojus
Copy link
Author

js-ojus commented Jun 17, 2015

Hello @odeke-em!

I have verified that the subset of files that drive reports as M, is consistent across runs, even across reboots. I have also verified that MD5 checksums match in both locations for each of those files. Yet, the result is the same even when I issue the following.

$ drive push -ignore-checksum=false

@odeke-em
Copy link
Owner

Hello @js-ojus thank you for reporting this and welcome to drive. My apologies for the late reply. Thank you for the investigation. To start debugging what could be wrong, would you mind:

  • specifying the mimeTypes of the files, anything else about these files? Duplicates remotely? Being opened somewhere else? Being touched by a program periodically?
  • try to drive list, drive list --matches <prefix_pattern>, drive stat <target_dir>.

If interesting results don't come out of that and if your files are not confidential, could you please send to me a zip of them, if you don't mind?

@js-ojus
Copy link
Author

js-ojus commented Jun 18, 2015

@odeke-em : Thanks for (re-)welcoming me! I am an existing user :-). I even reported #92 a while ago.

Mime Types

Here are what file answers.

Images: JPEG image data, EXIF standard
Movies 1: RIFF (little-endian) data, AVI, 640 x 480, 10.00 fps, video: Motion JPEG, audio: uncompressed PCM (mono, 11024 Hz)
Movies 2: ISO Media, MPEG v4 system, 3GPP

Duplicates remotely?

Duplicates get created when I agree to re-push.

Opened somewhere else?

As the upload happened, I previewed some of them for verification. But, those were not the files marked for re-push!

Being touched by a program periodically?

No. In fact, the problem occurred as soon as the push operation concluded. I re-ran drive push, and it marked some files for pushing again.

@js-ojus
Copy link
Author

js-ojus commented Jun 18, 2015

Hello @odeke-em!

I have just now upgraded to 0.2.4 by compiling from source (latest commit). The behaviour remains the same. The same subset of files continues to be marked M for a re-push.

All of drive list, drive list --matches <> and drive stat <dir> function properly.

As I have already mentioned, the MD5 checksums match for both the versions (offline and online).

To answer your last question, the pictures and videos are of the family and extended family, so I'm afraid I cannot share them. My apologies!

@js-ojus
Copy link
Author

js-ojus commented Jun 18, 2015

Giving it some more thought, here is what I propose.

  • When the checksums match, push and pull should not attempt a re-transfer or an overwrite.
  • A corresponding informational message can be printed for each such skipped file.
  • The above print can be made optional, introducing a verbose mode.
  • Alternatively, a new mark can be introduced to indicate that the file has identical checksums, but potentially differing timestamps.
  • Files with identical checksums can still be overwritten using either -ignore-checksum=true or -force.

Please let me know your thoughts.

@odeke-em
Copy link
Owner

Aha gotcha @js-ojus my apologies, I hadn't recognized your name.
No worries if you can't share the photos. So please make sure as well at that issue #191 does not manifest. Also drive version would have given better information on the build.
drive already handles the suggestions in your thoughts, as you'll find in the implementation.

I can't seem to reproduce your issue but due to the recurrance of it, am suspecting something might not be registering with file sizes and matching checksums. I can't start to guess what the issue could be off the top of my head :( without any error messages or suspicions.
For a sledge-hammer solution, I'd suggest re-installing go and please make sure it is installed properly and not gccgo or xgo or an Ubuntu installation from an outdated PPA. I've encountered weird and undefined program behaviour when using gccgo and old installations for code that was easy to inspect and verify correctness.
Sorry that there doesn't seem to be progress in solving this issue.

@js-ojus
Copy link
Author

js-ojus commented Jun 18, 2015

@odeke-em : Here are my responses.

  • Go installation : The installation of go was done by directly downloading the binary distribution of 1.4.2 for Linux/amd64 from golang.org. [This is what I always do for go, by the way.]
  • drive version : This command somehow does not print full information.
drive version: 0.2.4
Commit Hash: <CURRENT_COMMIT>
Go Version: <GO_VERSION>
OS: <OS_INFO>

I shall try to look into the implementation, too. I have not done that so far, on account of my own schedule being full.

@odeke-em
Copy link
Owner

To get the full build information, do go get github.com/odeke-em/drive/drive-gen && drive-gen. The README also mentions how to do this.

@js-ojus
Copy link
Author

js-ojus commented Jun 18, 2015

Hello @odeke-em!

This afternoon (I live in India), I spent some time debugging this. I have narrowed down the proximal cause of my woes! For each of the files marked for re-push, no corresponding index file exists in .gd/indices.

Why are indices missing for those files specifically? I have not comprehended that yet. But, this - hopefully - gives you a clue.

@odeke-em
Copy link
Owner

That's cool, you are living in my time future. Here it is still morning :)
That's great! Actually sounds like a plausible problem. Now can you check what your permissions are in the indices path? Make sure you never ran init as sudo.

Please try out this patch to help in debugging

diff --git a/config/config.go b/config/config.go
index b7e11ae..25a0f67 100644
--- a/config/config.go
+++ b/config/config.go
@@ -103,7 +103,9 @@ func (c *Context) SerializeIndex(index *Index, p string) (err error) {
    if data, err = json.Marshal(index); err != nil {
        return
    }
-   return ioutil.WriteFile(IndicesAbsPath(p, index.FileId), data, 0600)
+   indicesPath := IndicesAbsPath(p, index.FileId)
+   fmt.Println("indicesPath: ", indicesPath)
+   return ioutil.WriteFile(indicesPath, data, 0600)
 }

 func (c *Context) Write() (err error) {
diff --git a/src/push.go b/src/push.go
index 9185b8b..24c8cc7 100644
--- a/src/push.go
+++ b/src/push.go
@@ -334,6 +334,7 @@ func (g *Commands) remoteMod(change *Change) (err error) {
        return
    }
    index := rem.ToIndex()
+   g.log.Logln("index", index)
    wErr := g.context.SerializeIndex(index, g.context.AbsPathOf(""))

    // TODO: Should indexing errors be reported?

@js-ojus
Copy link
Author

js-ojus commented Jun 19, 2015

@odeke-em : Some progress here. Even before I apply the patch, I have been able to reproduce the error!

After pushing a part of a new directory, drive encountered an authentication time-out error, because my ISP went down for several seconds. There was no crash.

When I re-ran drive, it completed the process, but reported some of the files to be re-pushed. I checked the indices directory, and indeed the indices for those files are missing.

I have not looked into the source yet, but it could probably be a case of the goroutine that encountered the error not flushing the pending writes of the indices, before exiting.

@odeke-em
Copy link
Owner

Good morning @js-ojus, now that is interesting. Thanks for the debugging and heads up. Am taking notes for future error handling as we can see this was a silent error.

@js-ojus
Copy link
Author

js-ojus commented Jun 29, 2015

Re-posting from #249 to re-establish the context

Creation of indices has taken place as intended. Good thus far!

Now, running drive push continues to mark the same set of files for a re-push. I have investigated it a little. The reason is that the 'mtime' on GDrive is set to whatever was the UTC at the time of the original push, not that of the pushed file itself.

To illustrate using an example, the local file has 'mtime' 2007-11-08 18:30:02, while the corresponding copy in GDrive has 2015-06-15 17:32:56 (the time when push uploaded this file).

For those files whose indices were created properly upon the first push, the local 'mtime' and the remote one match perfectly.

It will be good if push and pull gained a capability (sub-command) to update the target's (remote file's metadata in the case of push, local file's index in the case of pull) 'mtime' when MD5 sums match.

Alternatively, such a sub-command could exist for index itself. E.g. drive index push and drive index pull.

The overriding criteria here is that MD5 sums match, but 'mtime's do not! Other cases are already handled quite well.

Should you think that it is reasonable, I can create an issue for the same.

@odeke-em
Copy link
Owner

Thanks @js-ojus for reposting, I'll do the same here:

Unless some bug just crept, drive always updates the mtime derived from local file itself and this has
been working since I made this fork, you can even try this out yourself by moving in and old file, pushing
it upstream, making sure there are no more changes, then doing a drive touch <remote_file>, inspecting
with your browser, as well as stating and diffing. Then re-run drive push <remote_file> and the older
date that was present locally is re-updated. Also at the end of every pull or push index files are updated.

Please let me know what you think.

Btw were you able to apply the patch I had provided?

@js-ojus
Copy link
Author

js-ojus commented Jun 29, 2015

Do you sleep at all, @odeke-em? :-)

I shall get back to you shortly.

@odeke-em
Copy link
Owner

Haha, oh yes I do sleep, I'll actually head to bed soon after releasing drive v0.2.5 in a few ;) Thanks for the work and investigations.

@js-ojus
Copy link
Author

js-ojus commented Jun 29, 2015

@odeke-em : Glad to report that things work as they should. Upon actually pushing again, there was no further data transfer; only the remote 'mtime' was updated, as you stated above.

Now, re-issuing push does nothing, as intended.

Thus, in combination with PR #256, the normal push behaviour has made drive usable again for me!

Thanks for the great work! I now close this issue.

@js-ojus js-ojus closed this as completed Jun 29, 2015
@odeke-em
Copy link
Owner

Woot woot, this is good to know. Your investigation was very useful in helping get this resolved, thank you again and cheers!

@odeke-em odeke-em added this to the v0.2.5 milestone Jun 29, 2015
@odeke-em
Copy link
Owner

odeke-em commented Aug 6, 2015

Please get the latest from master. Uploading is faster, there is an retry mechanism on encountering retryable errors as well. Also after passing in drive push --v you'll see the step by step uploads, when done, when uploading etc. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants