Weird behavior during rsync and failure #100

mathieuruellan · 2016-10-12T21:03:09Z

Hello, i'm using the last version of svfs (0.8.2 i386) on a debian stable.
I mount a hubic fuse like this

hubicfuse on /mnt/hubic type fuse.svfs (rw,nosuid,nodev,relatime,user_id=1000,group_id=100,default_permissions,allow_other,user=mathieu)

I'm doing a rync like this

rsync -rtW --inplace --progress /mnt/readynas/backup/biniou/2016.10/ /mnt/hubic/backup/biniou/2016.10/

The command fails. After a timeout error (debug mode), it continues with a weird upload rate (my bandwidth is limited to 600KB/s).

DEBU[2016-10-12T18:14:12+02:00] -> [ID=0xc43] Write 131064           source=fuse
DEBU[2016-10-12T18:14:12+02:00] <- Write [ID=0xc44 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 131072 @268042232 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:12+02:00] -> [ID=0xc44] Write 131072           source=fuse
DEBU[2016-10-12T18:14:12+02:00] <- Write [ID=0xc45 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 8 @268173304 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:12+02:00] -> [ID=0xc45] Write 8                source=fuse
DEBU[2016-10-12T18:14:12+02:00] <- Write [ID=0xc46 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 131064 @268173312 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:13+02:00] -> [ID=0xc46] Write 131064           source=fuse
DEBU[2016-10-12T18:14:13+02:00] <- Write [ID=0xc47 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 131072 @268304376 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:13+02:00] -> [ID=0xc47] Write 131072           source=fuse
DEBU[2016-10-12T18:14:13+02:00] <- Write [ID=0xc48 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 8 @268435448 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:13+02:00] -> [ID=0xc48] Write 8                source=fuse
DEBU[2016-10-12T18:14:13+02:00] <- Write [ID=0xc49 Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 131064 @268435456 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:32+02:00] -> [ID=0xc49] Write error=EIO: Timeout when reading or writing data  source=fuse
DEBU[2016-10-12T18:14:32+02:00] <- Flush [ID=0xc4a Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x1 fl=0x0 lk=0x2bd9a4e794970fc6  source=fuse
DEBU[2016-10-12T18:14:32+02:00] -> [ID=0xc4a] Flush                  source=fuse
DEBU[2016-10-12T18:14:32+02:00] <- Release [ID=0xc4b Node=0x5 Uid=0 Gid=0 Pid=0] 0x1 fl=OpenReadOnly rfl=0 owner=0x0  source=fuse
DEBU[2016-10-12T18:14:32+02:00] -> [ID=0xc4b] Release                source=fuse
DEBU[2016-10-12T18:14:32+02:00] <- Write [ID=0xc4c Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 131064 @268435456 fl=WriteLockOwner lock=3159737928114442182 ffl=OpenWriteOnly  source=fuse
DEBU[2016-10-12T18:14:47+02:00] -> [ID=0xc4c] Write error=EIO: Timeout when reading or writing data  source=fuse
DEBU[2016-10-12T18:14:47+02:00] <- Flush [ID=0xc4d Node=0x5 Uid=1000 Gid=100 Pid=7479] 0x2 fl=0x0 lk=0x2bd9a4e794970fc6  source=fuse
DEBU[2016-10-12T18:14:47+02:00] -> [ID=0xc4d] Flush                  source=fuse
    268,795,904  25%   79.31kB/s    2:49:08  DEBU[2016-10-12T18:14:47+02:00] <- Release [ID=0xc4e Node=0x5 Uid=0 Gid=0 Pid=0] 0x2 fl=OpenWriteOnly rfl=0x2 owner=0xc8253648  source=fuse
DEBU[2016-10-12T18:14:47+02:00] -> [ID=0xc4e] Release                source=fuse
  1,073,741,824 100%    2.05MB/s    0:08:18 (xfr#1, to-chk=61/63)
2016.10.01.FULL.10.dar
  1,073,741,824 100%   34.95MB/s    0:00:29 (xfr#2, to-chk=60/63)
2016.10.01.FULL.11.dar
  1,073,741,824 100%   36.16MB/s    0:00:28 (xfr#3, to-chk=59/63)
2016.10.01.FULL.12.dar
  1,073,741,824 100%   35.74MB/s    0:00:28 (xfr#4, to-chk=58/63)
2016.10.01.FULL.13.dar
  1,073,741,824 100%   35.59MB/s    0:00:28 (xfr#5, to-chk=57/63)
2016.10.01.FULL.14.dar
  1,073,741,824 100%   38.07MB/s    0:00:26 (xfr#6, to-chk=56/63)
2016.10.01.FULL.15.dar
  1,073,741,824 100%   39.27MB/s    0:00:26 (xfr#7, to-chk=55/63)
2016.10.01.FULL.16.dar
  1,073,741,824 100%   38.56MB/s    0:00:26 (xfr#8, to-chk=54/63)
2016.10.01.FULL.17.dar
  1,073,741,824 100%   39.44MB/s    0:00:25 (xfr#9, to-chk=53/63)


...

DB.PIWIGO.2016.10.12.sql.gz
             20 100%    0.03kB/s    0:00:00 (xfr#62, to-chk=0/63)
rsync: write failed on "/mnt/hubic/backup/biniou/2016.10/2016.10.01.FULL.1.dar": Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.1]

The text was updated successfully, but these errors were encountered:

xlucas · 2016-10-12T22:21:11Z

Hello, with hubiC you should try to lower the default segment_size mount option value to something like 64(MB).

What happens here is : once the uploaded size reaches 256MB (segment size default value), SVFS will move the uploaded part to a segment container server-side and create a manifest refering to it before continuing transferring data. When this happens, you will see a noticeable pause in the upload process : that's the time taken for the data to be copied elsewhere. Unfortunately, such operations are slow at the moment, so a timeout is reached before completion.

Lowering this value at mount time will help completing these operations quicker.

mathieuruellan · 2016-10-13T07:03:24Z

That workaround is not really restrictive (on customer side).
Thank you.

romanrm · 2016-10-13T07:09:59Z

Still it is strange that rsync proceeds at dozens of MB/sec after getting the failure (and apparently copies into nowhere). I have encountered this too.

mathieuruellan · 2016-10-13T07:38:56Z

It's not better with segment_size=64 :(

Here is my fstab line:
hubicfuse /mnt/hubic svfs hubic_auth=XXXXXXXXX,hubic_token=XXXXXXXX,container=default,allow_other,default_permissions,uid=mathieu,user,extra_attr,segment_size=64 0 0

xlucas · 2016-10-21T08:10:13Z

What if you lower it even more ? Does the problem disappear at some point or not at all ?

romanrm · 2016-10-21T09:09:54Z

I would suggest also testing another approach, i.e. not using segments at all by setting segment_size to 5120 (and not uploading files larger than 5 GB).

mathieuruellan · 2016-10-21T11:57:25Z

Thank you!
I will try both suggestions! Give you feedback after!

mathieuruellan · 2016-10-21T19:50:25Z

Well, i'm trying with segment_size to 5120 since this morning, and no failure since.

mathieuruellan · 2016-10-22T09:55:28Z

Failure after 20h of upload.

xlucas · 2016-11-07T16:05:03Z

Were your files all < 5 GB ?

mathieuruellan · 2016-11-07T16:07:19Z

Yes! My backup files are splitted into 1GB files.

mathieuruellan · 2016-11-07T16:08:49Z

Since last settings, it's really better, still crashing but less often.

xlucas · 2016-11-22T19:02:12Z

At this point the issue is more related to the current performance of the hubic infrastructure than an issue in svfs I'm afraid.

ghost · 2016-11-30T11:47:13Z

How can we get rsync to hard-fail in this situation ? Because the IO error will get rsync to return an exit code but it will continue to spit data to nowhere until there is nothing left to copy.

For example, if rsync is used on a script to copy 1T of data and an error occurs after copying 1G, it will not stop and will continue copying 999G (which consume time and IO ressources) to /dev/null, before realising it gone wrong and eventually trying again.

Does it come from rsync (who would need a parameter to properly stop) or svfs ?

xlucas · 2017-01-06T14:16:25Z

One way to get around this would be to implement an "object moved" event polling after a COPY request is sent to Swift and let the user control the polling timeout with an option.

xlucas · 2017-02-28T20:38:53Z

The issue actually come from the underlying swift library SVFS uses.

In particular, SVFS exposes several options coming from this layer as mount options like connect_timeout and request_timeout, user_agent etc.

It happens the (very) badly named connection timeout of this library is actually a timeout regarding the entire request processing (from connection to sending the request, getting a reply etc) and the other badly named request timeout is actually only used while reading a request body.

So for instance passing connect_timeout=5m should fix the issue.

I'm working at submitting a PR to change timeout names and separating connection from processing.
In the meantime SVFS default timeout for the connection_timeout mount option will be changed from 15s to 5m.

mathieuruellan changed the title ~~Weird behavior during rsync. failure~~ Weird behavior during rsync and failure Oct 12, 2016

xlucas added the usage label Oct 12, 2016

xlucas added the performance label Nov 22, 2016

xlucas closed this as completed Feb 28, 2017

xlucas added bug and removed performance labels Feb 28, 2017

xlucas mentioned this issue Feb 28, 2017

Timeout when reading or writing data #123

Closed

xlucas mentioned this issue Sep 22, 2017

Unable to create a .tar archive bigger than 256MB #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behavior during rsync and failure #100

Weird behavior during rsync and failure #100

mathieuruellan commented Oct 12, 2016 •

edited

Loading

xlucas commented Oct 12, 2016

mathieuruellan commented Oct 13, 2016

romanrm commented Oct 13, 2016

mathieuruellan commented Oct 13, 2016

xlucas commented Oct 21, 2016

romanrm commented Oct 21, 2016 •

edited

Loading

mathieuruellan commented Oct 21, 2016

mathieuruellan commented Oct 21, 2016

mathieuruellan commented Oct 22, 2016

xlucas commented Nov 7, 2016

mathieuruellan commented Nov 7, 2016

mathieuruellan commented Nov 7, 2016 •

edited

Loading

xlucas commented Nov 22, 2016

ghost commented Nov 30, 2016

xlucas commented Jan 6, 2017 •

edited

Loading

xlucas commented Feb 28, 2017

Weird behavior during rsync and failure #100

Weird behavior during rsync and failure #100

Comments

mathieuruellan commented Oct 12, 2016 • edited Loading

xlucas commented Oct 12, 2016

mathieuruellan commented Oct 13, 2016

romanrm commented Oct 13, 2016

mathieuruellan commented Oct 13, 2016

xlucas commented Oct 21, 2016

romanrm commented Oct 21, 2016 • edited Loading

mathieuruellan commented Oct 21, 2016

mathieuruellan commented Oct 21, 2016

mathieuruellan commented Oct 22, 2016

xlucas commented Nov 7, 2016

mathieuruellan commented Nov 7, 2016

mathieuruellan commented Nov 7, 2016 • edited Loading

xlucas commented Nov 22, 2016

ghost commented Nov 30, 2016

xlucas commented Jan 6, 2017 • edited Loading

xlucas commented Feb 28, 2017

mathieuruellan commented Oct 12, 2016 •

edited

Loading

romanrm commented Oct 21, 2016 •

edited

Loading

mathieuruellan commented Nov 7, 2016 •

edited

Loading

xlucas commented Jan 6, 2017 •

edited

Loading