Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS EBS stuck attachments DOS libstorage server. #773

Open
Nomon opened this issue Mar 17, 2017 · 12 comments
Open

AWS EBS stuck attachments DOS libstorage server. #773

Nomon opened this issue Mar 17, 2017 · 12 comments

Comments

@Nomon
Copy link

Nomon commented Mar 17, 2017

Summary

When the docker driver on host A tries to attach EBS volume with a corrupted filesystem(corrupted superblock) to a host the attachment never transitions from attaching to attached state. This will cause the VolumeDriver.Mount on that once host to return error 500 when the attachment request to libstorage server times out (1 minute). The libstorage server responds with the running task info along with the 408. After this moment all other EBS operation on the rexray server will time out and respond with 408 and a new task id. Each call effectively doing nothing more than create tasks on the server to be queued for execution that is never executed (because the server does range over unbuffered channel to execute tasks in sequence).

I have ran into this issue several times, the corrupted volume will prevent even functional volumes from attaching to other hosts. The /tasks api on the server returns all the tasks that keep piling up.

Next time this happen I will get dump of goroutines to see where they are blocking, but I am fairly confident it is due to the serial execution of tasks and the ebs driver waitVolumeComplete where it blocks forever as the volume in EBS is stuck in attaching state forever. There are multiple cases where volumes on AWS get stuck into attaching state, AWS docs recommend rebooting the instance as initial fix....

Version

REX-Ray
-------
Binary: /usr/bin/rexray
Flavor: client+agent+controller
SemVer: 0.8.1
OsArch: Linux-x86_64
Branch: v0.8.1
Commit: 30e9082dd9917f0d218ca981f886d701110ce5f5
Formed: Sat, 25 Feb 2017 03:00:28 UTC

libStorage
----------
SemVer: 0.5.1
OsArch: Linux-x86_64
Branch: v0.8.1
Commit: 35c7b6d96d5f17aa0c0379924615ae22c1ad3d45
Formed: Sat, 25 Feb 2017 02:59:00 UTC

Expected Behavior

A failing mount on a single host should not prevent other hosts using different volumes.

Steps To Reproduce

I have a few corrupted EBS volumes that do this consistently, if you are unable to reproduce, let me know and I will create an IAM policy allowing you to mount the volumes in question to your AWS account.

Here are the relevant kernel ring buffer entries from the host running the docker integration, using libstorage server to do the attachment:

[Fri Mar 17 12:24:22 2017] blk_update_request: 55 callbacks suppressed
[Fri Mar 17 12:24:22 2017] end_request: I/O error, dev xvdh, sector 4456448
[Fri Mar 17 12:24:22 2017] Buffer I/O error on device xvdh, logical block 557056
[Fri Mar 17 12:24:22 2017] lost page write due to I/O error on xvdh
[Fri Mar 17 12:24:22 2017] JBD2: Error -5 detected when updating journal superblock for xvdh-8.
[Fri Mar 17 12:24:22 2017] Aborting journal on device xvdh-8.
[Fri Mar 17 12:24:22 2017] end_request: I/O error, dev xvdh, sector 4456448
[Fri Mar 17 12:24:22 2017] Buffer I/O error on device xvdh, logical block 557056
[Fri Mar 17 12:24:22 2017] lost page write due to I/O error on xvdh
[Fri Mar 17 12:24:22 2017] JBD2: Error -5 detected when updating journal superblock for xvdh-8.
[Fri Mar 17 12:24:22 2017] end_request: I/O error, dev xvdh, sector 0
[Fri Mar 17 12:24:22 2017] Buffer I/O error on device xvdh, logical block 0
[Fri Mar 17 12:24:22 2017] lost page write due to I/O error on xvdh
[Fri Mar 17 12:24:22 2017] EXT4-fs error (device xvdh): ext4_put_super:797: Couldn't clean up the journal
[Fri Mar 17 12:24:22 2017] EXT4-fs (xvdh): Remounting filesystem read-only
[Fri Mar 17 12:24:22 2017] EXT4-fs (xvdh): previous I/O error to superblock detected
[Fri Mar 17 12:24:22 2017] end_request: I/O error, dev xvdh, sector 0
[Fri Mar 17 12:24:22 2017] Buffer I/O error on device xvdh, logical block 0
[Fri Mar 17 12:24:22 2017] lost page write due to I/O error on xvdh
[Fri Mar 17 12:24:22 2017] block xvdh: releasing disk

Here is the log of the rexray-server timing out the attachment:

time="2017-03-17T12:24:22Z" level=info msg="http request" host="tcp://:7979" route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=schema-validator route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=storage-session-handler route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=service-validator route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=transaction-handler route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=local-devices-handler route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=instanceIDs-handler route=volumesForService server=orchid-dutchess-tj time=1489753462812 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=error-handler route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=transaction-handler route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=logging-handler route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=query-params-handler route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="query param" host="tcp://:7979" key=attachments len(value)=1 route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false value=[0] 
time="2017-03-17T12:24:22Z" level=debug msg="http header" Libstorage-Tx="txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d, txCR=1489753462" host="tcp://:7979" route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false 
time="2017-03-17T12:24:22Z" level=debug msg="http header" Libstorage-Instanceid=[ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2] host="tcp://:7979" route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="http header" Libstorage-Localdevices=[ebs=/dev/xvda::/dev/xvda,/dev/xvdf::/dev/xvdf,/dev/xvdg::/dev/xvdg,/dev/xvdh::/dev/xvdh] host="tcp://:7979" route=volumesForService server=orchid-dutchess-tj time=1489753462813 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="getting storage service" host="tcp://:7979" route=volumesForService server=orchid-dutchess-tj service=ebs time=1489753462813 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="using cached ebs service" cacheKey=0058cb369d1c07f0131ecebb8f5a16ec 
time="2017-03-17T12:24:22Z" level=debug msg="getting task service" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs time=1489753462814 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=info msg="executing task" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs task=0 time=1489753462814 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="querying volumes" attachments=0 host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs task=0 time=1489753462814 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="getting task service" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs time=1489753462816 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="skipping response schema validation; disabled" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs task=0 time=1489753462983 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=debug msg="task completed" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumesForService server=orchid-dutchess-tj service=ebs storageDriver=ebs task=0 time=1489753462983 tls=false txCR=1489753462 txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d 
time="2017-03-17T12:24:22Z" level=info msg="172.18.2.223 - - [17/Mar/2017:12:24:22 +0000] \"GET /volumes/ebs?attachments=0 HTTP/1.1\" 200 26302" 
time="2017-03-17T12:24:22Z" level=info 
time="2017-03-17T12:24:22Z" level=info msg="    -------------------------- HTTP REQUEST (SERVER) --------------------------" 
time="2017-03-17T12:24:22Z" level=info msg="    GET /volumes/ebs?attachments=0 HTTP/1.1" 
time="2017-03-17T12:24:22Z" level=info msg="    Host: 172.18.2.138:7979" 
time="2017-03-17T12:24:22Z" level=info msg="    Accept-Encoding: gzip" 
time="2017-03-17T12:24:22Z" level=info msg="    Libstorage-Instanceid: ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" 
time="2017-03-17T12:24:22Z" level=info msg="    Libstorage-Localdevices: ebs=/dev/xvda::/dev/xvda,/dev/xvdf::/dev/xvdf,/dev/xvdg::/dev/xvdg,/dev/xvdh::/dev/xvdh" 
time="2017-03-17T12:24:22Z" level=info msg="    Libstorage-Tx: txID=eda54ed0-b1ce-4ac6-7b36-9de6f108359d, txCR=1489753462" 
time="2017-03-17T12:24:22Z" level=info msg="    User-Agent: Go-http-client/1.1" 
time="2017-03-17T12:24:22Z" level=info msg="    " 
time="2017-03-17T12:24:22Z" level=info msg="    -------------------------- HTTP RESPONSE (SERVER) -------------------------" 
time="2017-03-17T12:24:22Z" level=info msg="    Content-Type=application/json" 
time="2017-03-17T12:24:22Z" level=info 
time="2017-03-17T12:24:22Z" level=info msg="    {" 

... lots of EBS volumes
... second request with attachments=1 and loads of volumes in response

Attach volume call:
ime="2017-03-17T12:24:23Z" level=info msg="http request" host="tcp://:7979" route=volumeAttach server=orchid-dutchess-tj time=1489753463819 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=post-args-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463819 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=schema-validator route=volumeAttach server=orchid-dutchess-tj time=1489753463819 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=storage-session-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added route middleware" host="tcp://:7979" middleware=service-validator route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=transaction-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=local-devices-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=instanceIDs-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=error-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=transaction-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=logging-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463820 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="added global middleware" host="tcp://:7979" middleware=query-params-handler route=volumeAttach server=orchid-dutchess-tj time=1489753463821 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="query param" host="tcp://:7979" key=attach len(value)=1 route=volumeAttach server=orchid-dutchess-tj time=1489753463821 tls=false value=[] 
time="2017-03-17T12:24:23Z" level=debug msg="http header" Libstorage-Tx="txID=cb4adca0-188d-4e42-6974-0d7adf2f8366, txCR=1489753463" host="tcp://:7979" route=volumeAttach server=orchid-dutchess-tj time=1489753463821 tls=false 
time="2017-03-17T12:24:23Z" level=debug msg="http header" Libstorage-Instanceid=[ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2] host="tcp://:7979" route=volumeAttach server=orchid-dutchess-tj time=1489753463821 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="http header" Libstorage-Localdevices=[ebs=/dev/xvda::/dev/xvda,/dev/xvdf::/dev/xvdf,/dev/xvdg::/dev/xvdg] host="tcp://:7979" route=volumeAttach server=orchid-dutchess-tj time=1489753463821 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="getting storage service" host="tcp://:7979" route=volumeAttach server=orchid-dutchess-tj service=ebs time=1489753463821 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="using cached ebs service" cacheKey=0058cb369d1c07f0131ecebb8f5a16ec 
time="2017-03-17T12:24:23Z" level=debug msg="validating schema" body="{\"force\":true,\"nextDeviceName\":\"/dev/xvdh\"}" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumeAttach schema="{\n    \"$schema\": \"http://json-schema.org/draft-04/schema#\",\n    \"$ref\": \"https://github.com/codedellemc/libstorage#/definitions/volumeAttachRequest\"\n}" server=orchid-dutchess-tj service=ebs storageDriver=ebs time=1489753463822 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="getting task service" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumeAttach server=orchid-dutchess-tj service=ebs storageDriver=ebs time=1489753463827 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=info msg="executing task" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumeAttach server=orchid-dutchess-tj service=ebs storageDriver=ebs task=0 time=1489753463827 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="getting task service" host="tcp://:7979" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" route=volumeAttach server=orchid-dutchess-tj service=ebs storageDriver=ebs time=1489753463830 tls=false txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 

..... other calls from other hosts succeeding (that are not tasks, like GET /services etc)

time="2017-03-17T12:25:23Z" level=info msg="172.18.2.223 - - [17/Mar/2017:12:25:23 +0000] \"POST /volumes/ebs/vol-fc35744e?attach HTTP/1.1\" 408 89" 
time="2017-03-17T12:25:23Z" level=info 
time="2017-03-17T12:25:23Z" level=info msg="    -------------------------- HTTP REQUEST (SERVER) --------------------------" 
time="2017-03-17T12:25:23Z" level=info msg="    POST /volumes/ebs/vol-fc35744e?attach HTTP/1.1" 
time="2017-03-17T12:25:23Z" level=info msg="    Host: 172.18.2.138:7979" 
time="2017-03-17T12:25:23Z" level=info msg="    Accept-Encoding: gzip" 
time="2017-03-17T12:25:23Z" level=info msg="    Content-Length: 43" 
time="2017-03-17T12:25:23Z" level=info msg="    Libstorage-Instanceid: ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" 
time="2017-03-17T12:25:23Z" level=info msg="    Libstorage-Localdevices: ebs=/dev/xvda::/dev/xvda,/dev/xvdf::/dev/xvdf,/dev/xvdg::/dev/xvdg" 
time="2017-03-17T12:25:23Z" level=info msg="    Libstorage-Tx: txID=cb4adca0-188d-4e42-6974-0d7adf2f8366, txCR=1489753463" 
time="2017-03-17T12:25:23Z" level=info msg="    User-Agent: Go-http-client/1.1" 
time="2017-03-17T12:25:23Z" level=info msg="    " 
time="2017-03-17T12:25:23Z" level=info msg="    {\"force\":true,\"nextDeviceName\":\"/dev/xvdh\"}" 
time="2017-03-17T12:25:23Z" level=info msg="    -------------------------- HTTP RESPONSE (SERVER) -------------------------" 
time="2017-03-17T12:25:23Z" level=info msg="    Content-Type=application/json" 
time="2017-03-17T12:25:23Z" level=info 
time="2017-03-17T12:25:23Z" level=info msg="    {" 
time="2017-03-17T12:25:23Z" level=info msg="      \"id\": 0," 
time="2017-03-17T12:25:23Z" level=info msg="      \"queueTime\": 1489753463," 
time="2017-03-17T12:25:23Z" level=info msg="      \"startTime\": 1489753463," 
time="2017-03-17T12:25:23Z" level=info msg="      \"state\": \"running\"" 
time="2017-03-17T12:25:23Z" level=info msg="    }" 

... other services attaching volumes timing out, getting id:1, id:2 tasks that never complete. 

Logs from the node with the docker integration driver:

time="2017-03-17T12:24:23Z" level=debug msg="performing precautionary unmount" integrationDriver=linux osDriver=linux service=ebs storageDriver=libstorage time=1489753463594 
time="2017-03-17T12:24:23Z" level=debug msg="waiting on executor lock" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463647 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="invoking executor cli" args=[ebs nextDevice] cmd="/var/lib/libstorage/lsx-linux" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463648 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="signalling executor lock" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463735 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="xli nextdevice success" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463735 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="waiting on executor lock" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463736 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="invoking executor cli" args=[ebs localDevices quick] cmd="/var/lib/libstorage/lsx-linux" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463736 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="signalling executor lock" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463815 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=debug msg="xli localdevices success" instanceID="ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" integrationDriver=linux osDriver=linux server=orchid-dutchess-tj service=ebs storageDriver=libstorage time=1489753463815 txCR=1489753463 txID=cb4adca0-188d-4e42-6974-0d7adf2f8366 
time="2017-03-17T12:24:23Z" level=info 
time="2017-03-17T12:24:23Z" level=info msg="    -------------------------- HTTP REQUEST (CLIENT) -------------------------" 
time="2017-03-17T12:24:23Z" level=info msg="    POST /volumes/ebs/vol-fc35744e?attach HTTP/1.1" 
time="2017-03-17T12:24:23Z" level=info msg="    Host: 172.18.2.138:7979" 
time="2017-03-17T12:24:23Z" level=info msg="    Libstorage-Instanceid: ebs=i-0f67b3095e3f30328,availabilityZone=us-west-2a&region=us-west-2" 
time="2017-03-17T12:24:23Z" level=info msg="    Libstorage-Localdevices: ebs=/dev/xvda::/dev/xvda,/dev/xvdf::/dev/xvdf,/dev/xvdg::/dev/xvdg" 
time="2017-03-17T12:24:23Z" level=info msg="    Libstorage-Tx: txID=cb4adca0-188d-4e42-6974-0d7adf2f8366, txCR=1489753463" 
time="2017-03-17T12:24:23Z" level=info msg="    " 
time="2017-03-17T12:24:23Z" level=info msg="    {\"force\":true,\"nextDeviceName\":\"/dev/xvdh\"}" 
time="2017-03-17T12:25:23Z" level=info 
time="2017-03-17T12:25:23Z" level=info msg="    -------------------------- HTTP RESPONSE (CLIENT) -------------------------" 
time="2017-03-17T12:25:23Z" level=info msg="    HTTP/1.1 408 Request Timeout" 
time="2017-03-17T12:25:23Z" level=info msg="    Content-Length: 89" 
time="2017-03-17T12:25:23Z" level=info msg="    Content-Type: application/json" 
time="2017-03-17T12:25:23Z" level=info msg="    Date: Fri, 17 Mar 2017 12:25:23 GMT" 
time="2017-03-17T12:25:23Z" level=info msg="    Libstorage-Servername: orchid-dutchess-tj" 
time="2017-03-17T12:25:23Z" level=info msg="    " 
time="2017-03-17T12:25:23Z" level=info msg="    {" 
time="2017-03-17T12:25:23Z" level=info msg="      \"id\": 0," 
time="2017-03-17T12:25:23Z" level=info msg="      \"queueTime\": 1489753463," 
time="2017-03-17T12:25:23Z" level=info msg="      \"startTime\": 1489753463," 
time="2017-03-17T12:25:23Z" level=info msg="      \"state\": \"running\"" 
time="2017-03-17T12:25:23Z" level=info msg="    }" 
time="2017-03-17T12:25:23Z" level=error msg="/VolumeDriver.Mount: error mounting volume" error.id=0 error.queueTime=1.489753463e+09 error.startTime=1.489753463e+09 error.state=running time=1489753523828 

State of the volume in AWS (notice that it has been attaching past 3.5h or so):

nomon-mb:~ nomon$ date -u
Fri Mar 17 15:51:41 UTC 2017
nomon-mb:~ nomon$ aws ec2 describe-volumes --volume-id vol-fc35744e
{
    "Volumes": [
        {
            "AvailabilityZone": "us-west-2a", 
            "Attachments": [
                {
                    "AttachTime": "2017-03-17T12:24:30.000Z", 
                    "InstanceId": "i-0f67b3095e3f30328", 
                    "VolumeId": "vol-fc35744e", 
                    "State": "attaching", 
                    "DeleteOnTermination": false, 
                    "Device": "/dev/xvdh"
                }
            ], 
            "Tags": [
                {
                    "Value": "us-west-2-staging-rexray-challenge-cassandra-a-0", 
                    "Key": "Name"
                }
            ], 
            "Encrypted": false, 
            "VolumeType": "standard", 
            "VolumeId": "vol-fc35744e", 
            "State": "in-use", 
            "SnapshotId": "", 
            "CreateTime": "2016-05-26T10:50:09.866Z", 
            "Size": 5
        }
    ]
}

Server config:

ubuntu@ip-172-18-3-33:~$ nomad fs 7822f907 server/local/rexray.yml
rexray:
  logLevel: debug
  modules:
    default-docker:
      disabled: true
libstorage:
  host: tcp://127.0.0.1:7979
  embedded: true
  logging:
    level: debug
    httpRequests:  true
    httpResponses: true
  integration:
    volume:
      operations:
        remove:
          disable: true
        unmount:
          ignoreUsedCount: true
        mount:
          preempt: true
  client:
    type: controller
  server:
    endpoints:
      public:
        address: tcp://:7979
    services:
      ebs:
        driver: ebs
ebs:
  accessKey:      REDACTED
  secretKey:      REDACTED
  region:         us-west-2

Agent config:

ubuntu@ip-172-18-3-33:~$ nomad fs deaf0f2f agent/rexray.yml
rexray:
  logLevel: debug
  module:
    startTimeout: 30s
libstorage:
  host:     tcp://172.18.2.138:7979
  embedded: false
  service:  ebs
  device:
    attachTimeout: 60s
  integration:
    volume:
      operations:
        remove:
          disable: true
        unmount:
          ignoreUsedCount: true
        mount:
          preempt: true
  logging:
    level: debug
    httpRequests: true
    httpResponses: true

I did not upload all the logs due to not being sure if there is anything confidential in them, if you require the full logs for further analysis I can sanitize and share the rest of the logs.

@Nomon Nomon changed the title AWS EBS failed attachments DOS libstorage server. AWS EBS stuck attachments DOS libstorage server. Mar 17, 2017
@codenrhoden
Copy link
Member

@Nomon Thanks for the detailed write up. I will see if I can recreate by purposefully corrupting the filesystem on an EBS volume. If not, I'll take you up on the offer to mount via IAM profile. :) From you config file, it looks like you are using the default file system, which is ext4, so I'll stick with that as well.

@codenrhoden
Copy link
Member

@Nomon After looking into this a bit, I believe I have identified the issue within the ebs driver itself. A fix for that should be pretty straightforward, and updates for that will tracked against this issue.

Separately, I will probably file a second issue in order to track a fix that we can put in at the libStorage framework level to keep misbehaving drivers from triggering this outcome. Thanks again for the writeup.

For what it's worth, I was never able to get one of my EBS volumes to fail to attach. I tried corrupting the ext4 filesystem superblocks with dd, and I could always still attach the device (but not mount the corrupted filesystem w/o repair). I may still reach out to for help on that in the future.

@Nomon
Copy link
Author

Nomon commented Apr 14, 2017

Some updates of the issue:
Its not actually a corruption issue, its just errors from unclean ebs detach without unmount. These volumes only fail to attach to the specific instance:/device/names.

From what I have gathered, unclean detaches (without unmount first inside the vm) render the device name (xen virtual device name in aws hvm) unusable for the VM lifetime(actually reboot resets the state, so not strictly VM lifetime) and there is no guest vm workaround fix for it and AWS is "working on one". Even though the guest OS does not see the device, trying to attach any device to the same /dev/xvdXXX after an unclean detach will leave the volume in PENDING attachment state forever this is 100% reproducible . Kubernetes has several issues and merges used to workaround this issue with aws ebs over a long period of time.

kubernetes/kubernetes#32630
kubernetes/kubernetes#31891
etc...

The consensus seems to be that the best "workaround" currently is to use the full 56 ebs device names range (/dev/xvd{b,c}[a-z] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html) for allocating the device names and the name allocator should not select first available device on the vm from that list, instead it should walk the list so that first mount goes to /dev/xvdba and second to /dev/xvdbb even if xvdba is no longer attached to the vm and once reaching /dev/xvdcz it should start from /dev/xvdba and so forth.

When using the full range of devices and setting preempt to false and changing the allocator to walk the names instead of probing first available should greatly mitigate the issue. I see that there is already new timeout code in place, if attach timeouts would also blacklist insteance_id:device_name I think the issue would be mitigated as good as can be done in the guest vm. In theory the VM could then continue attaching volumes until all names are blacklisted and under normal operation that should not happen.

Currently we run embedded server on each node to work around the pending attachments blocking all attachments, but this only reduces the failure domain of the attachments from the cluster to a single node, and when scheduling even moderately sized workloads it easily leads to aws api ratelimiting with all the independent agents using the api. A sequential device name allocator would allow the failed node to make progress in both embedded and server-client setups instead of trying to attach to that device name that will not work over and over.

@Nomon
Copy link
Author

Nomon commented May 22, 2017

with 0.9.0 the situation is much much better and the server-client no longer get stuck but server keeps making progress. Once in a while a node will still end up in a state where some devices unusable but now its isolated to the nodes. I would still like to see the whole /dev/xvd{b,c}[a-z] space used for the ebs volumes instead of the 10 devices (/dev/xvd{g,h,i,j,k,l,m,n,o,p}) currently used.

@akutz
Copy link
Member

akutz commented May 22, 2017

Hi @Nomon,

The device name logic (and limitation to 10 volumes) comes directly from Amazon itself:

image

Both paravirtual and HVM instances, for EBS, restrict their recommendation to /dev/sd[f-p].

@Nomon
Copy link
Author

Nomon commented May 31, 2017

The recommended range is what it says, a recommendation. The range is not large enough to attach the max 40 volumes that AWS promises you can attach to an hvm instance and more than that is possible on best-effort basis. For example kubernetes uses the whole of the available range at /dev/xvd[b-c][a-z] in an implementation that minimizes the device node re-use to work around the attaching forever on some device nodes issue, a retried attachment will receive a different device (least used device node on the instance):
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1310-L1337
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/device_allocator.go#L80-L92
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/device_allocator.go#L32-L41

@miry
Copy link

miry commented Aug 31, 2017

@akutz I have same issue. I could not attach more than 1 volume with kubernetes.

Can you suggest solution how to fix it?
Can I do something on configuration level?

@clintkitson
Copy link
Member

@miry Can you discuss more about why only a single volume can be attached? Are the previous device names allocated already?

@miry
Copy link

miry commented Aug 31, 2017

@clintkitson I try to setup rexray with kubernetes. Followed next man https://rexray.readthedocs.io/en/stable/user-guide/schedulers/#kubernetes.
Created a sample pod with flex volume => it worked like a charm.
Then, when I played a bit with deployments and statefuls, i found that volumes could not be properly attached.

After small debug, in the logs I found: Device /dev/xvdg is busy status code 400. I checked current device lists there are no such disk.

So i tried next experiment:

On new node I setup rexray and kubelet. And using rexray volume mount volumeid1 tried to install several disks to mount. I got the same error after first success mount.

Same experiment I did without kubernetes and it was ok. I could mount more than 4 disks.

It could be related that I created 3 PV with FlexVolume and kubernetes somehow reserved disks, but this is crazy idea.

@Nomon
Copy link
Author

Nomon commented Sep 1, 2017

@clintkitson it is the xen device ghosting issue I mentioned earlier (apr 14) in this thread, if a device is detached uncleanly from a xen virtual device node that node will become unusable for the instances lifetime even though the instance cannot see any device attached to that node, and the device allocator will always return the same device. The PR referenced in this issue from villet mostly mitigates the issue by making the allocator use a random order when seeking the next device for attachment, as with that hitting a broken node will retry the failed attachment to a new node.

@clintkitson
Copy link
Member

@Nomon Ok thank you for the clarification.

@clintkitson
Copy link
Member

@miry Do you have the preempt feature turned on? The issue that @Nomon is referring to has to do with devices being uncleanly detached from servers. If you are pre-empting from host to host a volume then this issue will occur. If pre-empt is off, then a volume should not be uncleanly removed from a host unless through operator intervention of a forced detach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants