Some suspected failings when a CN falls over #84

magnayn · 2016-06-17T15:52:58Z

I had a CN fall over, and this seemed to cause docker provisioning to fail:

docker run -ti ubuntu /bin/bash
Error response from daemon: (DockerNoComputeResourcesError) No compute resources available. (256b0080-34a0-11e6-8bb7-0d35b3585d39)

When trying to update things just to check, it also failed :

Reprovisioning 3a628bd5-5594-4ae9-92df-5557d38a7a96 (hostvolume-LGW) inst
to image cfa18754-2e06-11e6-80c0-f71606342a60
sdcadm experimental: error: sapi client error: socket hang up

I destroyed the server in adminui. Docker provisioning now works - hwever updating still doesn't

[root@headnode (Osney) ~]# sdcadm experimental update-docker --servers cns
"docker" VM already has a delegate dataset
Reprovisioning 3a628bd5-5594-4ae9-92df-5557d38a7a96 (hostvolume-LGW) inst
to image cfa18754-2e06-11e6-80c0-f71606342a60
sdcadm experimental: error: sapi client error (ReprovisionFailedError): Server 44454c4c-4200-1039-8036-b1c04f345831 not found

Feels like there is some behaviour that assumes servers are always alive/up.

kusor · 2016-06-17T16:01:07Z

Hi @magnayn, which sdcadm version are you using?. Hostvolumes are not needed anymore and should be removed by sdcadm experimental update-other using a recent version of sdcadm. The update of docker should be done just with sdcadm update docker.

magnayn · 2016-06-17T16:06:58Z

[root@headnode (Osney) ~]# sdcadm --version
sdcadm 1.11.1 (release-20160428-20160428T183310Z-g04ea412)

doing a selfupdate to 1.11.2 fixes it. D'oh!

On Fri, Jun 17, 2016 at 5:01 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

Hi @magnayn https://github.com/magnayn, which sdcadm version are you
using?. Hostvolumes are not needed anymore and should be removed by sdcadm
experimental update-other using a recent version of sdcadm. The update of
docker should be done just with sdcadm update docker.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRleYwStGposvoSLNv4Ac3MpTHqf9eks5qMsTGgaJpZM4I4fhS
.

magnayn · 2016-06-17T16:09:08Z

hm - that said, after upgrading and sdcadm post-setup docker :

[root@headnode (Osney) ~]# sdcadm update docker

/opt/smartdc/sdcadm/lib/sdcadm.js:720
vm.server_uuid].hostname;
^
TypeError: Cannot read property 'hostname' of undefined
at Object.fillOutVmInsts as func
at Object._onImmediate (/opt/smartdc/sdcadm/node_modules/vasync/lib/vasync.js:213:20)
at processImmediate as _immediateCallback

kusor · 2016-06-17T16:13:22Z

I think that's b/c you removed the server from AdminUI w/o deleting the instances from SAPI. Mind to tell me what's the output of:

sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=hostvolume|json -H 0.uuid) | json -H

kusor · 2016-06-17T16:14:48Z

Also, same thing for docker service, please:

sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=docker|json -H 0.uuid) | json -H

magnayn · 2016-06-17T16:16:02Z

Ah - ok - I wasn't aware I needed to do non-adminui stuff

[root@headnode (Osney) ~]# sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=hostvolume|json -H 0.uuid) | json -H
[
{
"uuid": "2d011ac4-45dd-4935-836c-310762cf2e2a",
"service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39",
"params": {
"alias": "hostvolume-headnode",
"server_uuid": "44454c4c-3400-1058-8038-b4c04f365831"
},
"type": "vm"
},
{
"uuid": "3a628bd5-5594-4ae9-92df-5557d38a7a96",
"service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39",
"params": {
"alias": "hostvolume-LGW",
"server_uuid": "44454c4c-4200-1039-8036-b1c04f345831"
},
"type": "vm"
},
{
"uuid": "2873f5ac-6dd1-4750-9e9f-667e3e23d41d",
"service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39",
"params": {
"alias": "hostvolume-SFO",
"server_uuid": "44454c4c-5700-1043-8033-c2c04f4c5631"
},
"type": "vm"
},
{
"uuid": "9b67fab4-4830-4bde-91e1-f33e6c2d2946",
"service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39",
"params": {
"alias": "hostvolume-JFK",
"server_uuid": "35383339-3637-435a-3231-323930303747"
},
"type": "vm"
},
{
"uuid": "34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c",
"service_uuid": "a34730e9-ca7c-47ab-b1bc-b7f5fdd42a39",
"params": {
"alias": "hostvolume-LHR",
"server_uuid": "44454c4c-4c00-104d-804e-c6c04f46354a"
},
"type": "vm"
}
]

magnayn · 2016-06-17T16:16:55Z

[root@headnode (Osney) ~]# sdc-sapi /instances?service_uuid=$(sdc-sapi /services?name=docker|json -H 0.uuid) | json -H
[
{
"uuid": "da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8",
"service_uuid": "9b7490de-1e13-4d93-a8c6-01cb23938fb0",
"params": {
"alias": "docker0",
"delegate_dataset": true,
"server_uuid": "44454c4c-3400-1058-8038-b4c04f365831"
},
"type": "vm"
}
]

kusor · 2016-06-17T16:18:14Z

Ok, run: sdcadm experimental update-other in order to get rid of the hostvolume instances - or, at least, to see if anything there is failing.

Then, sdc-cnapi /servers/44454c4c-3400-1058-8038-b4c04f365831 to see what's the status of that server, please

magnayn · 2016-06-17T16:21:03Z

Hmm - there is an error recorded in the update

[root@headnode (Osney) ~]# sdcadm experimental update-other
Update "sdc" SAPI app metadata_schema
Set "docker" service "metadata.SERVICE_DOMAIN"
Adding domain keys to "sdc" SAPI app metadata: {"DOCKER_SERVICE":"
docker.Osney.allocatesoftware.com","docker_domain":"
docker.Osney.allocatesoftware.com"}
Running VMAPI migrations
Removing deprecated hostvolume instances

sdcadm experimental: error: socket hang up

[root@headnode (Osney) ~]# sdc-cnapi
/servers/44454c4c-3400-1058-8038-b4c04f365831
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 12764
Date: Fri, 17 Jun 2016 16:19:27 GMT
Server: Compute Node API
x-request-id: e8afbbb7-7a15-4e7c-8279-77c0603e309a
x-response-time: 8
x-server-name: ae76109a-80a0-4a15-a1c4-4cd7ec84858b
Connection: keep-alive

{
"agents": [
{
"name": "firewaller",
"version": "1.4.0",
"image_uuid": "318eb24f-e44a-4271-99fa-14ba1b397c32"
},
{
"name": "cainstsvc",
"version": "0.0.3vrelease-20160428-20160428T180942Z-gc110a12",
"image_uuid": "7625072f-8701-4cb5-b014-65d57f1e39e6"
},
{
"name": "hagfish-watcher",
"version": "1.1.0-release-20160428-20160428T183307Z-geb1d34a",
"image_uuid": "10609cb7-244c-47fd-9642-35d2ab04199d"
},
{
"name": "cabase",
"version": "1.0.3vrelease-20160428-20160428T180942Z-gc110a12",
"image_uuid": "c910dd54-d6dd-4144-bd78-5b38b70b7ba6"
},
{
"name": "marlin",
"version": "0.0.3",
"image_uuid": "6a476594-3f7c-41ed-8be9-78aa17111642"
},
{
"name": "smartlogin",
"version": "0.3.0-release-20160428-20160428T183259Z-g381e99f",
"image_uuid": "3ef411da-e6cd-46b3-b920-bc2599060af3"
},
{
"name": "config-agent",
"version": "1.5.0",
"image_uuid": "b782f289-9f98-4488-a047-060969410b16"
},
{
"name": "net-agent",
"version": "1.3.0",
"image_uuid": "cd375ca4-5e14-40e5-9264-a2ad68029021",
"uuid": "6e79bed7-b019-4c3e-bab1-ca4e7470aa28"
},
{
"name": "agents_core",
"version": "2.1.0",
"image_uuid": "d4b784f1-3d7c-4ece-a3d7-4ddcf6c4c8b0"
},
{
"name": "cn-agent",
"version": "1.5.2",
"image_uuid": "a3900067-98ff-4288-9dc4-0835f1d8f767",
"uuid": "377e9319-3225-470c-8731-07b982e52f9a"
},
{
"name": "amon-relay",
"version": "1.0.1",
"image_uuid": "30ded3e7-ea50-4637-8902-23e18903eba6"
},
{
"name": "vm-agent",
"version": "1.5.0",
"image_uuid": "fec9f401-5921-4025-a555-8d6f7cff8da1",
"uuid": "a1dc76a3-f87a-4da1-bde4-c960bd966cab"
},
{
"name": "amon-agent",
"version": "1.0.1",
"image_uuid": "27e19fa9-d693-4d28-aaa0-d6057dfec8b1"
}
],
"datacenter": "Osney",
"overprovision_ratio": 1,
"reservation_ratio": 0.15,
"reservoir": false,
"traits": {},
"rack_identifier": "",
"comments": "",
"uuid": "44454c4c-3400-1058-8038-b4c04f365831",
"reserved": false,
"vms": {
"91d346f7-f7c1-4688-bf3c-0c8235ec6ffd": {
"uuid": "91d346f7-f7c1-4688-bf3c-0c8235ec6ffd",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 128,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 100,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"ae2ad258-1a3c-4e81-a4e8-be92a31debe6": {
"uuid": "ae2ad258-1a3c-4e81-a4e8-be92a31debe6",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 512,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 200,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"0940ef38-3a34-46de-bed7-e5ec9cf443a0": {
"uuid": "0940ef38-3a34-46de-bed7-e5ec9cf443a0",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"f8054311-3c83-4253-8db8-86d0792c7a86": {
"uuid": "f8054311-3c83-4253-8db8-86d0792c7a86",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 50,
"max_physical_memory": 2048,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"a27c54e5-d36d-40bb-84a6-9eeed610d904": {
"uuid": "a27c54e5-d36d-40bb-84a6-9eeed610d904",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 8192,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"59bc2cae-25c1-4c30-b20f-27719f5673b1": {
"uuid": "59bc2cae-25c1-4c30-b20f-27719f5673b1",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"03687fd1-bdfa-48d7-8054-b84fbc3ce233": {
"uuid": "03687fd1-bdfa-48d7-8054-b84fbc3ce233",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"9bb0607c-6829-490a-8f02-81555471cb3b": {
"uuid": "9bb0607c-6829-490a-8f02-81555471cb3b",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 8192,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"f354c285-02ee-4e8a-9518-6237c8378138": {
"uuid": "f354c285-02ee-4e8a-9518-6237c8378138",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 8192,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"7d0ddd86-a441-4ffc-a30f-2d05d3866d8d": {
"uuid": "7d0ddd86-a441-4ffc-a30f-2d05d3866d8d",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"ed869a9c-bd30-42dc-b069-b0ae56acad28": {
"uuid": "ed869a9c-bd30-42dc-b069-b0ae56acad28",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"5bc500b4-adf7-462d-9906-3006cd34a699": {
"uuid": "5bc500b4-adf7-462d-9906-3006cd34a699",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"56cac0d5-6f59-40f2-acff-ef861c4bb5a5": {
"uuid": "56cac0d5-6f59-40f2-acff-ef861c4bb5a5",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 256,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 150,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1": {
"uuid": "38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 2048,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"66d1c94a-c32d-4f9b-9fb5-075c705cdd03": {
"uuid": "66d1c94a-c32d-4f9b-9fb5-075c705cdd03",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 500,
"max_physical_memory": 768,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 250,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"ae76109a-80a0-4a15-a1c4-4cd7ec84858b": {
"uuid": "ae76109a-80a0-4a15-a1c4-4cd7ec84858b",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"fa761d28-1596-4b0c-a9a4-3807a72b25df": {
"uuid": "fa761d28-1596-4b0c-a9a4-3807a72b25df",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 128,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 100,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"97d39281-e759-41a9-8a1e-41b0b09698f1": {
"uuid": "97d39281-e759-41a9-8a1e-41b0b09698f1",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"ac4fdf09-1206-4fa8-8d04-0f6ff878744d": {
"uuid": "ac4fdf09-1206-4fa8-8d04-0f6ff878744d",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"85f47220-e710-4523-bf5e-6af458a47610": {
"uuid": "85f47220-e710-4523-bf5e-6af458a47610",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 4096,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"3263c53a-1be1-496b-ad70-95825dfaac57": {
"uuid": "3263c53a-1be1-496b-ad70-95825dfaac57",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"54452da7-4159-48fc-ad89-4a0e58d0ea33": {
"uuid": "54452da7-4159-48fc-ad89-4a0e58d0ea33",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 2048,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"0238ac40-ec99-4c02-83b3-650559000cca": {
"uuid": "0238ac40-ec99-4c02-83b3-650559000cca",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 1024,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 300,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8": {
"uuid": "da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 25,
"max_physical_memory": 4096,
"zone_state": "running",
"state": "running",
"brand": "joyent-minimal",
"cpu_cap": 400,
"last_modified": "2016-06-17T15:14:19.000Z"
},
"10137a24-5785-6aef-c46b-e5027f956f85": {
"uuid": "10137a24-5785-6aef-c46b-e5027f956f85",
"owner_uuid": "00000000-0000-0000-0000-000000000000",
"quota": 10,
"max_physical_memory": 5120,
"zone_state": "installed",
"state": "stopped",
"brand": "kvm",
"last_modified": "2016-06-10T08:24:33.000Z"
},
"2ad788ff-d3d6-4eb7-b789-c7fc66e26a83": {
"uuid": "2ad788ff-d3d6-4eb7-b789-c7fc66e26a83",
"owner_uuid": "930896af-bf8c-48d4-885c-6573a94b1853",
"quota": 10,
"max_physical_memory": 4352,
"zone_state": "installed",
"state": "stopped",
"brand": "kvm",
"cpu_cap": 400,
"last_modified": "2016-06-10T08:23:38.000Z"
},
"6356aa55-5881-e1ed-f2bf-a03ba2623cb0": {
"uuid": "6356aa55-5881-e1ed-f2bf-a03ba2623cb0",
"owner_uuid": "00000000-0000-0000-0000-000000000000",
"quota": 10,
"max_physical_memory": 5120,
"zone_state": "running",
"state": "running",
"brand": "kvm",
"last_modified": "2016-06-17T15:39:55.000Z"
}
},
"boot_platform": "20160505T114610Z",
"boot_params": {
"rabbitmq": "guest:guest:rabbitmq.Osney.allocatesoftware.com:5672"
},
"kernel_flags": {},
"default_console": "serial",
"serial": "ttyb",
"created": "2016-05-06T14:28:40.000Z",
"sysinfo": {
"Live Image": "20160505T114610Z",
"System Type": "SunOS",
"Boot Time": "1466177784",
"Datacenter Name": "Osney",
"SDC Version": "7.0",
"Manufacturer": "Dell Inc.",
"Product": "Precision T1650",
"Serial Number": "44X86X1",
"SKU Number": "",
"HW Version": "01",
"HW Family": "",
"Setup": "true",
"VM Capable": true,
"CPU Type": "Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz",
"CPU Virtualization": "vmx",
"CPU Physical Cores": 1,
"UUID": "44454c4c-3400-1058-8038-b4c04f365831",
"Hostname": "headnode",
"CPU Total Cores": 8,
"MiB of Memory": "32722",
"Zpool": "zones",
"Zpool Disks": "c1t0d0,c1t1d0",
"Zpool Profile": "striped",
"Zpool Creation": 1462544920,
"Zpool Size in GiB": 1798,
"Disks": {
"c1t0d0": {
"Size in GB": 256
},
"c1t1d0": {
"Size in GB": 2000
}
},
"Boot Parameters": {
"console": "vga",
"vga_mode": "115200,8,n,1,-",
"headnode": "true"
},
"SDC Agents": [
{
"name": "firewaller",
"version": "1.4.0"
},
{
"name": "cainstsvc",
"version": "0.0.3vrelease-20160428-20160428T180942Z-gc110a12"
},
{
"name": "hagfish-watcher",
"version": "1.1.0-release-20160428-20160428T183307Z-geb1d34a"
},
{
"name": "cabase",
"version": "1.0.3vrelease-20160428-20160428T180942Z-gc110a12"
},
{
"name": "marlin",
"version": "0.0.3"
},
{
"name": "smartlogin",
"version": "0.3.0-release-20160428-20160428T183259Z-g381e99f"
},
{
"name": "config-agent",
"version": "1.5.0"
},
{
"name": "net-agent",
"version": "1.3.0"
},
{
"name": "agents_core",
"version": "2.1.0"
},
{
"name": "cn-agent",
"version": "1.5.2"
},
{
"name": "amon-relay",
"version": "1.0.1"
},
{
"name": "vm-agent",
"version": "1.5.0"
},
{
"name": "amon-agent",
"version": "1.0.1"
}
],
"Network Interfaces": {
"e1000g0": {
"MAC Address": "90:b1:1c:7a:cd:0e",
"ip4addr": "10.20.4.1",
"Link Status": "up",
"NIC Names": [
"admin",
"external"
]
}
},
"Virtual Network Interfaces": {
"external0": {
"MAC Address": "02:08:20:ee:7e:6f",
"ip4addr": "10.20.2.1",
"Link Status": "up",
"Host Interface": "e1000g0",
"VLAN": "0"
}
},
"Link Aggregations": {}
},
"ram": 32722,
"hostname": "headnode",
"status": "running",
"headnode": true,
"current_platform": "20160505T114610Z",
"setup": true,
"last_boot": "2016-06-17T15:36:24.000Z",
"last_heartbeat": "2016-06-17T16:19:26.066Z",
"memory_available_bytes": 18606661632,
"memory_arc_bytes": 7620202088,
"memory_total_bytes": 34302623744,
"memory_provisionable_bytes": -40628440269,
"disk_cores_quota_bytes": 2899102924800,
"disk_cores_quota_used_bytes": 19053072384,
"disk_installed_images_used_bytes": 291095281664,
"disk_kvm_quota_bytes": 32212254720,
"disk_kvm_quota_used_bytes": 3331371008,
"disk_kvm_zvol_used_bytes": 292282540032,
"disk_kvm_zvol_volsize_bytes": 283451064320,
"disk_pool_alloc_bytes": 503308369920,
"disk_pool_size_bytes": 1992864825344,
"disk_system_used_bytes": -118947184640,
"disk_zone_quota_bytes": 1181116006400,
"disk_zone_quota_used_bytes": 16493289472,
"transitional_status": "",
"score": 0,
"overprovision_ratios": {
"ram": 1,
"disk": 1,
"cpu": 4
},
"unreserved_cpu": 0,
"unreserved_ram": -38754,
"unreserved_disk": 176727
}

On Fri, Jun 17, 2016 at 5:18 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

Ok, run: sdcadm experimental update-other in order to get rid of the
hostvolume instances - or, at least, to see if anything there is failing.

Then, sdc-cnapi /servers/44454c4c-3400-1058-8038-b4c04f365831 to see
what's the status of that server, please

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRlSiJxpQyEIGhkVOB3xGG4ZIaFeAjks5qMsjKgaJpZM4I4fhS
.

kusor · 2016-06-17T16:22:33Z

Can you get the output of sdcadm health please?

magnayn · 2016-06-17T16:23:08Z

hmm

[root@headnode (Osney) ~]# sdcadm health

/opt/smartdc/sdcadm/lib/sdcadm.js:720
vm.server_uuid].hostname;
^
TypeError: Cannot read property 'hostname' of undefined
at Object.fillOutVmInsts as func
at Object._onImmediate
(/opt/smartdc/sdcadm/node_modules/vasync/lib/vasync.js:213:20)
at processImmediate as _immediateCallback
[root@headnode (Osney) ~]#

On Fri, Jun 17, 2016 at 5:22 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

Can you get the output of sdcadm health please?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRlf6lJoxUl1E55M2865Oy_HFrFL2yks5qMsnNgaJpZM4I4fhS
.

kusor · 2016-06-17T16:25:18Z

Try sdc-role list as alternate approach, please.

magnayn · 2016-06-17T16:28:29Z

from this list, hostvolume-SFO server was destroyed in adminui

[root@headnode (Osney) ~]# sdc-role list
ALIAS SERVER UUID RAM
STATE ROLE ADMIN_IP
adminui0 headnode 54452da7-4159-48fc-ad89-4a0e58d0ea33 2048
running adminui 10.20.4.25
amon0 headnode 7d0ddd86-a441-4ffc-a30f-2d05d3866d8d 1024
running amon 10.20.4.19
amonredis0 headnode 59bc2cae-25c1-4c30-b20f-27719f5673b1 1024
running amonredis 10.20.4.17
assets0 headnode 91d346f7-f7c1-4688-bf3c-0c8235ec6ffd 128
running assets 10.20.4.2
binder0 headnode 0940ef38-3a34-46de-bed7-e5ec9cf443a0 1024
running binder 10.20.4.5
ca0 headnode 85f47220-e710-4523-bf5e-6af458a47610 4096
running ca 10.20.4.24
cloudapi0 headnode 0238ac40-ec99-4c02-83b3-650559000cca 1024
running cloudapi 10.20.4.32
cnapi0 headnode ae76109a-80a0-4a15-a1c4-4cd7ec84858b 1024
running cnapi 10.20.4.16
dhcpd0 headnode fa761d28-1596-4b0c-a9a4-3807a72b25df 128
running dhcpd 10.20.4.3
docker0 headnode da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8 4096
running docker 10.20.4.33
fwapi0 headnode 97d39281-e759-41a9-8a1e-41b0b09698f1 1024
running fwapi 10.20.4.20
hostvolume-JFK JFK 9b67fab4-4830-4bde-91e1-f33e6c2d2946 4096
running hostvolume -
hostvolume-LHR LHR 34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c 4096
running hostvolume -
hostvolume-SFO - 2873f5ac-6dd1-4750-9e9f-667e3e23d41d 4096
running hostvolume -
imgapi0 headnode 66d1c94a-c32d-4f9b-9fb5-075c705cdd03 768
running imgapi 10.20.4.15
mahi0 headnode 3263c53a-1be1-496b-ad70-95825dfaac57 1024
running mahi 10.20.4.27
manatee0 headnode f8054311-3c83-4253-8db8-86d0792c7a86 2048
running manatee 10.20.4.10
moray0 headnode a27c54e5-d36d-40bb-84a6-9eeed610d904 8192
running moray 10.20.4.11
napi0 headnode 56cac0d5-6f59-40f2-acff-ef861c4bb5a5 256
running napi 10.20.4.4
papi0 headnode 5bc500b4-adf7-462d-9906-3006cd34a699 1024
running papi 10.20.4.23
rabbitmq0 headnode 38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1 2048
running rabbitmq 10.20.4.14
redis0 headnode 03687fd1-bdfa-48d7-8054-b84fbc3ce233 1024
running redis 10.20.4.18
sapi0 headnode ae2ad258-1a3c-4e81-a4e8-be92a31debe6 512
running sapi 10.20.4.26
sdc0 headnode ed869a9c-bd30-42dc-b069-b0ae56acad28 1024
running sdc 10.20.4.22
ufds0 headnode 9bb0607c-6829-490a-8f02-81555471cb3b 8192
running ufds 10.20.4.12
vmapi0 headnode ac4fdf09-1206-4fa8-8d04-0f6ff878744d 1024
running vmapi 10.20.4.21
win headnode 2ad788ff-d3d6-4eb7-b789-c7fc66e26a83 4096
stopped - -
workflow0 headnode f354c285-02ee-4e8a-9518-6237c8378138 8192
running workflow 10.20.4.13

On Fri, Jun 17, 2016 at 5:25 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

Try sdc-role list as alternate approach, please.

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRlaW27AEytqVGpTT8ArM5-E88CcQRks5qMspxgaJpZM4I4fhS
.

kusor · 2016-06-17T16:29:47Z

Mind to give a try to: sdc-vmapi /vms/2873f5ac-6dd1-4750-9e9f-667e3e23d41d -X DELETE and try again?

magnayn · 2016-06-17T16:33:29Z

health-check and 'sdcadm update docker' are now happy

[root@headnode (Osney) ~]# sdcadm update docker
Finding candidate update images for the "docker" service.
Using channel dev
Up-to-date.

experimental update-other isn't, but I don't know if that is significant

[root@headnode (Osney) ~]# sdcadm experimental update-other
Running VMAPI migrations
Removing deprecated hostvolume instances
sdcadm experimental: error: socket hang up
[root@headnode (Osney) ~]#

FWIW
[root@headnode (Osney) ~]# sdcadm check-health
INSTANCE SERVICE HOSTNAME ALIAS
HEALTHY
54452da7-4159-48fc-ad89-4a0e58d0ea33 adminui headnode adminui0
true
7d0ddd86-a441-4ffc-a30f-2d05d3866d8d amon headnode amon0
true
59bc2cae-25c1-4c30-b20f-27719f5673b1 amonredis headnode amonredis0
true
91d346f7-f7c1-4688-bf3c-0c8235ec6ffd assets headnode assets0
true
0940ef38-3a34-46de-bed7-e5ec9cf443a0 binder headnode binder0
true
85f47220-e710-4523-bf5e-6af458a47610 ca headnode ca0
true
0238ac40-ec99-4c02-83b3-650559000cca cloudapi headnode cloudapi0
true
ae76109a-80a0-4a15-a1c4-4cd7ec84858b cnapi headnode cnapi0
true
fa761d28-1596-4b0c-a9a4-3807a72b25df dhcpd headnode dhcpd0
true
da61b9d5-75b1-4f6f-8cd5-973c2f11d8b8 docker headnode docker0
true
97d39281-e759-41a9-8a1e-41b0b09698f1 fwapi headnode fwapi0
true
9b67fab4-4830-4bde-91e1-f33e6c2d2946 hostvolume JFK
hostvolume-JFK true
34a1a9e8-4bc0-4915-9b19-8ad1a40fba3c hostvolume LHR
hostvolume-LHR true
66d1c94a-c32d-4f9b-9fb5-075c705cdd03 imgapi headnode imgapi0
true
3263c53a-1be1-496b-ad70-95825dfaac57 mahi headnode mahi0
true
f8054311-3c83-4253-8db8-86d0792c7a86 manatee headnode manatee0
true
a27c54e5-d36d-40bb-84a6-9eeed610d904 moray headnode moray0
true
56cac0d5-6f59-40f2-acff-ef861c4bb5a5 napi headnode napi0
true
5bc500b4-adf7-462d-9906-3006cd34a699 papi headnode papi0
true
38bbf9a3-e6d2-4844-8d59-a03b83b5cbc1 rabbitmq headnode rabbitmq0
true
03687fd1-bdfa-48d7-8054-b84fbc3ce233 redis headnode redis0
true
ae2ad258-1a3c-4e81-a4e8-be92a31debe6 sapi headnode sapi0
true
ed869a9c-bd30-42dc-b069-b0ae56acad28 sdc headnode sdc0
true
9bb0607c-6829-490a-8f02-81555471cb3b ufds headnode ufds0
true
ac4fdf09-1206-4fa8-8d04-0f6ff878744d vmapi headnode vmapi0
true
f354c285-02ee-4e8a-9518-6237c8378138 workflow headnode workflow0
true
44454c4c-3400-1058-8038-b4c04f365831 global headnode global
true

                                amon-agent       JFK       -

true

                                amon-agent       LHR       -

true

                                amon-agent       headnode  -

true

                                amon-relay       JFK       -

true

                                amon-relay       LHR       -

true

                                amon-relay       headnode  -

true

                                cainstsvc        JFK       -

true

                                cainstsvc        LHR       -

true

```
                                cainstsvc        headnode  -
```
true
d1dae7cf-a701-4cf4-8658-5a76d372bb0d cn-agent JFK -
true
6ec37ab5-cc89-4d14-830e-5a5da7925703 cn-agent LHR -
true
377e9319-3225-470c-8731-07b982e52f9a cn-agent headnode -
true
caed3d7b-b768-4f5e-b6f2-86b180a299d2 dockerlogger JFK -
true
d2bea47e-237a-40d4-8e58-75b324068a76 dockerlogger LHR -
true
7d4480ee-d69d-4690-870d-1a8517a98977 dockerlogger headnode -
true

                                firewaller       JFK       -

true

                                firewaller       LHR       -

true

                                firewaller       headnode  -

true

                                hagfish-watcher  JFK       -

true

                                hagfish-watcher  LHR       -

true

```
                                hagfish-watcher  headnode  -
```
true
d0d82395-bbde-4c00-9754-d0b6d68b4f8b net-agent JFK -
true
020f1746-f240-409a-9da5-aa369bb245d1 net-agent LHR -
true
6e79bed7-b019-4c3e-bab1-ca4e7470aa28 net-agent headnode -
true

                                smartlogin       JFK       -

true

                                smartlogin       LHR       -

true

```
                                smartlogin       headnode  -
```
true
fb83cb60-6770-4672-b7c6-e89b9868d332 vm-agent JFK -
true
904a1b16-fa44-4830-a9e2-b83eeb40bce5 vm-agent LHR -
true
a1dc76a3-f87a-4da1-bde4-c960bd966cab vm-agent headnode -
true

On Fri, Jun 17, 2016 at 5:29 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

Mind to give a try to: sdc-vmapi
/vms/2873f5ac-6dd1-4750-9e9f-667e3e23d41d -X DELETE and try again?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRlWX3FVlN6RrYfIO5iBE5F-CzMYVZks5qMst_gaJpZM4I4fhS
.

kusor · 2016-06-17T16:37:10Z

It looks like something isn't Ok either for VMAPI or SAPI there. Could you take a look at those service logs - within the vms - and see if there's any error?

magnayn · 2016-06-20T11:46:58Z

I had to leave for the weekend; coming back I found

svc:/manta/application/binder:default (Joyent DNS-ZooKeeper Service)
State: maintenance since Sun Jun 19 13:37:59 2016
Reason: Restarting too quickly.

(log was empty) so I've restarted it

On Fri, Jun 17, 2016 at 5:37 PM, Pedro Palazón Candel <
notifications@github.com> wrote:

It looks like something isn't Ok either for VMAPI or SAPI there. Could you
take a look at those service logs - within the vms - and see if there's any
error?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#84 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADRlRvFsNMHNLuYsy_1Ob62W1xqtjUDks5qMs05gaJpZM4I4fhS
.

magnayn closed this as completed Jun 17, 2016

magnayn reopened this Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some suspected failings when a CN falls over #84

Some suspected failings when a CN falls over #84

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016 •

edited

Loading

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 20, 2016

Some suspected failings when a CN falls over #84

Some suspected failings when a CN falls over #84

Comments

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016 • edited Loading

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 17, 2016

kusor commented Jun 17, 2016

magnayn commented Jun 20, 2016

kusor commented Jun 17, 2016 •

edited

Loading