"IsAlive" for drivers goes red after a while #279

gonewest818 · 2015-06-30T19:39:03Z

As the heading says: We are running the controller and drivers as docker containers on openstack VMs. Overnight the entire system is usually idle (because we are just ramping up with cosbench, trying to understand the system, learning how to construct synthetic workloads, and so on). By morning the controller shows all its drivers as "red". No indication of any error in the logs on the controller nor on the driver. We can recover by restarting the controller, but it seems something is wrong.

gonewest818 · 2015-07-02T00:09:53Z

by the way I scaled up to 32 drivers and now the controller is losing contact much more quickly now. Within a few hours all the "IsAlive" indicators are red. I have log_level=DEBUG on the controller and all drivers, but nothing is reported in any of the logs to indicate what's wrong. The drivers are responding to hits from my browser, too.

I am running the drivers in Docker, and they are launched in such a way that port 18088 inside the container is not necessarily seen as 18088 from the outside. But I am adjusting the configs accordingly and like I said, for some period of time after restarting the controller everything runs fine.

ywang19 · 2015-07-09T13:38:56Z

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.

gonewest818 · 2015-07-09T15:15:21Z

We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services.

I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.

On Jul 9, 2015, at 6:38 AM, Yaguang Wang notifications@github.com wrote:

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.

—
Reply to this email directly or view it on GitHub.

gonewest818 · 2015-07-09T22:50:57Z

Specifically answering your question: no, we do not have http_proxy set before running cosbench.

ywang19 · 2015-07-10T00:24:46Z

So could you list the information you expect to see from DEBUG level?

From: gonewest818 [mailto:notifications@github.com]
Sent: Thursday, July 09, 2015 11:15 PM
To: intel-cloud/cosbench
Cc: Wang, Yaguang
Subject: Re: [cosbench] "IsAlive" for drivers goes red after a while (#279)

We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services.

I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.

On Jul 9, 2015, at 6:38 AM, Yaguang Wang <notifications@github.com mailto:notifications@github.com> wrote:

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/279#issuecomment-120028215.

gonewest818 · 2015-07-10T03:21:45Z

Well, for example, information to assist with debugging.

On the controller side

timestamp for each attempt to ping a driver (what hostname/ip address and port, etc)
whether the "ping" was successful or not
if failure, then what was the error (connection dropped, no route to host, timed out, ...)
if retries are attempted, then how many and when

and logging from the driver side

timestamp for each inbound "ping" detected (from what hostname/ip address)
and what response was returned
if the case of an error, then what was the error

gonewest818 · 2015-07-20T04:38:53Z

I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception.

ywang19 · 2015-07-20T05:10:54Z

Thanks for your work, feel free to contribute your code.

-yaguang

From: gonewest818 [mailto:notifications@github.com]
Sent: Monday, July 20, 2015 12:39 PM
To: intel-cloud/cosbench
Cc: Wang, Yaguang
Subject: Re: [cosbench] "IsAlive" for drivers goes red after a while (#279)

I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/279#issuecomment-122758394.

Signed-off-by: ywang19 <yaguang.wang@intel.com>

ywang19 · 2015-08-07T05:01:43Z

see upcoming 0.4.2.c3 for fixing.

Signed-off-by: ywang19 <yaguang.wang@intel.com>

ywang19 pushed a commit that referenced this issue Aug 6, 2015

fix issue #279, IsAliv for drivers goes red after a while

446c564

Signed-off-by: ywang19 <yaguang.wang@intel.com>

ywang19 closed this as completed Aug 7, 2015

ywang19 mentioned this issue Aug 13, 2015

It appears that Cosbench does not properly close driver sockets. #275

Closed

ywang19 pushed a commit that referenced this issue Jan 27, 2016

fix issue #279, IsAliv for drivers goes red after a while

cf13b08

Signed-off-by: ywang19 <yaguang.wang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"IsAlive" for drivers goes red after a while #279

"IsAlive" for drivers goes red after a while #279

gonewest818 commented Jun 30, 2015

gonewest818 commented Jul 2, 2015

ywang19 commented Jul 9, 2015

gonewest818 commented Jul 9, 2015

gonewest818 commented Jul 9, 2015

ywang19 commented Jul 10, 2015

gonewest818 commented Jul 10, 2015

gonewest818 commented Jul 20, 2015

ywang19 commented Jul 20, 2015

ywang19 commented Aug 7, 2015

"IsAlive" for drivers goes red after a while #279

"IsAlive" for drivers goes red after a while #279

Comments

gonewest818 commented Jun 30, 2015

gonewest818 commented Jul 2, 2015

ywang19 commented Jul 9, 2015

gonewest818 commented Jul 9, 2015

gonewest818 commented Jul 9, 2015

ywang19 commented Jul 10, 2015

gonewest818 commented Jul 10, 2015

gonewest818 commented Jul 20, 2015

ywang19 commented Jul 20, 2015

ywang19 commented Aug 7, 2015