Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IsAlive" for drivers goes red after a while #279

Closed
gonewest818 opened this issue Jun 30, 2015 · 9 comments
Closed

"IsAlive" for drivers goes red after a while #279

gonewest818 opened this issue Jun 30, 2015 · 9 comments

Comments

@gonewest818
Copy link

As the heading says: We are running the controller and drivers as docker containers on openstack VMs. Overnight the entire system is usually idle (because we are just ramping up with cosbench, trying to understand the system, learning how to construct synthetic workloads, and so on). By morning the controller shows all its drivers as "red". No indication of any error in the logs on the controller nor on the driver. We can recover by restarting the controller, but it seems something is wrong.

@gonewest818
Copy link
Author

by the way I scaled up to 32 drivers and now the controller is losing contact much more quickly now. Within a few hours all the "IsAlive" indicators are red. I have log_level=DEBUG on the controller and all drivers, but nothing is reported in any of the logs to indicate what's wrong. The drivers are responding to hits from my browser, too.

I am running the drivers in Docker, and they are launched in such a way that port 18088 inside the container is not necessarily seen as 18088 from the outside. But I am adjusting the configs accordingly and like I said, for some period of time after restarting the controller everything runs fine.

@ywang19
Copy link
Contributor

ywang19 commented Jul 9, 2015

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.

@gonewest818
Copy link
Author

We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services.

I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.

On Jul 9, 2015, at 6:38 AM, Yaguang Wang notifications@github.com wrote:

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.


Reply to this email directly or view it on GitHub.

@gonewest818
Copy link
Author

Specifically answering your question: no, we do not have http_proxy set before running cosbench.

@ywang19
Copy link
Contributor

ywang19 commented Jul 10, 2015

So could you list the information you expect to see from DEBUG level?

From: gonewest818 [mailto:notifications@github.com]
Sent: Thursday, July 09, 2015 11:15 PM
To: intel-cloud/cosbench
Cc: Wang, Yaguang
Subject: Re: [cosbench] "IsAlive" for drivers goes red after a while (#279)

We're running cosbench controller and drivers inside docker containers, on a Mesos cluster, via Marathon. Definitely there is some network address and port mapping due to docker. I prefer not to eliminate Docker from our test setup though. Mesos is how we run our production services.

I was hoping that raising the logging level to DEBUG would tell me more about the failure but there is no logging implemented in that part of the code base.

On Jul 9, 2015, at 6:38 AM, Yaguang Wang <notifications@github.commailto:notifications@github.com> wrote:

do you have http proxy set before cosbench running? it's suggested to "unset http_proxy" before running cosbench.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/279#issuecomment-120028215.

@gonewest818
Copy link
Author

Well, for example, information to assist with debugging.

On the controller side

  • timestamp for each attempt to ping a driver (what hostname/ip address and port, etc)
  • whether the "ping" was successful or not
  • if failure, then what was the error (connection dropped, no route to host, timed out, ...)
  • if retries are attempted, then how many and when

and logging from the driver side

  • timestamp for each inbound "ping" detected (from what hostname/ip address)
  • and what response was returned
  • if the case of an error, then what was the error

@gonewest818
Copy link
Author

I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception.

@ywang19
Copy link
Contributor

ywang19 commented Jul 20, 2015

Thanks for your work, feel free to contribute your code.

-yaguang

From: gonewest818 [mailto:notifications@github.com]
Sent: Monday, July 20, 2015 12:39 PM
To: intel-cloud/cosbench
Cc: Wang, Yaguang
Subject: Re: [cosbench] "IsAlive" for drivers goes red after a while (#279)

I'm working on this myself. As far as I can see the controller's "ping" is just connecting to the driver hostnames and ports one at a time. I'm logging the attempt and, if an exception is thrown, then I log the exception.


Reply to this email directly or view it on GitHubhttps://github.com//issues/279#issuecomment-122758394.

ywang19 pushed a commit that referenced this issue Aug 6, 2015
Signed-off-by: ywang19 <yaguang.wang@intel.com>
@ywang19
Copy link
Contributor

ywang19 commented Aug 7, 2015

see upcoming 0.4.2.c3 for fixing.

@ywang19 ywang19 closed this as completed Aug 7, 2015
ywang19 pushed a commit that referenced this issue Jan 27, 2016
Signed-off-by: ywang19 <yaguang.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants