New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436
Conversation
…tification behind that interface
@@ -100,8 +102,7 @@ class JobManagementResource @Inject()(val jobScheduler: JobScheduler, | |||
} | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deleted job notification will no longer be sent. No notification is sent in deregisterJob. We should still send the notification here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deregisterJob calls jobsObserver.onEvent(JobRemoved(job)), I just need to handle that in NotifyingJobObserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, could you add that so that this notification is still sent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see it was just added. Thanks!
import com.google.inject.Inject | ||
import org.apache.mesos.chronos.scheduler.jobs._ | ||
import org.joda.time.{DateTimeZone, DateTime} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the naming JobNotificationObserver
class JobNotificationObserver @Inject()(val notificationClients: List[ActorRef] = List(), | ||
val clusterName: Option[String] = None) extends JobsObserver { | ||
private[this] val log = Logger.getLogger(getClass.getName) | ||
val clusterPrefix = clusterName.map(name => s"[$name]").getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's evaluate clusterPrefix inside JobRetriesExhausted (only where it is used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also used in JobRemoved case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense
message=Some(taskStatus.getMessage), | ||
attempt=Some(attempt), | ||
isFailure=Some(true)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a more descriptive variable name other than j?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jobOrName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, another option is jobInfo or jobNameOrJob.
Great work, @azakkerman! I think use of an common interface for JobStats and JobNotifications is much cleaner. Thanks for addressing my comments. |
…bserver does not handle a particular event
LGTM. @brndnmtthws will also be reviewing this since this is a significant PR. |
import org.joda.time.Period | ||
import org.specs2.mock._ | ||
|
||
object MockJobUtils extends Mockito { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test!
This looks great to me. Before we merge, can you please test this in a cluster? I think we can use the test-suite for that purpose. @elingg can you help with that? |
@brndnmtthws, yes, I will test in my own cluster. One thing is that we haven't had a chance to hook up Chronos with Cassandra in the test cluster, and I'm not sure we will get to it today. Do you think we should get that up before merging? @azakkerman, could you also try testing in a cluster as well? @kensipe should be able to help you with set up as well. |
It's pretty easy, just point it at C* (using Mesos-DNS names). You'll have to create the keyspace manually, but we should really have it happen within Chronos ( |
ok @brndnmtthws I will be sure test with Cassandra before merging it. If @azakkerman also has a chance to test earlier than I do, that would be great. |
I am trying to bring up DCOS in AWS with Ken's help to try testing it. On Wed, Apr 29, 2015 at 10:37 AM, Elizabeth Lingg notifications@github.com
|
I have a DCOS cluster running in AWS with Cassandra deployed, but don't quite know how to configure chronos to run against that Cassandra instance. Who is a good resource for this? |
Hello @azakkerman, you will have to run your version on Chronos with the cassandra options enabled and simply point to cassandra-dcos-node.cassandra.dcos.mesos for the cassandra hostname. The PORT on DCOS is 9160. You will also need to create the keyspace as Brenden mentioned above: |
If you would like to discuss over chat since I am WFH today, maybe you could ask @kensipe to add you to our slack channel. We can talk via IM. I will also be in the office tomorrow. |
Here are the Cassandra options for Chronos: "cassandra_consistency”: Consistency to use for Cassandra (default = ANY). “cassandra_contact_points”: Comma separated list of contact points for Cassandra. “cassandra_keyspace”: Keyspace to use for Cassandra (default = metrics). “cassandra_port”: Port for Cassandra (default = 9042). “cassandra_table”: Table to use for Cassandra (default = chronos). “cassandra_ttl”: TTL for records written to Cassandra (default = 31536000). |
I was able to launch chronos with my custom jar, and write to cassandra. I see rows in cassandra corresponding to my test job instances. What sort of tests should I perform? |
cqlsh> select * from metrics.chronos id | ts | attempt | is_failure | job_name | job_owner | job_parents | job_schedule | message | slave_id | task_state |
Nice job! Can you see the job history in the UI? |
Looking good! Can you try a job that takes longer to run to confirm you can see some stats? Here is an example of what the UI should look like: #402 |
I ran a sleep job that exits 1 in the end and verified that the email notification still works. Also deleted a job and got some emails. What specific stats are you looking for? |
Notification emails are also a great test. If you run a job that runs longer than 0 or 1 sec (let's say sleep 5), you should see stats like in the first screen shot in this link: #402 |
After that, we should have tested sufficiently! |
Very possible that it is a UI bug. Anyway your test cases sound great! I think we are good to merge. Any final objections @brndnmtthws? |
Wonderful! |
Introduce JobsObserver interface and place both JobStats and error notification behind that interface
I ran all tests and they seem to pass.