Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436

azakkerman · 2015-04-28T18:38:09Z

I ran all tests and they seem to pass.

…tification behind that interface

elingg · 2015-04-28T19:01:43Z

src/main/scala/org/apache/mesos/chronos/scheduler/api/JobManagementResource.scala

@@ -100,8 +102,7 @@ class JobManagementResource @Inject()(val jobScheduler: JobScheduler,
            }
        }
      }


This deleted job notification will no longer be sent. No notification is sent in deregisterJob. We should still send the notification here.

deregisterJob calls jobsObserver.onEvent(JobRemoved(job)), I just need to handle that in NotifyingJobObserver

sounds good, could you add that so that this notification is still sent?

Ah, I see it was just added. Thanks!

elingg · 2015-04-28T20:08:47Z

src/main/scala/org/apache/mesos/chronos/notification/NotifyingJobsObserver.scala

+import com.google.inject.Inject
+import org.apache.mesos.chronos.scheduler.jobs._
+import org.joda.time.{DateTimeZone, DateTime}
+


I would prefer the naming JobNotificationObserver

elingg · 2015-04-28T20:40:13Z

src/main/scala/org/apache/mesos/chronos/notification/JobNotificationObserver.scala

+class JobNotificationObserver @Inject()(val notificationClients: List[ActorRef] = List(),
+                                      val clusterName: Option[String] = None) extends JobsObserver {
+  private[this] val log = Logger.getLogger(getClass.getName)
+  val clusterPrefix = clusterName.map(name => s"[$name]").getOrElse("")


Let's evaluate clusterPrefix inside JobRetriesExhausted (only where it is used)

It is also used in JobRemoved case

ok, makes sense

elingg · 2015-04-28T21:23:18Z

src/main/scala/org/apache/mesos/chronos/scheduler/jobs/stats/JobStats.scala

-            message=Some(taskStatus.getMessage),
-            attempt=Some(attempt),
-            isFailure=Some(true))
-  }


can we have a more descriptive variable name other than j?

sure, another option is jobInfo or jobNameOrJob.

elingg · 2015-04-28T21:28:43Z

Great work, @azakkerman! I think use of an common interface for JobStats and JobNotifications is much cleaner. Thanks for addressing my comments.

…bserver does not handle a particular event

elingg · 2015-04-28T23:20:29Z

LGTM. @brndnmtthws will also be reviewing this since this is a significant PR.

brndnmtthws · 2015-04-29T13:36:29Z

src/test/scala/org/apache/mesos/chronos/scheduler/jobs/MockJobUtils.scala

+import org.joda.time.Period
+import org.specs2.mock._
+
+object MockJobUtils extends Mockito {


Thanks for adding the test!

brndnmtthws · 2015-04-29T13:37:20Z

This looks great to me. Before we merge, can you please test this in a cluster? I think we can use the test-suite for that purpose. @elingg can you help with that?

elingg · 2015-04-29T16:04:57Z

@brndnmtthws, yes, I will test in my own cluster. One thing is that we haven't had a chance to hook up Chronos with Cassandra in the test cluster, and I'm not sure we will get to it today. Do you think we should get that up before merging?

@azakkerman, could you also try testing in a cluster as well? @kensipe should be able to help you with set up as well.

brndnmtthws · 2015-04-29T16:20:10Z

It's pretty easy, just point it at C* (using Mesos-DNS names). You'll have to create the keyspace manually, but we should really have it happen within Chronos (CREATE KEYSPACE IF NOT EXISTS [derp] WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };).

elingg · 2015-04-29T17:37:09Z

ok @brndnmtthws I will be sure test with Cassandra before merging it. If @azakkerman also has a chance to test earlier than I do, that would be great.

azakkerman · 2015-04-29T17:39:44Z

I am trying to bring up DCOS in AWS with Ken's help to try testing it.

On Wed, Apr 29, 2015 at 10:37 AM, Elizabeth Lingg notifications@github.com
wrote:

ok @brndnmtthws https://github.com/brndnmtthws I will be sure test with
Cassandra before merging it. If @azakkerman
https://github.com/azakkerman also has a chance to test earlier than I
do, that would be great.

—
Reply to this email directly or view it on GitHub
#436 (comment).

azakkerman · 2015-04-29T19:54:10Z

I have a DCOS cluster running in AWS with Cassandra deployed, but don't quite know how to configure chronos to run against that Cassandra instance. Who is a good resource for this?

elingg · 2015-04-29T20:19:03Z

Hello @azakkerman, you will have to run your version on Chronos with the cassandra options enabled and simply point to cassandra-dcos-node.cassandra.dcos.mesos for the cassandra hostname. The PORT on DCOS is 9160. You will also need to create the keyspace as Brenden mentioned above:
CREATE KEYSPACE IF NOT EXISTS [derp] WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };)

elingg · 2015-04-29T20:20:12Z

If you would like to discuss over chat since I am WFH today, maybe you could ask @kensipe to add you to our slack channel. We can talk via IM. I will also be in the office tomorrow.

elingg · 2015-04-29T20:22:33Z

Here are the Cassandra options for Chronos:

"cassandra_consistency”: Consistency to use for Cassandra (default = ANY).

“cassandra_contact_points”: Comma separated list of contact points for Cassandra.

“cassandra_keyspace”: Keyspace to use for Cassandra (default = metrics).

“cassandra_port”: Port for Cassandra (default = 9042).

“cassandra_table”: Table to use for Cassandra (default = chronos).

“cassandra_ttl”: TTL for records written to Cassandra (default = 31536000).

azakkerman · 2015-04-29T20:38:42Z

I was able to launch chronos with my custom jar, and write to cassandra. I see rows in cassandra corresponding to my test job instances. What sort of tests should I perform?

azakkerman · 2015-04-29T20:39:04Z

cqlsh> select * from metrics.chronos
... ;

elingg · 2015-04-29T20:53:15Z

Nice job! Can you see the job history in the UI?

azakkerman · 2015-04-29T21:01:04Z

elingg · 2015-04-29T21:04:15Z

Looking good! Can you try a job that takes longer to run to confirm you can see some stats? Here is an example of what the UI should look like: #402

azakkerman · 2015-04-29T21:35:26Z

I ran a sleep job that exits 1 in the end and verified that the email notification still works. Also deleted a job and got some emails. What specific stats are you looking for?

elingg · 2015-04-29T21:40:35Z

Notification emails are also a great test. If you run a job that runs longer than 0 or 1 sec (let's say sleep 5), you should see stats like in the first screen shot in this link: #402

elingg · 2015-04-29T21:40:49Z

After that, we should have tested sufficiently!

azakkerman · 2015-04-29T21:46:07Z

Btw, it looks like if one had a job with limited repeat count and the job gets disabled (because repeat count is now 0), there is no way to re-enable the job by incrementing the repeat count and updating start time, the job remains disabled. I think this is a bug in the UI.

elingg · 2015-04-29T21:47:31Z

Very possible that it is a UI bug. Anyway your test cases sound great! I think we are good to merge. Any final objections @brndnmtthws?

brndnmtthws · 2015-05-03T05:51:19Z

Wonderful!

Introduce JobsObserver interface and place both JobStats and error notification behind that interface

Introduce JobsObserver interface and place both JobStats and error no…

8d3ad7d

…tification behind that interface

elingg reviewed Apr 28, 2015
View reviewed changes

Properly handle JobRemoved event to send appropriate notification

e8d6270

elingg reviewed Apr 28, 2015
View reviewed changes

azakkerman added 2 commits April 28, 2015 13:20

Tweak variable name for consistency

ac67774

Renamed notification observer class

2dc0f72

elingg reviewed Apr 28, 2015
View reviewed changes

Use conventional if/else for option creation

46c0090

elingg reviewed Apr 28, 2015
View reviewed changes

azakkerman added 3 commits April 28, 2015 14:31

Rename util class

07bc0ba

Refactor JobsObserver into PartialFunction and log.info any time an o…

b586d37

…bserver does not handle a particular event

Rename variable for clarity

6dd59cc

brndnmtthws reviewed Apr 29, 2015
View reviewed changes

azakkerman force-pushed the master branch from 63f3f81 to 6dd59cc Compare April 29, 2015 18:27

azakkerman mentioned this pull request Apr 29, 2015

Cleanup of compiler warnings and complete JobStats refactoring #437

Merged

brndnmtthws added a commit that referenced this pull request May 3, 2015

Merge pull request #436 from azakkerman/master

3ad15f0

Introduce JobsObserver interface and place both JobStats and error notification behind that interface

brndnmtthws merged commit 3ad15f0 into mesos:master May 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436

Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436

azakkerman commented Apr 28, 2015

elingg Apr 28, 2015

azakkerman Apr 28, 2015

elingg Apr 28, 2015

elingg Apr 28, 2015

elingg Apr 28, 2015

elingg Apr 28, 2015

azakkerman Apr 28, 2015

elingg Apr 28, 2015

elingg Apr 28, 2015

azakkerman Apr 28, 2015

elingg Apr 28, 2015

elingg commented Apr 28, 2015

elingg commented Apr 28, 2015

brndnmtthws Apr 29, 2015

brndnmtthws commented Apr 29, 2015

elingg commented Apr 29, 2015

brndnmtthws commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

brndnmtthws commented May 3, 2015

Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436

Introduce JobsObserver interface and place both JobStats and error notification behind that interface #436

Conversation

azakkerman commented Apr 28, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elingg commented Apr 28, 2015

elingg commented Apr 28, 2015

Choose a reason for hiding this comment

brndnmtthws commented Apr 29, 2015

elingg commented Apr 29, 2015

brndnmtthws commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

elingg commented Apr 29, 2015

azakkerman commented Apr 29, 2015

elingg commented Apr 29, 2015

brndnmtthws commented May 3, 2015