Process and agent state management #64

JanKoppe · 2017-02-27T11:26:44Z

While looking at #53 and thinking about how this could be implemented the best way, I've noticed a few issues:

Completely separate processes

With #52 we gained the ability to launch the capture, schedule and ingest processes separately. I think that this is an important feature (ability to run as independent services on the system), but makes it hard to know which processes are actually running.

My proposal for this is to create .pid files for each service (even if we use run_all) which have to be checked before startup, if they exist, check if the process with the pid is still alive. If so exit, else start process. Obviously we should only ever have one process of each type, otherwise we will place ourselves in a special kind of hell.

Agent state management

This is a tricky one, as it is tied to the constraints of capture agent states in opencast. We are only ever able to define one state.

But what if we start recording, while a ingest process ist still working, the ingest process finishes and sets the state to idle although we have not finished recording yet? This is a possible scenario with tightly clocked events and slow uplink.

My proposal for this is to implement a internal state table for each process (working or idle), which will then be used to set the state according to a priority list:

offline
capturing
uploading
shutting_down (not used anywhere yet)
idle

Say every process but the capture process is idle, then we would set the agent state to capturing. Now our ingest process starts working. We do not change the agent state to uploading, because capturing supercedes it. As soon as the capture process is idle again, the ingest process is first working in priority list, so agent state is now uploading, etc.

With respect to my comment in #53 the behaviour for the scheduler process needs to be special: if the scheduler process exists, the internal state is idle. If there is no scheduler process the internal state is working (okay, that is a bad name. suggestions?), we are absolutely offline. If any other process would take precedence over this, it would give the illusion that the agent is ready to fetch new scheduled events.

The text was updated successfully, but these errors were encountered:

JanKoppe · 2017-02-27T11:33:43Z

While re-reading this I noticed that I should put this in writing too:

I guess if we could live with saying that it is the users responsibility to always have exactly one process of each type running, we could skip the .pid file part and go straight to the internal state database.

lkiesow · 2017-02-27T16:15:19Z

My suggestion for the processes would be anyway to have them launched separately by Systemd. That way the system can ensure that things are restarted, … If you want to use SysV-init (or whatever) instead, then let that create the pid files.

A status table, it makes sense. Though here are a few thoughts:

The single CA state is indeed a problem. I will bring it up at the CA session at the conference. I think Opencast should support a set of states for each capture agent instead.
Having scheduler xor capturer offline should set the state to error since it might break recordings. I'm not so sure about the ingest part (I'm thinking about the backup mode).
Do we want a keep-alive for the state?
All that would probably deserve another service keeping track of the ca state ;-D

JanKoppe · 2017-02-28T07:47:47Z

Single CA state: That would be the best thing, yes. With add agent state and centralized reporting to opencast #65 That would be very easy to adapt to now.
error: I'm not quite sure what you mean by "might break recordings." Can you elaborate?
Keep-Alive: This would be easy to implement in add agent state and centralized reporting to opencast #65, using timestamps and a timeout config. I think that this would indeed be a good idea: When a service fails, it might fail because of outside cirumstances, which prevent it from restarting, in which case it cannot update back to stopped.
Agentstate service is what I did. :)

JanKoppe · 2017-02-28T11:05:26Z

A keep-alive would be perfect for #48.

JanKoppe · 2017-03-10T11:37:03Z

Continue keep-alive issue in #76 to keep this tidy.

JanKoppe mentioned this issue Feb 27, 2017

Set Capture Agent to Offline #53

Closed

JanKoppe mentioned this issue Feb 27, 2017

add agent state and centralized reporting to opencast #65

Merged

JanKoppe added this to the 2.0 milestone Mar 6, 2017

JanKoppe mentioned this issue Mar 6, 2017

Heartbeat? #48

Closed

JanKoppe modified the milestones: 2.1, 2.0 Mar 10, 2017

JanKoppe mentioned this issue Mar 10, 2017

keep-alive for internal services #76

Closed

JanKoppe closed this as completed Mar 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process and agent state management #64

Process and agent state management #64

JanKoppe commented Feb 27, 2017

JanKoppe commented Feb 27, 2017 •

edited

lkiesow commented Feb 27, 2017

JanKoppe commented Feb 28, 2017

JanKoppe commented Feb 28, 2017

JanKoppe commented Mar 10, 2017

Process and agent state management #64

Process and agent state management #64

Comments

JanKoppe commented Feb 27, 2017

JanKoppe commented Feb 27, 2017 • edited

lkiesow commented Feb 27, 2017

JanKoppe commented Feb 28, 2017

JanKoppe commented Feb 28, 2017

JanKoppe commented Mar 10, 2017

JanKoppe commented Feb 27, 2017 •

edited