Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process and agent state management #64

Closed
JanKoppe opened this issue Feb 27, 2017 · 5 comments
Closed

Process and agent state management #64

JanKoppe opened this issue Feb 27, 2017 · 5 comments
Milestone

Comments

@JanKoppe
Copy link
Contributor

While looking at #53 and thinking about how this could be implemented the best way, I've noticed a few issues:

Completely separate processes

With #52 we gained the ability to launch the capture, schedule and ingest processes separately. I think that this is an important feature (ability to run as independent services on the system), but makes it hard to know which processes are actually running.

My proposal for this is to create .pid files for each service (even if we use run_all) which have to be checked before startup, if they exist, check if the process with the pid is still alive. If so exit, else start process. Obviously we should only ever have one process of each type, otherwise we will place ourselves in a special kind of hell.

Agent state management

This is a tricky one, as it is tied to the constraints of capture agent states in opencast. We are only ever able to define one state.

But what if we start recording, while a ingest process ist still working, the ingest process finishes and sets the state to idle although we have not finished recording yet? This is a possible scenario with tightly clocked events and slow uplink.

My proposal for this is to implement a internal state table for each process (working or idle), which will then be used to set the state according to a priority list:

  1. offline
  2. capturing
  3. uploading
  4. shutting_down (not used anywhere yet)
  5. idle

Say every process but the capture process is idle, then we would set the agent state to capturing. Now our ingest process starts working. We do not change the agent state to uploading, because capturing supercedes it. As soon as the capture process is idle again, the ingest process is first working in priority list, so agent state is now uploading, etc.

With respect to my comment in #53 the behaviour for the scheduler process needs to be special: if the scheduler process exists, the internal state is idle. If there is no scheduler process the internal state is working (okay, that is a bad name. suggestions?), we are absolutely offline. If any other process would take precedence over this, it would give the illusion that the agent is ready to fetch new scheduled events.

@JanKoppe
Copy link
Contributor Author

JanKoppe commented Feb 27, 2017

While re-reading this I noticed that I should put this in writing too:

I guess if we could live with saying that it is the users responsibility to always have exactly one process of each type running, we could skip the .pid file part and go straight to the internal state database.

@lkiesow
Copy link
Member

lkiesow commented Feb 27, 2017

My suggestion for the processes would be anyway to have them launched separately by Systemd. That way the system can ensure that things are restarted, … If you want to use SysV-init (or whatever) instead, then let that create the pid files.

A status table, it makes sense. Though here are a few thoughts:

  • The single CA state is indeed a problem. I will bring it up at the CA session at the conference. I think Opencast should support a set of states for each capture agent instead.
  • Having scheduler xor capturer offline should set the state to error since it might break recordings. I'm not so sure about the ingest part (I'm thinking about the backup mode).
  • Do we want a keep-alive for the state?
  • All that would probably deserve another service keeping track of the ca state ;-D

@JanKoppe
Copy link
Contributor Author

  • Single CA state: That would be the best thing, yes. With add agent state and centralized reporting to opencast #65 That would be very easy to adapt to now.
  • error: I'm not quite sure what you mean by "might break recordings." Can you elaborate?
  • Keep-Alive: This would be easy to implement in add agent state and centralized reporting to opencast #65, using timestamps and a timeout config. I think that this would indeed be a good idea: When a service fails, it might fail because of outside cirumstances, which prevent it from restarting, in which case it cannot update back to stopped.
  • Agentstate service is what I did. :)

@JanKoppe
Copy link
Contributor Author

A keep-alive would be perfect for #48.

@JanKoppe JanKoppe added this to the 2.0 milestone Mar 6, 2017
@JanKoppe JanKoppe mentioned this issue Mar 6, 2017
@JanKoppe JanKoppe modified the milestones: 2.1, 2.0 Mar 10, 2017
@JanKoppe
Copy link
Contributor Author

Continue keep-alive issue in #76 to keep this tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants