You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Datafeeds that have an end time auto-close their job when they stop. They do this when they stop due to reaching their end time, but also usually when they are stopped by an API call.
However, there is an inconsistency in the "stopped by an API call" part. If the datafeed gets stopped by an API call when it is not assigned to a node (for example when the node it was originally running on has left the cluster, and it hasn't yet been reassigned), stopping the datafeed simply cancels its persistent task and hence the associated job will remain open.
This sort of inconsistency is more evidence that we should move towards a world where the job and datafeed are one single thing.
The problem described in this issue will be a rare occurrence, and is not particularly hard to recover from manually if it is noticed. But the issue is that manual intervention is required so in situations where nobody is looking at the state of the ML jobs the job could unnecessarily remain open for a very long time, just wasting resources.
The text was updated successfully, but these errors were encountered:
Datafeeds that have an end time auto-close their job when they stop. They do this when they stop due to reaching their end time, but also usually when they are stopped by an API call.
However, there is an inconsistency in the "stopped by an API call" part. If the datafeed gets stopped by an API call when it is not assigned to a node (for example when the node it was originally running on has left the cluster, and it hasn't yet been reassigned), stopping the datafeed simply cancels its persistent task and hence the associated job will remain open.
This sort of inconsistency is more evidence that we should move towards a world where the job and datafeed are one single thing.
The problem described in this issue will be a rare occurrence, and is not particularly hard to recover from manually if it is noticed. But the issue is that manual intervention is required so in situations where nobody is looking at the state of the ML jobs the job could unnecessarily remain open for a very long time, just wasting resources.
The text was updated successfully, but these errors were encountered: