Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A "*" Constraint for unique should exist #846

Closed
florianleibert opened this issue Dec 2, 2014 · 49 comments
Closed

A "*" Constraint for unique should exist #846

florianleibert opened this issue Dec 2, 2014 · 49 comments

Comments

@florianleibert
Copy link
Member

@florianleibert florianleibert commented Dec 2, 2014

This constraint would ensure that a command is run on all slaves in the cluster.

@sabraham
Copy link

@sabraham sabraham commented Dec 4, 2014

I am very interested in this kind of deployment (and more generally, running 1 and only 1 instance on each slave with a given attribute). In this case, I think the interface should disallow scaling the number of instances entirely, as the instance count should always be #(slaves with given attribute).

@normanjoyner
Copy link

@normanjoyner normanjoyner commented Dec 4, 2014

👍 This is potentially a very useful constraint.

@aseppala
Copy link

@aseppala aseppala commented Dec 16, 2014

+1, this is definitely a common usecase. I would use it combined with UNIQUE hostname constraint to run exactly one instance on each host.

@villind
Copy link

@villind villind commented Dec 16, 2014

+1

@ConnorDoyle
Copy link
Contributor

@ConnorDoyle ConnorDoyle commented Dec 16, 2014

Makes sense, lots of support, let's do it.

@ConnorDoyle
Copy link
Contributor

@ConnorDoyle ConnorDoyle commented Dec 16, 2014

The only strange part in the API is the instances key, which will have no bearing on the number of instances we actually launch. Any suggestions of how you'd like to see this handled? One option is add a onePerHost (mod naming) boolean option to the app definition and ignore the instances field if that is not supplied.

@ConnorDoyle ConnorDoyle added this to the 0.7.7 milestone Dec 16, 2014
@aseppala
Copy link

@aseppala aseppala commented Dec 16, 2014

How about

"instances": "*",
"constraints": [["hostname", "UNIQUE"]]
@tnn1t1s
Copy link

@tnn1t1s tnn1t1s commented Dec 17, 2014

exactly. and this functionality becomes critical as the size of the cluster
becomes non-deterministic.

On Mon, Dec 1, 2014 at 8:38 PM, Florian Leibert notifications@github.com
wrote:

This constraint would ensure that a command is run on all slaves in the
cluster.


Reply to this email directly or view it on GitHub
#846.

@BrianHicks
Copy link
Contributor

@BrianHicks BrianHicks commented Dec 17, 2014

👍 for "instances": "*" - would it just turn the constraint into a selector in general, though?

@defender
Copy link

@defender defender commented Dec 19, 2014

+1

5 similar comments
@andrewortman
Copy link

@andrewortman andrewortman commented Dec 19, 2014

👍

@sttts
Copy link
Contributor

@sttts sttts commented Dec 19, 2014

+1

@prantaaho
Copy link

@prantaaho prantaaho commented Dec 29, 2014

👍

@amiorin
Copy link

@amiorin amiorin commented Dec 31, 2014

👍

@pigeonflight
Copy link

@pigeonflight pigeonflight commented Jan 1, 2015

+1

@ConnorDoyle ConnorDoyle modified the milestones: 0.8.0, 0.8.1 Jan 20, 2015
@defender
Copy link

@defender defender commented Feb 11, 2015

Hi ConnorDoyle
A "*" Constraint for unique should exist #846 do you think it will be released at 0.8.1, this bug was already reassign several times :)

thanks.

@drexin
Copy link
Contributor

@drexin drexin commented Feb 11, 2015

We can't ensure that a task is running on all slaves in the cluster for following reasons:

  1. we can't guarantee that there are enough resources on every machine
  2. in a multi framework cluster, we don't know if we will ever receive offers for all the machines
  3. we could never tell if it has been deployed successfully, because we don't know about all the machines in the cluster

I think this would need direct support from Mesos.

@drexin drexin modified the milestones: 0.8.1, 0.8.2 Feb 24, 2015
@emilevauge
Copy link

@emilevauge emilevauge commented Apr 9, 2015

+1

@clehene
Copy link

@clehene clehene commented Feb 10, 2016

Here's a the equivalent from Kubernetes - it's called "Daemon Sets"
Curious if Kubernetes on Mesos supports this.

As we describe Marathon as "the init or upstart daemon" it would make sense to have this feature ;)

@drexin I understand the complexity, I think the right way to go about this is to figure out what exactly is needed and discuss this with Mesos devs.

@philipnrmn
Copy link
Contributor

@philipnrmn philipnrmn commented Feb 12, 2016

Some great documentation on using GROUP_BY and another feature request for this in #3220

@scatterbrain
Copy link

@scatterbrain scatterbrain commented Mar 23, 2016

+1 really important for many kinds of services (monitoring etc)

@graphex
Copy link

@graphex graphex commented Mar 31, 2016

+1 a very useful addition. Yes, this could be done with less complexity to Marathon by provisioning all services on all slaves up front, but that just shifts the complexity and burden of updates and redeploys. If a frequently changing app needs to be on each host, having marathon manage redeployment/restarting/configuration in a consistent manner would be very helpful. Would be worth coordinating any necessary changes to Mesos, and appropriate separation of concerns. Is there a corresponding thread in Mesos for this?

@drewrobb
Copy link
Contributor

@drewrobb drewrobb commented Mar 31, 2016

I have been doing this by setting an autoscaling rule against the metrics we populate for the number of running slaves against an app with a hostname:UNIQUE constraint that we want on every slave. (I also have a simple internally written autoscaling service for our marathon apps). We also configured each slave to have a small amount of resources reserved for a specific role, and that role is only used by apps that need to run on every slave. It has been working pretty nicely in practice.

@mvanholsteijn
Copy link

@mvanholsteijn mvanholsteijn commented Apr 8, 2016

Being able to define a application that needs to run on every slave is essential functionality of a scheduler.

In Nomad these are called system Jobs, in CoreOS Fleet these are called global units.

So, what will the Marathon equivalent be called and how do we activate them?

It would make our live really easy. We need it now.......

@edurdias
Copy link

@edurdias edurdias commented May 20, 2016

+1

@clehene
Copy link

@clehene clehene commented May 20, 2016

Marathon shouldn't guarantee that resources exist.
We can guarantee resources in Mesos through reservations.

The 1-1 framework - role "binding" may become the next issue, though as we'd need / like to ensure resources by workloads (in Marathon) rather than have it for the entire framework.

@vkhatri
Copy link

@vkhatri vkhatri commented Jun 22, 2016

+1

@JamieCressey
Copy link

@JamieCressey JamieCressey commented Oct 15, 2016

+1 this would be hugely beneficial for monitoring and log extraction etc.

@mimmus
Copy link

@mimmus mimmus commented Oct 26, 2016

We need this now 👍

@fksimon
Copy link

@fksimon fksimon commented Nov 14, 2016

+1

@sybrandy
Copy link

@sybrandy sybrandy commented Nov 15, 2016

@JamieCressey is correct. We're in the same situation where we want to make sure telegraf and logspout are running on all of the slaves for metrics and log collection. It would be great if Marathon/Mesos can ensure that we have instances of each on every node in case failed nodes come back online or we add additional nodes to the cluster.

@jdef
Copy link
Contributor

@jdef jdef commented Nov 15, 2016

Running daemons on every node in the cluster in an efficient, robust manner will almost certainly require assistance from Mesos. If you want this feature then I encourage you to vote for support in Mesos because without that we're a bit stranded in Marathon-land.

A workaround meanwhile is to use a hostname UNIQUE constraint with instances set to a greater number of agents than you ever expect to have in your cluster.

https://issues.apache.org/jira/browse/MESOS-6595

@jdef jdef added the is blocked by label Nov 15, 2016
@silvamerica
Copy link

@silvamerica silvamerica commented Nov 15, 2016

@jdef Does your workaround leave hanging deployments around, since Marathon will never be able to satisfy the number of instances specified?

@jdef
Copy link
Contributor

@jdef jdef commented Nov 15, 2016

Using a greater number of instances is not ideal for interaction w/ the
deployment manager. If you're running a fairly static cluster then you
could set # instances to match the number of agents you expect to have. If
you expect to increase the number of agents in the cluster, then increase
instances to match just prior to adding the agent(s). It's a workaround,
not a "solution".

On Tue, Nov 15, 2016 at 4:31 PM, Nicholas Silva notifications@github.com
wrote:

@jdef https://github.com/jdef Does your workaround leave hanging
deployments around, since Marathon will never be able to satisfy the number
of instances specified?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#846 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACPVLC6ozQWMkivmjwXsXET35dHpIlOgks5q-iSjgaJpZM4DCvNI
.

@gengmao
Copy link

@gengmao gengmao commented Nov 16, 2016

@jdef probably a dumb question - why Marathon can't deploy a daemon (with instances=* and hostname UNIQUE) to any new agents offered by Mesos? What specific supports have to be provided by Mesos?

@hampsterx
Copy link

@hampsterx hampsterx commented Nov 17, 2016

+1

@dlaidlaw
Copy link

@dlaidlaw dlaidlaw commented Nov 23, 2016

+1 1 per instance per role. If an agent has a specific role, and the acceptedResourceRoles matches then deploy there. This is like an autoscaling that is performed as soon as a new instance is detected and before any other deployments can happen on that new instance.

@janisz
Copy link
Contributor

@janisz janisz commented Nov 23, 2016

👎
What if there are no resources to run this app on every agent? If service needs to be running on all agents then there are better ways to put it there (upstart/systemd/supervisor + ansible/puppet/chef/salt).
Other orchestrators take advantage of "being the only framework" so they can schedule given tasks before node is available for others. In Mesos you can't do that. Easiest way to achieve it with Marathon is to follow this comment #846 (comment)
Finally mixing integer and string in one json field in the API is a mistake.

@dangra
Copy link

@dangra dangra commented Nov 23, 2016

@janisz the resources for these services is something you usually plan and reserve ahead of time. I am not interested on "before any other deployments can happen", just running a service on all agents that meets the criteria is enough for me.

@mimmus
Copy link

@mimmus mimmus commented Nov 24, 2016

On CoreOS (systemd-based), I'm solving dropping-in a unit file to run and restart a monitoring Docker container but it would be great having a way to manage this by Marathon.

@tobilg
Copy link
Contributor

@tobilg tobilg commented Nov 24, 2016

Question I see regarding this:

  • What should happen if there are agent nodes added to the cluster? Should Marathon automatically be able to schedule the app on these new nodes as well?

This would mean IMO that Marathon would need to to utilize the /slaves endpoint of the masters (I guess it does so already?), and compare the list of slave from the current call to the list of slaves from the last call... Not sure about the implications.

  • Concerning mixing string and integer in a property as @janisz said, I think this wouldn't be a good idea either.

This would effectively mean to change the type of the instances property to string, and then testing and potentially parsing it as integer. It could be discussed if the value -1 could replace *, but maybe there's a better idea.

@meichstedt
Copy link
Contributor

@meichstedt meichstedt commented Mar 7, 2017

@mesosphere mesosphere locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.