-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK Quota Support #3102
SDK Quota Support #3102
Conversation
* Retrieve role from env. * Remove irritating 100 line checkstyle limitation. * Name change getSchedulerNamespace -> getServiceNamespace * Add support for configureable roles via an environment-variable.
* Add non-namespaced role to list of roles to subscribe with. * Add new Builders that copy existing instances. * Add missing getters/setters from Builder. * Print associated role with incoming offers. * Add support for maintaining current role on service-role update. * Arguments to assert were swapped, expected in place of actual. * Compare contents of lists as ordering can change and isn't important here. * Fix failing tests for scheduler upgrade. * Fix runtime errors, undo test changes.
* Add non-namespaced role to list of roles to subscribe with. * Add new Builders that copy existing instances. * Add missing getters/setters from Builder. * Print associated role with incoming offers. * Add support for maintaining current role on service-role update. * Arguments to assert were swapped, expected in place of actual. * Compare contents of lists as ordering can change and isn't important here. * Fix failing tests for scheduler upgrade. * Fix runtime errors, undo test changes. * WIP: Role changes do not trigger requirement for new TaskConfig. * WIP: Disable validation of TaskVolumes for now. * Prevent scheduler from applying a role change on incomplete previous deployment. Include conditional validation for role changes. * Revert fixRoleChange introduced earlier.
- Update scheduler to new role. - Update pods to new role via pod replace, restart scheduler in between each replaced pod to ensure mixed-mode roles are applicable. - Add additional pods post scheduler update, ensure that new pods are launched under the new role.
Configure the role the framework subscribes with via these two environment variables: - MARATHON_ENFORCE_GROUP_ROLE - Determines if we use the Mesos supplied role for quota or revert back to legacy behaviour. - MESOS_ALLOCATION_ROLE - Specified the role the scheduler subscribes to Mesos with and which role the new footprint will be created under.
…oling with upstream branch.
sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/FrameworkConfig.java
Show resolved
Hide resolved
sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/FrameworkConfig.java
Outdated
Show resolved
Hide resolved
sdk/scheduler/src/main/java/com/mesosphere/sdk/framework/FrameworkConfig.java
Outdated
Show resolved
Hide resolved
sdk/scheduler/src/main/java/com/mesosphere/sdk/scheduler/SchedulerConfig.java
Outdated
Show resolved
Hide resolved
sdk/scheduler/src/main/java/com/mesosphere/sdk/scheduler/SchedulerConfig.java
Show resolved
Hide resolved
…uration as Marathon now injects this based on the group settings.
…efaultResourceSet.java Co-Authored-By: Tarun Gupta Akirala <takirala@users.noreply.github.com>
…efaultResourceSet.java Co-Authored-By: Tarun Gupta Akirala <takirala@users.noreply.github.com>
…/DefaultConfigValidators.java Co-Authored-By: Tarun Gupta Akirala <takirala@users.noreply.github.com>
…ulerConfig.java Co-Authored-By: Tarun Gupta Akirala <takirala@users.noreply.github.com>
- Fix invalid use of marathon group delete. - Introduce pytest-dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had couple queries, good to merge after those are answered. 🚢
Had some ideas on code cleanup in test_quota_deployment
and the usage of null in SchedulerConfig
. I created
suggestions.diff.txt a diff file with some suggestions. Nothing blocking though.
|
||
# Add an extra pod to each. | ||
marathon_config["env"]["HELLO_COUNT"] = "3" | ||
marathon_config["env"]["WORLD_COUNT"] = "4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a closer look, but would this affect the minimum number of nodes on the cluster (due to placement constraints ) - and if yes, is this within that limit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both hello
and world
pods have the placement constraint "[[\"hostname\", \"UNIQUE\"]]"
, in our SI we spin-up a cluster with 5 agents, so this fits into the normal test configuration.
current_task_roles = service_roles["task-roles"] | ||
|
||
# We must have some role! | ||
assert len(current_task_roles) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can delete this line
@@ -98,6 +98,7 @@ PyJWT==1.7.1 | |||
pylint==2.3.1 | |||
PyNaCl==1.3.0 | |||
pytest==4.1.0 | |||
pytest-dependency==0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we have to add this to test_requirements.txt
as well ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think so, pytest harness gets all of its deps from frozen_requirements.txt
sdk/scheduler/src/main/java/com/mesosphere/sdk/config/validate/DefaultConfigValidators.java
Outdated
Show resolved
Hide resolved
|
||
if "roles" in service_state: | ||
# MUTI_ROLE | ||
current_service_roles["framework-roles"] = service_state["roles"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this is a multi role, what would be the result of service_state["role"]
- is it a non existent key OR defaulted to an empty list ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service_state["role"] = "*"
Co-Authored-By: Tarun Gupta Akirala <takirala@users.noreply.github.com>
This feature requires support for
enforceRole
on Marathon groups found in Marathon v1.9.73 and Mesos v1.9.0 available starting DC/OS 2.0.By default Marathon does not set
enforceRole=true
on group creation, and existing semantics are maintained.Deploy new service in a group with quota enabled
Hello-World is used in the example below but this is applicable to any SDK based service.
To create a service named
/dev/hello-world
in groupdev
with quota consumed from roledev
dev
role via Mesos UI.Migrate an existing deployed service to use quota support
To upgrade an existing service to a new version of the SDK with quota support, use the following procedure.
We will use Hello-World again pre-installed in group
foo
in the example below but this is applicable to any SDK based service.role
: Specifies the quota enforced role we're migrating towards, which isfoo
in this example.enable_role_migration
: Notifies the scheduler that its pods will be migrated between legacy and quota enforced roles. The scheduler subscribes with both roles when this flag is set.foo
role. The deployed pods will be unaffected and will use their previous roles.pod replace
commands to migrate all the pods in the service to the quota enforced role.The
hello-0
pod will be migrated to consume quota fromfoo
5. Create a file with the current service-name and the following options to signal the end of the migration:
At this point, the scheduler and all the previous running pods have been migrated to the quota enforced role.
Strict Mode Clusters
For strict mode clusters, additional role permissions are required and must be setup before deploying the service.
enforceRole=true
Example:
/dev/hello-world
will need permissions to thedev
roleExample:
/foo/hello-world
will need permissions to both thefoo
andfoo__hello-world-role
rolesPod Pre-Reserved Roles
For pods which specify pre-reserved roles (eg
slave_public
), the scheduler will issue a hierarhical role depending on the value ofrole
.Example:
slave_public
androle=slave_public
. These permissions are required:slave_public
androle=dev
. These permissions are required:When performing migration between legacy to enforced group roles via
enable_role_migration
, both permissions above will be required.Downgrading to and older non-quota aware version of the scheduler
This section details the procedures to downgrade from a quota enforced role to a shipped non-quota enforced release.
The process is the same as migrating an existing service to the quoted role
The key difference is that
role
should beslave_public
to indicate migration towards the legacy roles.The remaining scheduler update and
pod-replace
operations must be issued to move the scheduler and pods into the legacy roles.Once all the pods have been migrated, the scervice can be downgraded to an earlier release which isn't quota aware.
JIRA: DCOS-54278