Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand manage_slm privilege #67539

Open
albertzaharovits opened this issue Jan 14, 2021 · 10 comments
Open

Expand manage_slm privilege #67539

albertzaharovits opened this issue Jan 14, 2021 · 10 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Data Management Meta label for data/management team Team:Security Meta label for security team

Comments

@albertzaharovits
Copy link
Contributor

The description from SLM with Security is a bit unclear to me.
The list mentions privileges to create and delete snapshots (cluster:admin/snapshot/*) when working with SLM, but from my reading of the SLM code that is not necessary. The internal client used by the SLM tasks runs under sufficient privileges to create and remove snapshots, irrespective of the users privileges.

I think we should proceed to remove the cluster:admin/snapshot/* mention in the docs.
Maybe someone from @elastic/es-core-features can verify my theory.

@albertzaharovits albertzaharovits added >bug >docs General docs changes :Data Management/ILM+SLM Index and Snapshot lifecycle management :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC needs:triage Requires assignment of a team area label labels Jan 14, 2021
@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Security Meta label for security team Team:Docs Meta label for docs team labels Jan 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@BigPandaToo BigPandaToo removed the needs:triage Requires assignment of a team area label label Jan 15, 2021
@jrodewig
Copy link
Contributor

The internal client used by the SLM tasks runs under sufficient privileges to create and remove snapshots, irrespective of the users privileges.

I tested this in a 8.0 snapshot build, and it looks like the cluster:admin/snapshot/* privilege is still required for the user creating the SLM policy. Otherwise, SLM will fail to take the snapshots. It looks like @dakrone added the cluster:admin/snapshot/* bit with #47545. He may have some additional context.

Here's the reproduction steps for my test:

  1. As the elastic user, create a my_test role with only the manage_slm cluster privilege:
POST _security/role/my_test
{
  "cluster": [ "manage_slm" ],
  "indices": [ ]
}
  1. As the elastic user, create a test user with the my_test role:
POST _security/user/test
{
  "password" : "...",
  "roles" : [ "my_test" ]
}
  1. As the elastic user, register a snapshot repository:
PUT _snapshot/my_repository
{
  "type": "fs",
  "settings": {
    "location": "my_repo_location"
  }
}
  1. As the test user, put a SLM policy that uses the repo. This policy should run every 15 minutes.
PUT _slm/policy/test-snapshot-policy
{
  "schedule": "0 15 * * * ?", 
  "name": "<15m-snap-{now}>", 
  "repository": "my_repository"
}
  1. Wait 15 minutes or use the execute snapshot lifecycle policy API to force SLM to create a snapshot. You can use either the test or elastic user.
POST _slm/policy/test-snapshot-policy/_execute
  1. Check the SLM stats as either the test or elastic user:
GET _slm/stats

The response indicates the snapshot failed.

{
  "retention_runs" : 0,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "0s",
  "retention_deletion_time_millis" : 0,
  "total_snapshots_taken" : 0,
  "total_snapshots_failed" : 1,
  "total_snapshots_deleted" : 0,
  "total_snapshot_deletion_failures" : 0,
  "policy_stats" : [
    {
      "policy" : "test-snapshot-policy",
      "snapshots_taken" : 0,
      "snapshots_failed" : 1,
      "snapshots_deleted" : 0,
      "snapshot_deletion_failures" : 0
    }
  ]
}

You can repeat steps 5 & 6. Each time will result in snapshot failure.

  1. As the elastic user, add cluster:admin/snapshot/* to the my-test role:
POST _security/role/my_test
{
  "cluster": [ "manage_slm",  "cluster:admin/snapshot/*"],
  "indices": [ ]
}
  1. As the test user, re-put the SLM policy. This will ensure the policy runs with the new privileges:
PUT _slm/policy/test-snapshot-policy
{
  "schedule": "0 15 * * * ?", 
  "name": "<15m-snap-{now}>", 
  "repository": "my_repository"
}
  1. Wait 15 minutes or use the execute snapshot lifecycle policy API to force SLM to create a snapshot.
POST _slm/policy/test-snapshot-policy/_execute
  1. Check the SLM stats:
GET _slm/stats

The response now indicates a successful snapshot was taken:

{
  "retention_runs" : 0,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "0s",
  "retention_deletion_time_millis" : 0,
  "total_snapshots_taken" : 1,
  "total_snapshots_failed" : 1,
  "total_snapshots_deleted" : 0,
  "total_snapshot_deletion_failures" : 0,
  "policy_stats" : [
    {
      "policy" : "test-snapshot-policy",
      "snapshots_taken" : 1,
      "snapshots_failed" : 1,
      "snapshots_deleted" : 0,
      "snapshot_deletion_failures" : 0
    }
  ]
}

@jrodewig
Copy link
Contributor

jrodewig commented Jan 29, 2021

It does seem odd that we use the cluster:admin/snapshot/* action(s) as the privilege here. While it is supported, I would think most users would use a "named" privilege instead: https://www.elastic.co/guide/en/elasticsearch/reference/master/security-privileges.html#privileges-list-cluster

As is, this sorta feels like a workaround. Maybe a manage_snapshot or similar privilege is warranted? My assumptions about how often users use actions as privileges could be off though.

@ywangd
Copy link
Member

ywangd commented Jan 31, 2021

I think @jrodewig is right. The privileges are still needed with the current code. The logic in ClientHelper#executeWithHeadersAsync is to execute with origin if no security header is passed in. However, security header does get passed in from SnapshotLifecycleTask #maybeTakeSnapshot. The security header is from the SLM policy metadata which is created the policy is created. With James's example, it represents the authentication info of user test.

So, unless a policy is created with no authentication, the task will always be executed with the privileges of the user who creates the policy. I am not entirely sure why we also set an origin of index_lifecycle here. The only usage that I can think of is to support a cluster migrating from "security-off" to "security-on". If there are existing SLM policies with the open cluster, they can still function once the cluster is secured.

I also agree the recommendation of using raw action names is off. We don't currently have a manage_snapshot privilege, only create_snapshot and monitor_snapshot and none of them include privilege to delete a snapshot. So if the intention is allow full access to snapshots, it does seem we need a new manage_snapshot privilege.

@albertzaharovits
Copy link
Contributor Author

Thank you @jrodewig for the investigation! It was on me to verify, thank you for your time.

Thanks for pitching in @ywangd .

I misread the code.
But, note that deleting snapshots is always performed under the index_lifecycle origin, so no delete privileges are required on the user creating the policy. Also, note that manage_slm as a standalone privilege is effectively useless, because if you edit an SLM policy, you'll update the headers, and if you don't have the "create snapshot privilege" you'll ruin every policy you touch.

@ywangd I think we should add the privileges to create (and delete snapshots) to the manage_slm privilege. WDYT Yang?

@ywangd
Copy link
Member

ywangd commented Feb 2, 2021

I think we should add the privileges to create (and delete snapshots) to the manage_slm privilege

I am not sure what was the original intention of having manage_slm as a separate privilege. Is it to isolate the policy management from the actual snapshot manageent? i.e. people who manages policy should not automatically gets to manually create (or delete) snapshots. If this is the intention and still stands, I wonder whether always executing with the index_lifecycle origin is a viable alternative (as you mentioned, the deletion is already handled this way)?

@albertzaharovits
Copy link
Contributor Author

people who manages policy should not automatically gets to manually create (or delete) snapshots. If this is the intention and still stands, I wonder whether always executing with the index_lifecycle origin is a viable alternative

The intention doesn't make much sense because if you can create an SLM policy but not create snapshots, you create a policy that does a snapshot 1 minute from now, circumventing your restriction. In addition, it's preferable to run the snapshots under the owner user context for auditing purposes (or better attribution in general).

@ywangd
Copy link
Member

ywangd commented Feb 3, 2021

people who manages policy should not automatically gets to manually create (or delete) snapshots. If this is the intention and still stands, I wonder whether always executing with the index_lifecycle origin is a viable alternative

The intention doesn't make much sense because if you can create an SLM policy but not create snapshots, you create a policy that does a snapshot 1 minute from now, circumventing your restriction. In addition, it's preferable to run the snapshots under the owner user context for auditing purposes (or better attribution in general).

Makes sense. I agree manage_slm should be expanded to include create snapshot at least. We could also include delete if we are happy about the behaviour change. I can see at least three benefits:

  1. Make manage_slm privilege more useful
  2. Simplify the doc so it can drop the bit about adding cluster:admin/snapshot/*
  3. cluster:admin/snapshot/* grants more than just create (and delete), e.g. it covers restore and mount searchable snapshot. I don't think they are all related to SLM, so it is better if we can tighten up manage_slm.

@jrodewig jrodewig changed the title Fix DOC for security privileges to create snapshots for SLM users Expand manage_slm privilege Mar 1, 2021
@jrodewig jrodewig removed >bug >docs General docs changes labels Mar 1, 2021
@elasticmachine elasticmachine removed the Team:Docs Meta label for docs team label Mar 1, 2021
@jrodewig
Copy link
Contributor

jrodewig commented Mar 1, 2021

Since the current docs are correct, I changed the title and removed the >docs label. Feel free to re-add once these changes are made. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Data Management Meta label for data/management team Team:Security Meta label for security team
Projects
None yet
Development

No branches or pull requests

5 participants