Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Machine Learning datafeeds fail on remote clusters due to _has_privileges check #87832

Closed
aarju opened this issue Jun 20, 2022 · 7 comments · Fixed by #91895
Closed

[ML] Machine Learning datafeeds fail on remote clusters due to _has_privileges check #87832

aarju opened this issue Jun 20, 2022 · 7 comments · Fixed by #91895
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@aarju
Copy link

aarju commented Jun 20, 2022

Elasticsearch Version

8.2.3

Installed Plugins

No response

Java Version

bundled

OS Version

Elastic Cloud

Problem Description

On our Cross Cluster Search primary cluster we have several Machine Learning jobs and Datafeeds using the data on the remote clusters. This worked fine in 7.X, but it appears that they are all in a closed state in 8.2.3 due to the _has_privileges api check failing on the remote clusters. We are unable to create new datafeeds using the CCS configuration. It is a known issue that _has_privileges does not work on remote CCS cluster indices so at this time Machine Learning does not support CCS due to this API check.
#67798

Steps to Reproduce

Create multiple clusters and configure them to use Cross Cluster Search. On the primary cluster create a Machine Learning job and datafeed referencing the remote cluster. The Machine Learning job will fail the permissions check when creating a new datafeed.

Logs (if relevant)

No response

@aarju aarju added >bug needs:triage Requires assignment of a team area label labels Jun 20, 2022
@droberts195 droberts195 added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Jun 20, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 20, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@droberts195 droberts195 changed the title BUG - Machine Learning datafeeds fail on remote clusters due to _has_privileges check [ML] Machine Learning datafeeds fail on remote clusters due to _has_privileges check Jun 20, 2022
@aarju
Copy link
Author

aarju commented Jun 20, 2022

A workaround to this issue is to have a user with superuser privileges on the primary CCS cluster open the datafeeds. It appears that if you are running as superuser the _has_privileges check is bypassed.

@dolaru
Copy link
Member

dolaru commented Jun 20, 2022

I can confirm I was able to reproduce the issue in Elastic Cloud, with a couple of 8.2.2 instances. The issue does not reproduce in 7.17.4.

Steps to reproduce

  1. Set up CCS between 2 Elastic Cloud clusters
  2. Add some Kibana sample data to the remote cluster
  3. On both clusters: create a role named allow_read_sample_data with read and view_index_metadata index privileges for the index pattern kibana_sample_data*
  4. On the main/local cluster:
    a. Create a role named kibana_minimal with read and view_index_metadata index privileges for the index pattern .kibana*
    b. Create a user with the roles machine_learning_admin, kibana_minimal and allow_read_sample_data
    c. Create an index pattern for the remote sample data: *:kibana_sample_data*
  5. Log on with the newly created user
  6. Create a new anomaly detection job on the *:kibana_sample_data* index pattern (configuration is not important)
  7. Notice that after you click Create Job and the datafeed attempts to start, it fails with the following error:
{
  "statusCode": 403,
  "error": "Forbidden",
  "message": "[security_exception: [security_exception] Reason: Cannot create datafeed [datafeed-job-on-ccs-index] because user es-87832 lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}]: Cannot create datafeed [datafeed-job-on-ccs-index] because user es-87832 lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "security_exception",
            "reason": "Cannot create datafeed [datafeed-job-on-ccs-index] because user es-87832 lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}"
          }
        ],
        "type": "security_exception",
        "reason": "Cannot create datafeed [datafeed-job-on-ccs-index] because user es-87832 lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}"
      },
      "status": 403
    }
  }
}

@droberts195
Copy link
Contributor

The fact that it worked in 7.17 but not 8.2 is interesting. We didn't change anything in the ML code between these two versions.

#72715 (comment) contains this interesting snippet:

  • CCS checks are done against the local cluster:
  • no checks on the remote
  • the check on the local cluster can fail although the check is wrong
  • to workaround the issue its necessary to create privileges on the local cluster that mimic the remote privileges or use ?defer_validation=true
    • example: you want to query myremote:remote_test_index, you need a privilege on the local cluster for remote_test_index (:facepalm:) to pass the check in PUT, or use defer_validation=true, on myremote you need the privileges, too. Otherwise the transform will fail

It makes me think this is how it was working for datafeeds in 7.x too. And now datafeeds is broken in 8.2 transforms probably is too.

@elastic/es-security were there changes in how _has_privileges worked between 7.17 and 8.2 that would explain this?

I am thinking that for the fix in 8.x we should just stop using _has_privileges to try to check in advance whether datafeeds/transforms will work if they reference any remote indices.

@slobodanadamovic
Copy link
Contributor

slobodanadamovic commented Jun 24, 2022

@droberts195 @dolaru

There were no changes to how _has_privileges works between 7.17 and 8.2. Due to #67798 issue, this was never expected to work.

I followed steps to reproduce from #87832 (comment) and I was able to reproduce the same error on 7.17.4 cluster. Could it be that you were logged in with a different user when testing in 7.17 cluster?

{
  "statusCode": 403,
  "error": "Forbidden",
  "message": "[security_exception: [security_exception] Reason: Cannot create datafeed [datafeed-test-ml-anomaly] because user testml lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}]: Cannot create datafeed [datafeed-test-ml-anomaly] because user testml lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}",
  "attributes": {
    "body": {
      "error": {
        "root_cause": [
          {
            "type": "security_exception",
            "reason": "Cannot create datafeed [datafeed-test-ml-anomaly] because user testml lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}"
          }
        ],
        "type": "security_exception",
        "reason": "Cannot create datafeed [datafeed-test-ml-anomaly] because user testml lacks permissions on the indices: {\"*:kibana_sample_data*\":{\"indices:data/read/search\":false}}"
      },
      "status": 403
    }
  }
}

@sommerda
Copy link

sommerda commented Oct 5, 2022

I found a workaround by granting (dummy) read access to the local index remote-cluster:kibana_sample_data* on the cluster that runs the ML models. This allows for creation of the datafeed.

To do so, I have created the corresponding role "ml_log_reader_dummy" and assigend it to my (less-privileged) user.

POST /_security/role/ml_log_reader_dummy
{
    "cluster": [],
    "indices": [
      {
        "names": [
					"remote-cluster:kibana_sample_data*"
        ],
        "privileges": [
          "read",
          "read_cross_cluster",
					"view_index_metadata"
        ],
        "field_security": {
          "grant": [
            "*"
          ],
          "except": []
        },
        "allow_restricted_indices": false
      }
    ],
    "applications" : [ ],
    "run_as": [],
    "metadata": {
			"ATTENTION": "The index name in this sample is hacky and will break as soon as these strings are properly parsed."
		},
    "transient_metadata": {
      "enabled": true
    }
}

Looks like _has_privileges is not resolving the :-remote-cluster notation. For now, it works.

Of course, this workaround will break as soon as the index/data-view string is parsed differently.

@droberts195
Copy link
Contributor

Looking into this again it seems that the same problem affects transforms. It turns out some of our transform tests have had to use the workaround mentioned above too, for example:

"names": ["my_remote_cluster:remote_test_i*", "my_remote_cluster:aliased_test_index"],

We'll change the up-front privilege validation for both datafeeds and transforms so that it ignores configured source indices if they contain colons (indicating cross-cluster patterns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
6 participants