Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

Open
noCharger opened this issue Aug 9, 2023 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search and ML Project-wide roadmap label Search:Relevance

Comments

@noCharger
Copy link
Contributor

noCharger commented Aug 9, 2023

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Script request processor feature was launched in version 2.9.0 with the introduction of search pipeline. It supports all parameters in the search request body that are of primitive types (mostly boolean and int). The objective of this document is to explore the possibility of extending support to non-primitive type parameters (NPTs), mainly complex objects with internal builders (refer to the appendix for the complete list).

The objective in implementing support for these NPTs within the existing script context include:

  • Avoiding Direct Exposure of Internal Classes: Ensuring that internal mechanisms are not exposed, which can lead to security risks. In addition, it would reduce compatibility issues when parameters
  • Creating a Consistent Interface: The design must align with the existing painless syntax to provide a coherent and user-friendly experience.
  • Maintaining Simplicity and Codebase Integrity: It is essential to develop a solution that does not overly complicate the implementation or hinder the maintainability of the codebase.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Approach One: Allowing Raw Request Parameters in JSON Format for All NPTs

Example script source:

ctx._source['sort'] = "{
  \"sort\": [
    {
      \"timestamp\": {
        \"order\": \"desc\"
      }
    },
    {
      \"name.keyword\": {
        \"order\": \"asc\"
      }
    }
  ]
}";

// output of ctx._source['sort'] should also be a raw string
String sort = ctx._source['sort'];

Pros:

  • A generic approach that could elegantly cover all NPTs by leveraging existing parsing logic.

Cons:

  • Add more constraints in constructing the script due to various string formatting issues, like handling single/double quotes in script source.

Approach Two: Leverage Map Interface

Example script source:

ctx._source['sort'].put('timestamp', 'desc');
ctx._source['sort'].put('name.keyword', 'asc');

ctx._source['sort'].get('timestamp') // returns string 'desc'

Pros:

  • Very easy to write a script that can be compiled and executed.

Cons:

  • Object put(String key, Object value) is inconsisent to exist builder interface inSearchRequestBuilder
  • Increases complexity of implementations, necessitating validation and evaluation for each individual NPT.

Approach Three: Combining Both Approaches?

This approach would seek to merge the benefits of the previous two methods. It could potentially provide the versatility of allowing raw JSON for building certain NPTs like query while still offering a more user-friendly interface for simple NPTs such as sort and timeout. Specific details would need to be fleshed out to determine the feasibility and the best way to balance the advantages and drawbacks of both paths.

Timeout exposed in approach 2:

// Example search request body:
{
  "query": {
    "match": {
      "field": "value"
    }
  },
  "timeout": "5s"
}

// New script source:
String timeout = ctx._source['timeout'];
ctx._source['timeout'].put('5s');

// Prototype impl in SearchRequestMap.java:
// GET
case "timeout":
    return source.timeout().toString();
// PUT
case "timeout":
    source.timeout(TimeValue.parseTimeValue((String) value, null, TIMEOUT_FIELD.getPreferredName()));    

Query exposed in approach 1:

// New script source:
"script": {
        "lang": "painless",
        "source": """
        ctx._source.put('query', params.q);
        """,
        "params": {
            "q": "{\"match_all\":\"{}\"}"
        }
}


// Prototype impl in SearchRequestMap.java:
// GET
case "query":
    return source.query().toString();
// PUT
case "query":
   // Create a parser, this should be refactored
    XContentParser parser = JsonXContent.jsonXContent.createParser(
        new NamedXContentRegistry(
            List.of(new NamedXContentRegistry.Entry(QueryBuilder.class, new ParseField(MatchAllQueryBuilder.NAME), p -> parseInnerQueryBuilder(p)))
        ),
        DeprecationHandler.THROW_UNSUPPORTED_OPERATION,
        (String) value
    );
   // use current parse function
    source.query(parseInnerQueryBuilder(parser));

See the rest in Additional context

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

  1. Any existing solution can be used as reference?

Yes, the script processor is supported by the ingest pipeline. The implementation, on the other hand, is simple: create a map context with only top-level access to the indexing document in JSON format. It does not apply to our circumstance.

  1. Does the solution backward compatible?

Yes, all approaches are backward compatible because no SearchSourceBuilder internal interface exposed in the script context.

Additional context

Add any other context or screenshots about the feature request here.

Related issues:
#6712
#7757

Example search request with all parameters in request body:

{
  "docvalue_fields": [
    {
      "field": "timestamp",
      "format": "date_time"
    },
    {
      "field": "price",
      "format": "DecimalFormat_pattern"
    }
  ],
  "fields": [
    {
      "field": "location",
      "format": "geojson"
    }
  ],
  "explain": false,
  "from": 0,
  "indices_boost": [
    {
      "index1": 1.5
    },
    {
      "index2": 0.5
    }
  ],
  "min_score": 0.5,
  "query": {
    "bool": {
      "must": {
        "match": {
          "field": "value"
        }
      }
    }
  },
  "seq_no_primary_term": true,
  "size": 10,
  "_source": {
    "excludes": ["field_to_exclude"],
    "includes": ["field_to_include"]
  },
  "stats": ["group1"],
  "terminate_after": 100,
  "timeout": "2s",
  "version": false
}

FetchSourceContext exposed in approach 2:

// New script source:
    ctx._source['_source'].put('includes', ['field1']);
    ctx._source['_source'].put('excludes', ['field2']);

// Prototype impl in SearchRequestMap.java:
// GET
case "_source":
    return source.getSource(<includes>, <excludes>)
    
// PUT
case "_source":
    source.fetchSourceContext(new FetchSourceContext(
        true,
        ((Map<String, Object>) value).get("includes").toString(),
        ((Map<String, Object>) value).get("excludes").toString()
    ));

List indexBoosts

// New script source:
List<Map<String, Float>> indicesBoost = ctx._source['indices_boost'];
ctx._source['indices_boost'].put([["index1": 1.5], ["index2": 0.5]]);

// Prototype impl in SearchRequestMap.java:
// GET
case "indices_boost":
    return source.indicesBoost();
// PUT
case "indices_boost":
    List<Map.Entry<String, Float>> indicesBoostList = new ArrayList<>();
    for (Map<String, Float> map : (List<Map<String, Float>>) value) {
        for (Map.Entry<String, Float> entry : map.entrySet()) {
            indicesBoostList.add(new AbstractMap.SimpleEntry<>(entry.getKey(), entry.getValue()));
        }
    }
    source.indicesBoost(indicesBoostList);
@noCharger noCharger added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Aug 9, 2023
@msfroh
Copy link
Collaborator

msfroh commented Aug 16, 2023

IMO, approaches 2 and 3 sound pretty good for most properties.

I still feel like query manipulation with Painless is an unsolved problem. Maybe we split query manipulation off into its own issue and try to figure that out.

@msfroh
Copy link
Collaborator

msfroh commented Aug 17, 2023

Incidentally, if this work comes after #6722, the ability to write/read request context variables might be very exciting.

More details: The direction of #6722 has changed based on feedback from @navneet1v, where instead of defining a new type of processor, we're adding the ability for request/response processors to read and write from request-scoped state. Connecting that with script processors is quite powerful if script processors can connect some request/response state with arbitrary variables. The OversamplingRequestProcessor that I used as an example could be implemented entirely as a script.

@macohen macohen added Search:Relevance and removed Search Search query, autocomplete ...etc labels Oct 19, 2023
@andrross andrross added the Roadmap:Search and ML Project-wide roadmap label label May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search and ML Project-wide roadmap label Search:Relevance
Projects
Status: 👀 In review
Development

No branches or pull requests

4 participants