[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

noCharger · 2023-08-09T15:14:52Z

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Script request processor feature was launched in version 2.9.0 with the introduction of search pipeline. It supports all parameters in the search request body that are of primitive types (mostly boolean and int). The objective of this document is to explore the possibility of extending support to non-primitive type parameters (NPTs), mainly complex objects with internal builders (refer to the appendix for the complete list).

The objective in implementing support for these NPTs within the existing script context include:

Avoiding Direct Exposure of Internal Classes: Ensuring that internal mechanisms are not exposed, which can lead to security risks. In addition, it would reduce compatibility issues when parameters
Creating a Consistent Interface: The design must align with the existing painless syntax to provide a coherent and user-friendly experience.
Maintaining Simplicity and Codebase Integrity: It is essential to develop a solution that does not overly complicate the implementation or hinder the maintainability of the codebase.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Approach One: Allowing Raw Request Parameters in JSON Format for All NPTs

Example script source:

ctx._source['sort'] = "{
  \"sort\": [
    {
      \"timestamp\": {
        \"order\": \"desc\"
      }
    },
    {
      \"name.keyword\": {
        \"order\": \"asc\"
      }
    }
  ]
}";

// output of ctx._source['sort'] should also be a raw string
String sort = ctx._source['sort'];

Pros:

A generic approach that could elegantly cover all NPTs by leveraging existing parsing logic.

Cons:

Add more constraints in constructing the script due to various string formatting issues, like handling single/double quotes in script source.

Approach Two: Leverage Map Interface

Example script source:

ctx._source['sort'].put('timestamp', 'desc');
ctx._source['sort'].put('name.keyword', 'asc');

ctx._source['sort'].get('timestamp') // returns string 'desc'

Pros:

Very easy to write a script that can be compiled and executed.

Cons:

Object put(String key, Object value) is inconsisent to exist builder interface inSearchRequestBuilder
Increases complexity of implementations, necessitating validation and evaluation for each individual NPT.

Approach Three: Combining Both Approaches?

This approach would seek to merge the benefits of the previous two methods. It could potentially provide the versatility of allowing raw JSON for building certain NPTs like query while still offering a more user-friendly interface for simple NPTs such as sort and timeout. Specific details would need to be fleshed out to determine the feasibility and the best way to balance the advantages and drawbacks of both paths.

Timeout exposed in approach 2:

// Example search request body:
{
  "query": {
    "match": {
      "field": "value"
    }
  },
  "timeout": "5s"
}

// New script source:
String timeout = ctx._source['timeout'];
ctx._source['timeout'].put('5s');

// Prototype impl in SearchRequestMap.java:
// GET
case "timeout":
    return source.timeout().toString();
// PUT
case "timeout":
    source.timeout(TimeValue.parseTimeValue((String) value, null, TIMEOUT_FIELD.getPreferredName()));

Query exposed in approach 1:

// New script source:
"script": {
        "lang": "painless",
        "source": """
        ctx._source.put('query', params.q);
        """,
        "params": {
            "q": "{\"match_all\":\"{}\"}"
        }
}


// Prototype impl in SearchRequestMap.java:
// GET
case "query":
    return source.query().toString();
// PUT
case "query":
   // Create a parser, this should be refactored
    XContentParser parser = JsonXContent.jsonXContent.createParser(
        new NamedXContentRegistry(
            List.of(new NamedXContentRegistry.Entry(QueryBuilder.class, new ParseField(MatchAllQueryBuilder.NAME), p -> parseInnerQueryBuilder(p)))
        ),
        DeprecationHandler.THROW_UNSUPPORTED_OPERATION,
        (String) value
    );
   // use current parse function
    source.query(parseInnerQueryBuilder(parser));

See the rest in Additional context

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Any existing solution can be used as reference?

Yes, the script processor is supported by the ingest pipeline. The implementation, on the other hand, is simple: create a map context with only top-level access to the indexing document in JSON format. It does not apply to our circumstance.

Does the solution backward compatible?

Yes, all approaches are backward compatible because no SearchSourceBuilder internal interface exposed in the script context.

Additional context

Add any other context or screenshots about the feature request here.

Related issues:
#6712
#7757

Example search request with all parameters in request body:

{
  "docvalue_fields": [
    {
      "field": "timestamp",
      "format": "date_time"
    },
    {
      "field": "price",
      "format": "DecimalFormat_pattern"
    }
  ],
  "fields": [
    {
      "field": "location",
      "format": "geojson"
    }
  ],
  "explain": false,
  "from": 0,
  "indices_boost": [
    {
      "index1": 1.5
    },
    {
      "index2": 0.5
    }
  ],
  "min_score": 0.5,
  "query": {
    "bool": {
      "must": {
        "match": {
          "field": "value"
        }
      }
    }
  },
  "seq_no_primary_term": true,
  "size": 10,
  "_source": {
    "excludes": ["field_to_exclude"],
    "includes": ["field_to_include"]
  },
  "stats": ["group1"],
  "terminate_after": 100,
  "timeout": "2s",
  "version": false
}

FetchSourceContext exposed in approach 2:

// New script source:
    ctx._source['_source'].put('includes', ['field1']);
    ctx._source['_source'].put('excludes', ['field2']);

// Prototype impl in SearchRequestMap.java:
// GET
case "_source":
    return source.getSource(<includes>, <excludes>)
    
// PUT
case "_source":
    source.fetchSourceContext(new FetchSourceContext(
        true,
        ((Map<String, Object>) value).get("includes").toString(),
        ((Map<String, Object>) value).get("excludes").toString()
    ));

List indexBoosts

// New script source:
List<Map<String, Float>> indicesBoost = ctx._source['indices_boost'];
ctx._source['indices_boost'].put([["index1": 1.5], ["index2": 0.5]]);

// Prototype impl in SearchRequestMap.java:
// GET
case "indices_boost":
    return source.indicesBoost();
// PUT
case "indices_boost":
    List<Map.Entry<String, Float>> indicesBoostList = new ArrayList<>();
    for (Map<String, Float> map : (List<Map<String, Float>>) value) {
        for (Map.Entry<String, Float> entry : map.entrySet()) {
            indicesBoostList.add(new AbstractMap.SimpleEntry<>(entry.getKey(), entry.getValue()));
        }
    }
    source.indicesBoost(indicesBoostList);

The text was updated successfully, but these errors were encountered:

msfroh · 2023-08-16T16:59:42Z

IMO, approaches 2 and 3 sound pretty good for most properties.

I still feel like query manipulation with Painless is an unsolved problem. Maybe we split query manipulation off into its own issue and try to figure that out.

msfroh · 2023-08-17T15:22:36Z

Incidentally, if this work comes after #6722, the ability to write/read request context variables might be very exciting.

More details: The direction of #6722 has changed based on feedback from @navneet1v, where instead of defining a new type of processor, we're adding the ability for request/response processors to read and write from request-scoped state. Connecting that with script processors is quite powerful if script processors can connect some request/response state with arbitrary variables. The OversamplingRequestProcessor that I used as an example could be implemented entirely as a script.

noCharger added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Aug 9, 2023

github-actions bot added the untriaged label Aug 9, 2023

noCharger removed the untriaged label Aug 9, 2023

msfroh mentioned this issue Sep 8, 2023

[Search pipeline] Support query context modification in script processor #7758

Open

macohen added Search:Relevance and removed Search Search query, autocomplete ...etc labels Oct 19, 2023

andrross added the Roadmap:Search and ML Project-wide roadmap label label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

noCharger commented Aug 9, 2023 •

edited

Loading

msfroh commented Aug 16, 2023

msfroh commented Aug 17, 2023

[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

[RFC] [Search Pipeline] Script request processor support non-primitive types #9200

Comments

noCharger commented Aug 9, 2023 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

msfroh commented Aug 16, 2023

msfroh commented Aug 17, 2023

noCharger commented Aug 9, 2023 •

edited

Loading