Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enrich processor #48039

Merged
merged 165 commits into from Oct 15, 2019
Merged

Add enrich processor #48039

merged 165 commits into from Oct 15, 2019

Commits on Apr 5, 2019

  1. first commit

    martijnvg committed Apr 5, 2019
    Configuration menu
    Copy the full SHA
    35af474 View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2019

  1. Configuration menu
    Copy the full SHA
    0ae4546 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c905559 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2019

  1. Configuration menu
    Copy the full SHA
    018b1e2 View commit details
    Browse the repository at this point in the history
  2. Added enrich policy definition. (#41003)

    Relates to #32789
    martijnvg committed Apr 12, 2019
    Configuration menu
    Copy the full SHA
    7ea14fd View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2019

  1. Move the policy class to xpack core module. (#41311)

    This allows the transport client use this class in enrich APIs.
    
    Relates to #40997
    martijnvg authored and jbaiera committed Apr 17, 2019
    Configuration menu
    Copy the full SHA
    def1024 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2019

  1. Configuration menu
    Copy the full SHA
    2e9e480 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2019

  1. Refactor the enrich store to remove it from guice (#41421)

    There is no need to create a enrich store component for the transport
    layer since the inner components of the store are either present in the
    master node calls or via an already injected ClusterService. This commit
    cleans up the class, adds the forthcoming delete call and tests the new
    code.
    hub-cap committed Apr 23, 2019
    Configuration menu
    Copy the full SHA
    6dc41a8 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2019

  1. Configuration menu
    Copy the full SHA
    ba32255 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    284c508 View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2019

  1. Add enrich policy PUT API (#41383)

    This commit wires up the Rest calls and Transport calls for PUT enrich
    policy, as well as tests and rest spec additions.
    hub-cap committed Apr 25, 2019
    Configuration menu
    Copy the full SHA
    1c28f30 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2019

  1. Add enrich qa module for rest tests and (#41568)

    move put policy api yaml test to this rest module.
    
    The main benefit is that all tests will then be run when running:
    `./gradlew -p x-pack/plugin/enrich check`
    
    The rest qa module starts a node with default distribution and basic
    license.
    
    This qa module will also be used for adding different rest tests (not yaml),
    for example rest tests needed for #41532
    
    Also when we are going to work on security integration then we can
    add a security qa module under the qa folder. Also at some point
    we should add a multi node qa module.
    martijnvg committed Apr 26, 2019
    Configuration menu
    Copy the full SHA
    9f51137 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2019

  1. Configuration menu
    Copy the full SHA
    8c8e3e0 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2019

  1. Add enrich processor (#41532)

    The enrich processor performs a lookup in a locally allocated
    enrich index shard using a field value from the document being enriched.
    If there is a match then the _source of the enrich document is fetched.
    The document being enriched then gets the decorate values from the
    enrich document based on the configured decorate fields in the pipeline.
    
    Note that the usage of the _source field is temporary until the enrich
    source field that is part of #41521 is merged into the enrich branch.
    Using the _source field involves significant decompression which not
    desired for enrich use cases.
    
    The policy contains the information what field in the enrich index
    to query and what fields are available to decorate a document being
    enriched with.
    
    The enrich processor has the following configuration options:
    * `policy_name` - the name of the policy this processor should use
    * `enrich_key` - the field in the document being enriched that holds to lookup value
    * `ignore_missing` - Whether to allow the key field to be missing
    * `enrich_values` - a list of fields to decorate the document being enriched with.
                        Each entry holds a source field and a target field.
                        The source field indicates what decorate field to use that is available in the policy.
                        The target field controls the field name to use in the document being enriched.
                        The source and target fields can be the same.
    
    Example pipeline config:
    
    ```
    {
       "processors": [
          {
             "policy_name": "my_policy",
             "enrich_key": "host_name",
             "enrich_values": [
                {
                  "source": "globalRank",
                  "target": "global_rank"
                }
             ]
          }
       ]
    }
    ```
    
    In the above example documents are being enriched with a global rank value.
    For each document that has match in the enrich index based on its host_name field,
    the document gets an global rank field value, which is fetched from the `globalRank`
    field in the enrich index and saved as `global_rank` in the document being enriched.
    
    This is PR is part one of #41521
    martijnvg committed Apr 30, 2019
    Configuration menu
    Copy the full SHA
    3c7f463 View commit details
    Browse the repository at this point in the history

Commits on May 1, 2019

  1. Add enrich policy list API (#41553)

    This commit wires up the Rest calls and Transport calls for listing all
    enrich policies, as well  as tests and rest spec additions.
    hub-cap committed May 1, 2019
    Configuration menu
    Copy the full SHA
    c999c09 View commit details
    Browse the repository at this point in the history

Commits on May 2, 2019

  1. Configuration menu
    Copy the full SHA
    50f3177 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    593a1c1 View commit details
    Browse the repository at this point in the history
  3. Add enrich policy DELETE API (#41495)

    This commit wires up the Rest calls and Transport calls for DELETE enrich
    policy, as well as tests and rest spec additions.
    hub-cap committed May 2, 2019
    Configuration menu
    Copy the full SHA
    83617e8 View commit details
    Browse the repository at this point in the history
  4. Add enrich policy runner (#41088)

    Adds the foundation of the execution logic to execute an enrich policy. Validates
    the source index existence as well as mappings, creates a new enrich index for
    the policy, reindexes the source index into the new enrich index, and swaps the 
    enrich alias for the policy to the new index.
    jbaiera committed May 2, 2019
    Configuration menu
    Copy the full SHA
    a451292 View commit details
    Browse the repository at this point in the history

Commits on May 7, 2019

  1. Change policy runner to use helper method on EnrichPolicy instead of (#…

    …41839)
    
    its own helper method to determine alias / policy base name.
    
    This way both the enrich processor and policy runner use the same logic
    to determine the alias to use.
    
    Relates to #32789
    martijnvg committed May 7, 2019
    Configuration menu
    Copy the full SHA
    33fddef View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ecffd73 View commit details
    Browse the repository at this point in the history
  3. Change the reindex fetch in policy runner from 1000 to 10000 and (#41838

    )
    
    Reindex uses scroll searches to read the source data. It is more efficient
    to read more data in one search scroll round then several. I think 10000
    is a good sweet spot.
    
    Relates to #32789
    martijnvg committed May 7, 2019
    Configuration menu
    Copy the full SHA
    0bf0f52 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2019

  1. Configuration menu
    Copy the full SHA
    5a02999 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2019

  1. Configuration menu
    Copy the full SHA
    97d658e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    28c529f View commit details
    Browse the repository at this point in the history
  3. Keep track of the enrich key field in the enrich index. (#42022)

    The enrich key field is being kept track in _meta field by the policy runner.
    The ingest processor uses the field name defined in enrich index _meta field and
    not in the policy. This will avoid problems if policy is changed without
    a new enrich index being created.
    
    This also complete decouples EnrichPolicy from ExactMatchProcessor.
    
    The following scenario results in failure without this change:
    1) Create policy
    2) Execute policy
    3) Create pipeline with enrich processor
    4) Use pipeline
    5) Update enrich key in policy
    6) Use pipeline, which then fails.
    martijnvg committed May 9, 2019
    Configuration menu
    Copy the full SHA
    b65513e View commit details
    Browse the repository at this point in the history

Commits on May 12, 2019

  1. Configuration menu
    Copy the full SHA
    4ebee27 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2019

  1. Configuration menu
    Copy the full SHA
    df3a3f3 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2019

  1. Configuration menu
    Copy the full SHA
    4dde9e0 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2019

  1. Configuration menu
    Copy the full SHA
    ceab8ee View commit details
    Browse the repository at this point in the history
  2. Add enrich policy execute API (#41762)

    This commit wires up the Rest calls and Transport calls for execute
    enrich policy, as well as tests and rest spec additions.
    hub-cap committed May 22, 2019
    Configuration menu
    Copy the full SHA
    833c9d1 View commit details
    Browse the repository at this point in the history
  3. Add step to forcemerge enrich index after reindex (#41969)

    Adds a step in the policy execution that forcemerge's a new enrich index after reindex completes.
    jbaiera committed May 22, 2019
    Configuration menu
    Copy the full SHA
    e6662be View commit details
    Browse the repository at this point in the history

Commits on May 24, 2019

  1. Configuration menu
    Copy the full SHA
    f577ca0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    74cb4d5 View commit details
    Browse the repository at this point in the history

Commits on May 27, 2019

  1. Configuration menu
    Copy the full SHA
    719eebe View commit details
    Browse the repository at this point in the history

Commits on May 28, 2019

  1. Stricter update dependency between pipelines and components used by p…

    …ipelines (#42038)
    
    Add support for components used by processor factories to get updated
    before processor factories create new processor instances.
    
    Components can register via `IngestService#addIngestClusterStateListener(...)`
    then if the internal  representation of ingest pipelines get updated,
    these components get  updated with the current cluster state before
    pipelines are updated.
    
    Registered EnrichProcessorFactory as ingest cluster state listener, so
    that it has always an up to date view of the active enrich policies.
    martijnvg committed May 28, 2019
    Configuration menu
    Copy the full SHA
    8747916 View commit details
    Browse the repository at this point in the history
  2. Add enrich policy GET API (#41384)

    This commit wires up the Rest calls and Transport calls for GET enrich
    policy, as well as tests and rest spec additions.
    hub-cap committed May 28, 2019
    Configuration menu
    Copy the full SHA
    55f80ae View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a90b571 View commit details
    Browse the repository at this point in the history

Commits on May 30, 2019

  1. Configuration menu
    Copy the full SHA
    1f3e298 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2019

  1. Enrich validate nested mappings (#42452)

    Ensures that fields retained in an enrich index from a source are not contained 
    under a nested field. It additionally makes sure that key fields exist, and that 
    value fields are checked if they are present. The policy runner test has also 
    been expanded with some faulty mapping test cases.
    jbaiera committed Jun 3, 2019
    Configuration menu
    Copy the full SHA
    8a1173d View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2019

  1. Limit a enrich policy execution to only one at a time (#42535)

    Add a keyed lock mechanism to the policy executor to ensure that an enrich policy
    can only have one execution happening at a time.
    jbaiera committed Jun 10, 2019
    Configuration menu
    Copy the full SHA
    1b95434 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2019

  1. Configuration menu
    Copy the full SHA
    160a5bc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ce33c44 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2019

  1. Configuration menu
    Copy the full SHA
    3d5515d View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2019

  1. Configuration menu
    Copy the full SHA
    9aed546 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2019

  1. Add role for enrich processor (#42677)

    This commit adds the manage_enrich privilege, which grants access to all
    of the enrich processor lifecycle actions. In addition this commit also
    creates a role which grants access to the generated indices.
    
    Relates #41939
    
    Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
    hub-cap and martijnvg committed Jun 24, 2019
    Configuration menu
    Copy the full SHA
    4d03356 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5c3368a View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2019

  1. Configuration menu
    Copy the full SHA
    ed3bacc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    837999e View commit details
    Browse the repository at this point in the history
  3. Added multi node enrich tests and fixed serialization issues. (#43386)

    The test for now tests the enrich APIs in a multi node environment.
    Picked EsIntegTestCase test over a real qa module in order to avoid
    adding another module that starts a test cluster.
    martijnvg committed Jun 25, 2019
    Configuration menu
    Copy the full SHA
    66b7472 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a743be2 View commit details
    Browse the repository at this point in the history
  5. List.of(...) everywhere

    martijnvg committed Jun 25, 2019
    Configuration menu
    Copy the full SHA
    9ed7f5a View commit details
    Browse the repository at this point in the history
  6. Make enrich processor use search action through a client (#43311)

    Add client to processor parameters in the ingest service.
    Remove the search provider function from the processor parameters.
    ExactMatchProcessor and Factory converted to use client.
    Remove test cases that are no longer applicable from processor.
    jbaiera committed Jun 25, 2019
    Configuration menu
    Copy the full SHA
    8cbe6f6 View commit details
    Browse the repository at this point in the history
  7. unmuted test

    martijnvg committed Jun 25, 2019
    Configuration menu
    Copy the full SHA
    b90bae1 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2019

  1. Configuration menu
    Copy the full SHA
    8fc3b06 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    eac98e9 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2019

  1. Configuration menu
    Copy the full SHA
    813bf4c View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2019

  1. Configuration menu
    Copy the full SHA
    8d884d7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    79ba94f View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2019

  1. Make ingest executing non blocking (#43361)

    Added an additional method to the Processor interface to allow a
    processor implementation to make a non blocking call.
    
    Also added semaphore in order to avoid search thread pools from rejecting
    search requests originating from the match processor. This is a temporary workaround.
    martijnvg committed Jul 1, 2019
    Configuration menu
    Copy the full SHA
    7c42cde View commit details
    Browse the repository at this point in the history
  2. Add restart node enrich tests. (#43579)

    This test verifies that enrich policies still exist after a full
    cluster restart. If EnrichPolicy is not registered as named xcontent
    in EnrichPlugin class then this test fails.
    martijnvg committed Jul 1, 2019
    Configuration menu
    Copy the full SHA
    455cbb6 View commit details
    Browse the repository at this point in the history
  3. Validate read priv of enrich source indices (#43595)

    This commit adds permissions validation on the indices provided in the
    enrich policy. These indices should be validated at store time so as not
    to have cryptic error messages in the event the user does not have
    permissions to access said indices.
    hub-cap committed Jul 1, 2019
    Configuration menu
    Copy the full SHA
    b948d31 View commit details
    Browse the repository at this point in the history
  4. Fix restart test

    hub-cap committed Jul 1, 2019
    Configuration menu
    Copy the full SHA
    ed399b8 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2019

  1. Add enrich coordinator proxy action (#43801)

    Introduced proxy api the handle the search request load that originates
    from enrich processor. The enrich processor can execute many search 
    requests that execute asynchronously in parallel and that can easily overwhelm 
    the search thread pool on nodes. In order to protect this the Coordinator
    queues the search requests and only executes a fixed number of search requests
    in parallel.
    
    Besides this; the Coordinator tries to include as much as possible search requests 
    (up to a defined maximum) inside a multi search request in order to reduce the number 
    of remote api calls to be made from the node that performs ingestion.
    martijnvg committed Jul 3, 2019
    Configuration menu
    Copy the full SHA
    355fe6d View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2019

  1. Configuration menu
    Copy the full SHA
    ad98f68 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9fcb4bb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    002810a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    3fb32c3 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2019

  1. Configuration menu
    Copy the full SHA
    34b6067 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2019

  1. Ensure enrich policy is immutable (#43604)

    This commit ensures the policy cannot be overwritten. An error is thrown
    if the policy exists. All tests have been updated accordingly.
    hub-cap committed Jul 11, 2019
    Configuration menu
    Copy the full SHA
    b1c9c5c View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2019

  1. Add Enrich index background task to cleanup old indices (#43746)

    This PR adds a background maintenance task that is scheduled on the master node only. 
    The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not 
    currently pointing at it. Synchronization has been added to make sure that no policy 
    executions are running at the time of cleanup, and if any executions do occur, the marking 
    process delays cleanup until next run.
    jbaiera committed Jul 17, 2019
    Configuration menu
    Copy the full SHA
    c7ba91b View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2019

  1. Add soft limit for max concurrent policy executions (#43117)

    Adds a global soft limit on the number of concurrently executing enrich policies. 
    Since an enrich policy is run on the generic thread pool, this is meant to limit 
    policy runs separately from the generic thread pool capacity.
    jbaiera committed Jul 23, 2019
    Configuration menu
    Copy the full SHA
    49377b0 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2019

  1. Configuration menu
    Copy the full SHA
    9875e16 View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2019

  1. Fix build errors (#44933)

    Add EnrichPlugin to test cases that update cluster state
    jbaiera authored and martijnvg committed Jul 29, 2019
    Configuration menu
    Copy the full SHA
    d0a1657 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5d0b71c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6d8bba6 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    814c2bb View commit details
    Browse the repository at this point in the history
  5. removed unused imports

    martijnvg committed Jul 29, 2019
    Configuration menu
    Copy the full SHA
    97d3e30 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2019

  1. Configuration menu
    Copy the full SHA
    a8c7a29 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    da3c34c View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2019

  1. Configuration menu
    Copy the full SHA
    4c860c0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6e33080 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    eea64ff View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2019

  1. Added a custom api to perform the msearch more efficiently for enrich…

    … processor (#43965)
    
    Currently the msearch api is used to execute buffered search requests;
    however the msearch api doesn't deal with search requests in an intelligent way.
    It basically executes each search separately in a concurrent manner.
    
    This api reuses the msearch request and response classes and executes
    the searches as one request in the node holding the enrich index shard.
    Things like engine.searcher and query shard context are only created once.
    Also there are less layers than executing a regular msearch request. This
    results in an interesting speedup.
    
    Without this change, in a single node cluster, enriching documents
    with a bulk size of 5000 items, the ingest time in each bulk response
    varied from 174ms to 822ms. With this change the ingest time in each
    bulk response varied from 54ms to 109ms.
    
    I think we should add a change like this based on this improvement in ingest time.
    
    However I do wonder if instead of doing this change, we should improve
    the msearch api to execute more efficiently. That would be more complicated
    then this change, because in this change the custom api can only search
    enrich index shards and these are special because they always have a single
    primary shard. If msearch api is to be improved then that should work for
    any search request to any indices. Making the same optimization for
    indices with more than 1 primary shard requires much more work.
    
    The current change is isolated in the enrich plugin and LOC / complexity
    is small. So this good enough for now.
    martijnvg committed Aug 9, 2019
    Configuration menu
    Copy the full SHA
    ccf30c3 View commit details
    Browse the repository at this point in the history
  2. Added HLRC support for enrich put policy API. (#45183)

    This PR also adds HLRC docs.
    
    Relates to #32789
    martijnvg committed Aug 9, 2019
    Configuration menu
    Copy the full SHA
    43b23aa View commit details
    Browse the repository at this point in the history
  3. Add support for a more compact enrich values format (#45033)

    In the case that source and target are the same in `enrich_values` then
    a string array can be specified.
    
    For example instead of this:
    
    ```
    PUT /_ingest/pipeline/my-pipeline
    {
        "processors": [
            {
                "enrich" : {
                    "policy_name": "my-policy",
                    "enrich_values": [
                        {
                            "source": "first_name",
                            "target": "first_name"
                        },
                        {
                            "source": "last_name",
                            "target": "last_name"
                        },
                        {
                            "source": "address",
                            "target": "address"
                        },
                        {
                            "source": "city",
                            "target": "city"
                        },
                        {
                            "source": "state",
                            "target": "state"
                        },
                        {
                            "source": "zip",
                            "target": "zip"
                        }
                    ]
                }
            }
        ]
    }
    ```
    This more compact format can be specified:
    
    ```
    PUT /_ingest/pipeline/my-pipeline
    {
        "processors": [
            {
                "enrich" : {
                    "policy_name": "my-policy",
                    "targets": [
                       "first_name",
                       "last_name",
                       "address",
                       "city",
                       "state",
                       "zip"
                    ]
                }
            }
        ]
    }
    ```
    
    And the `enrich_values` key has been renamed to `set_from`.
    
    Relates to #32789
    martijnvg committed Aug 9, 2019
    Configuration menu
    Copy the full SHA
    75a6a99 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2019

  1. Configuration menu
    Copy the full SHA
    bfa25b4 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2019

  1. Configuration menu
    Copy the full SHA
    d775277 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8adb39e View commit details
    Browse the repository at this point in the history
  3. Validate policy name like an index name. (#45452)

    The policy name is used to generate the enrich index name.
    For this reason, a policy name should be validated in the same way
    as index names.
    
    Relates to #32789
    martijnvg committed Aug 13, 2019
    Configuration menu
    Copy the full SHA
    4ea8812 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2019

  1. Improve naming of enrich policy fields. (#45494)

    Renamed `enrich_key` to `match_field` and
    renamed `enrich_values` to `enrich_fields`.
    
    Relates #32789
    martijnvg committed Aug 14, 2019
    Configuration menu
    Copy the full SHA
    2559998 View commit details
    Browse the repository at this point in the history
  2. Fail delete policy if pipeline exists (#44438)

    If a pipeline that refrences the policy exists, we should not allow the
    policy to be deleted. The user will need to remove the processor from
    the pipeline before deleting the policy. This commit adds a check to
    ensure that the policy cannot be deleted if it is referenced by any
    pipeline in the system.
    hub-cap committed Aug 14, 2019
    Configuration menu
    Copy the full SHA
    9e22fd4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8b3d358 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2019

  1. Fix policy removal bug in delete policy (#45573)

    The delete policy had a subtle bug in that it would still delete the
    policy if pipelines were accessing it, after giving the client back an
    error. This commit fixes that and ensures it does not happen by adding
    verification in the test.
    hub-cap authored and martijnvg committed Aug 15, 2019
    Configuration menu
    Copy the full SHA
    ae4bfe9 View commit details
    Browse the repository at this point in the history
  2. Prevent delete policy for active executing policy (#45472)

    This commit adds a lock to the delete policy, in the same way that the
    locking is done for policy execution. It also creates a test to exercise
    the delete transport action, and modifies an existing test to provide a
    common set of functions for saving and deleting policies.
    hub-cap committed Aug 15, 2019
    Configuration menu
    Copy the full SHA
    d408b79 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2019

  1. Configuration menu
    Copy the full SHA
    5707bc7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ac0e5c9 View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2019

  1. Consolidate enrich list all and get by name APIs (#45705)

    The get and list APIs are a single API in this commit. Whether
    requesting one named policy or all policies, a list of policies is
    returened. The list API code has all been removed and the GET api is
    what remains, which contains much of the list response code.
    hub-cap committed Aug 20, 2019
    Configuration menu
    Copy the full SHA
    a7c5925 View commit details
    Browse the repository at this point in the history
  2. Renamed CoordinatorProxyAction to EnrichCoordinatorProxyAction and (#…

    …45663)
    
    fail if query shard context needs current time (certain queries / scripts
    use this, but in the enrich context this is not used).
    martijnvg committed Aug 20, 2019
    Configuration menu
    Copy the full SHA
    b4e3614 View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2019

  1. Configuration menu
    Copy the full SHA
    ae83375 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a6917a1 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2019

  1. Enrich processor configuration changes (#45466)

    Enrich processor configuration changes:
    * Renamed `enrich_key` option to `field` option.
    * Replaced `set_from` and `targets` options with `target_field`.
    
    The `target_field` option behaves different to how `set_from` and
    `targets` worked. The `target_field` is the field that will contain
    the looked up document.
    
    Relates to #32789
    martijnvg committed Aug 22, 2019
    Configuration menu
    Copy the full SHA
    2879e67 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2019

  1. Configuration menu
    Copy the full SHA
    b48784f View commit details
    Browse the repository at this point in the history
  2. Change how type is stored in an enrich policy. (#45789)

    A policy type controls how the enrich index is created and
    the query executed against the match field. Currently there
    is a single policy type (`exact_match`). In the near future
    more policy types will be added and different policy may have
    different configuration options.
    
    For this reason type should be a json object instead of a string field:
    
    ```
    {
       "exact_match": {
          ...
       }
    }
    ```
    
    instead of:
    
    ```
    {
      "type": "exact_match",
      ...
    }
    ```
    
    This will make streaming parsing of enrich policies easier as in the
    new format, the parsing code can know ahead what configuration fields
    to expect. In the latter format that is not possible if the type field
    appears not as the first field.
    
    Relates to #32789
    martijnvg committed Aug 23, 2019
    Configuration menu
    Copy the full SHA
    f14874c View commit details
    Browse the repository at this point in the history
  3. Decouple enrich processor factory from enrich policy (#45826)

    This commit changes the enrich processor factory to read the required
    configuration from the current enrich index (from meta mapping field)
    in order to create the processor.
    
    Before this change the required config was read from the enrich policy
    in the cluster state. Enrich policies are going to be stored in an
    index (instead of the cluster state). In a processor factory there isn't
    a way to load something from an index, so with this change we read
    the required config / info from the enrich index (which is derived
    from the enrich policy), which then allows us to move enrich policies
    to an index.
    
    With this change it is required to execute a policy before creating a
    pipeline. Otherwise there is no enrich index and then there is no way
    to validate that a policy exist or retrieve its type and match field.
    
    Relates to #32789
    martijnvg committed Aug 23, 2019
    Configuration menu
    Copy the full SHA
    3e3cd72 View commit details
    Browse the repository at this point in the history
  4. Remove enrich indices on delete policy (#45870)

    When a policy is deleted, the enrich indices that are backing the policy
    alias should also be deleted. This commit does that work and cleans up
    the transport action a bit so that the lock release is easier to see, as
    well as to ensure that any action carried out, regardless of exception,
    unlocks the policy.
    hub-cap committed Aug 23, 2019
    Configuration menu
    Copy the full SHA
    1c4ffd3 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2019

  1. Add HLRC support for delete policy api (#45833)

    This PR also adds HLRC docs.
    
    Relates to #32789
    martijnvg committed Aug 26, 2019
    Configuration menu
    Copy the full SHA
    a1e8194 View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2019

  1. Configuration menu
    Copy the full SHA
    c8436a7 View commit details
    Browse the repository at this point in the history
  2. fixed method signature

    martijnvg committed Aug 28, 2019
    Configuration menu
    Copy the full SHA
    a815d8d View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2019

  1. Configuration menu
    Copy the full SHA
    63fe69f View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2019

  1. removed redundant cast

    martijnvg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    90994ce View commit details
    Browse the repository at this point in the history
  2. Change exact match processor to match processor. (#46041)

    Besides a rename, this changes allows to processor to attach multiple
    enrich docs to the document being ingested.
    
    Also in order to control the maximum number of enrich docs to be
    included in the document being ingested, the `max_matches` setting
    is added to the enrich processor.
    
    Relates #32789
    martijnvg committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    43ede36 View commit details
    Browse the repository at this point in the history
  3. [DOCS] Separate Enrich API Docs (#46286)

    * Add enrich policy common parameter
    
    * Add enrich APIs to REST APIs index
    
    * Add put enrich policy API docs
    
    * Add get enrich policy API docs
    
    * Add delete enrich policy API docs
    
    * Add execute enrich policy API docs
    jrodewig committed Sep 4, 2019
    Configuration menu
    Copy the full SHA
    dace374 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2019

  1. Configuration menu
    Copy the full SHA
    f97cc7f View commit details
    Browse the repository at this point in the history
  2. [DOCS] Update "Enrich your data" tutorials (#46417)

    * Move enrich docs to separate file
    
    * Rewrite enrich processor tutorial
    jrodewig committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    a97ed3e View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2019

  1. Ensure enrich executes on master node only (#46448)

    The previous transport action was a read action, which under the right
    set of circumstances can execute on a coordinating node. This commit
    ensures that cannot happen.
    hub-cap committed Sep 10, 2019
    Configuration menu
    Copy the full SHA
    d5a9527 View commit details
    Browse the repository at this point in the history
  2. Allow comma separated ids in get enrich policy API (#46351)

    This commit changes the GET REST api so it will accept an optional comma
    separated list of enrich policy ids. This change also modifies the
    behavior of the GET API in that it will not error if it is passed a bad
    enrich id anymore, but will instead just return an empty list.
    hub-cap committed Sep 10, 2019
    Configuration menu
    Copy the full SHA
    53b19af View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2019

  1. Configuration menu
    Copy the full SHA
    f1ba62a View commit details
    Browse the repository at this point in the history
  2. fixed typo

    martijnvg committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    4875c48 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    eeba099 View commit details
    Browse the repository at this point in the history
  4. Add enrich stats api (#46462)

    The enrich api returns enrich coordinator stats and
    information about currently executing enrich policies.
    
    The coordinator stats include per ingest node:
    * The current number of search requests in the queue.
    * The total number of outstanding remote requests that
      have been executed since node startup. Each remote
      request is likely to include multiple search requests.
      This depends on how much search requests are in the
      queue at the time when the remote request is performed.
    * The number of current outstanding remote requests.
    * The total number of search requests that `enrich`
      processors have executed since node startup.
    
    The current execution policies stats include:
    * The name of policy that is executing
    * A full blow task info object that is executing the policy.
    
    Relates to #32789
    martijnvg committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    5345274 View commit details
    Browse the repository at this point in the history
  5. Add HLRC support for enrich get policy API. (#45970)

    Changed the signature of AbstractResponseTestCase#createServerTestInstance(...)
    to include the randomly selected xcontent type. This is needed for the
    creating a server response instance with a query which is represented as BytesReference.
    Maybe this should go into a different change?
    
    This PR also includes HLRC docs for the get policy api.
    
    Relates to #32789
    martijnvg committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    5d76d2d View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2b95e6a View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2019

  1. Configuration menu
    Copy the full SHA
    2fbaf32 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2019

  1. Configuration menu
    Copy the full SHA
    a18b587 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2019

  1. Add the cluster version to enrich policies (#45021)

    Adds the Elasticsearch version as a field on the EnrichPolicy object
    jbaiera committed Sep 20, 2019
    Configuration menu
    Copy the full SHA
    e8ffcd7 View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2019

  1. Configuration menu
    Copy the full SHA
    afc16ba View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    92f24c9 View commit details
    Browse the repository at this point in the history
  3. fixed tests

    martijnvg committed Sep 23, 2019
    Configuration menu
    Copy the full SHA
    1118da0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2d77751 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2019

  1. Configuration menu
    Copy the full SHA
    88a1cd5 View commit details
    Browse the repository at this point in the history
  2. Add pipeline to ensure unique Enrich index documents (#46348)

    Adds a pipeline that removes ids and routing from documents before indexing 
    them into enrich indices. Enrich documents may come from multiple indices, 
    and thus have id collisions on them. This pipeline ensures that documents 
    with colliding id fields do not clobber one another during the reindex operation 
    while executing an enrich policy.
    jbaiera committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    d5cf383 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2019

  1. Expose enrich stats api to monitoring. (#46708)

    This change also slightly modifies the stats response,
    so that is can easier consumer by monitoring and other
    users. (coordinators stats are now in a list instead of
    a map and has an additional field for the node id)
    
    Relates to #32789
    martijnvg committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    31cfd57 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2019

  1. Configuration menu
    Copy the full SHA
    f676d97 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    500585a View commit details
    Browse the repository at this point in the history
  3. put provided argument on the previous line just like in master branch,

    that way this doesn't show in the final pr.
    martijnvg committed Sep 27, 2019
    Configuration menu
    Copy the full SHA
    69cad3d View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2019

  1. give monitoring more time

    martijnvg committed Sep 30, 2019
    Configuration menu
    Copy the full SHA
    82c8f16 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    197c1d5 View commit details
    Browse the repository at this point in the history
  3. fixed checkstyle violation

    martijnvg committed Sep 30, 2019
    Configuration menu
    Copy the full SHA
    c3561e8 View commit details
    Browse the repository at this point in the history
  4. Add config namespace in get policy api response (#47162)

    Currently the policy config is placed directly in the json object
    of the toplevel `policies` array field. For example:
    
    ```
    {
        "policies": [
            {
                "match": {
                    "name" : "my-policy",
                    "indices" : ["users"],
                    "match_field" : "email",
                    "enrich_fields" : [
                        "first_name",
                        "last_name",
                        "city",
                        "zip",
                        "state"
                    ]
                }
            }
        ]
    }
    ```
    
    This change adds a `config` field in each policy json object:
    
    ```
    {
        "policies": [
            {
                "config": {
                    "match": {
                        "name" : "my-policy",
                        "indices" : ["users"],
                        "match_field" : "email",
                        "enrich_fields" : [
                            "first_name",
                            "last_name",
                            "city",
                            "zip",
                            "state"
                        ]
                    }
                }
            }
        ]
    }
    ```
    
    This allows us in the future to add other information about policies
    in the get policy api response.
    
    The UI will consume this API to build an overview of all policies.
    The UI may in the future include additional information about a policy
    and the plan is to include that in the get policy api, so that this
    information can be gathered in a single api call.
    
    An example of the information that is likely to be added is:
    * Last policy execution time
    * The status of a policy (executing, executed, unexecuted)
    * Information about the last failure if exists
    martijnvg committed Sep 30, 2019
    Configuration menu
    Copy the full SHA
    a23c7af View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2019

  1. Add enable checks to missing enrich plugin methods (#47187)

    Some of the server side objects that do not need to be created unless
    enrich is enabled were still being created. This commit fixes that.
    hub-cap committed Oct 1, 2019
    Configuration menu
    Copy the full SHA
    5a213ac View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2019

  1. Add retry to force merge operation in EnrichPolicyRunner (#47178)

    Adds a check when running an Enrich policy to make sure that an Enrich index 
    is force merged down to one segment, and if it was not fully merged, attempts 
    the merge again, up to a configurable number of times.
    jbaiera committed Oct 4, 2019
    Configuration menu
    Copy the full SHA
    5eb9a04 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4d3f681 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2019

  1. Geo-Match Enrich Processor (#47243)

    this commit introduces a geo-match enrich processor that looks up a specific
    `geo_point` field in the enrich-index for all entries that have a geo_shape match field
    that meets some specific relation criteria with the input field.
    
    For example, the enrich index may contain documents with zipcodes and their respective
    geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
    they associate according to which shape they are contained within.
    
    this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
    AbstractEnrichProcessor.
    
    Closes #42639.
    talevy committed Oct 7, 2019
    Configuration menu
    Copy the full SHA
    c20fa4d View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2019

  1. Don't remove indices to avoid monitoring from intermittently failing

    to index monitoring docs.
    martijnvg committed Oct 8, 2019
    Configuration menu
    Copy the full SHA
    3eb07f5 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2019

  1. Configuration menu
    Copy the full SHA
    957f0fa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    81e6034 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    17eef81 View commit details
    Browse the repository at this point in the history
  4. Reuse OperationRouting#searchShards(...) to select local enrich shard (

    …#47359)
    
    The currently logic shard selecting logic selects a random shard copy
    instead of selecting the local shard copy and if local copy is not
    available then selecting a random shard copy. The latter is desired
    behaviour for enrich.
    
    By reusing `OperationRouting#searchShards(...)` we get the desired
    behaviour and reuse the same logic that the search api is using.
    martijnvg committed Oct 9, 2019
    Configuration menu
    Copy the full SHA
    f7c03ad View commit details
    Browse the repository at this point in the history
  5. Add basic task support for executing enrich policies (#47523)

    Changes the execution logic to create a new task using the execute request, 
    and attaches the new task to the policy runner to be updated. Also, a new 
    response is now returned from the execute api, which contains either the task 
    id of the execution, or the completed status of the run. The fields are mutually 
    exclusive to make it easier to discern what type of response it is.
    jbaiera committed Oct 9, 2019
    Configuration menu
    Copy the full SHA
    9e5e51b View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2019

  1. match processor should handler values other than string properly (#47419

    )
    
    Currently if the document being ingested contains another field value
    than a string then the processor fails with an error.
    
    This commit changes the match processor to handle number values
    and array values correctly.
    
    If a json array is detected then the `terms` query is used instead
    of the `term` query.
    martijnvg committed Oct 10, 2019
    Configuration menu
    Copy the full SHA
    498789b View commit details
    Browse the repository at this point in the history
  2. Add HLRC support for enrich stats API (#47306)

    This PR also includes HLRC docs for the enrich stats api.
    
    Relates to #32789
    martijnvg committed Oct 10, 2019
    Configuration menu
    Copy the full SHA
    0caca2f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4de7133 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    abdbd84 View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2019

  1. Configuration menu
    Copy the full SHA
    e06598b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0ef3668 View commit details
    Browse the repository at this point in the history
  3. Add HLRC support for enrich execute policy API (#47991)

    This PR also includes HLRC docs for the enrich stats api.
    
    Relates to #32789
    martijnvg committed Oct 14, 2019
    Configuration menu
    Copy the full SHA
    6ed7d69 View commit details
    Browse the repository at this point in the history
  4. Change how max_matches affects target_field option. (#47982)

    Prior to this change the `target_field` would always be a json array
    field in the document being ingested. This to take into account that
    multiple enrich documents could be inserted into the `target_field`.
    
    However the default `max_matches` is `1`. Meaning that by default
    only a single enrich document would be added to `target_field` json
    array field.
    
    This commit changes this; if `max_matches` is set to `1` then the single
    document would be added as a json object to the `target_field` and
    if it is configured to a higher value then the enrich documents will be
    added as a json array (even if a single enrich document happens to be
    enriched).
    martijnvg committed Oct 14, 2019
    Configuration menu
    Copy the full SHA
    ddf3bc2 View commit details
    Browse the repository at this point in the history
  5. Add wait for completion for Enrich policy execution (#47886)

    This PR adds the ability to run the enrich policy execution task in the background, 
    returning a task id instead of waiting for the completed operation.
    jbaiera committed Oct 14, 2019
    Configuration menu
    Copy the full SHA
    b0ccce2 View commit details
    Browse the repository at this point in the history
  6. Fix broken test

    jbaiera committed Oct 14, 2019
    Configuration menu
    Copy the full SHA
    382f264 View commit details
    Browse the repository at this point in the history

Commits on Oct 15, 2019

  1. Configuration menu
    Copy the full SHA
    7d68935 View commit details
    Browse the repository at this point in the history
  2. remove eclipse conditional

    martijnvg committed Oct 15, 2019
    Configuration menu
    Copy the full SHA
    6b0cfb5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    85ad27e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1fcadbb View commit details
    Browse the repository at this point in the history
  5. fixed invalid reference

    martijnvg committed Oct 15, 2019
    Configuration menu
    Copy the full SHA
    d941e1b View commit details
    Browse the repository at this point in the history