Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ingest processor supports overriding target field if exists #12990

Merged
merged 4 commits into from
Apr 23, 2024

Conversation

gaobinlong
Copy link
Contributor

@gaobinlong gaobinlong commented Apr 1, 2024

Description

This PR adds a new parameter override_target to the rename ingest processor, when the parameter is true and target_field already exists in the document, the processor will override the target field rather than throwing an error field [] already exists.

Related Issues

#12942

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

github-actions bot commented Apr 1, 2024

Compatibility status:

Checks if related components are compatible with change 441b47b

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer.git]

@gaobinlong
Copy link
Contributor Author

@reta, could you help to review this PR, thank you!

Copy link
Contributor

github-actions bot commented Apr 1, 2024

❕ Gradle check result for 0c6457d: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.33%. Comparing base (b15cb0c) to head (441b47b).
Report is 156 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12990      +/-   ##
============================================
- Coverage     71.42%   71.33%   -0.09%     
- Complexity    59978    60481     +503     
============================================
  Files          4985     5033      +48     
  Lines        282275   284968    +2693     
  Branches      40946    41296     +350     
============================================
+ Hits         201603   203275    +1672     
- Misses        63999    64870     +871     
- Partials      16673    16823     +150     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Apr 1, 2024

✅ Gradle check result for eb24203: SUCCESS

@reta
Copy link
Collaborator

reta commented Apr 1, 2024

@reta, could you help to review this PR, thank you!

Thanks @gaobinlong , I have concerns regarding this change, aligned with #12942 (comment). It does not look right (to me) and it basically just hides the problem (the field is not expected to be in the document but it is). Semantically, this change would apply to Copy processor that would copy from source to target field (and also may add a target field if does not exist with the document). @msfroh any thoughts?

Copy link
Contributor

@scrawfor99 scrawfor99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gaobinlong,

I actually think the implementation of the change looks good.

I do agree with Peter's comment about this being better named something like move. I don't have a problem with the behavior being similar to the Copy processor but think that the majority of users may be more familiar with the Unix terminology for move, copy, etc.

Unless there is another naming pattern we have already committed to, I would generally be in favor of us being consistent with the behavior of Unix if we can.

This also ties into my opinion on why Copy is not where I would expect this behavior.

https://unix.stackexchange.com/questions/277412/cp-vs-mv-which-operation-is-more-efficient

@gaobinlong
Copy link
Contributor Author

@reta, could you help to review this PR, thank you!

Thanks @gaobinlong , I have concerns regarding this change, aligned with #12942 (comment). It does not look right (to me) and it basically just hides the problem (the field is not expected to be in the document but it is). Semantically, this change would apply to Copy processor that would copy from source to target field (and also may add a target field if does not exist with the document). @msfroh any thoughts?

I think rename just like move, but for now the behavior that always doesn't override the target field is not as expected, so maybe it's more friendly that we provide an option for users to use it as a real move. Copy processor supports overriding the target field but it doesn't remove the source field, this is different from the rename processor which always remove the source field.

@gaobinlong
Copy link
Contributor Author

Hi @gaobinlong,

I actually think the implementation of the change looks good.

I do agree with Peter's comment about this being better named something like move. I don't have a problem with the behavior being similar to the Copy processor but think that the majority of users may be more familiar with the Unix terminology for move, copy, etc.

Unless there is another naming pattern we have already committed to, I would generally be in favor of us being consistent with the behavior of Unix if we can.

This also ties into my opinion on why Copy is not where I would expect this behavior.

https://unix.stackexchange.com/questions/277412/cp-vs-mv-which-operation-is-more-efficient

Thanks, do you mean the Copy processor isn't like the cp in Linux? The idea of Copy processor comes from the set processor which always throws exception if the target field exists, but it supports overriding by specifying override to true, so the derived copy processor also has this behavior.

@reta
Copy link
Collaborator

reta commented Apr 8, 2024

I think rename just like move, but for now the behavior that always doesn't override the target field is not as expected, so maybe it's more friendly that we provide an option for users to use it as a real move. ``

Agree, on the second thought, I think it would make sense to have such an option, thank you @gaobinlong for clarifying.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

github-actions bot commented Apr 9, 2024

❕ Gradle check result for e1d501b: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

✅ Gradle check result for 441b47b: SUCCESS

@gaobinlong
Copy link
Contributor Author

Hi @reta , now all checks have passed yet, could you help to review?

@reta
Copy link
Collaborator

reta commented Apr 12, 2024

LGTM, thanks @gaobinlong , it looks like we have no docs for this ingest processor (Rename Processor) [1], could you please create an issue to document it [2]? Thank you

@scrawfor99 LGTY?

[1] https://opensearch.org/docs/latest/ingest-pipelines/processors/index-processors/
[2] https://github.com/opensearch-project/documentation-website/issues

@gaobinlong
Copy link
Contributor Author

LGTM, thanks @gaobinlong , it looks like we have no docs for this ingest processor (Rename Processor) [1], could you please create an issue to document it [2]? Thank you

@scrawfor99 LGTY?

[1] https://opensearch.org/docs/latest/ingest-pipelines/processors/index-processors/ [2] https://github.com/opensearch-project/documentation-website/issues

Thank you @reta ,I've opened an issue and a related PR for the missing rename ingest processor, after the PR is merged and backported to old versions, I'll open another PR for the change in this PR.

@gaobinlong
Copy link
Contributor Author

Hi @reta , could this PR be merged now?

@reta reta merged commit 41a3055 into opensearch-project:main Apr 23, 2024
31 checks passed
@reta reta added the backport 2.x Backport to 2.x branch label Apr 23, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 23, 2024
…12990)

* Rename ingest processor supports overriding target field if exists

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Modify change log

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 41a3055)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dblock pushed a commit that referenced this pull request Apr 23, 2024
…12990) (#13348)

* Rename ingest processor supports overriding target field if exists



* Modify change log



---------


(cherry picked from commit 41a3055)

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants