Skip to content

Add support for specifying a pipeline when indexing documents#1455

Merged
ruflin merged 1 commit into
ruflin:masterfrom
stof:pipeline_support
Feb 23, 2018
Merged

Add support for specifying a pipeline when indexing documents#1455
ruflin merged 1 commit into
ruflin:masterfrom
stof:pipeline_support

Conversation

@stof

@stof stof commented Feb 9, 2018

Copy link
Copy Markdown
Contributor

Closes #1453

@stof

stof commented Feb 9, 2018

Copy link
Copy Markdown
Contributor Author

@ruflin I added the setPipeline in the Document class. Should it be there or in the AbstractUpdateAction ? I don't know whether pipelines make sense for Script.

And I would like some guidance about the best way to cover this with tests.

@stof

stof commented Feb 13, 2018

Copy link
Copy Markdown
Contributor Author

@ruflin could you give me the necessary indications so that I can finish this PR ?

@ruflin

ruflin commented Feb 13, 2018

Copy link
Copy Markdown
Owner

@stof Sorry for the late reply, only was partially online the last days.

I think having it in Document as you made it make sense as the pipeline is set for each Document.

The testing here is a bit trickier as you have to create a pipeline first in Elasticsearch. I would probably do that through Elastica by using php-elasticsearch as ingest does not really exist yet in Elastica. Best go for a simple processor like rename inside the pipeline which then should also make it easy to test if the document was actually processed as expected.

Cleaning up the pipeline after the test would be great, but I'm also ok if it sticks around if this gets too complicated.

@stof

stof commented Feb 14, 2018

Copy link
Copy Markdown
Contributor Author

@ruflin actually, for bulk requests, this goes in the bulk params, not in each document. Writing the test made me realize this mistake in my PR (I'm already using the pipeline when indexing documents with the lower-level API, but I was not using the bulk API).

and the update API does not support pipelines yet (this is discussed in elastic/elasticsearch#17895)

So does it still make sense to have it in the Document object ?

@stof

stof commented Feb 14, 2018

Copy link
Copy Markdown
Contributor Author

Tests are written though.

They are currently leaking the rename pipeline, as ensuring that the cleanup is done even in case of test failure would involve wrapping the whole test in a try/finally or doing the cleanup in tearDown while most tests don't create it. But I use always the same pipeline name (overriding the pipeline defined by previous tests), so I won't leak multiple ones.

@stof

stof commented Feb 14, 2018

Copy link
Copy Markdown
Contributor Author

anyway, Type::addDocument does not take options outside the Document, so it still make sense to have pipeline in the Document IMO. But the question is how this integrates with addDocuments cleanly.

@ruflin

ruflin commented Feb 15, 2018

Copy link
Copy Markdown
Owner

@stof Thanks for the details and changes. I will probably only have time to have a detailed look and think about it beginning next week. It's definitively on my todo list ;-)

@ruflin

ruflin commented Feb 18, 2018

Copy link
Copy Markdown
Owner

@stof I'm positively surprised that using a pipeline with the bulk API already works as you showed in your tests. I wonder if that is enough for now or if we need a deeper integration?

For your PR I'm good to merge as is if your are ok with it and would add a Changelog. We can perhaps have a follow up discussion in an issue about the bulk part.

@stof

stof commented Feb 19, 2018

Copy link
Copy Markdown
Contributor Author

@ruflin it already works, because the client exposes the request params in its API, and #1427 makes it much more usable by allowing to pass them along.

@stof

stof commented Feb 19, 2018

Copy link
Copy Markdown
Contributor Author

@ruflin I'm OK with merging this PR as is (and I added the changelog)

Comment thread CHANGELOG.md


## [Unreleased](https://github.com/ruflin/Elastica/compare/6.0.0...master)
## [Unreleased](https://github.com/ruflin/Elastica/compare/6.0.1...master)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@ruflin ruflin merged commit e505ade into ruflin:master Feb 23, 2018
@stof stof deleted the pipeline_support branch February 23, 2018 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants