New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Parallel & Batch Ingestion #12457
Comments
@peternied Yes, it looks like the proposed feature 3 in this RFC has very similar idea with the streaming API especially the coordinator part to load balancing the ingest load. For feature 3, it just tries to reuse the Feature 1 and 2 are different from streaming API as they focus on parallel and batch ingestion on a single node which would happen post streaming API or feature 3. |
@dbwiddis, @joshpalis -- you may be interested in this, as you've been thinking about parallel execution for search pipelines. For ingest pipelines, the use-case is a little bit more "natural", because we already do parallel execution of @chishui, can you confirm where exactly the parallel/batch execution would run? A bulk request is received on one node (that serves as coordinator for the request), then the underlying |
@msfroh, the "parallel/batch execution" would be run on the ingest pipeline side. The DocWriteRequests are first processed by ingest pipeline and its processors on a single ingest node, then the processed documents are fanned out to shards to be indexed. To answer your question, the logic would be run on the coordinator. |
Additional information about parallel ingestion: Performance:Light-weighted processors - no improvementWe benchmarked the performance on some light weighted processors (lowercase + append) with current solution and parallelized batch solution, we don't see improvement on either latency or throughput which is aligned with our expectation that they are already very fast and parallelization wouldn't help and could bring some additional overhead. ML processors - already in asyncML processors are the processors doing heavy lifting work, but they actually put the predict logic in a thread (code) which brings the ingestion of that document to async. Reasons to have parallel ingestion
Reasons not to have parallel ingestion
|
@reta , could you also help to take a look at this RFC, thanks! |
Thanks @gaobinlong
@msfroh @model-collapse @chishui I think the idea of enhancing ingest processors with batch processing is sound in general but it may have unintended consequences, due to complexity of
Making the ingestion API streaming based (apologies again for bribing for #3000) is fundamentally a different approach to ingestion - we would be able to vary the ingestion based on how fast the documents could be ingested at this moment of time, without introducing the complexity of batch / parallelism management. @nknize I think you mind be eager to chime in here :) |
Thanks for the comment. For machine learning inference, making use of batched inference API will significantly increase the GPU utilization and reduce the ingestion time. Thus batch is very important thing. You pointed out that "picking the right batch for bulk is very difficult, but adding yet more parallelization / internal batching would make it much harder". Can you elaborate more on that and give your suggestions on how to make ingestion faster? |
@reta thanks for the feedbacks
The proposal only targets the ingest pipeline & its processor part, it won't touch the indexing part. Even documents are processed in a batch manner, these things are still ensured:
Either the action is index or update, upsert or script, they would be processed by ingest pipeline in the same way. I don't see the proposal will cause "changing the ingestion sequence", please let me know if I miss a piece of the puzzle. |
Due to the aforementioned reasons about "parallel ingestion", we won't have immediate gain from delivering the feature, we have decided to deprioritize the “parallel ingestion” part of this RFC and mainly focus on the "batch ingestion". |
@chishui The parallelization (which is mentioned in this proposal) naturally changes the order which documents are being ingested, does it make sense? I think your last comment is the reflection of that, thank you.
@model-collapse the problem with batching (at least how it is implemented currently in OS and what we've seen so far with |
@reta in ingest flow when documents are processed by ingest pipeline, could one document depend on another? Even for today, text_embedding and sparse_encoding processors have their inference logic run in a thread which makes the document ingestion run in parallel, right? https://github.com/opensearch-project/ml-commons/blob/020207ecd6322fed424d5d54c897be74623db103/plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java#L194 |
@chishui yes, in general documents could depend on each other (just think about an example of the documents that are ingested out of any CDC or message broker, where the documents are being constructed as a sequence of changes).
This is purely plugin specific logic |
In my understanding, in terms of the execution of pipeline, each document in a bulk runs independently, no ingest processor can access other in-flight documents in the same bulk request, so in the process of executing pipelines, maybe a document cannot depend on another? And subsequently, for the processing of indexing(call lucene api to write), we have the write thread_pool, each document is processed in parallel, so the indexing order in a bulk cannot be guaranteed, the client side needs to ensure the indexing order. @reta, correct me if something is wrong, thank you! |
I think executing pipelines run before the indexing process, firstly, we use a single transport thread to execute pipelines for all the documents in a bulk request, and then use the write thread_pool to process the new generated documents in parallel, so it seems that when executing pipelines for the documents, the execution order doesn't matter. |
Thanks @gaobinlong
The documents could logically depend on each other (I am not referring to any sharing that may happen in ingest processor). Since we are talking about bulk ingestion, where document could be indexed / updated / deleted, we certainly don't want to the deletes to be "visible" before documents are indexed.
This part is not clear to me: AFAIK we offload processing of bulk requests (batches) to thread pool, not individual documents. Could you please point out where we parallelize the ingestion of the individual documents in the batch? Thank you |
Yeah, you're correct, but for this RFC, it only focuses on the execution of ingest pipeline which only performs on the coordinate node, just the pre-processing part, not the indexing part, the indexing operations will not happen before the execution of ingest pipeline completes for all the documents in a bulk request.
After the execution of ingest pipeline for all documents in a bulk, the coordinate code groups these documents by shard and send them to different shards, each shard processes its documents in parallel, so at least in shard level, we process the documents in a bulk request in parallel. But I think this RFC will not touch the processing logic in each shard which processes the create/update/delete operations for the same document in order, so it's not harmful. |
@reta What is your estimation where the circuit breaking will happen? If you mean it will happen in side the batch processor's own process, that could be, because it is impossible to estimate how much memory will be consumed by its code. Therefore, we need to let the users to configure the batch_size in the bulk_api. |
@model-collapse there are no estimates the one could make upfront, this is purely operational issue (basically depends on what is going on at the moment)
Due to previous comment, users have difficulties with that: same batch_size may work now and may not 10m from now (if cluster is under duress). The issue referred there has all the details. |
Benchmark Results on Batch ingestion with Neural Search ProcessorsWe implemented the PoC of batch ingestion locally and enabled the capability of sending batch documents to remote ML servers. We used "opensearch-benchmark" to benchmark both batch enabled and disabled situation on different ML servers (SageMaker, Cohere, OpenAI) and here are the benchmark results Benchmark ResultsEnvironment Setup
SageMakerEnvironment Setup
CohereEnvironment Setup
OpenAIEnvironment Setup
Results
[1]: The errors are coming from SageMaker 4xx response which was also reported in ml-commons issue opensearch-project/ml-commons#2249 |
Have we benchmarked the performance of this change? how much is the throughput increasing after this change? |
@reta to address your concern we plan to provide an automation tool to help user run a series of benchmarks against their OS with different batch size and recommend the optimal batch size. Here is the feature link: #13009 Could you please take a look and see if your concerns are addressed, we really want to push this forward to benefit users. |
The async http client benchmark results are attached here opensearch-project/ml-commons#1839 |
@reta since we only pursue batch ingestion in this RFC, and to address your concern that user will have difficulty tuning batch size, we also proposed to have a automation tool to make it easier for user opensearch-project/opensearch-benchmark#508. Is there any other things that you believe we should address before moving forward? |
@chishui I honestly don't know at what extent tool could help, you may need to provide the guide for the users to explain how it is supposed to be used. At least it may give some confidence probably. AFAIK OpenSearch benchmarks does targeted measurements for specific operations (this is what it was designed for), but does not measure the different interleaving operational workloads (and shouldn't I think): fe running search while ingesting new documents, etc ... |
@reta IMO, to have this batch ingestion feature, is from 0 to 1, that user can start to use it to accelerate their ingestion process and have fewer chances to get throttled by remote ML server (benefits are shown from the benchmark results above). Maybe it's not easy for them to find the optimal batch size initially, but they have an option and can benefit immediately once they use batch feature. Then, to have a tool to help them find an optimal batch size automatically, is from 1 to 10, that we make this feature easy to use for everyone.
Yes, we definitely need a document on OpenSearch website when we introduce this feature explaining how the feature should be used, how it can benefit, how the tool can help.
That's what I understand as well. |
All documents from this request are handled by one ingest node - Is this a correct statement? For multi nodes cluster, the documents in the _bulk will be distributed to each node for ingestion? |
@Zhangxunmt thanks for the comment. The RFC is only about preprocessing, all documents are handled by a single ingest node which remains the same as current behavior. After preprocessing is the indexing process, as you said, documents are distributed to different node which also remains the same as we don't touch this part of logic. |
What is the preprocessing? Does it mean processors in the pipeline? Recently we noticed that in neural search, the text-embedding processor in the ingest pipeline sends remote inference traffics that are proportional to the number of data nodes in the cluster. That means, the _bulk requests takes N documents, and N documents are evenly distributed among all nodes for the text-embedding processor to run remote inference for vectorization. So this means all the docs are divided into smaller batches and preprocessed in every nodes? @chishui |
Yes
The text-embedding processor is actually run on the node which accepts "_bulk" API. When it needs to send out text for inferencing, it could route the requests to "ml" nodes depending on the
Basically, since we won't change inferencing API, if texts are already dispatched to every node for inferencing, with this batch enabled, batched texts are dispatched to every nodes. |
Got it. I think it makes sense. In most cases the ml_node setting is false. AOS doesn't have ml nodes so far so the inferencing would happen in data nodes. Based on the prior discussion here, the pre-processing runs on the node which accepts the _bulk, but it will call "Predict" API in Ml-Commons that routes the inference traffics to all data nodes. So essentially it's still the whole cluster handling the batch of documents. So in the cases of a single text-embedding processor in a ingest pipeline, does the proposed parallel ingestion still help the performance since the processor itself is already handling docs in parallel mode? |
@chishui One takeaway from this issue is that we'd better to use a bigger cluster (>10 nodes) for the performance benchmarking because nodes number is direct proportional to the concurrency TPS we send to the model service. Smaller OS cluster easily reaches its hard limit in the concurrency requests and may not represent the real customer scenarios. |
@Zhangxunmt I explained the benefits of having parallel ingestion in this comment #12457 (comment). In the scenario you described, it won't help the performance.
I think it's actually the opposite. Even with only one data node in the cluster, inferencing is done in thread pool and the thread pool size controls the maximum concurrent TPS. And based on our benchmark result, without batch, each document is inferenced in a single thread and can easily run into 4xx from sagemaker. But with batch, each batch is in a single thread and can less likely run into 4xx from sagemaker. So with bigger cluster, you would have higher concurrency and would get 4xx more likely and batch can definitely help with this. |
I am late to this RFC, but wanted to highlight #13306 (comment) for those who commented here - if you can please take a quick look? I think the API proposed should have been discussed a little more, starting with the inconsistent use of |
* [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove unnecessary change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add missing java docs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more UTs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Fix spotlesscheck Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <drreta@gmail.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> * Remove version constant Signed-off-by: Liyun Xiu <xiliyun@amazon.com> --------- Signed-off-by: Liyun Xiu <xiliyun@amazon.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
…13462) * Support batch ingestion in bulk API (#12457) (#13306) * [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove unnecessary change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add missing java docs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more UTs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Fix spotlesscheck Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <drreta@gmail.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> * Remove version constant Signed-off-by: Liyun Xiu <xiliyun@amazon.com> --------- Signed-off-by: Liyun Xiu <xiliyun@amazon.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com> (cherry picked from commit 1219c56) * Adjust changelog item position to trigger CI Signed-off-by: Liyun Xiu <xiliyun@amazon.com> --------- Signed-off-by: Liyun Xiu <xiliyun@amazon.com>
…earch-project#13306) * [PoC][issues-12457] Support Batch Ingestion Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rewrite batch interface and handle error and metrics Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove unnecessary change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Revert some unnecessary test change Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Keep executeBulkRequest main logic untouched Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add UT & yamlRest test, fix BulkRequest se/deserialization Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add missing java docs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove Writable from BatchIngestionOption Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more UTs Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Fix spotlesscheck Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Rename parameter name to batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Add more rest yaml tests & update rest spec Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Remove batch_ingestion_option and only use batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Throw invalid request exception for invalid batch_size Signed-off-by: Liyun Xiu <xiliyun@amazon.com> * Update server/src/main/java/org/opensearch/action/bulk/BulkRequest.java Co-authored-by: Andriy Redko <drreta@gmail.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> * Remove version constant Signed-off-by: Liyun Xiu <xiliyun@amazon.com> --------- Signed-off-by: Liyun Xiu <xiliyun@amazon.com> Signed-off-by: Liyun Xiu <chishui2@gmail.com> Co-authored-by: Andriy Redko <drreta@gmail.com>
Is your feature request related to a problem? Please describe
Problem Statements
Today, users can utilize
bulk
API to ingest multiple documents in a single request. All documents from this request are handled by one ingest node and on this node, if there's any ingest pipeline configured, documents are processed by pipeline one at a time in a sequential order (ref). The ingest pipeline is constituted by a collection of processors and processor is the computing unit of a pipeline. Most of the processors are pretty light weighted such as append, uppercase, lowercase, and to process multiple documents one after another or to process them in parallel would make no observable difference. But for time-consuming processors such as neural search processors, which by their nature, require more time to compute, being able to run them in parallel could save user some valuable ingest time. Apart from ingestion time, processors like neural search, can benefit from processing batch documents together as it can reduce the requests to remote ML services via batch APIs to maximally avoid hitting rate limit restriction. (Feature request: opensearch-project/ml-commons#1840, rate limit example from OpenAI: https://platform.openai.com/docs/guides/rate-limits)Due to the lack of parallel ingestion and batch ingestion capabilities in ingest flow, we propose below solution to address them.
Describe the solution you'd like
Proposed Features
1. Batch Ingestion
An ingest pipeline is constructed by a list of processors and a single document could flow through each processor one by one before it can be stored into index. Currently, both pipeline and processor can only handle one document each time and even if with bulk API, documents are iterated and handled in sequential order. As shown in figure 1, to ingest doc1, it would firstly flow through ingest pipeline 1, then through pipeline 2. Then, the next document would go through both pipeline.
To support batch processing of documents, we'll add a
batchExecute
API in ingest pipeline and processors which take multiple documents as input parameters. We will provide a default implementation inProcessor
interface to iteratively call existingexecute
API to process document one by one so that most of the processors don't need to make change and only if there's necessity for them to batch process documents (e.g. text embedding processor), they can have their own implementation, otherwise, even receiving documents altogether, they default to process them one by one.To batch process documents, user need to use
bulk
API. We'll add two optional parameters forbulk
API for user to enable batch feature and set batch size. Based onmaximum_batch_size
value, documents are split into batches.Since in bulk API, different documents could be ingested to different indexes, indexes could use the same pipelines but in different order, e.g. index “movies” uses pipeline P1 as default pipeline, P2 as final pipeline; index “musics” uses P2 as default pipeline and P1 as final pipeline. To avoid over-complexity of handling cross indexes batching (topology sorting), we would batch documents in index level.
2. Parallel Ingestion
Apart from batch ingestion, we also propose to have parallel ingestion to accompany with batch ingestion to boost the ingestion performance. When user enables parallel ingestion, based on batch size, documents from bulk API will be split into batches, then, batches are processed in parallel with threads managed by thread pool. Although limiting the maximum concurrency of parallel ingestion, thread pool can help us protect host resources to not be exhausted by batch ingestion threads.
Ingest flow logic change
Current logic of the ingestion flow of documents can be shown from the pseudo code below:
We'll change the flow to logic shown below if the pipeline has enable the batch option.
Update to Bulk API
We propose new parameters to
bulk
API, all of them are optional.none
,enable
andparallel
. By default, it'snone
. When set it toenable
, batch ingestion is enabled, and batches are processed in sequential order. When set it toparallel
, batch ingestion is enabled and batches are processes in parallel.enable
orparallel
. It's 1 by default.3. Split and Redistribute Bulk API
Users tend to use
bulk
API to ingest many documents which can be very time consuming sometimes. In order to achieve lower ingestion time, they have to use multiple clients to make multiplebulk
requests with smaller document size so that the requests can be distributed to different ingest nodes. To offload the burden from user side, we can support the split and redistribute work from server side and help distribute the ingest load more evenly.Note: although brought up here, we think it's better to discuss this topic in a separate RFC doc which will be published later.
Related component
Indexing:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: