Add parallel processing to map
step of map reduce summarisation
#656
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Before this change, the map step of map reduce summarisation, used for large document summarisation, was not happing in parallel. The code needed to be updated to allow the parallele summarisation of large documents into smaller summaries using LCEL
batch
.batch
only takes a list as input, so I needed to split the question prompt intomap_question_prompt
, which is populated with the question and chat_history before being passed to themap_operation
and then themap_document_prompt
which is populated when runningbatch
.Changes proposed in this pull request
build_map_reduce_summary_chain
to allow forbatch
processingChunk
model iterablemax_tokens
in env var AI Settings tosummarisation_chunk_max_tokens
to better describe what it does.max_tokens
is already an overloaded term.Guidance to review
Check you are happy with how I have structured
build_map_reduce_summary_chain
.Relevant links
Things to check