Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel processing to map step of map reduce summarisation #656

Merged
merged 7 commits into from
Jun 26, 2024

Conversation

andy-symonds
Copy link
Contributor

@andy-symonds andy-symonds commented Jun 25, 2024

Context

Before this change, the map step of map reduce summarisation, used for large document summarisation, was not happing in parallel. The code needed to be updated to allow the parallele summarisation of large documents into smaller summaries using LCEL batch.

batch only takes a list as input, so I needed to split the question prompt into map_question_prompt, which is populated with the question and chat_history before being passed to the map_operation and then the map_document_prompt which is populated when running batch.

Changes proposed in this pull request

  • Updated build_map_reduce_summary_chain to allow for batch processing
  • Made Chunk model iterable
  • Renamed max_tokens in env var AI Settings to summarisation_chunk_max_tokens to better describe what it does. max_tokens is already an overloaded term.

Guidance to review

Check you are happy with how I have structured build_map_reduce_summary_chain.

Relevant links

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

…fore documents, as LCEL batch operation can only take a list of documents
…erform the map operation of summarising into multiple summaries in parallel using LCEL batch
…o summarisation_chunk_max_tokens for clear naming
…eeded and I added it earlier in this PR work
Copy link
Contributor

@jamesrichards4 jamesrichards4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Probably worth making max_concurrency a setting so we can switch it out if we get issues though

@andy-symonds andy-symonds changed the title Add parallele processing to map step of map reduce summarisation Add parallel processing to map step of map reduce summarisation Jun 26, 2024
@andy-symonds andy-symonds merged commit 1593999 into main Jun 26, 2024
3 checks passed
@andy-symonds andy-symonds deleted the feat/map-reduce-batch branch June 26, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants