Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/large doc summarisation #617

Merged
merged 14 commits into from
Jun 21, 2024
Merged

Conversation

andy-symonds
Copy link
Contributor

@andy-symonds andy-symonds commented Jun 19, 2024

Context

We want to add the abilty to summarise large documents and multiple documents, with more tokens than can be 'stuffed' into the context window of an LLM.

Changes proposed in this pull request

Added a new build_map_reduce_summary_chain that uses Langchain Expression Language (LCEL) to perform a map and then a reduce to achieve large document summarisation.

Guidance to review

The map_reduce summarisation is now the default for all summarisation. Will map and reduce everything.

Relevant links

Things to check

  • I have added any new ENV vars in all deployed environments
  • I have tested any code added or changed
  • I have run integration tests

@andy-symonds andy-symonds marked this pull request as ready for review June 20, 2024 17:06
@@ -90,7 +89,7 @@ def make_document_context(input_dict):
documents += chunks

# right now, can only handle a single document so we manually truncate
max_tokens = 20_000 # parameterise later
max_tokens = (env.ai.max_tokens,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe call this something more specific? env.ai.summary_map_chunk_size?

@jamesrichards4 jamesrichards4 merged commit ae77702 into main Jun 21, 2024
5 checks passed
@andy-symonds andy-symonds deleted the feature/large-doc-summarisation branch June 21, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants