Async Support for LLMChainExtractor #3587

jphme · 2023-04-26T12:12:35Z

Implemented acompress_documents and changed syntax for compress_documents slightly to make sync/async functions consistent.

LLMChainExtractor as implemented in #2915 for use in the ContextualCompressionRetriever lacked an async method. As compression of retrieved documents is a highly parallelizable task, this was a major performance bottleneck in my tests.

This implementation is consistent to the implementation in the somewhat similar MapReduceDocumentsChain chain, see https://github.com/hwchase17/langchain/blob/85dae78548ed0c11db06e9154c7eb4236a1ee246/langchain/chains/combine_documents/map_reduce.py#L131 .

In my own tests (standalone as well as in a compression pipeline) inputs/outputs are unchanged and the async-speedup is significant.

jphme · 2023-04-26T19:04:46Z

fixed the formatting...

langchain/retrievers/document_compressors/chain_extract.py

jphme · 2023-04-29T10:00:53Z

Reverted the Sync changes and just added async without the use of apply as sugested.
New PR here: #3780 .

@vowelparrot

@vowelparrot @hwchase17 Here a new implementation of `acompress_documents` for `LLMChainExtractor ` without changes to the sync-version, as you suggested in #3587 / [Async Support for LLMChainExtractor](#3587) . I created a new PR to avoid cluttering history with reverted commits, hope that is the right way. Happy for any improvements/suggestions. (PS: I also tried an alternative implementation with a nested helper function like ``` python async def acompress_documents_old( self, documents: Sequence[Document], query: str ) -> Sequence[Document]: """Compress page content of raw documents.""" async def _compress_concurrently(doc): _input = self.get_input(query, doc) output = await self.llm_chain.apredict_and_parse(**_input) return Document(page_content=output, metadata=doc.metadata) outputs=await asyncio.gather(*[_compress_concurrently(doc) for doc in documents]) compressed_docs=list(filter(lambda x: len(x.page_content)>0,outputs)) return compressed_docs ``` But in the end I found the commited version to be better readable and more "canonical" - hope you agree.

Jan Philipp Harries and others added 3 commits April 26, 2023 14:02

Async Support for LLMChainExtractor

f5071f2

Merge branch 'hwchase17:master' into async-llmchainextractor

5a9b90a

black reformat

ebe1245

vowelparrot reviewed Apr 26, 2023

View reviewed changes

langchain/retrievers/document_compressors/chain_extract.py Show resolved Hide resolved

Merge branch 'hwchase17:master' into async-llmchainextractor

aceafe2

jphme mentioned this pull request Apr 29, 2023

Async Support for LLMChainExtractor (new) #3780

Merged

jphme closed this Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async Support for LLMChainExtractor #3587

Async Support for LLMChainExtractor #3587

jphme commented Apr 26, 2023

jphme commented Apr 26, 2023

jphme commented Apr 29, 2023

Async Support for LLMChainExtractor #3587

Async Support for LLMChainExtractor #3587

Conversation

jphme commented Apr 26, 2023

jphme commented Apr 26, 2023

jphme commented Apr 29, 2023