Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain: adds recursive json splitter #17144

Conversation

joelsprunger
Copy link
Contributor

@joelsprunger joelsprunger commented Feb 7, 2024

  • Description: This adds a recursive json splitter class to the existing text_splitters as well as unit tests
  • Issue: splitting text from structured data can cause issues if you have a large nested json object and you split it as regular text you may end up losing the structure of the json. To mitigate against this you can split the nested json into large chunks and overlap them, but this causes unnecessary text processing and there will still be times where the nested json is so big that the chunks get separated from the parent keys.

As an example you wouldn't want the following to be split in half:

{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf',
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}

Any llm processing the second chunk of text may not have the context of val1, and val16 reducing accuracy. Embeddings will also lack this context and this makes retrieval less accurate.

Instead you want it to be split into chunks that retain the json structure.

{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf'}}}

and

{'val1':{'val16':{
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}

This recursive json text splitter does this. Values that contain a list can be converted to dict first by using split(... convert_lists=True) otherwise long lists will not be split and you may end up with chunks larger than the max chunk.

In my testing large json objects could be split into small chunks with
✅ Increased question answering accuracy
✅ The ability to split into smaller chunks meant retrieval queries can use fewer tokens

  • Dependencies: json import added to text_splitter.py, and random added to the unit test
  • Twitter handle: @joelsprunger

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 7, 2024
Copy link

vercel bot commented Feb 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 8, 2024 9:44pm

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 7, 2024
@joelsprunger joelsprunger reopened this Feb 7, 2024
@hwchase17 hwchase17 self-assigned this Feb 7, 2024
@joelsprunger
Copy link
Contributor Author

I've added an ipynb notebook and I'm trying to get the documentation built locally.

@joelsprunger
Copy link
Contributor Author

joelsprunger commented Feb 7, 2024

I think we are good here. I was able to run the unit tests, linting, and see the documentation.
image

Let me know if I need to do anything else. :-)

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really really like this - one large-ish comment/refactor to make more consistent with other classes

if min_chunk_size is not None
else max(max_chunk_size - 200, 50)
)
self._chunks = JsonChunks()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this class should be stateful - pretty inconsistent with other classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that. I think I initially coded it up for my own use. And with the chunks passed to 3 of the functions in the class I was thinking it made more sense to just have those be a class member. I can change that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to have it inherit from TextSplitter, and change split to split_text so that create_documents() can work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I am seeing is with create_documents()
image
The super is expecting List[str]
I wanted to keep the loaded json as a Dict rather than expecting str because the recursion for preprocessing and splitting is on Dict objects. And anyone with json in python probably has it in a Dict format to start. I think I will stop inheriting from TextSplitter, and instead just shaddow the functionality with split_text, and create_documents, with that said I'll add split_json that returns List[dict] so that people can get the output in json if they want. Then split_text and create_docs can be wrappers around split_json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwchase17 I refactored this and updated the docs and unit tests.

@joelsprunger joelsprunger changed the title langchain: adds recursive json splitter and unit test langchain: adds recursive json splitter Feb 7, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 8, 2024
@hwchase17
Copy link
Contributor

thanks @joelsprunger ! will highlight this when its out in next release. really like it!

@hwchase17 hwchase17 merged commit 3984f66 into langchain-ai:master Feb 8, 2024
43 checks passed
@joelsprunger
Copy link
Contributor Author

@hwchase17 great! If you want me to demo some of the benefits with example json using langsmith I can make a screen-cast or something.

@funkymonkeymonk
Copy link

So I just found this in the documentation and it took me a bit of spelunking to find that this was not yet released because it's already in the documentation. Thanks for making this just in time for me to need it :-)

adamnolte pushed a commit to autoblocksai/autoblocks-examples that referenced this pull request Feb 13, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[@types/node](https://togithub.com/DefinitelyTyped/DefinitelyTyped/tree/master/types/node)
([source](https://togithub.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node))
| [`20.11.16` ->
`20.11.17`](https://renovatebot.com/diffs/npm/@types%2fnode/20.11.16/20.11.17)
|
[![age](https://developer.mend.io/api/mc/badges/age/npm/@types%2fnode/20.11.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/@types%2fnode/20.11.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/@types%2fnode/20.11.16/20.11.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/@types%2fnode/20.11.16/20.11.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [ai](https://sdk.vercel.ai/docs)
([source](https://togithub.com/vercel/ai)) | [`2.2.33` ->
`2.2.35`](https://renovatebot.com/diffs/npm/ai/2.2.33/2.2.35) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/ai/2.2.35?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/ai/2.2.35?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/ai/2.2.33/2.2.35?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/ai/2.2.33/2.2.35?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [langchain](https://togithub.com/langchain-ai/langchain) | `0.1.5` ->
`0.1.6` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/langchain/0.1.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/langchain/0.1.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/langchain/0.1.5/0.1.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/langchain/0.1.5/0.1.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
|
[langchain](https://togithub.com/langchain-ai/langchainjs/tree/main/langchain/)
([source](https://togithub.com/langchain-ai/langchainjs)) | [`0.1.16` ->
`0.1.17`](https://renovatebot.com/diffs/npm/langchain/0.1.16/0.1.17) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/langchain/0.1.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/langchain/0.1.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/langchain/0.1.16/0.1.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/langchain/0.1.16/0.1.17?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [novel](https://novel.sh)
([source](https://togithub.com/steven-tey/novel)) | [`^0.1.19` ->
`^0.2.0`](https://renovatebot.com/diffs/npm/novel/0.1.22/0.2.0) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/novel/0.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/novel/0.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/novel/0.1.22/0.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/novel/0.1.22/0.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [openai](https://togithub.com/openai/openai-python) | `1.11.1` ->
`1.12.0` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/openai/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/openai/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/openai/1.11.1/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/openai/1.11.1/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [openai](https://togithub.com/openai/openai-python) | `1.9.0` ->
`1.12.0` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/openai/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/openai/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/openai/1.9.0/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/openai/1.9.0/1.12.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [openai](https://togithub.com/openai/openai-node) | [`4.26.1` ->
`4.27.1`](https://renovatebot.com/diffs/npm/openai/4.26.1/4.27.1) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/openai/4.27.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/openai/4.27.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/openai/4.26.1/4.27.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/openai/4.26.1/4.27.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [pydantic](https://togithub.com/pydantic/pydantic)
([changelog](https://docs.pydantic.dev/latest/changelog/)) | `2.5.3` ->
`2.6.1` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/pydantic/2.6.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/pydantic/2.6.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/pydantic/2.5.3/2.6.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/pydantic/2.5.3/2.6.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [python-dotenv](https://togithub.com/theskumar/python-dotenv) |
`1.0.0` -> `1.0.1` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/python-dotenv/1.0.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/python-dotenv/1.0.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/python-dotenv/1.0.0/1.0.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/python-dotenv/1.0.0/1.0.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [tsx](https://togithub.com/privatenumber/tsx) | [`4.7.0` ->
`4.7.1`](https://renovatebot.com/diffs/npm/tsx/4.7.0/4.7.1) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/tsx/4.7.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/tsx/4.7.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/tsx/4.7.0/4.7.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/tsx/4.7.0/4.7.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>vercel/ai (ai)</summary>

### [`v2.2.35`](https://togithub.com/vercel/ai/releases/tag/ai%402.2.35)

[Compare
Source](https://togithub.com/vercel/ai/compare/ai@2.2.34...ai@2.2.35)

##### Patch Changes

- [`b717dad`](https://togithub.com/vercel/ai/commit/b717dad): Adding
Inkeep as a stream provider

### [`v2.2.34`](https://togithub.com/vercel/ai/releases/tag/ai%402.2.34)

[Compare
Source](https://togithub.com/vercel/ai/compare/ai@2.2.33...ai@2.2.34)

##### Patch Changes

- [`2c8ffdb`](https://togithub.com/vercel/ai/commit/2c8ffdb):
cohere-stream: support AsyncIterable
- [`ed1e278`](https://togithub.com/vercel/ai/commit/ed1e278): Message
annotations handling for all Message types

</details>

<details>
<summary>langchain-ai/langchain (langchain)</summary>

###
[`v0.1.6`](https://togithub.com/langchain-ai/langchain/releases/tag/v0.1.6)

[Compare
Source](https://togithub.com/langchain-ai/langchain/compare/v0.1.5...v0.1.6)

##### What's Changed

- experimental\[patch]: Release 0.0.50 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16883](https://togithub.com/langchain-ai/langchain/pull/16883)
- infra: bump exp min test reqs by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16884](https://togithub.com/langchain-ai/langchain/pull/16884)
- docs: fix docstring examples by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16889](https://togithub.com/langchain-ai/langchain/pull/16889)
- langchain\[patch]: Add async methods to MultiVectorRetriever by
[@&#8203;cbornet](https://togithub.com/cbornet) in
[https://github.com/langchain-ai/langchain/pull/16878](https://togithub.com/langchain-ai/langchain/pull/16878)
- docs: Indicated Guardrails for Amazon Bedrock preview status by
[@&#8203;harelix](https://togithub.com/harelix) in
[https://github.com/langchain-ai/langchain/pull/16769](https://togithub.com/langchain-ai/langchain/pull/16769)
- Factorize AstraDB components constructors by
[@&#8203;cbornet](https://togithub.com/cbornet) in
[https://github.com/langchain-ai/langchain/pull/16779](https://togithub.com/langchain-ai/langchain/pull/16779)
- support LIKE comparator (full text match) in Qdrant by
[@&#8203;xieqihui](https://togithub.com/xieqihui) in
[https://github.com/langchain-ai/langchain/pull/12769](https://togithub.com/langchain-ai/langchain/pull/12769)
- infra: ci naming by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16890](https://togithub.com/langchain-ai/langchain/pull/16890)
- Docs: Fixed grammatical mistake by
[@&#8203;ShorthillsAI](https://togithub.com/ShorthillsAI) in
[https://github.com/langchain-ai/langchain/pull/16858](https://togithub.com/langchain-ai/langchain/pull/16858)
- Minor update to Nomic cookbook by
[@&#8203;rlancemartin](https://togithub.com/rlancemartin) in
[https://github.com/langchain-ai/langchain/pull/16886](https://togithub.com/langchain-ai/langchain/pull/16886)
- infra: ci naming 2 by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16893](https://togithub.com/langchain-ai/langchain/pull/16893)
- refactor `langchain.prompts.example_selector` by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/15369](https://togithub.com/langchain-ai/langchain/pull/15369)
- doc: fix typo in message_history.ipynb by
[@&#8203;akirawuc](https://togithub.com/akirawuc) in
[https://github.com/langchain-ai/langchain/pull/16877](https://togithub.com/langchain-ai/langchain/pull/16877)
- community: revert SQL Stores by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16912](https://togithub.com/langchain-ai/langchain/pull/16912)
- langchain_openai\[patch]: Invoke callback prior to yielding token by
[@&#8203;eyurtsev](https://togithub.com/eyurtsev) in
[https://github.com/langchain-ai/langchain/pull/16909](https://togithub.com/langchain-ai/langchain/pull/16909)
- docs: fix broken links by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16855](https://togithub.com/langchain-ai/langchain/pull/16855)
- Fix loading of ImagePromptTemplate by
[@&#8203;hinthornw](https://togithub.com/hinthornw) in
[https://github.com/langchain-ai/langchain/pull/16868](https://togithub.com/langchain-ai/langchain/pull/16868)
- core\[patch]: Hide aliases when serializing by
[@&#8203;hinthornw](https://togithub.com/hinthornw) in
[https://github.com/langchain-ai/langchain/pull/16888](https://togithub.com/langchain-ai/langchain/pull/16888)
- core\[patch]: Remove deep copying of run prior to submitting it to
LangChain Tracing by [@&#8203;hinthornw](https://togithub.com/hinthornw)
in
[https://github.com/langchain-ai/langchain/pull/16904](https://togithub.com/langchain-ai/langchain/pull/16904)
- core\[minor]: add validation error handler to `BaseTool` by
[@&#8203;hmasdev](https://togithub.com/hmasdev) in
[https://github.com/langchain-ai/langchain/pull/14007](https://togithub.com/langchain-ai/langchain/pull/14007)
- Updated integration doc for aleph alpha by
[@&#8203;rocky1405](https://togithub.com/rocky1405) in
[https://github.com/langchain-ai/langchain/pull/16844](https://togithub.com/langchain-ai/langchain/pull/16844)
- core\[patch]: fix chat prompt partial messages placeholder var by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16918](https://togithub.com/langchain-ai/langchain/pull/16918)
- core\[patch]: Message content as positional arg by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16921](https://togithub.com/langchain-ai/langchain/pull/16921)
- core\[patch]: doc init positional args by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16854](https://togithub.com/langchain-ai/langchain/pull/16854)
- community\[docs]: add quantization to vllm and update API by
[@&#8203;mspronesti](https://togithub.com/mspronesti) in
[https://github.com/langchain-ai/langchain/pull/16950](https://togithub.com/langchain-ai/langchain/pull/16950)
- docs: BigQuery Vector Search went public review and updated docs by
[@&#8203;ashleyxuu](https://togithub.com/ashleyxuu) in
[https://github.com/langchain-ai/langchain/pull/16896](https://togithub.com/langchain-ai/langchain/pull/16896)
- core\[patch]: Add doc-string to RunnableEach by
[@&#8203;keenborder786](https://togithub.com/keenborder786) in
[https://github.com/langchain-ai/langchain/pull/16892](https://togithub.com/langchain-ai/langchain/pull/16892)
- core\[patch]: handle some optional cases in tools by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16954](https://togithub.com/langchain-ai/langchain/pull/16954)
- docs: partner packages by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16960](https://togithub.com/langchain-ai/langchain/pull/16960)
- infra: install integration deps for test linting by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16963](https://togithub.com/langchain-ai/langchain/pull/16963)
- Update README.md by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16966](https://togithub.com/langchain-ai/langchain/pull/16966)
- langchain_mistralai\[patch]: Invoke callback prior to yielding token
by [@&#8203;ccurme](https://togithub.com/ccurme) in
[https://github.com/langchain-ai/langchain/pull/16986](https://togithub.com/langchain-ai/langchain/pull/16986)
- openai\[patch]: rm tiktoken model warning by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16964](https://togithub.com/langchain-ai/langchain/pull/16964)
- google-genai\[patch]: fix new core typing by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16988](https://togithub.com/langchain-ai/langchain/pull/16988)
- community\[patch]: Correct the calling to collection_name in qdrant by
[@&#8203;killinsun](https://togithub.com/killinsun) in
[https://github.com/langchain-ai/langchain/pull/16920](https://togithub.com/langchain-ai/langchain/pull/16920)
- docs: Update ollama examples with new community libraries by
[@&#8203;picsoung](https://togithub.com/picsoung) in
[https://github.com/langchain-ai/langchain/pull/17007](https://togithub.com/langchain-ai/langchain/pull/17007)
- langchain_core: Fixed bug in dict to message conversion. by
[@&#8203;rmkraus](https://togithub.com/rmkraus) in
[https://github.com/langchain-ai/langchain/pull/17023](https://togithub.com/langchain-ai/langchain/pull/17023)
- Add async methods to BaseChatMessageHistory and BaseMemory by
[@&#8203;cbornet](https://togithub.com/cbornet) in
[https://github.com/langchain-ai/langchain/pull/16728](https://togithub.com/langchain-ai/langchain/pull/16728)
- Nvidia trt model name for stop_stream() by
[@&#8203;mkhludnev](https://togithub.com/mkhludnev) in
[https://github.com/langchain-ai/langchain/pull/16997](https://togithub.com/langchain-ai/langchain/pull/16997)
- core\[patch]: Add langsmith to printed sys information by
[@&#8203;eyurtsev](https://togithub.com/eyurtsev) in
[https://github.com/langchain-ai/langchain/pull/16899](https://togithub.com/langchain-ai/langchain/pull/16899)
- docs: exa contents by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16555](https://togithub.com/langchain-ai/langchain/pull/16555)
- add -p to mkdir in lint steps by
[@&#8203;hwchase17](https://togithub.com/hwchase17) in
[https://github.com/langchain-ai/langchain/pull/17013](https://togithub.com/langchain-ai/langchain/pull/17013)
- template: tool-retrieval-fireworks by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17052](https://togithub.com/langchain-ai/langchain/pull/17052)
- pinecone: init pkg by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/16556](https://togithub.com/langchain-ai/langchain/pull/16556)
- community\[patch]: fix agent_toolkits mypy by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17050](https://togithub.com/langchain-ai/langchain/pull/17050)
- Shield callback methods from cancellation: Fix interrupted runs marked
as pending forever by [@&#8203;nfcampos](https://togithub.com/nfcampos)
in
[https://github.com/langchain-ai/langchain/pull/17010](https://togithub.com/langchain-ai/langchain/pull/17010)
- Fix condition on custom root type in runnable history by
[@&#8203;nfcampos](https://togithub.com/nfcampos) in
[https://github.com/langchain-ai/langchain/pull/17017](https://togithub.com/langchain-ai/langchain/pull/17017)
- partners: \[NVIDIA AI Endpoints] Support User-Agent metadata and minor
fixes. by [@&#8203;VKudlay](https://togithub.com/VKudlay) in
[https://github.com/langchain-ai/langchain/pull/16942](https://togithub.com/langchain-ai/langchain/pull/16942)
- community\[patch]: callbacks mypy fixes by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17058](https://togithub.com/langchain-ai/langchain/pull/17058)
- community\[patch]: chat message history mypy fixes by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17059](https://togithub.com/langchain-ai/langchain/pull/17059)
- community\[patch]: chat model mypy fixes by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17061](https://togithub.com/langchain-ai/langchain/pull/17061)
- Langchain: `json_chat` don't need stop sequenes by
[@&#8203;calvinweb](https://togithub.com/calvinweb) in
[https://github.com/langchain-ai/langchain/pull/16335](https://togithub.com/langchain-ai/langchain/pull/16335)
- langchain: add partial parsing support to JsonOutputToolsParser by
[@&#8203;Mercurrent](https://togithub.com/Mercurrent) in
[https://github.com/langchain-ai/langchain/pull/17035](https://togithub.com/langchain-ai/langchain/pull/17035)
- Community: Allow adding ARNs as model_id to support Amazon Bedrock
custom models by [@&#8203;supreetkt](https://togithub.com/supreetkt) in
[https://github.com/langchain-ai/langchain/pull/16800](https://togithub.com/langchain-ai/langchain/pull/16800)
- Community: Add Progress bar to HuggingFaceEmbeddings by
[@&#8203;tylertitsworth](https://togithub.com/tylertitsworth) in
[https://github.com/langchain-ai/langchain/pull/16758](https://togithub.com/langchain-ai/langchain/pull/16758)
- Langchain Community: Fix the \_call of HuggingFaceHub by
[@&#8203;keenborder786](https://togithub.com/keenborder786) in
[https://github.com/langchain-ai/langchain/pull/16891](https://togithub.com/langchain-ai/langchain/pull/16891)
- Community: MLflow callback update by
[@&#8203;serena-ruan](https://togithub.com/serena-ruan) in
[https://github.com/langchain-ai/langchain/pull/16687](https://togithub.com/langchain-ai/langchain/pull/16687)
- docs: add 2 more tutorials to the list in youtube.mdx by
[@&#8203;strongSoda](https://togithub.com/strongSoda) in
[https://github.com/langchain-ai/langchain/pull/16998](https://togithub.com/langchain-ai/langchain/pull/16998)
- Docs: Fix Copilot name by
[@&#8203;bmuskalla](https://togithub.com/bmuskalla) in
[https://github.com/langchain-ai/langchain/pull/16956](https://togithub.com/langchain-ai/langchain/pull/16956)
- docs:Updating documentation for Konko provider by
[@&#8203;shivanimodi16](https://togithub.com/shivanimodi16) in
[https://github.com/langchain-ai/langchain/pull/16953](https://togithub.com/langchain-ai/langchain/pull/16953)
- fixing a minor grammatical mistake by
[@&#8203;ShorthillsAI](https://togithub.com/ShorthillsAI) in
[https://github.com/langchain-ai/langchain/pull/16931](https://togithub.com/langchain-ai/langchain/pull/16931)
- docs: Fix typo in quickstart.ipynb by
[@&#8203;n0vad3v](https://togithub.com/n0vad3v) in
[https://github.com/langchain-ai/langchain/pull/16859](https://togithub.com/langchain-ai/langchain/pull/16859)
- community:Breebs docs retriever by
[@&#8203;Poissecaille](https://togithub.com/Poissecaille) in
[https://github.com/langchain-ai/langchain/pull/16578](https://togithub.com/langchain-ai/langchain/pull/16578)
- add structured tools by
[@&#8203;hwchase17](https://togithub.com/hwchase17) in
[https://github.com/langchain-ai/langchain/pull/15772](https://togithub.com/langchain-ai/langchain/pull/15772)
- docs: update parse_partial_json source info by
[@&#8203;Mercurrent](https://togithub.com/Mercurrent) in
[https://github.com/langchain-ai/langchain/pull/17036](https://togithub.com/langchain-ai/langchain/pull/17036)
- infra: fix breebs test lint by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17075](https://togithub.com/langchain-ai/langchain/pull/17075)
- docs: add youtube link by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17065](https://togithub.com/langchain-ai/langchain/pull/17065)
- Add prompt metadata + tags by
[@&#8203;hinthornw](https://togithub.com/hinthornw) in
[https://github.com/langchain-ai/langchain/pull/17054](https://togithub.com/langchain-ai/langchain/pull/17054)
- core\[patch]: fix \_sql_record_manager mypy for
[#&#8203;17048](https://togithub.com/langchain-ai/langchain/issues/17048)
by [@&#8203;moorej-oci](https://togithub.com/moorej-oci) in
[https://github.com/langchain-ai/langchain/pull/17073](https://togithub.com/langchain-ai/langchain/pull/17073)
- langchain_experimental: Fixes issue
[#&#8203;17060](https://togithub.com/langchain-ai/langchain/issues/17060)
by [@&#8203;SalamanderXing](https://togithub.com/SalamanderXing) in
[https://github.com/langchain-ai/langchain/pull/17062](https://togithub.com/langchain-ai/langchain/pull/17062)
- community: add integration_tests and coverage to MAKEFILE by
[@&#8203;scottnath](https://togithub.com/scottnath) in
[https://github.com/langchain-ai/langchain/pull/17053](https://togithub.com/langchain-ai/langchain/pull/17053)
- templates: bump by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17074](https://togithub.com/langchain-ai/langchain/pull/17074)
- docs\[patch]: Update streaming documentation by
[@&#8203;eyurtsev](https://togithub.com/eyurtsev) in
[https://github.com/langchain-ai/langchain/pull/17066](https://togithub.com/langchain-ai/langchain/pull/17066)
- core\[patch]: Add astream events config test by
[@&#8203;eyurtsev](https://togithub.com/eyurtsev) in
[https://github.com/langchain-ai/langchain/pull/17055](https://togithub.com/langchain-ai/langchain/pull/17055)
- docs: fix typo in dspy.ipynb by
[@&#8203;eltociear](https://togithub.com/eltociear) in
[https://github.com/langchain-ai/langchain/pull/16996](https://togithub.com/langchain-ai/langchain/pull/16996)
- fixed import in `experimental` by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17078](https://togithub.com/langchain-ai/langchain/pull/17078)
- community: Fix error in `LlamaCpp` community LLM with Configurable
Fields, 'grammar' custom type not available by
[@&#8203;fpaupier](https://togithub.com/fpaupier) in
[https://github.com/langchain-ai/langchain/pull/16995](https://togithub.com/langchain-ai/langchain/pull/16995)
- docs/docs/integrations/chat/mistralai.ipynb: update for version 0.1+
by [@&#8203;mtmahe](https://togithub.com/mtmahe) in
[https://github.com/langchain-ai/langchain/pull/17011](https://togithub.com/langchain-ai/langchain/pull/17011)
- docs: update StreamlitCallbackHandler example by
[@&#8203;os1ma](https://togithub.com/os1ma) in
[https://github.com/langchain-ai/langchain/pull/16970](https://togithub.com/langchain-ai/langchain/pull/16970)
- docs: Link to Brave Website added by
[@&#8203;Janldeboer](https://togithub.com/Janldeboer) in
[https://github.com/langchain-ai/langchain/pull/16958](https://togithub.com/langchain-ai/langchain/pull/16958)
- community: Added new Utility runnables for NVIDIA Riva. by
[@&#8203;rmkraus](https://togithub.com/rmkraus) in
[https://github.com/langchain-ai/langchain/pull/15966](https://togithub.com/langchain-ai/langchain/pull/15966)
- langchain: `output_parser.py` in conversation_chat is customizable by
[@&#8203;hdnh2006](https://togithub.com/hdnh2006) in
[https://github.com/langchain-ai/langchain/pull/16945](https://togithub.com/langchain-ai/langchain/pull/16945)
- docs: Fix typo in amadeus.ipynb by
[@&#8203;laoazhang](https://togithub.com/laoazhang) in
[https://github.com/langchain-ai/langchain/pull/16916](https://togithub.com/langchain-ai/langchain/pull/16916)
- new feature: add github file loader to load any github file content b…
by [@&#8203;shufanhao](https://togithub.com/shufanhao) in
[https://github.com/langchain-ai/langchain/pull/15305](https://togithub.com/langchain-ai/langchain/pull/15305)
- core\[patch]: Release 0.1.19 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17117](https://togithub.com/langchain-ai/langchain/pull/17117)
- Add SelfQueryRetriever support to PGVector by
[@&#8203;Swalloow](https://togithub.com/Swalloow) in
[https://github.com/langchain-ai/langchain/pull/16991](https://togithub.com/langchain-ai/langchain/pull/16991)
- infra: add pinecone secret by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17120](https://togithub.com/langchain-ai/langchain/pull/17120)
- nvidia-trt: propagate InferenceClientException to the caller. by
[@&#8203;mkhludnev](https://togithub.com/mkhludnev) in
[https://github.com/langchain-ai/langchain/pull/16936](https://togithub.com/langchain-ai/langchain/pull/16936)
- infra: add integration deps to partner lint by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17122](https://togithub.com/langchain-ai/langchain/pull/17122)
- pinecone\[patch]: integration test new namespace by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17121](https://togithub.com/langchain-ai/langchain/pull/17121)
- nvidia-ai-endpoints\[patch]: release 0.0.2 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17125](https://togithub.com/langchain-ai/langchain/pull/17125)
- infra: update to cache v4 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17126](https://togithub.com/langchain-ai/langchain/pull/17126)
- community\[patch]: Release 0.0.18 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17129](https://togithub.com/langchain-ai/langchain/pull/17129)
- API References sorted `Partner libs` menu by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17130](https://togithub.com/langchain-ai/langchain/pull/17130)
- docs: fix typo in ollama notebook by
[@&#8203;arnoschutijzer](https://togithub.com/arnoschutijzer) in
[https://github.com/langchain-ai/langchain/pull/17127](https://togithub.com/langchain-ai/langchain/pull/17127)
- mistralai\[patch]: 16k token batching logic embed by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17136](https://togithub.com/langchain-ai/langchain/pull/17136)
- infra: read min versions by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17135](https://togithub.com/langchain-ai/langchain/pull/17135)
- mistralai\[patch]: release 0.0.4 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17139](https://togithub.com/langchain-ai/langchain/pull/17139)
- infra: fix release by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17142](https://togithub.com/langchain-ai/langchain/pull/17142)
- docs: format by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17143](https://togithub.com/langchain-ai/langchain/pull/17143)
- infra: poetry run min versions by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17146](https://togithub.com/langchain-ai/langchain/pull/17146)
- infra: poetry run min versions 2 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17149](https://togithub.com/langchain-ai/langchain/pull/17149)
- infra: release min version debugging by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17150](https://togithub.com/langchain-ai/langchain/pull/17150)
- infra: release min version debugging 2 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17152](https://togithub.com/langchain-ai/langchain/pull/17152)
- docs: tutorials update by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17132](https://togithub.com/langchain-ai/langchain/pull/17132)
- docs `integraions/providers` nav fix by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17148](https://togithub.com/langchain-ai/langchain/pull/17148)
- docs `Integraions/Components` menu reordered by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17151](https://togithub.com/langchain-ai/langchain/pull/17151)
- Add trace_as_chain_group metadata by
[@&#8203;hinthornw](https://togithub.com/hinthornw) in
[https://github.com/langchain-ai/langchain/pull/17187](https://togithub.com/langchain-ai/langchain/pull/17187)
- allow optional newline in the action responses of JSON Agent parser by
[@&#8203;tomasonjo](https://togithub.com/tomasonjo) in
[https://github.com/langchain-ai/langchain/pull/17186](https://togithub.com/langchain-ai/langchain/pull/17186)
- Feat: support functions call for google-genai by
[@&#8203;chyroc](https://togithub.com/chyroc) in
[https://github.com/langchain-ai/langchain/pull/15146](https://togithub.com/langchain-ai/langchain/pull/15146)
- Use batched tracing in sdk by
[@&#8203;nfcampos](https://togithub.com/nfcampos) in
[https://github.com/langchain-ai/langchain/pull/16305](https://togithub.com/langchain-ai/langchain/pull/16305)
- core\[patch]: Release 0.1.20 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17194](https://togithub.com/langchain-ai/langchain/pull/17194)
- infra: fix core release by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17195](https://togithub.com/langchain-ai/langchain/pull/17195)
- infra: better conditional by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17197](https://togithub.com/langchain-ai/langchain/pull/17197)
- Add neo4j semantic layer with ollama template by
[@&#8203;tomasonjo](https://togithub.com/tomasonjo) in
[https://github.com/langchain-ai/langchain/pull/17192](https://togithub.com/langchain-ai/langchain/pull/17192)
- remove pg_essay.txt by [@&#8203;efriis](https://togithub.com/efriis)
in
[https://github.com/langchain-ai/langchain/pull/17198](https://togithub.com/langchain-ai/langchain/pull/17198)
- langchain: Standardize `output_parser.py` across all agent types for
custom `FORMAT_INSTRUCTIONS` by
[@&#8203;hdnh2006](https://togithub.com/hdnh2006) in
[https://github.com/langchain-ai/langchain/pull/17168](https://togithub.com/langchain-ai/langchain/pull/17168)
- core\[patch], community\[patch]: link extraction continue on failure
by [@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17200](https://togithub.com/langchain-ai/langchain/pull/17200)
- core\[patch]: Release 0.1.21 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17202](https://togithub.com/langchain-ai/langchain/pull/17202)
- cli\[patch]: copyright 2024 default by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17204](https://togithub.com/langchain-ai/langchain/pull/17204)
- community\[patch]: Release 0.0.19 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17207](https://togithub.com/langchain-ai/langchain/pull/17207)
- Fix stream events/log with some kinds of non addable output by
[@&#8203;nfcampos](https://togithub.com/nfcampos) in
[https://github.com/langchain-ai/langchain/pull/17205](https://togithub.com/langchain-ai/langchain/pull/17205)
- google-vertexai\[patch]: serializable citation metadata, release 0.0.4
by [@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17145](https://togithub.com/langchain-ai/langchain/pull/17145)
- google-vertexai\[patch]: function calling integration test by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17209](https://togithub.com/langchain-ai/langchain/pull/17209)
- google-genai\[patch]: match function call interface by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17213](https://togithub.com/langchain-ai/langchain/pull/17213)
- google-genai\[patch]: no error for FunctionMessage by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17215](https://togithub.com/langchain-ai/langchain/pull/17215)
- google-genai\[patch]: release 0.0.7 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17193](https://togithub.com/langchain-ai/langchain/pull/17193)
- docs: cleanup fleet integration by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17214](https://togithub.com/langchain-ai/langchain/pull/17214)
- templates: add gemini functions agent by
[@&#8203;hwchase17](https://togithub.com/hwchase17) in
[https://github.com/langchain-ai/langchain/pull/17141](https://togithub.com/langchain-ai/langchain/pull/17141)
- langchain\[minor], community\[minor], core\[minor]: Async Cache
support and AsyncRedisCache by
[@&#8203;dzmitry-kankalovich](https://togithub.com/dzmitry-kankalovich)
in
[https://github.com/langchain-ai/langchain/pull/15817](https://togithub.com/langchain-ai/langchain/pull/15817)
- community\[patch]: Fix chat openai unit test by
[@&#8203;LuizFrra](https://togithub.com/LuizFrra) in
[https://github.com/langchain-ai/langchain/pull/17124](https://togithub.com/langchain-ai/langchain/pull/17124)
- docs: titles fix by [@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/17206](https://togithub.com/langchain-ai/langchain/pull/17206)
- community\[patch]: Better error propagation for neo4jgraph by
[@&#8203;tomasonjo](https://togithub.com/tomasonjo) in
[https://github.com/langchain-ai/langchain/pull/17190](https://togithub.com/langchain-ai/langchain/pull/17190)
- community\[minor]: SQLDatabase Add fetch mode `cursor`, query
parameters, query by selectable, expose execution options, and
documentation by [@&#8203;eyurtsev](https://togithub.com/eyurtsev) in
[https://github.com/langchain-ai/langchain/pull/17191](https://togithub.com/langchain-ai/langchain/pull/17191)
- community\[patch]: octoai embeddings bug fix by
[@&#8203;AI-Bassem](https://togithub.com/AI-Bassem) in
[https://github.com/langchain-ai/langchain/pull/17216](https://togithub.com/langchain-ai/langchain/pull/17216)
- docs: add missing link to Quickstart by
[@&#8203;sana-google](https://togithub.com/sana-google) in
[https://github.com/langchain-ai/langchain/pull/17085](https://togithub.com/langchain-ai/langchain/pull/17085)
- docs: use PromptTemplate.from_template by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17218](https://togithub.com/langchain-ai/langchain/pull/17218)
- langchain_google_vertexai : added logic to override
get_num_tokens_from_messages() for ChatVertexAI by
[@&#8203;Adi8885](https://togithub.com/Adi8885) in
[https://github.com/langchain-ai/langchain/pull/16784](https://togithub.com/langchain-ai/langchain/pull/16784)
- google-vertexai\[patch]: integration test fix, release 0.0.5 by
[@&#8203;efriis](https://togithub.com/efriis) in
[https://github.com/langchain-ai/langchain/pull/17258](https://togithub.com/langchain-ai/langchain/pull/17258)
- partners/google-vertexai:fix \_parse_response_candidate issue by
[@&#8203;hsuyuming](https://togithub.com/hsuyuming) in
[https://github.com/langchain-ai/langchain/pull/16647](https://togithub.com/langchain-ai/langchain/pull/16647)
- langchain\[minor], core\[minor]: add openai-json structured output
runnable by [@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16914](https://togithub.com/langchain-ai/langchain/pull/16914)
- Documentation: Fix typo in github.ipynb by
[@&#8203;jorge-campo](https://togithub.com/jorge-campo) in
[https://github.com/langchain-ai/langchain/pull/17259](https://togithub.com/langchain-ai/langchain/pull/17259)
- Implement Unique ID Enforcement in FAISS by
[@&#8203;ByeongUkChoi](https://togithub.com/ByeongUkChoi) in
[https://github.com/langchain-ai/langchain/pull/17244](https://togithub.com/langchain-ai/langchain/pull/17244)
- langchain, community: Fixes in the Ontotext GraphDB Graph and QA Chain
by [@&#8203;nelly-hateva](https://togithub.com/nelly-hateva) in
[https://github.com/langchain-ai/langchain/pull/17239](https://togithub.com/langchain-ai/langchain/pull/17239)
- community: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) by
[@&#8203;cjpark-data](https://togithub.com/cjpark-data) in
[https://github.com/langchain-ai/langchain/pull/17178](https://togithub.com/langchain-ai/langchain/pull/17178)
- community: Support SerDe transform functions in Databricks LLM by
[@&#8203;liangz1](https://togithub.com/liangz1) in
[https://github.com/langchain-ai/langchain/pull/16752](https://togithub.com/langchain-ai/langchain/pull/16752)
- langchain_google-genai\[patch]: Invoke callback prior to yielding
token by [@&#8203;dudesparsh](https://togithub.com/dudesparsh) in
[https://github.com/langchain-ai/langchain/pull/17092](https://togithub.com/langchain-ai/langchain/pull/17092)
- Added LCEL for alibabacloud and anyscale by
[@&#8203;kartheekyakkala](https://togithub.com/kartheekyakkala) in
[https://github.com/langchain-ai/langchain/pull/17252](https://togithub.com/langchain-ai/langchain/pull/17252)
- langchain: Fix create_retriever_tool missing on_retriever_end Document
content by [@&#8203;wangcailin](https://togithub.com/wangcailin) in
[https://github.com/langchain-ai/langchain/pull/16933](https://togithub.com/langchain-ai/langchain/pull/16933)
- added parsing of function call / response by
[@&#8203;lkuligin](https://togithub.com/lkuligin) in
[https://github.com/langchain-ai/langchain/pull/17245](https://togithub.com/langchain-ai/langchain/pull/17245)
- langchain: Update quickstart.mdx - Fix 422 error in example with
LangServe client code by
[@&#8203;schalkje](https://togithub.com/schalkje) in
[https://github.com/langchain-ai/langchain/pull/17163](https://togithub.com/langchain-ai/langchain/pull/17163)
- langchain: adds recursive json splitter by
[@&#8203;joelsprunger](https://togithub.com/joelsprunger) in
[https://github.com/langchain-ai/langchain/pull/17144](https://togithub.com/langchain-ai/langchain/pull/17144)
- community: Add you.com utility, update you retriever integration docs
by [@&#8203;scottnath](https://togithub.com/scottnath) in
[https://github.com/langchain-ai/langchain/pull/17014](https://togithub.com/langchain-ai/langchain/pull/17014)
- community: add runtime kwargs to HuggingFacePipeline by
[@&#8203;ab-10](https://togithub.com/ab-10) in
[https://github.com/langchain-ai/langchain/pull/17005](https://togithub.com/langchain-ai/langchain/pull/17005)
- \[Langchain_core]: Added Docstring for
RunnableConfigurableAlternatives by
[@&#8203;keenborder786](https://togithub.com/keenborder786) in
[https://github.com/langchain-ai/langchain/pull/17263](https://togithub.com/langchain-ai/langchain/pull/17263)
- community: updated openai prices in mapping by
[@&#8203;Sssanek](https://togithub.com/Sssanek) in
[https://github.com/langchain-ai/langchain/pull/17009](https://togithub.com/langchain-ai/langchain/pull/17009)
- docs: `Toolkits` menu by
[@&#8203;leo-gan](https://togithub.com/leo-gan) in
[https://github.com/langchain-ai/langchain/pull/16217](https://togithub.com/langchain-ai/langchain/pull/16217)
- infra: rm boto3, gcaip from pyproject by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17270](https://togithub.com/langchain-ai/langchain/pull/17270)
- langchain\[patch]: expose cohere rerank score, add parent doc param by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/16887](https://togithub.com/langchain-ai/langchain/pull/16887)
- core\[patch]: Release 0.1.22 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17274](https://togithub.com/langchain-ai/langchain/pull/17274)
- langchain\[patch]: Release 0.1.6 by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17133](https://togithub.com/langchain-ai/langchain/pull/17133)
- langchain\[patch]: undo redis cache import by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17275](https://togithub.com/langchain-ai/langchain/pull/17275)
- infra: mv SQLDatabase tests to community by
[@&#8203;baskaryan](https://togithub.com/baskaryan) in
[https://github.com/langchain-ai/langchain/pull/17276](https://togithub.com/langchain-ai/langchain/pull/17276)

##### New Contributors

- [@&#8203;akirawuc](https://togithub.com/akirawuc) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16877](https://togithub.com/langchain-ai/langchain/pull/16877)
- [@&#8203;rocky1405](https://togithub.com/rocky1405) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16844](https://togithub.com/langchain-ai/langchain/pull/16844)
- [@&#8203;picsoung](https://togithub.com/picsoung) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17007](https://togithub.com/langchain-ai/langchain/pull/17007)
- [@&#8203;rmkraus](https://togithub.com/rmkraus) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17023](https://togithub.com/langchain-ai/langchain/pull/17023)
- [@&#8203;mkhludnev](https://togithub.com/mkhludnev) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16997](https://togithub.com/langchain-ai/langchain/pull/16997)
- [@&#8203;calvinweb](https://togithub.com/calvinweb) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16335](https://togithub.com/langchain-ai/langchain/pull/16335)
- [@&#8203;Mercurrent](https://togithub.com/Mercurrent) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17035](https://togithub.com/langchain-ai/langchain/pull/17035)
- [@&#8203;supreetkt](https://togithub.com/supreetkt) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16800](https://togithub.com/langchain-ai/langchain/pull/16800)
- [@&#8203;strongSoda](https://togithub.com/strongSoda) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16998](https://togithub.com/langchain-ai/langchain/pull/16998)
- [@&#8203;bmuskalla](https://togithub.com/bmuskalla) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16956](https://togithub.com/langchain-ai/langchain/pull/16956)
- [@&#8203;n0vad3v](https://togithub.com/n0vad3v) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16859](https://togithub.com/langchain-ai/langchain/pull/16859)
- [@&#8203;Poissecaille](https://togithub.com/Poissecaille) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/16578](https://togithub.com/langchain-ai/langchain/pull/16578)
- [@&#8203;moorej-oci](https://togithub.com/moorej-oci) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17073](https://togithub.com/langchain-ai/langchain/pull/17073)
- [@&#8203;SalamanderXing](https://togithub.com/SalamanderXing) made
their first contribution in
[https://github.com/langchain-ai/langchain/pull/17062](https://togithub.com/langchain-ai/langchain/pull/17062)
- [@&#8203;scottnath](https://togithub.com/scottnath) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17053](https://togithub.com/langchain-ai/langchain/pull/17053)
- [@&#8203;fpaupier](https://togithub.com/fpaupier) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16995](https://togithub.com/langchain-ai/langchain/pull/16995)
- [@&#8203;mtmahe](https://togithub.com/mtmahe) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17011](https://togithub.com/langchain-ai/langchain/pull/17011)
- [@&#8203;hdnh2006](https://togithub.com/hdnh2006) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16945](https://togithub.com/langchain-ai/langchain/pull/16945)
- [@&#8203;laoazhang](https://togithub.com/laoazhang) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16916](https://togithub.com/langchain-ai/langchain/pull/16916)
- [@&#8203;Swalloow](https://togithub.com/Swalloow) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16991](https://togithub.com/langchain-ai/langchain/pull/16991)
- [@&#8203;arnoschutijzer](https://togithub.com/arnoschutijzer) made
their first contribution in
[https://github.com/langchain-ai/langchain/pull/17127](https://togithub.com/langchain-ai/langchain/pull/17127)
-
[@&#8203;dzmitry-kankalovich](https://togithub.com/dzmitry-kankalovich)
made their first contribution in
[https://github.com/langchain-ai/langchain/pull/15817](https://togithub.com/langchain-ai/langchain/pull/15817)
- [@&#8203;LuizFrra](https://togithub.com/LuizFrra) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17124](https://togithub.com/langchain-ai/langchain/pull/17124)
- [@&#8203;sana-google](https://togithub.com/sana-google) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/17085](https://togithub.com/langchain-ai/langchain/pull/17085)
- [@&#8203;jorge-campo](https://togithub.com/jorge-campo) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/17259](https://togithub.com/langchain-ai/langchain/pull/17259)
- [@&#8203;ByeongUkChoi](https://togithub.com/ByeongUkChoi) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/17244](https://togithub.com/langchain-ai/langchain/pull/17244)
- [@&#8203;cjpark-data](https://togithub.com/cjpark-data) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/17178](https://togithub.com/langchain-ai/langchain/pull/17178)
- [@&#8203;kartheekyakkala](https://togithub.com/kartheekyakkala) made
their first contribution in
[https://github.com/langchain-ai/langchain/pull/17252](https://togithub.com/langchain-ai/langchain/pull/17252)
- [@&#8203;wangcailin](https://togithub.com/wangcailin) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/16933](https://togithub.com/langchain-ai/langchain/pull/16933)
- [@&#8203;schalkje](https://togithub.com/schalkje) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17163](https://togithub.com/langchain-ai/langchain/pull/17163)
- [@&#8203;joelsprunger](https://togithub.com/joelsprunger) made their
first contribution in
[https://github.com/langchain-ai/langchain/pull/17144](https://togithub.com/langchain-ai/langchain/pull/17144)
- [@&#8203;Sssanek](https://togithub.com/Sssanek) made their first
contribution in
[https://github.com/langchain-ai/langchain/pull/17009](https://togithub.com/langchain-ai/langchain/pull/17009)

**Full Changelog**:
https://github.com/langchain-ai/langchain/compare/v0.1.5...v0.1.6

</details>

<details>
<summary>langchain-ai/langchainjs (langchain)</summary>

###
[`v0.1.17`](https://togithub.com/langchain-ai/langchainjs/releases/tag/0.1.17)

[Compare
Source](https://togithub.com/langchain-ai/langchainjs/compare/0.1.16...0.1.17)

#### What's Changed

- langchain\[patch]: Release 0.1.16 by
[@&#8203;jacoblee93](https://togithub.com/jacoblee93) in
[https://github.com/langchain-ai/langchainjs/pull/4334](https://togithub.com/langchain-ai/langchainjs/pull/4334)
- Correct waitlist instruction in README by
[@&#8203;eknuth](https://togithub.com/eknuth) in
[https://github.com/langchain-ai/langchainjs/pull/4335](https://togithub.com/langchain-ai/langchainjs/pull/4335)
- docs\[patch]: Fix broken link by
[@&#8203;jacoblee93](https://togithub.com/jacoblee93) in
[https://github.com/langchain-ai/langchainjs/pull/4336](https://togithub.com/langchain-ai/langchainjs/pull/4336)
- langchain\[patch]: Export helper functions from indexing api by
[@&#8203;bracesproul](https://togithub.com/bracesproul) in
[https://github.com/langchain-ai/langchainjs/pull/4344](https://togithub.com/langchain-ai/langchainjs/pull/4344)
- docs\[minor]: Add Human-in-the-loop to tools use case by
[@&#8203;bracesproul](https://togithub.com/bracesproul) in
[https://github.com/langchain-ai/langchainjs/pull/4314](https://togithub.com/langchain-ai/langchainjs/pull/4314)
- langchain\[minor],docs\[minor]: Add `SitemapLoader` by
[@&#8203;bracesproul](https://togithub.com/bracesproul) in
[https://github.com/langchain-ai/langchainjs/pull/4331](https://togithub.com/langchain-ai/langchainjs/pull/4331)
- langchain\[patch]: Rm unwanted build artifacts by
[@&#8203;bracesproul](https://togithub.com/bracesproul) in
[https://github.com/langchain-ai/langchainjs/pull/4345](https://togithub.com/langchain-ai/langchainjs/pull/4345)

#### New Contributors

- [@&#8203;eknuth](https://togithub.com/eknuth) made their first
contribution in
[https://github.com/langchain-ai/langchainjs/pull/4335](https://togithub.com/langchain-ai/langchainjs/pull/4335)

**Full Changelog**:
https://github.com/langchain-ai/langchainjs/compare/0.1.16...0.1.17

</details>

<details>
<summary>steven-tey/novel (novel)</summary>

### [`v0.2.0`](https://togithub.com/steven-tey/novel/releases/tag/0.2.0)

[Compare
Source](https://togithub.com/steven-tey/novel/compare/0.1.22...0.2.0)

WIP Novel docs here [Docs](https://novel.sh/docs/introduction)

#### What's Changed

- RFC: Headless core components & imperative support by
[@&#8203;andrewdoro](https://togithub.com/andrewdoro) in
[https://github.com/steven-tey/novel/pull/136](https://togithub.com/steven-tey/novel/pull/136)
- feat: add docs app by
[@&#8203;andrewdoro](https://togithub.com/andrewdoro) in
[https://github.com/steven-tey/novel/pull/284](https://togithub.com/steven-tey/novel/pull/284)
- fix: update dark mode class to drag handler component by
[@&#8203;brunocroh](https://togithub.com/brunocroh) in
[https://github.com/steven-tey/novel/pull/286](https://togithub.com/steven-tey/novel/pull/286)
- \[Fix] - Correct License Link in README.md by
[@&#8203;justinjunodev](https://togithub.com/justinjunodev) in
[https://github.com/steven-tey/novel/pull/274](https://togithub.com/steven-tey/novel/pull/274)
- fix: image move when dragged by
[@&#8203;brunocroh](https://togithub.com/brunocroh) in
[https://github.com/steven-tey/novel/pull/287](https://togithub.com/steven-tey/novel/pull/287)

#### New Contributors

- [@&#8203;andrewdoro](https://togithub.com/andrewdoro) made their first
contribution in
[https://github.com/steven-tey/novel/pull/136](https://togithub.com/steven-tey/novel/pull/136)
- [@&#8203;brunocroh](https://togithub.com/brunocroh) made their first
contribution in
[https://github.com/steven-tey/novel/pull/286](https://togithub.com/steven-tey/novel/pull/286)
- [@&#8203;justinjunodev](https://togithub.com/justinjunodev) made their
first contribution in
[https://github.com/steven-tey/novel/pull/274](https://togithub.com/steven-tey/novel/pull/274)

**Full Changelog**:
https://github.com/steven-tey/novel/compare/0.1.22...0.2.0

</details>

<details>
<summary>openai/openai-python (openai)</summary>

###
[`v1.12.0`](https://togithub.com/openai/openai-python/blob/HEAD/CHANGELOG.md#1120-2024-02-08)

[Compare
Source](https://togithub.com/openai/openai-python/compare/v1.11.1...v1.12.0)

Full Changelog:
[v1.11.1...v1.12.0](https://togithub.com/openai/openai-python/compare/v1.11.1...v1.12.0)

##### Features

- **api:** add `timestamp_granularities`, add `gpt-3.5-turbo-0125` model
([#&#8203;1125](https://togithub.com/openai/openai-python/issues/1125))
([1ecf8f6](https://togithub.com/openai/openai-python/commit/1ecf8f6b12323ed09fb6a2815c85b9533ee52a50))
- **cli/images:** add support for `--model` arg
([#&#8203;1132](https://togithub.com/openai/openai-python/issues/1132))
([0d53866](https://togithub.com/openai/openai-python/commit/0d5386615cda7cd50d5db90de2119b84dba29519))

##### Bug Fixes

- remove double brackets from timestamp_granularities param
([#&#8203;1140](https://togithub.com/openai/openai-python/issues/1140))
([3db0222](https://togithub.com/openai/openai-python/commit/3db022216a81fa86470b53ec1246669bc7b17897))
- **types:** loosen most List params types to Iterable
([#&#8203;1129](https://togithub.com/openai/openai-python/issues/1129))
([bdb31a3](https://togithub.com/openai/openai-python/commit/bdb31a3b1db6ede4e02b3c951c4fd23f70260038))

##### Chores

- **internal:** add lint command
([#&#8203;1128](https://togithub.com/openai/openai-python/issues/1128))
([4c021c0](https://togithub.com/openai/openai-python/commit/4c021c0ab0151c2ec092d860c9b60e22e658cd03))
- **internal:** support serialising iterable types
([#&#8203;1127](https://togithub.com/openai/openai-python/issues/1127))
([98d4e59](https://togithub.com/openai/openai-python/commit/98d4e59afcf2d65d4e660d91eb9462240ef5cd63))

##### Documentation

- add CONTRIBUTING.md
([#&#8203;1138](https://togithub.com/openai/openai-python/issues/1138))
([79c8f0e](https://togithub.com/openai/openai-python/commit/79c8f0e8bf5470e2e31e781e8d279331e89ddfbe))

</details>

<details>
<summary>openai/openai-node (openai)</summary>

###
[`v4.27.1`](https://togithub.com/openai/openai-node/blob/HEAD/CHANGELOG.md#4271-2024-02-12)

[Compare
Source](https://togithub.com/openai/openai-node/compare/v4.27.0...v4.27.1)

Full Changelog:
[v4.27.0...v4.27.1](https://togithub.com/openai/openai-node/compare/v4.27.0...v4.27.1)

###
[`v4.27.0`](https://togithub.com/openai/openai-node/blob/HEAD/CHANGELOG.md#4270-2024-02-08)

[Compare
Source](https://togithub.com/openai/openai-node/compare/v4.26.1...v4.27.0)

Full Changelog:
[v4.26.1...v4.27.0](https://togithub.com/openai/openai-node/compare/v4.26.1...v4.27.0)

##### Features

- **api:** add `timestamp_granularities`, add `gpt-3.5-turbo-0125` model
([#&#8203;661](https://togithub.com/openai/openai-node/issues/661))
([5016806](https://togithub.com/openai/openai-node/commit/50168066862f66b529bae29f4564741300303246))

##### Chores

- **internal:** fix retry mechanism for ecosystem-test
([#&#8203;663](https://togithub.com/openai/openai-node/issues/663))
([0eb7ed5](https://togithub.com/openai/openai-node/commit/0eb7ed5ca3f7c7b29c316fc7d725d834cee73989))
- respect `application/vnd.api+json` content-type header
([#&#8203;664](https://togithub.com/openai/openai-node/issues/664))
([f4fad54](https://togithub.com/openai/openai-node/commit/f4fad549c5c366d8dd8b936b7699639b895e82a1))

</details>

<details>
<summary>pydantic/pydantic (pydantic)</summary>

### [`v2.6.1`](https://togithub.com/pydantic/py

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on Monday" in timezone
America/Chicago, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://togithub.com/renovatebot/renovate/discussions) if
that's undesired.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/autoblocksai/autoblocks-examples).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNzMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjE3My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
snsten pushed a commit to snsten/langchain that referenced this pull request Feb 15, 2024
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.

As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf',
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.

Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
 'val1': {'val10': 'QdJo',
          'val11': 'FWSDVFHClW',
          'val12': 'bkVnXMMlTiQh',
          'val13': 'tdDMKRrOY',
          'val14': 'zybPALvL',
          'val15': 'JMzGMNH',
          'val16': {'val160': 'qLuLKusFw',
                    'val161': 'DGuotLh',
                    'val162': 'KztlcSBropT',
                    'val163': 'YlHHDrN',
                    'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
                    'val165': 'bXzhcrWLmBFp',
                    'val166': 'zZAqC',
                    'val167': 'ZtyWno',
                    'val168': 'nQQZRsLnaBhb',
                    'val169': 'gSpMbJwA'},
          'val17': 'JhgiyF',
          'val18': 'aJaqjUSFFrI',
          'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.

In my testing large json objects could be split into small chunks with 
  ✅ Increased question answering accuracy
✅ The ability to split into smaller chunks meant retrieval queries can
use fewer tokens


- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
  - **Twitter handle:** @joelsprunger

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
@joelsprunger
Copy link
Contributor Author

@hwchase17 were you planning to highlight this feature in some way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants