Skip to content

Conversation

@joemcelroy
Copy link
Member

@joemcelroy joemcelroy commented Dec 6, 2023

This PR demonstrates:

  • Example 1: Using nested dense vector for document -> Full document + nested multiple passages.
  • Example 2: Using nested dense vector for document -> chunked document + nested smaller passages.

Fixed couple issues on:

  • remove unneeded function in bm25 example
  • remove shard settings so can run on serverless

@miguelgrinberg
Copy link
Collaborator

miguelgrinberg commented Dec 8, 2023

Comments:

  • Not including the commands to upload the model to the ML node means that most people will not be able to run this notebook to completion.
  • In 8.11 you do not need to specify the vector dimensions or the index variable. Actually I think with 8.11 you can get away w/o declaring any properties on the nested field.
  • At some point you had reservations with using dot product for the minilm-l6-v2 model. I think it is fine based on the information here, but I mention it just in case you prefer to switch to cosine.
  • I found the big and small chunking difficult to understand. In reality, when you create the larger chunks you do not preserve a link to the complete document, so these large chunks are effectively parent documents in their own right. So I'm not sure doing a two-level splitting contributes much to the example. Both examples could use complete doc and nested passages, with example 1 showing how to get only the passages, and example 2 showing how to get the parent and the passages. No need to change anything, just thinking out loud here that I think I would have found it easier to follow without the large chunks...
  • We have a mix of helpers.bulk and client.bulk in our notebooks. We should clean this up at some point.

@joemcelroy
Copy link
Member Author

joemcelroy commented Dec 18, 2023

Hey @miguelgrinberg

Going to do the following updates:

  • add the part on how to load the ML model into the notebook so can be run to completion
  • update not declaring the nested field
  • remove the 2nd parent / child chunking example. Not sure if its helping here. Keep the notebook simple IMO.

@joemcelroy joemcelroy merged commit b06b4d3 into elastic:main Dec 18, 2023
@joemcelroy joemcelroy deleted the parent-child-retriever branch December 18, 2023 11:06
miguelgrinberg pushed a commit to miguelgrinberg/elasticsearch-labs that referenced this pull request Jan 2, 2024
* notebook wip

* remove explicitly declaring the shard settings

* remove unused fn

* notebook for parent / child capability

* updates

* add colab link

* rename to langchain splitters
miguelgrinberg pushed a commit to miguelgrinberg/elasticsearch-labs that referenced this pull request Jan 2, 2024
* notebook wip

* remove explicitly declaring the shard settings

* remove unused fn

* notebook for parent / child capability

* updates

* add colab link

* rename to langchain splitters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants