Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing knowledge base columns #9005

Merged
merged 4 commits into from Apr 12, 2024
Merged

Managing knowledge base columns #9005

merged 4 commits into from Apr 12, 2024

Conversation

ea-rus
Copy link
Contributor

@ea-rus ea-rus commented Mar 28, 2024

Description

Fixes RAG-52

Added parameters to KB to specify input columns:

  • metadata_columns - optional, default: no metadata columns
  • content_columns - optional, default: all columns
  • id_column - optional, default: id

Example:

CREATE KNOWLEDGE BASE my_kb
USING
 metadata_columns = ['date', 'creator'],
 content_columns = ['review'],
 id_column='index'   

Logic of different combination of params:
For id column:

  • if id_column is defined:
    • use it as id
  • elif 'id' column exists:
    • use it
  • else:
    • use hash(content)

For content and metadata:

  • if content_columns is defined:
    • if len(content_columns) > 1:
      • make text from row (col: value\n col: value)
    • if metadata_columns is defined:
      • use them as metadata
    • else:
      • use all unused columns is metadata
  • elif metadata_columns is defined:
    • metadata_columns go to metadata
    • use all unused columns as content (make text if columns>1)
  • else:
    • no metadata
    • all unused columns go to content (make text if columns>1)

Compared to #8990: it concatenates columns globally and prepares input to vector storage. And we don't have to adapt every vector storage (or embedding model)

Example on video:
https://www.loom.com/share/8da2d8de7e6e45a9b1fd4036ef9d65ad

Fixes #issue_number

Type of change

(Please delete options that are not relevant)

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ⚡ New feature (non-breaking change which adds functionality)
  • 📢 Breaking change (fix or feature that would cause existing functionality not to work as expected)
  • 📄 This change requires a documentation update

Verification Process

To ensure the changes are working as expected:

  • Test Location: Specify the URL or path for testing.
  • Verification Steps: Outline the steps or queries needed to validate the change. Include any data, configurations, or actions required to reproduce or see the new functionality.

Additional Media:

  • I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

  • My code follows the style guidelines(PEP 8) of MindsDB.
  • I have appropriately commented on my code, especially in complex areas.
  • Necessary documentation updates are either made or tracked in issues.
  • Relevant unit and integration tests are updated or added.

Copy link
Contributor

@dusvyat dusvyat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overally implementation looks good but let's also include a few unit tests 🙏

@ea-rus
Copy link
Contributor Author

ea-rus commented Apr 11, 2024

Overally implementation looks good but let's also include a few unit tests 🙏

Added unit tests for several combination of settings

Copy link
Contributor

@dusvyat dusvyat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ea-rus ea-rus merged commit b90cf80 into staging Apr 12, 2024
13 checks passed
@ea-rus ea-rus mentioned this pull request Apr 12, 2024
11 tasks
@StpMax StpMax mentioned this pull request Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants