Refactor Langchain Handler #8998

tmichaeldb · 2024-03-26T23:46:27Z

Description

Fixes RAG-2.

This PR overhauls langchain_handler:

Make as many arguments optional (e.g. user_column, assisstant_column) as possible
Remove mode argument as it was untested & unused
- Users can pass in agent_type to customize behavior (see agent types)
Add test suite for different providers
Clean up LLM config into util methods
Clean up overall implementation

Type of change

🐛 Bug fix (non-breaking change which fixes an issue)
⚡ New feature (non-breaking change which adds functionality)
📄 This change requires a documentation update

Verification Process

To ensure the changes are working as expected:

Test Location: ./tests/unit/ml_handlers/test_langchain.py
Verification Steps: Follow steps in README.md to create and query a model using the langchain handler.

Additional Media:

I have attached a brief loom video or screenshots showcasing the new functionality or change.

Checklist:

My code follows the style guidelines(PEP 8) of MindsDB.
I have appropriately commented on my code, especially in complex areas.
Necessary documentation updates are either made or tracked in issues.
Relevant unit and integration tests are updated or added.

…gchain_handler_refactor

dusvyat

Nice work, LGTM

dusvyat · 2024-03-27T13:26:52Z

@paxcema - any major comments ? Or am I fine to merge this one

paxcema

Slick set of changes, looks great 👍

* safe extract for all tar files * fix * tests for: - create empty table - interval function * added module and skeleton for GoogleServiceAccountOauth2Utilities * added the NoCredentialsException * raised NoCredentialsException if no creds provided * added method of getting credentials: file and JSON * added the method for download creds file * refactored method for getting credentials: URL * renamed class to GoogleServiceAccountOAuth2Manager * added error handling * imported the lclass to the main utils package * added parsing for JSON credentials * separated parsing credentials JSON to new function * updated Vertex handler to use new credentials params * updated Vertex client to use GoogleServiceAccountOAuth2Manager * imported GoogleServiceAccountOAuth2Manager to main auth package * removing redundant package lists and sentence-transformers library (#8971) * removing redundant package lists and sentence-transformers library * Add input column to RAG tests and update requirements Added 'input_column' to RAG tests in `test_rag.py` to accommodate for changes in the function signatures. Furthermore, the RAG handler requirements have been updated to include 'sentence-transformers', required for HuggingFaceEmbeddings from the langchain-community package. * Add allow_dangerous_deserialization in FAISS loading and fix typo in unit tests. A new allow_dangerous_deserialization parameter was added to the FAISS loading in RAG handler settings, providing additional flexibility for deserialization. Additionally, a typo in the input column name within unit tests was fixed, which was causing the tests to fail. --------- Co-authored-by: dusvyat <13661123+dusvyat@users.noreply.github.com> * Fix tarfile safe extract * sql dep * Refactor Langchain Handler (#8998) * Refactor langchain_handler * Added more tests * Add back mdb_read tool by default * Remove prints * Use updated import path * Update test_llm_utils * Add pydantic to reqs * Remove pydantic from handler reqs not that it is in main reqs * Remove explicit pydantic versioning to avoid conflict * added method to get creds and used it in create and predict * updated credentials parsing logic * renamed GoogleOAuth2Manager to GoogleUserOAuth2Manager * renamed google_oauth_utilities to google_user_oauth_utilities * updated the README with new parameters * removed unused json import * Update query format in quickstart-tutorial.mdx * fix: delete from table * fix url in huggingface_inference_api.mdx * test * Fix hardcoded API base, read it from conn args too * added a raise_for_status() check when downloading creds * raised a ValueError if parsing JSON creds fail * updated the CREATE ML_ENGINE examples in README * refactored the error message for when parsing fails * added a check for import errors during model creation * added comment to explain change * split req file paths to flag and path * Add pred_args * updated hostname of sample dbs * fixed slack chatbot tutorial * added default values for creds params in client * updated the order of params passed to client * clean up handler install log messages (#8720) * Docker build improvements (#8999) * build once and push cache later * Separate cache job * add db seed and mssql disconnect (#8742) * fixe the init for the EventStoreDB handler * removed unusued imports from the Rockset handler * removed unusued deps from the Rockset handler * fixed types in connection args * refactored req files to be installed from abs path * removed unnecessary call to set the success of result * updated Snowflake docs with tip to install git * fixed grammar in last sentence * added space between the two lines * updated email integration docs * fix: dn.query could not support projection * added a check for standalone comments * added a check for inline comments * make alembic use mindsdb log level (#9028) * Bump version (#9041) * Updated LightFM Integration Docs with Tip to Install Linux Dev Pakcages (#9042) * added a tip for installing the Linux dev packages * fixed a couple of grammatical errors * updated ollama docs (#8995) * updated ollama docs * updated note * added tip for docker * updates * updates --------- Co-authored-by: Max Stepanov <stpmax@yandex.ru> Co-authored-by: andrew <elkin.andr@gmail.com> Co-authored-by: Minura Punchihewa <minurapunchihewa17@gmail.com> Co-authored-by: QuantumPlumber <44450703+QuantumPlumber@users.noreply.github.com> Co-authored-by: dusvyat <13661123+dusvyat@users.noreply.github.com> Co-authored-by: Ty <124617566+tmichaeldb@users.noreply.github.com> Co-authored-by: Andrey <andrey@mindsdb.com> Co-authored-by: Vlad Romanenko <vlad.romanenko@hotmail.com> Co-authored-by: Arnav K <arnavkaushal09@gmail.com> Co-authored-by: Zoran Pandovski <zoran.pandovski@gmail.com> Co-authored-by: Martyna <martyna@mindsdb.com> Co-authored-by: Minura Punchihewa <49385643+MinuraPunchihewa@users.noreply.github.com> Co-authored-by: martyna-mindsdb <109554435+martyna-mindsdb@users.noreply.github.com>

tmichaeldb added 7 commits March 26, 2024 15:07

Refactor langchain_handler

603553b

Added more tests

10398b4

Add back mdb_read tool by default

ebe796d

Remove prints

a470736

Use updated import path

65b9d91

Update test_llm_utils

e41e4cc

Add pydantic to reqs

fbdc17d

tmichaeldb requested review from paxcema and dusvyat March 26, 2024 23:57

tmichaeldb added 3 commits March 26, 2024 17:02

Remove pydantic from handler reqs not that it is in main reqs

27b0838

Merge branch 'staging' of https://github.com/mindsdb/mindsdb into lan…

e56f815

…gchain_handler_refactor

Remove explicit pydantic versioning to avoid conflict

d0fa62f

dusvyat approved these changes Mar 27, 2024

View reviewed changes

paxcema approved these changes Mar 27, 2024

View reviewed changes

dusvyat merged commit 3849238 into staging Mar 28, 2024
12 checks passed

hamishfagg mentioned this pull request Apr 7, 2024

Release v24.4.2.0 #9037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Langchain Handler #8998

Refactor Langchain Handler #8998

tmichaeldb commented Mar 26, 2024 •

edited

dusvyat left a comment

dusvyat commented Mar 27, 2024

paxcema left a comment

Refactor Langchain Handler #8998

Refactor Langchain Handler #8998

Conversation

tmichaeldb commented Mar 26, 2024 • edited

Description

Type of change

Verification Process

Additional Media:

Checklist:

dusvyat left a comment

Choose a reason for hiding this comment

dusvyat commented Mar 27, 2024

paxcema left a comment

Choose a reason for hiding this comment

tmichaeldb commented Mar 26, 2024 •

edited