Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Databricks in SQLDatabase #4702

Merged
merged 14 commits into from May 19, 2023

Conversation

gengliangwang
Copy link
Contributor

@gengliangwang gengliangwang commented May 15, 2023

This PR adds support for Databricks runtime and Databricks SQL by using Databricks SQL Connector for Python.
As a cloud data platform, accessing Databricks requires a URL as follows
databricks://token:{api_token}@{hostname}?http_path={http_path}&catalog={catalog}&schema={schema}.

The URL is complicated and it may take users a while to figure it out. Since the fields api_token/hostname/http_path fields are known in the Databricks notebook, I am proposing a new method from_databricks to simplify the connection to Databricks.

In Databricks Notebook

After changes, Databricks users only need to specify the catalog and schema field when using langchain.
image

In Jupyter Notebook

The method can be used on the local setup as well:
image

@gengliangwang
Copy link
Contributor Author

cc @vowelparrot @hwchase17 @mengxr

Copy link
Contributor

@mengxr mengxr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments on the API doc

langchain/sql_database.py Outdated Show resolved Hide resolved
langchain/sql_database.py Outdated Show resolved Hide resolved
langchain/sql_database.py Outdated Show resolved Hide resolved
langchain/sql_database.py Outdated Show resolved Hide resolved
langchain/sql_database.py Outdated Show resolved Hide resolved
langchain/sql_database.py Outdated Show resolved Hide resolved
f"databricks://token:{api_token}@{host}?"
f"http_path={http_path}&catalog={catalog}&schema={schema}"
)
return cls.from_uri(uri, engine_args=None, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity are engine_args explicitly not allowed in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I just added the engine_args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified it:
image

Copy link
Contributor

@dev2049 dev2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small comments, otherwise looks good!

@dev2049 dev2049 merged commit bf5a3c6 into langchain-ai:master May 19, 2023
13 checks passed
hwchase17 pushed a commit that referenced this pull request May 21, 2023
# Add documentation for Databricks integration

This is a follow-up of #4702
It documents the details of how to integrate Databricks using langchain.
It also provides examples in a notebook.


## Who can review?
@dev2049 @hwchase17 since you are aware of the context. We will promote
the integration after this doc is ready. Thanks in advance!
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants