Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing langchain library for SQL database querying with GIS functionality #5799

Closed
vlasvlasvlas opened this issue Jun 7, 2023 · 20 comments

Comments

@vlasvlasvlas
Copy link

Feature request

Hi!

It's possible to integrate the langchain library with a SQL database, enabling the execution of SQL queries with spatial analysis capabilities?

Specifically, I'd like to leverage GIS functions to perform spatial operations such as distance calculations, point-in-polygon queries, and other spatial analysis tasks.

If the langchain library supports such functionality, it would greatly enhance the capabilities of our project by enabling us to process and analyze spatial data directly within our SQL database.

I appreciate any guidance or information you can provide on this topic.

Motivation

By leveraging GIS functions within the langchain library, we can unlock the power of spatial analysis directly within our SQL database combined with LLM.
This integration creates a multitude of possibilities for executing intricate spatial operations, including proximity analysis, overlay analysis, and spatial clustering via user prompts.

Your contribution

  • I can provide insights into the best practices, optimal approaches, and efficient query design for performing various spatial operations within the SQL database.

  • I can offer troubleshooting assistance and support to address any issues or challenges that may arise during the integration of the langchain library. I can provide guidance on resolving errors, improving performance, and ensuring the accuracy of spatial analysis results.

@pmads1376
Copy link

This is a great concept. I'd love to help out on it.

@mpskex
Copy link
Contributor

mpskex commented Jun 15, 2023

If you are looking for something like H3index or S2geometry, ClickHouse and its variant like MyScale can do that for you. Furthermore MyScale can gives you vector search capability. We are very excited to help you to get there.

@jonasnordlund
Copy link

jonasnordlund commented Jun 20, 2023

I agree, this is a potentially transformative feature for GIS and network information systems. It would be amazing if we could query on spatial data with an understanding of the OGC standard geometry data type and have it generate OGC methods for geometry. It would enable queries of the kind "Which is the closest pump station to the district heating plant DHP-001?" or "How many subscribers are in the affected area Risk Analysis 2023?"

We currently provide tools to store and reuse queries with SQL but this isn't user-friendly and requires a knowledge of SQL. Using natural language, even in their own language and not only English, would be a fantastic usability improvement, and this is honestly the first thing I came for after getting hyped over how easy it was to get LangChain running with Python and SQLAlchemy!

@vlasvlasvlas
Copy link
Author

how can we start this? im in!

@mpskex
Copy link
Contributor

mpskex commented Jun 22, 2023

Perhaps one can start working from langchain.chains.sql_database.base.SQLDatabaseChain. It already has some supports on ClickHouse SQL. Augmenting the datatype and related functions may be a good idea. I think I will create a branch under myscale/LangChain somewhere next week. It will be really nice if you guys has some examples on data or query to play with~

@vlasvlasvlas
Copy link
Author

vlasvlasvlas commented Jun 22, 2023

Perhaps one can start working from langchain.chains.sql_database.base.SQLDatabaseChain. It already has some supports on ClickHouse SQL. Augmenting the datatype and related functions may be a good idea. I think I will create a branch under myscale/LangChain somewhere next week. It will be really nice if you guys has some examples on data or query to play with~

i can help with that, identifying basic geo queries (mainly the ones you can use at something like postgresql + postgis functions) typical queries like buffering, distance, within, intersects, touches, via clickhouse DBMS ~ postGIS Integration? https://clickhouse.com/docs/en/integrations/postgresql
http://postgis.net/workshops/postgis-intro/spatial_relationships.html

@mpskex
Copy link
Contributor

mpskex commented Jun 27, 2023

Perhaps one can start working from langchain.chains.sql_database.base.SQLDatabaseChain. It already has some supports on ClickHouse SQL. Augmenting the datatype and related functions may be a good idea. I think I will create a branch under myscale/LangChain somewhere next week. It will be really nice if you guys has some examples on data or query to play with~

i can help with that, identifying basic geo queries (mainly the ones you can use at something like postgresql + postgis functions) typical queries like buffering, distance, within, intersects, touches, via clickhouse DBMS ~ postGIS Integration? https://clickhouse.com/docs/en/integrations/postgresql http://postgis.net/workshops/postgis-intro/spatial_relationships.html

We are mainly thinking of utilizing some of the geo functions that are natively supported by Clickhouse. We can also look into some general functions in both postGIS and Clickhouse like distance to polygon or points. Those function prototypes / signatures can then be transformed into prompts to help constructing a structured query.

@dosubot
Copy link

dosubot bot commented Sep 26, 2023

Hi, @vlasvlasvlas! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you are requesting guidance on integrating the langchain library with a SQL database to enable spatial analysis capabilities. There has been some discussion in the comments, with pmads1376 and mpskex expressing interest in helping with the integration. mpskex suggests using ClickHouse or MyScale for spatial analysis, while jonasnordlund suggests using natural language queries. mpskex plans to work on the integration in a branch of the MyScale repository, and you have offered to help with identifying basic geo queries and integrating with postGIS.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 26, 2023
@mpskex
Copy link
Contributor

mpskex commented Sep 27, 2023

We have worked on something so far, for example #9850, where you can ask for close location based on your current location:

Note: Navigate to ClickHouse's Geo Index for more information on geo functions.

Imagine you want to use an Hexagonal Hierarchical Spatial (H3) index to find the answer to the following question:

Please find a restaurant that sells delicious Thail-style seafood dishes within 3km of my location.

Note: An H3 index is a grid system for analyzing large spatial data sets, developed by Uber, and partitioning areas of the earth into identifiable grid cells.

How do you set up this search?

As described above, MyScale combines the VirtualColumnName function added to LangChain's self-querying retriever. You can therefore search semantically within a specific range of geographical locations with H3 index columns.

For example, you can create a virtual column name for an existing H3 index as follows:

metadata_field_info = [
    AttributeInfo(
        name=VirtualColumnName(name="geo_dist",
                               column=f"{vectorstore.metadata_column}.h3index",
                               func=lambda col: f"h3Distance({col}, {get_my_position_in_h3()})"),
        description="distance to my current location in meters",
        type="float",
    ),
]

Perhaps this could cover some cases discussed here? @vlasvlasvlas @jonasnordlund

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 27, 2023
@dosubot
Copy link

dosubot bot commented Sep 27, 2023

@baskaryan Could you please help @vlasvlasvlas with integrating the langchain library with a SQL database for spatial analysis? They have provided some updates and context in their latest comment. Thank you!

@mpskex
Copy link
Contributor

mpskex commented Sep 27, 2023

Another smart way to augment this would be having a special translator on self-querying retrievers with specific locations. You may have a location map (name vs geoindex) in your translator. That will translate your desired location into searchable indices!

@mpskex
Copy link
Contributor

mpskex commented Sep 27, 2023

We have worked on something so far, for example #9850, where you can ask for close location based on your current location:

Note: Navigate to ClickHouse's Geo Index for more information on geo functions.

Imagine you want to use an Hexagonal Hierarchical Spatial (H3) index to find the answer to the following question:

Please find a restaurant that sells delicious Thail-style seafood dishes within 3km of my location.

Note: An H3 index is a grid system for analyzing large spatial data sets, developed by Uber, and partitioning areas of the earth into identifiable grid cells.

How do you set up this search?

As described above, MyScale combines the VirtualColumnName function added to LangChain's self-querying retriever. You can therefore search semantically within a specific range of geographical locations with H3 index columns.

For example, you can create a virtual column name for an existing H3 index as follows:

metadata_field_info = [
    AttributeInfo(
        name=VirtualColumnName(name="geo_dist",
                               column=f"{vectorstore.metadata_column}.h3index",
                               func=lambda col: f"h3Distance({col}, {get_my_position_in_h3()})"),
        description="distance to my current location in meters",
        type="float",
    ),
]

Perhaps this could cover some cases discussed here? @vlasvlasvlas @jonasnordlund

You may also want to try #10177. But you need to try hard on prompting the geoindex functions, and make sure LLM can learn it.

@vlasvlasvlas
Copy link
Author

vlasvlasvlas commented Sep 27, 2023

Hi there! Apologies for the delay. I'd be happy to assist with use cases related to classic PostGIS functions and how to approach their primary uses. Would it be acceptable to share a Google Spreadsheet where I can outline the main use cases? This could serve as a kind of Rosetta Stone approach, for instance:

Column 1: Main use case description.
Column 2: PostGIS approach.
Column 3: ClickHouse approach.
Column 4: Langchain approach.

This way, we can have a clear and organized reference for understanding how these use cases are tackled in different contexts. What do you think?

@mpskex
Copy link
Contributor

mpskex commented Oct 7, 2023

Hi there! Apologies for the delay. I'd be happy to assist with use cases related to classic PostGIS functions and how to approach their primary uses. Would it be acceptable to share a Google Spreadsheet where I can outline the main use cases? This could serve as a kind of Rosetta Stone approach, for instance:

Column 1: Main use case description.

Column 2: PostGIS approach.

Column 3: ClickHouse approach.

Column 4: Langchain approach.

This way, we can have a clear and organized reference for understanding how these use cases are tackled in different contexts. What do you think?

Just created a google sheet for this:

https://docs.google.com/spreadsheets/d/1ixL-e8lvLUOJhoY7fdk5JWDkyc3wMSG3myBR65Y9uPQ/edit?usp=sharing

@vlasvlasvlas
Copy link
Author

Hi im really sorry about the delay, i filled the doc with some examples, i can add more examples during this week.

i also add a new sheet where i put the sources of where i took some of the examples and added documentation.

@vlasvlasvlas
Copy link
Author

let me know if examples are ok or if i must add more use cases

@vlasvlasvlas
Copy link
Author

hi, any news? let me know if i can help, txs!

@mpskex
Copy link
Contributor

mpskex commented Nov 27, 2023

Sorry for the late reply. I have looked into those examples and found some functions not supported by ClickHouse.

But don't worry, we still have a chance to make the SQL Database Chain work in PostgreSQL.

You can follow how we did in prompts in VectorSQLDatabaseChain and ChatData

We are still trying to make the prompt working, and at the same time we encourage anyone who is interested in this issue to try for themselves. There are no general prompts for this and as we have a good set of examples I think it would be pretty quick to test out some simple ideas with LangChain's SQLDatabaseChain!

@vlasvlasvlas
Copy link
Author

hi! yep, i got more involved with agents, and i think we can use postgresql with postgis functions!
if you want i can get involved more with this functionality.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 27, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 13, 2024
@vlasvlasvlas
Copy link
Author

any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants