feat(graph): add Spanner Graph support to LangChain GraphStore interface#104
feat(graph): add Spanner Graph support to LangChain GraphStore interface#104averikitsch merged 11 commits intogoogleapis:mainfrom
Conversation
1) Add SpannerGraphStore implementation of GraphStore; 2) Add an integration test; 3) Add a notebook to demonstrate the usage; 4) Misc updates: requirements, __init__
1) same edge label between different types of nodes
E.g.
Company FOCUS_ON Product
Company FOCUS_ON Technology
2) the same doc contains multiple nodes/edges with the same key fields.
E.g.
Merge
Company(id='Google', properties={'p1':'a1', 'p3':'a3'}),
Company(id='Google', properties={'p2':'a2', 'p3':'a4'})
into
Company(id='Google', properties={'p1':'a1', 'p2':'a2', 'p3':'a4'})
This is due to Spanner DML doesn't support updating the same row twice
in the same DML. We could also separate each node/edge into a separate
DML which can be very slow.
gauravpurohit06
left a comment
There was a problem hiding this comment.
Thank you mtyin for raising the PR. The code looks well organized and modular.
I have reviewed the source code file and not test file. From the functionality point of view, it looks good but I need to do one more iteration.
Other than mentioned comments, few general things:
- You can use
isortandblackto format and style the code. - Null Checks are missing at various places, can you please add it?
| @@ -0,0 +1,903 @@ | |||
| { | |||
There was a problem hiding this comment.
nit: Styling of this file is super bright (ie. painful white background) and inconsistent with other docs. Please make it consistent with other docs.
There was a problem hiding this comment.
I made some changes but from what I can see, the other notebooks are also using brightwhite as background color. LMK if I missed anything.
| w = NodeWrapper(node) | ||
| if w in s: | ||
| # Combine the properties for nodes with the same id. | ||
| n = next(filter(lambda v: v == w, s)) |
There was a problem hiding this comment.
This operation potentially be slow for big graphs, because it's iterating over a complete set which is not required as the NodeWrapper object is hashable & comparable based on their ids.
| w = NodeWrapper(node) | ||
| if w in s: | ||
| # Combine the properties for nodes with the same id. | ||
| n = next(filter(lambda v: v == w, s)) |
There was a problem hiding this comment.
if w.node.id in s:
# Combine the properties for nodes with the same id.
s[w.node.id].node.properties.update(node.properties)
There was a problem hiding this comment.
Btw, I used the wrapper as the key to make the code looks more consistent between node and edge.
| w = EdgeWrapper(edge) | ||
| if w in s: | ||
| # Combine the properties for edges with the same id. | ||
| e = next(filter(lambda v: v == w, s)) |
| } | ||
|
|
||
|
|
||
| class TypeUtility(object): |
There was a problem hiding this comment.
Done! Also moved the class to a separate file
| """ | ||
| if s == "BOOL": | ||
| return param_types.BOOL | ||
| if s in ["INT64", "INT32"]: |
There was a problem hiding this comment.
why would the schema have INT32 ?
| } | ||
|
|
||
|
|
||
| class TypeUtility(object): |
There was a problem hiding this comment.
This is very generic class and not limited just to GraphStore. Can we create a utility module and move it there? It would be useful for future development in this repo.
| from requests.structures import CaseInsensitiveDict | ||
|
|
||
|
|
||
| def to_identifier(s: str) -> str: |
There was a problem hiding this comment.
Can we move this method in GraphDocumentUtility? as we are keeping fixup_identifier method there as well, both methods looks similar to me.
| return "`" + s + "`" | ||
|
|
||
|
|
||
| def to_identifiers(s: List[str]) -> str: |
1) Fix style of python notebook; 2) GraphStore: - improve a corner case by avoiding full set iteration; - add some null checking; - move utilities around.
|
/gcbrun |
1 similar comment
|
/gcbrun |
|
/gcbrun |
|
/gcbrun |
2 similar comments
|
/gcbrun |
|
/gcbrun |
| langchain-core==0.3.9 | ||
| langchain-community==0.3.1 | ||
| google-cloud-spanner==3.49.1 | ||
| langchain-experimental==0.3.2 |
There was a problem hiding this comment.
Since this dep is only needed for the colab, please remove from here.
| "colab_type": "text" | ||
| }, | ||
| "source": [ | ||
| "<a href=\"https://colab.research.google.com/gist/mtyin/3f1e7d8e4c6b59edc6e7858c52247465/copy-of-graph_store.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" |
There was a problem hiding this comment.
please remove duplicate colab button
Resolve this. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕