Skip to content

feat(graph): add Spanner Graph support to LangChain GraphStore interface#104

Merged
averikitsch merged 11 commits intogoogleapis:mainfrom
mtyin:graph
Nov 25, 2024
Merged

feat(graph): add Spanner Graph support to LangChain GraphStore interface#104
averikitsch merged 11 commits intogoogleapis:mainfrom
mtyin:graph

Conversation

@mtyin
Copy link
Copy Markdown
Contributor

@mtyin mtyin commented Oct 22, 2024

  1. Add SpannerGraphStore implementation of GraphStore;
  2. Add an integration test;
  3. Add a notebook to demonstrate the usage;
  4. Misc updates: requirements, init

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

1) Add SpannerGraphStore implementation of GraphStore;
2) Add an integration test;
3) Add a notebook to demonstrate the usage;
4) Misc updates: requirements, __init__
@product-auto-label product-auto-label Bot added the api: spanner Issues related to the googleapis/langchain-google-spanner-python API. label Oct 22, 2024
1) same edge label between different types of nodes

E.g.

Company FOCUS_ON Product
Company FOCUS_ON Technology

2) the same doc contains multiple nodes/edges with the same key fields.

E.g.

Merge
  Company(id='Google', properties={'p1':'a1', 'p3':'a3'}),
  Company(id='Google', properties={'p2':'a2', 'p3':'a4'})
into
  Company(id='Google', properties={'p1':'a1', 'p2':'a2', 'p3':'a4'})

This is due to Spanner DML doesn't support updating the same row twice
in the same DML. We could also separate each node/edge into a separate
DML which can be very slow.
@gauravpurohit06 gauravpurohit06 marked this pull request as ready for review November 11, 2024 11:13
@gauravpurohit06 gauravpurohit06 requested review from a team November 11, 2024 11:13
Copy link
Copy Markdown
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you mtyin for raising the PR. The code looks well organized and modular.

I have reviewed the source code file and not test file. From the functionality point of view, it looks good but I need to do one more iteration.

Other than mentioned comments, few general things:

  1. You can use isort and black to format and style the code.
  2. Null Checks are missing at various places, can you please add it?

Comment thread docs/graph_store.ipynb
@@ -0,0 +1,903 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Styling of this file is super bright (ie. painful white background) and inconsistent with other docs. Please make it consistent with other docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes but from what I can see, the other notebooks are also using brightwhite as background color. LMK if I missed anything.

w = NodeWrapper(node)
if w in s:
# Combine the properties for nodes with the same id.
n = next(filter(lambda v: v == w, s))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation potentially be slow for big graphs, because it's iterating over a complete set which is not required as the NodeWrapper object is hashable & comparable based on their ids.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

w = NodeWrapper(node)
if w in s:
# Combine the properties for nodes with the same id.
n = next(filter(lambda v: v == w, s))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        if w.node.id in s:
            # Combine the properties for nodes with the same id.
            s[w.node.id].node.properties.update(node.properties) 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I used the wrapper as the key to make the code looks more consistent between node and edge.

w = EdgeWrapper(edge)
if w in s:
# Combine the properties for edges with the same id.
e = next(filter(lambda v: v == w, s))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}


class TypeUtility(object):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add class docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Also moved the class to a separate file

"""
if s == "BOOL":
return param_types.BOOL
if s in ["INT64", "INT32"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the schema have INT32 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

}


class TypeUtility(object):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very generic class and not limited just to GraphStore. Can we create a utility module and move it there? It would be useful for future development in this repo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

from requests.structures import CaseInsensitiveDict


def to_identifier(s: str) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this method in GraphDocumentUtility? as we are keeping fixup_identifier method there as well, both methods looks similar to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

return "`" + s + "`"


def to_identifiers(s: List[str]) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

1) Fix style of python notebook;
2) GraphStore:
   - improve a corner case by avoiding full set iteration;
   - add some null checking;
   - move utilities around.
@mtyin
Copy link
Copy Markdown
Contributor Author

mtyin commented Nov 15, 2024

/gcbrun

1 similar comment
@averikitsch
Copy link
Copy Markdown
Collaborator

/gcbrun

@averikitsch
Copy link
Copy Markdown
Collaborator

/gcbrun

@mtyin
Copy link
Copy Markdown
Contributor Author

mtyin commented Nov 20, 2024

/gcbrun

2 similar comments
@mtyin
Copy link
Copy Markdown
Contributor Author

mtyin commented Nov 20, 2024

/gcbrun

@mtyin
Copy link
Copy Markdown
Contributor Author

mtyin commented Nov 20, 2024

/gcbrun

Comment thread requirements.txt Outdated
langchain-core==0.3.9
langchain-community==0.3.1
google-cloud-spanner==3.49.1
langchain-experimental==0.3.2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this dep is only needed for the colab, please remove from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread docs/graph_store.ipynb Outdated
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/mtyin/3f1e7d8e4c6b59edc6e7858c52247465/copy-of-graph_store.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove duplicate colab button

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Done.

@mtyin
Copy link
Copy Markdown
Contributor Author

mtyin commented Nov 22, 2024

Thank you mtyin for raising the PR. The code looks well organized and modular.

I have reviewed the source code file and not test file. From the functionality point of view, it looks good but I need to do one more iteration.

Other than mentioned comments, few general things:

  1. You can use isort and black to format and style the code.
  2. Null Checks are missing at various places, can you please add it?

Resolve this.

Copy link
Copy Markdown
Contributor

@gauravpurohit06 gauravpurohit06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@averikitsch averikitsch merged commit 98c2f8f into googleapis:main Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the googleapis/langchain-google-spanner-python API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants