# Google Spanner

> [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.

This notebook goes over how to use `Spanner` for GraphRAG with `SpannerPropertyGraphStore`, `SpannerGraphTextToGQLRetriever` and `SpannerGraphCustomRetriever` class.

Learn more about the package on [GitHub](https://github.com/googleapis/llama-index-spanner-python/).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/llama-index-spanner-python/blob/main/docs/graph_retriever.ipynb)

## Before You Begin

To run this notebook, you will need to do the following:

 * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
 * [Enable the Cloud Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com)
 * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)
 * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)

### 🦜🔗 Library Installation
The integration lives in its own `llama-index-google-spanner` package, so we need to install it.

In [None]:
%pip install --upgrade --quiet llama-index-google-spanner json-repair pyvis

**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### 🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

* If you are using Colab to run this notebook, use the cell below and continue.
* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth

auth.authenticate_user()

### ☁ Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.

If you don't know your project ID, try the following:

* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.

PROJECT_ID = "my-project-id"  # @param {type:"string"}

# Set the project id
!gcloud config set project {PROJECT_ID}
%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}

### 💡 API Enablement
The `llama-index-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project.

In [None]:
# enable Spanner API
!gcloud services enable spanner.googleapis.com

## Basic Usage

### Set Spanner database values
Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).

In [None]:
# @title Set Your Values Here { display-mode: "form" }

INSTANCE = ""  # @param {type: "string"}
DATABASE = ""  # @param {type: "string"}
GRAPH_NAME = ""  # @param {type: "string"}

### SpannerGraphStore

To initialize the `SpannerPropertyGraphStore` class you need to provide 3 required arguments and other arguments are optional and only need to pass if it's different from default ones

1.   a Spanner instance id;
2.   a Spanner database id belongs to the above instance id;
3.   a Spanner graph name used to create a graph in the above database.

In [3]:
from llama_index_spanner import SpannerPropertyGraphStore

graph_store = SpannerPropertyGraphStore(
    instance_id=INSTANCE,
    database_id=DATABASE,
    graph_name=GRAPH_NAME,
    use_flexible_schema=False
)



#### Add Graph Documents to Spanner Graph

In [32]:
# @title Extract Nodes and Edges from text snippets
from llama_index.core.schema import Document
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
from llama_index.llms.google_genai import GoogleGenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.core.storage import StorageContext
from typing import Literal

text_snippets = [
    # Text snippet for students graduting from Veritas University, Computer Science Dept 2017
    """
This was the graduation ceremony of 2017. A wave of jubilant graduates poured out of the
grand halls of Veritas University, their laughter echoing across the quad. Among them were
a cohort of exceptional students from the Computer Science department, a group that had
become known for their collaborative spirit and innovative ideas.
Leading the pack was Emily Davis, a coding whiz with a passion for cybersecurity, already
fielding offers from top tech firms. Beside her walked James Rodriguez, a quiet but
brilliant mind fascinated by artificial intelligence, dreaming of building machines that
could understand human emotions.  Trailing slightly behind, deep in conversation, were
Sarah Chen and Michael Patel, both aspiring game developers, eager to bring their creative
visions to life.  And then there was  Aisha Khan, a social justice advocate who planned to
use her coding skills to address inequality through technology.
As they celebrated their achievements, these Veritas University Computer Science graduates
were ready to embark on diverse paths, each carrying the potential to shape the future of
technology in their own unique way.
""",
    # Text snippet for students graduting from Oakhaven University, Computer Science Dept 2016
    """
The year was 2016, and a palpable buzz filled the air as the graduating class of Oakhaven
university from Computer science and Engineering department emerged from the Beckman
Auditorium. Among them was a group of exceptional students, renowned for their
intellectual curiosity and groundbreaking research.
At the forefront was Alice Johnson, a gifted programmer with a fascination for quantum
computing, already collaborating with leading researchers in the field.  Beside her
strode David Kim, a brilliant theorist captivated by the intricacies of cryptography,
eager to contribute to the development of secure communication systems.  Engaged in an
animated discussion were Maria Rodriguez and Robert Lee, both passionate about robotics
and determined to push the boundaries of artificial intelligence.  And then there was
Chloe Brown, a visionary with a deep interest in bioinformatics, driven to unlock the
secrets of the human genome through computational analysis.
As they celebrated their accomplishments, these graduates, armed with their exceptional
skills and unwavering determination, were poised to make significant contributions to the world of computing and beyond.
""",
    # Text snippet mentions the company Emily Davis founded.
    # The snippet doesn't mention that she is an alumni of Veritas University
    """
Emily Davis, a name synonymous with cybersecurity innovation, turned that passion into a
thriving business.  In the year 2022, Davis founded Ironclad Security, a company that's
rapidly changing the landscape of cybersecurity solutions.
""",
    # Text snippet mentions the company Alice Johnson founded.
    # The snippet doesn't mention that she is an alumni of Oakhaven University.
    """
Alice Johnson had a vision that extended far beyond the classroom. Driven by an insatiable
curiosity about the potential of quantum mechanics, she founded Entangled Solutions, a
company poised to revolutionize industries through the power of quantum technology.
Entangled Solutions distinguishes itself by focusing on practical applications of quantum
computing.
""",
]

# Create splits for documents
documents = [Document(text=t) for t in text_snippets]
llm = GoogleGenAI(
    model="gemini-2.0-flash",
)
embed_model = GoogleGenAIEmbedding(
    model_name="text-embedding-004", embed_batch_size=100
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

index = PropertyGraphIndex.from_documents(
      documents,
      embed_model=embed_model,
      embed_kg_nodes=True,
      kg_extractors=[
          SchemaLLMPathExtractor(
            possible_entities=Literal["College", "Deparatment", "Person", "Year", "Company"],
            possible_relations=[
                "AlumniOf",
                "StudiedInDepartment",
                "PartOf",
                "GraduatedInYear",
                "Founded",
            ],
            llm=llm,
            max_triplets_per_chunk=500,
            num_workers=4,
            strict=False,
          )
      ],
      llm=llm,
      show_progress=True,
      property_graph_store=graph_store,
      use_async=False,
      storage_context=storage_context,
  )



Both GOOGLE_API_KEY and GEMINI_API_KEY are set. Using GOOGLE_API_KEY.
Both GOOGLE_API_KEY and GEMINI_API_KEY are set. Using GOOGLE_API_KEY.
  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 4/4 [00:00<00:00, 310.98it/s]
Extracting paths from text with schema: 100%|██████████| 4/4 [00:06<00:00,  1.69s/it]
Generating embeddings: 100%|██████████| 4/4 [00:00<00:00, 20.63it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  5.79it/s]


No schema change required...
Insert nodes of type `text_chunk`...
No schema change required...
Insert nodes of type `PERSON`...
Waiting for DDL operations to complete...
Insert edges of type `YEAR_GRADUATEDINYEAR_PERSON`...
Insert edges of type `COLLEGE_PARTOF_DEPARATMENT`...
Insert edges of type `PERSON_STUDIEDINDEPARTMENT_DEPARATMENT`...
Insert edges of type `PERSON_ALUMNIOF_COLLEGE`...
Insert edges of type `PERSON_FOUNDED_COMPANY`...
Insert edges of type `COMPANY_PARTOF_YEAR`...


### Initialize the Spanner Graph Text to GQL Retriever
The Spanner Graph Retriever takes two parameters, a SpannerGraphStore object and a language model.

In [13]:
from llama_index_spanner.graph_retriever import SpannerGraphTextToGQLRetriever

retriever_text_to_gql = SpannerGraphTextToGQLRetriever(
      graph_store=graph_store,
      llm=llm,
      include_raw_response_as_metadata=True,
      verbose=True,
      summarize_response=True
  )

In [48]:
# @title Run Spanner Graph NL2GQL Retriever
question = "Who are the alumni of the college id Veritas University ?"  # @param {type:"string"}
response = retriever_text_to_gql.retrieve(question)
first_node_with_score = response[0]
text_node = first_node_with_score.node
response_str = text_node.text
gql_query = text_node.metadata['query']
print("GQL Query: ", gql_query)
print("Summarized Response: ", response_str)



GQL Query:  MATCH (college:COLLEGE {name: 'Veritas University'})<-[:PERSON_ALUMNIOF_COLLEGE]-(person:PERSON)
RETURN person.name AS person_name;
Summarized Response:  The names of the people who are alumni of Veritas University are: Aisha Khan, Emily Davis, James Rodriguez, Michael Patel, and Sarah Chen.



In [45]:
# @title Run Spanner Graph NL2GQL Retriever 2
question = "List the companies, their founders and the college they attended."  # @param {type:"string"}
response = retriever_text_to_gql.retrieve(question)
first_node_with_score = response[0]
text_node = first_node_with_score.node
response_str = text_node.text
gql_query = text_node.metadata['query']
print("GQL Query: ", gql_query)
print("Summarized Response: ", response_str)

GQL Query:  MATCH (company:COMPANY)<-[:PERSON_FOUNDED_COMPANY]-(person:PERSON)-[:PERSON_ALUMNIOF_COLLEGE]->(college:COLLEGE)
RETURN company.name AS company_name, person.name AS founder_name, college.name AS college_name;
Summarized Response:  The companies 'Entangled Solutions' and 'Ironclad Security' were founded by 'Alice Johnson' and 'Emily Davis' respectively. 'Alice Johnson' is an alumni of 'Oakhaven university' and 'Emily Davis' is an alumni of 'Veritas University'.



In [46]:
# @title Run Spanner Graph NL2GQL Retriever 3
question = "Which companies were founded by alumni of college id Veritas University ? Who were the founders ?"  # @param {type:"string"}
response = retriever_text_to_gql.retrieve(question)
first_node_with_score = response[0]
text_node = first_node_with_score.node
response_str = text_node.text
gql_query = text_node.metadata['query']
print("GQL Query: ", gql_query)
print("Summarized Response: ", response_str)

GQL Query:  MATCH (c:COLLEGE {name: 'Veritas University'})<-[:PERSON_ALUMNIOF_COLLEGE]-(p:PERSON)-[:PERSON_FOUNDED_COMPANY]->(co:COMPANY)
RETURN co.name AS company_name, p.name AS founder_name;
Summarized Response:  The founder Emily Davis, who is an alumni of Veritas University, founded the company Ironclad Security.



### Initialize the Spanner Graph Custom Retriever - Combines VectorContextRetriever and SpannerGraphTextToGQLRetriever, then reranks the results.
The Spanner Graph Retriever takes two parameters, a SpannerGraphStore object and a language model.

In [8]:
from llama_index_spanner.graph_retriever import SpannerGraphCustomRetriever
custom_retriever = SpannerGraphCustomRetriever(
    graph_store=graph_store,
    embed_model=embed_model,
    llm=llm,
    include_raw_response_as_metadata=True,
    summarize_response=True,
    verbose=True,
)

In [None]:
# @title Run Spanner Graph Custom Retriever 1
question1 = "Who are the alumni of the college id Veritas University ?"  # @param {type:"string"}
response1 = custom_retriever.custom_retrieve(question1)
print("Received Response: \n", response1)

Received Response: 
 Emily Davis -> ALUMNIOF -> Veritas University

Sarah Chen -> ALUMNIOF -> Veritas University


In [None]:
# @title Run Spanner Graph Custom Retriever 2
question2 = "List the companies, their founders and the college they attended."  # @param {type:"string"}
response2 = custom_retriever.custom_retrieve(question2)
print("Received Response: \n", response2)

Received Response: 
 query: MATCH (company:COMPANY)<-[:PERSON_FOUNDED_COMPANY]-(person:PERSON)-[:PERSON_ALUMNIOF_COLLEGE]->(college:COLLEGE)
RETURN company.name AS company_name, person.name AS founder_name, college.name AS college_name;
response: The companies 'Entangled Solutions' and 'Ironclad Security' were founded by 'Alice Johnson' and 'Emily Davis' respectively. 'Alice Johnson' is an alumni of 'Oakhaven university' and 'Emily Davis' is an alumni of 'Veritas University'.

The companies 'Entangled Solutions' and 'Ironclad Security' were founded by 'Alice Johnson' and 'Emily Davis' respectively. 'Alice Johnson' is an alumni of 'Oakhaven university' and 'Emily Davis' is an alumni of 'Veritas University'.


In [None]:
# @title Run Spanner Graph Custom Retriever 3
question3 = "Which companies were founded by alumni of college id Veritas University ? Who were the founders ?"  # @param {type:"string"}
response3 = custom_retriever.custom_retrieve(question3)
print("Received Response: \n", response3)

Received Response: 
 query: MATCH (college:COLLEGE {name: 'Veritas University'})<-[:PERSON_ALUMNIOF_COLLEGE]-(person:PERSON)-[:PERSON_FOUNDED_COMPANY]->(company:COMPANY)
RETURN company.name AS company_name, person.name AS founder_name;
response: The company name is Ironclad Security, and the founder's name is Emily Davis.

The company name is Ironclad Security, and the founder's name is Emily Davis.

Emily Davis -> FOUNDED -> Ironclad Security


In [None]:
# @title Run Spanner Graph Custom Retriever 4
##"Significant contributions" is vague and judgment-based; vector search retrieves nuanced textual descriptions. The Graph retriever fills in structured parts (founder, company, college).

question4 = "Name alumni who later went on to make significant contributions to tech."  # @param {type:"string"}
response4 = custom_retriever.custom_retrieve(question4)
print("Received Response: \n", response4)

Received Response: 
 query: MATCH (p:PERSON)-[:PERSON_ALUMNIOF_COLLEGE]->(c:COLLEGE), (p)-[:PERSON_FOUNDED_COMPANY]->(co:COMPANY)
RETURN p.name AS person_name, c.name AS college_name, co.name AS company_name;
response: The query returned the following results: Alice Johnson, who attended Oakhaven university, founded Entangled Solutions; Emily Davis, who attended Veritas University, founded Ironclad Security.

The query returned the following results: Alice Johnson, who attended Oakhaven university, founded Entangled Solutions; Emily Davis, who attended Veritas University, founded Ironclad Security.


In [None]:
# @title Run Spanner Graph Custom Retriever 5
## It uses entity attributes (gender inference from name) + company founder role.
question5 = "Who are the female founders among the alumni?"  # @param {type:"string"}
response5 = custom_retriever.custom_retrieve(question5)
print("Received Response: \n", response5)

Received Response: 
 query: MATCH (p:PERSON)-[:PERSON_ALUMNIOF_COLLEGE]->(c:COLLEGE), (p:PERSON)-[:PERSON_FOUNDED_COMPANY]->(co:COMPANY) RETURN p.name AS founder_name

response: The founder names are Alice Johnson and Emily Davis.

The founder names are Alice Johnson and Emily Davis.

Sarah Chen -> GRADUATEDINYEAR -> 2017


In [None]:
# @title Run Spanner Graph Custom Retriever 6
## It is a Semantic bridge which needs vector understanding of interests and GQL’s mapping of entities and their relationships.
question6 = "Find connections between graduation interests and their real-world ventures."  # @param {type:"string"}
response6 = custom_retriever.custom_retrieve(question6)
print("Received Response: \n", response6)

Received Response: 
 query: MATCH (p:PERSON)-[:PERSON_STUDIEDINDEPARTMENT_DEPARATMENT]->(d:DEPARATMENT), (p)-[:PERSON_FOUNDED_COMPANY]->(c:COMPANY)
RETURN p.name AS person_name, d.name AS department_name, c.name AS company_name;
response: Okay, here are the results for the query:
Alice Johnson, who studied in the Computer science and Engineering department, founded Entangled Solutions. Emily Davis, who studied in the Computer Science department, founded Ironclad Security.

Okay, here are the results for the query:
Alice Johnson, who studied in the Computer science and Engineering department, founded Entangled Solutions. Emily Davis, who studied in the Computer Science department, founded Ironclad Security.

Sarah Chen -> GRADUATEDINYEAR -> 2017


#### Clean up the graph

> USE IT WITH CAUTION!

Clean up all the nodes/edges in your graph and remove your graph definition.

In [None]:
graph_store.cleanup()