# Amazon Neptune Graph Store

## Overview

This notebook covers the Amazon Neptune Database integration in Semantica. Amazon Neptune is a fully managed graph database service that supports both property graphs (via OpenCypher/Gremlin) and RDF graphs (via SPARQL).

### Key Features

- **IAM Authentication**: Secure access using AWS SigV4 signatures via AuthManager
- **OpenCypher Support**: Query using standard OpenCypher syntax
- **Bolt Protocol**: Uses Neo4j Bolt driver for efficient binary communication
- **Native ~id Support**: Leverages Neptune's native element ID handling
- **Full CRUD Operations**: Create, read, update, delete nodes and relationships
- **Automatic Retry**: Built-in retry logic with exponential backoff for transient errors

### Prerequisites

- An Amazon Neptune Database cluster
- AWS credentials configured (boto3, environment variables, or IAM role)
- Network access to your Neptune cluster (VPC, security groups)

---

## Installation

```bash
# Install Semantica with Neptune support
pip install semantica

# Required dependencies (installed automatically)
pip install boto3 neo4j
```

In [None]:
!pip install semantica

## Configuration

Set your Neptune cluster endpoint and AWS credentials. Replace the placeholder values with your actual configuration.

In [None]:
import os

# Neptune cluster configuration - REPLACE WITH YOUR VALUES
os.environ["NEPTUNE_ENDPOINT"] = "your-cluster.us-east-1.neptune.amazonaws.com"
os.environ["NEPTUNE_PORT"] = "8182"
os.environ["AWS_REGION"] = "us-east-1"

# AWS credentials (if using IAM Auth and not relying on IAM role or ~/.aws/credentials)
# os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
# os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
# os.environ["AWS_SESSION_TOKEN"] = "your-session-token"

print(f"Neptune Endpoint: {os.environ.get('NEPTUNE_ENDPOINT')}")
print(f"AWS Region: {os.environ.get('AWS_REGION')}")

## Step 1: Initialize Neptune Store

Initialize a connection to your Amazon Neptune cluster with IAM authentication.

In [None]:
import os
from semantica.graph_store import GraphStore

# Option 1: Using GraphStore factory (recommended)
neptune_store = GraphStore(
    backend="neptune",
    endpoint=os.environ.get("NEPTUNE_ENDPOINT"),
    port=int(os.environ.get("NEPTUNE_PORT", 8182)),
    region=os.environ.get("AWS_REGION", "us-east-1"),
    iam_auth=True,
)

# Connect to Neptune
neptune_store.connect()
print("Connected to Amazon Neptune!")

### Development/Testing Without IAM Auth

For development or testing environments where IAM authentication is not required (e.g., Neptune notebooks or VPC-only access), you can disable IAM signing:

In [None]:
# For dev/test environments without IAM authentication
neptune_store_dev = GraphStore(
    backend="neptune",
    endpoint=os.environ.get("NEPTUNE_ENDPOINT"),
    port=int(os.environ.get("NEPTUNE_PORT", 8182)),
    region=os.environ.get("AWS_REGION", "us-east-1"),
    iam_auth=False,  # Disable IAM signing for dev/test
)
neptune_store_dev.connect()

### Authentication Options

IAM Authentication (recommended for production) automatically uses the AWS credential chain:
1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
2. AWS credentials file (~/.aws/credentials)
3. IAM role (for EC2, Lambda, ECS)

## Step 2: Node Operations

### Creating Nodes

Nodes represent entities in your graph. Each node can have:
- **ID**: A unique identifier (custom or auto-generated UUID)
- **Labels**: Categories/types (e.g., `Person`, `Company`)
- **Properties**: Key-value pairs (e.g., `{"name": "Alice", "age": 30}`)

In [None]:
# Create a single node with custom ID (id in properties)
alice = neptune_store.create_node(
    labels=["Person"],
    properties={"id": "alice", "name": "Alice", "age": 30, "role": "Engineer"}
)
print(f"Created node: {alice}")

# Create a node with auto-generated UUID (no id in properties)
bob = neptune_store.create_node(
    labels=["Person"],
    properties={"name": "Bob", "age": 25, "role": "Designer"}
)
print(f"Created node with UUID: {bob['id']}")

# Create a company node with auto-generated ID
acme = neptune_store.create_node(
    labels=["Company"],
    properties={"name": "Acme Corp", "industry": "Technology", "founded": 2010}
)
print(f"Created company: {acme}")

### Creating Multiple Nodes (Batch)

In [None]:
# Batch create nodes for better performance
# Include 'id' in properties for custom IDs
nodes_data = [
    {"labels": ["Person"], "properties": {"id": "charlie", "name": "Charlie", "age": 35}},
    {"labels": ["Person"], "properties": {"id": "diana", "name": "Diana", "age": 28}},
    {"labels": ["Location"], "properties": {"name": "San Francisco", "state": "CA"}},
]

created_nodes = neptune_store.create_nodes(nodes_data)
print(f"Created {len(created_nodes)} nodes in batch")

### Retrieving Nodes

In [None]:
# Get a specific node by ID
alice_node = neptune_store.get_node(node_id="alice")
print(f"Retrieved: {alice_node}")

# Get nodes by label
people = neptune_store.get_nodes(labels=["Person"], limit=10)
print(f"Found {len(people)} Person nodes:")
for person in people:
    print(f"  - {person.get('properties', {}).get('name')}")

# Get nodes by properties
engineers = neptune_store.get_nodes(
    labels=["Person"],
    properties={"role": "Engineer"},
    limit=5
)
print(f"Found {len(engineers)} engineers")

### Updating Nodes

In [None]:
# Update node properties (merge mode - default)
updated_alice = neptune_store.update_node(
    node_id="alice",
    properties={"age": 31, "department": "AI Research"},
    merge=True
)
print(f"Updated Alice: {updated_alice}")

# Replace all properties (merge=False)
# WARNING: This removes properties not in the update
replaced = neptune_store.update_node(
    node_id="charlie",
    properties={"name": "Charlie", "age": 36},
    merge=False
)

### Deleting Nodes

In [None]:
# Delete a node (with detach=True to also delete relationships)
deleted = neptune_store.delete_node(node_id="diana", detach=True)
print(f"Deleted diana: {deleted}")

# Without detach (fails if node has relationships)
# neptune_store.delete_node(node_id="alice", detach=False)

## Step 3: Relationship Operations

### Creating Relationships

Relationships connect nodes and represent connections between entities.

In [None]:
# Create a relationship between Alice and Acme
works_at = neptune_store.create_relationship(
    start_node_id="alice",
    end_node_id=acme["id"],
    rel_type="WORKS_AT",
    properties={"since": 2020, "position": "Senior Engineer"}
)
print(f"Created relationship: {works_at}")

# Create a KNOWS relationship between people
knows_rel = neptune_store.create_relationship(
    start_node_id="alice",
    end_node_id=bob["id"],
    rel_type="KNOWS",
    properties={"since": 2019}
)

### Retrieving Relationships

In [None]:
# Get all relationships for a node
alice_rels = neptune_store.get_relationships(node_id="alice", direction="both")
print(f"Alice has {len(alice_rels)} relationships")

# Get outgoing relationships only
outgoing = neptune_store.get_relationships(node_id="alice", direction="out")

# Filter by relationship type
works_rels = neptune_store.get_relationships(
    node_id="alice",
    rel_type="WORKS_AT",
    direction="out"
)
print(f"Alice's work relationships: {len(works_rels)}")

### Deleting Relationships

In [None]:
# Delete a specific relationship by ID
if works_at.get("id"):
    deleted = neptune_store.delete_relationship(rel_id=works_at["id"])
    print(f"Deleted relationship: {deleted}")

## Step 4: OpenCypher Queries

Amazon Neptune supports OpenCypher queries via the Bolt protocol. Execute complex graph patterns using standard Cypher syntax.

In [None]:
# Simple query
results = neptune_store.execute_query(
    "MATCH (p:Person) RETURN p.name, p.age ORDER BY p.age"
)
print("People in the graph:")
for record in results.get("records", []):
    print(f"  - {record.get('p.name')}: {record.get('p.age')} years old")

In [None]:
# Using parameters (safer and more efficient)
results = neptune_store.execute_query(
    "MATCH (p:Person) WHERE p.age > $min_age RETURN p.name, p.age",
    parameters={"min_age": 25}
)
print(f"People over 25: {len(results.get('records', []))}")

In [None]:
# Find relationships between nodes
results = neptune_store.execute_query("""
    MATCH (p:Person)-[r:WORKS_AT]->(c:Company)
    RETURN p.name as employee, c.name as company, r.since as start_year
""")
for record in results.get("records", []):
    print(f"{record['employee']} works at {record['company']} since {record['start_year']}")

In [None]:
# Count and aggregate
results = neptune_store.execute_query("""
    MATCH (p:Person)
    RETURN count(p) as total, avg(p.age) as avg_age, max(p.age) as max_age
""")
stats = results.get("records", [{}])[0]
print(f"Total: {stats.get('total')}, Avg Age: {stats.get('avg_age'):.1f}")

## Step 5: Graph Analytics

### Get Neighbors

Traverse the graph to find connected nodes.

In [None]:
# Get immediate neighbors (depth=1)
neighbors = neptune_store.get_neighbors(
    node_id="alice",
    direction="both",
    depth=1
)
print(f"Alice's direct neighbors: {len(neighbors)}")

# Get neighbors up to 2 hops away
extended = neptune_store.get_neighbors(
    node_id="alice",
    direction="out",
    depth=2
)
print(f"Nodes within 2 hops: {len(extended)}")

### Shortest Path

Find the shortest path between two nodes.

In [None]:
# Find shortest path
path = neptune_store.shortest_path(
    start_node_id="alice",
    end_node_id="charlie",
    max_depth=5
)

if path:
    print("Path found!")
    print(f"  Length: {path.get('length')}")
    print(f"  Nodes: {len(path.get('nodes', []))}")
    print(f"  Relationships: {len(path.get('relationships', []))}")
else:
    print("No path found between nodes")

## Step 6: Graph Statistics

Get comprehensive statistics about your graph.

In [None]:
# Get graph statistics
stats = neptune_store.get_stats()

print("Graph Statistics:")
print(f"  Total nodes: {stats.get('node_count', 'N/A')}")
print(f"  Total relationships: {stats.get('relationship_count', 'N/A')}")

print("\nNode labels:")
for label, count in stats.get('label_counts', {}).items():
    print(f"  - {label}: {count}")

print("\nRelationship types:")
for rel_type, count in stats.get('relationship_type_counts', {}).items():
    print(f"  - {rel_type}: {count}")

## Step 7: Connection Management

Always close connections when done to free resources.

In [None]:
# Check connection status
status = neptune_store.get_status()
print(f"Connection status: {status}")

# Close the connection
neptune_store.close()
print("Connection closed")

## Neptune-Specific Considerations

### Native Element IDs

Neptune uses native `~id` for element identification. Include `id` in properties to set a custom ID:

```python
# Create a node with custom ID (include 'id' in properties)
node = neptune_store.create_node(
    labels=["Person"],
    properties={"id": "my-custom-id", "name": "Test"}
)

# Create a node with auto-generated UUID (omit 'id' from properties)
node = neptune_store.create_node(
    labels=["Person"],
    properties={"name": "Test"}
)

# The ID is used in id() function calls internally:
# MATCH (n) WHERE id(n) = 'my-custom-id' RETURN n
```

### OpenCypher Considerations

Amazon Neptune Database's OpenCypher implementation has some differences from Neo4j:

1. **No `shortestPath()` function**: Use variable-length path patterns or `allShortestPaths()`
2. **Labels syntax**: Use `labels(n)` function to retrieve node labels
3. **Property updates**: Use `SET n += {props}` for merge behavior

For the complete OpenCypher specification supported by Amazon Neptune Database, see the [AWS documentation](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-opencypher.html).

### Amazon Neptune Analytics

For analytical (OLAP) workloads such as graph algorithms, aggregations, and large-scale traversals, consider [Amazon Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html). Neptune Analytics complements Neptune Database by providing optimized performance for analytical queries while Neptune Database is optimized for transactional (OLTP) workloads.

### Performance Tips

1. **Use batch operations** for creating multiple nodes/relationships
2. **Use parameters** in queries to enable query caching
3. **Limit result sets** with `LIMIT` clause

## Summary

This notebook covered the Amazon Neptune Graph Store integration:

- **IAM Authentication**: Secure AWS SigV4 signing
- **CRUD Operations**: Full node and relationship management
- **OpenCypher Queries**: Standard graph query language
- **Graph Analytics**: Neighbors and shortest path algorithms
- **Statistics & Monitoring**: Graph metrics and status

### Key Takeaways

- Neptune uses native `~id` for element identification
- IAM authentication is recommended for production
- Bolt protocol provides efficient binary query interface
- Semantica abstracts Neptune-specific syntax differences

### Next Steps

- [Graph Store (Neo4j/FalkorDB)](09_Graph_Store.ipynb) - Compare with other backends
- [Building Knowledge Graphs](07_Building_Knowledge_Graphs.ipynb) - Build production KGs
- [Graph Analytics](10_Graph_Analytics.ipynb) - Advanced analytics algorithms