An intelligent data processing system that creates a knowledge graph of data relationships using vector embeddings and AI analysis.
SmartMemory automatically processes emails to:
- Generate vector embeddings using Google Gemini AI
- Store emails in Supabase PostgreSQL with pgvector
- Find semantically similar emails using vector similarity search
- Analyze relationships between emails using LLM
- Build knowledge graphs in Neo4j showing email connections
π§ Email Input
β
π€ Gemini Embedding (1536-dim vectors)
β
ποΈ Supabase Storage (PostgreSQL + pgvector)
β
π Vector Similarity Search
β
π§ LLM Relationship Analysis
β
πΈοΈ Neo4j Knowledge Graph
| Component | Technology | Purpose |
|---|---|---|
| AI/ML | Google Gemini API | Vector embeddings & relationship analysis |
| Database | Supabase PostgreSQL | Email storage with pgvector extension |
| Graph DB | Neo4j AuraDB | Knowledge graph for email relationships |
| Backend | Python 3.11+ | Main processing logic |
| Environment | python-dotenv | Configuration management |
SmartMemory/
βββ .env # Environment variables (API keys, DB credentials)
βββ .gitignore # Git ignore rules
βββ config.py # Configuration management
βββ email_processor.py # Main email processing pipeline
βββ gemini_service.py # Gemini AI embedding & analysis
βββ supabase_client.py # Supabase database operations
βββ neo4j_client.py # Neo4j graph database operations
βββ main.py # Application entry point
βββ test_connections.py # Connection testing utility
βββ setup_supabase_function.sql # SQL function for vector similarity
βββ README.md # This file
- Python 3.11+
- Google Gemini API key
- Supabase account with PostgreSQL + pgvector
- Neo4j AuraDB instance
git clone https://github.com/santhosh-005/Smart_Memory.git
cd Smart_Memory
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install google-generativeai supabase neo4j python-dotenvCreate .env file with your credentials:
# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key
# Neo4j Configuration
NEO4J_URI=your_neo4j_uri
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password- Create
emailstable:
CREATE TABLE IF NOT EXISTS emails (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now(),
embedding vector(1536)
);- Run the SQL from
setup_supabase_function.sqlin your Supabase SQL Editor
- No additional setup required - the system creates nodes and relationships automatically
# Test connections
python test_connections.py
# Process sample emails
python main.py- π₯ Email Input: Title and body text
- π― Content Preparation: Combine title + body for comprehensive context
- π€ Generate Embedding: Use Gemini with
RETRIEVAL_DOCUMENTtask type - πΎ Store in Supabase: Save email with 1536-dimension vector
- π Create Neo4j Node: Add email node to knowledge graph
- π Find Similar Emails: Use pgvector similarity search via RPC function
- π§ LLM Analysis: Analyze relationships using Gemini Pro
- π Update Graph: Create relationship edges in Neo4j
REPLY_TO- Direct email repliesFOLLOW_UP_ON- Follow-up discussionsREFERENCES_SAME_TOPIC- Related topic discussionsRELATED_TO- General relationships
from email_processor import process_email
# Process an email
process_email(
title="Project Update - Q4 Status",
body="Hi team, here's the latest update on our Q4 project..."
)Output:
--- Processing new email: 'Project Update - Q4 Status' ---
Storing email in Supabase...
Stored successfully. New email ID: 123
Created initial node in Neo4j for email 123
Searching for similar emails...
Found 2 similar emails.
Querying LLM for relationship analysis...
LLM Response: {'relationship_found': True, 'target_email_id': 45, 'relationship_type': 'FOLLOW_UP_ON'}
Relationship found! Creating edge in Neo4j: (123)-[FOLLOW_UP_ON]->(45)
Graph updated.
Key settings in config.py:
EMBEDDING_DIMENSIONS = 1536 # Gemini embedding size
SIMILARITY_THRESHOLD = 0.7 # Minimum similarity for relationships
MATCH_COUNT = 5 # Max similar emails to analyze# Test all connections
python test_connections.py
# Process sample emails
python main.py- Embedding Generation: ~1-2 seconds per email
- Vector Search: Milliseconds with pgvector indexing
- LLM Analysis: ~2-3 seconds for relationship detection
- Graph Updates: Near-instantaneous
- Solution: Wait for quota reset or upgrade to paid plan
- Check: https://ai.dev/usage?tab=rate-limit
- Verify: URL and API key in
.env - Check: pgvector extension is enabled
- Verify: URI and credentials in
.env - Check: AuraDB instance is running
- Real-time Processing: WebSocket integration for live email processing
- Advanced Relationships: Thread detection and conversation mapping
- Search Interface: Query interface for finding related emails
- Analytics Dashboard: Visualization of email networks
- Batch Processing: Bulk email import functionality
This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section above
- Review API documentation for Gemini, Supabase, and Neo4j
Built with β€οΈ for intelligent email management