diff --git a/docs/CLAUDE.md b/CLAUDE.md
similarity index 100%
rename from docs/CLAUDE.md
rename to CLAUDE.md
diff --git a/docs/agent-examples.md b/docs/agent-examples.md
new file mode 100644
index 0000000..d9ccd5a
--- /dev/null
+++ b/docs/agent-examples.md
@@ -0,0 +1,318 @@
+# Agent Examples
+
+This section provides comprehensive working examples that demonstrate real-world usage patterns of the Redis Agent Memory Server. Each example showcases different aspects of memory management, from basic conversation storage to advanced memory editing workflows.
+
+## 🧳 Travel Agent
+
+**File**: [`examples/travel_agent.py`](https://github.com/redis/agent-memory-server/blob/main/examples/travel_agent.py)
+
+A comprehensive travel assistant that demonstrates the most complete integration patterns.
+
+### Key Features
+
+- **Automatic Tool Discovery**: Uses `MemoryAPIClient.get_all_memory_tool_schemas()` to automatically discover and integrate all available memory tools
+- **Unified Tool Resolution**: Leverages `client.resolve_tool_call()` to handle all memory tool calls uniformly across different LLM providers
+- **Working Memory Management**: Session-based conversation state and structured memory storage
+- **Long-term Memory**: Persistent memory storage and semantic search capabilities
+- **Optional Web Search**: Cached web search using Tavily API with Redis caching
+
+### Available Tools
+
+The travel agent automatically discovers and uses all memory tools:
+
+1. **search_memory** - Search through previous conversations and stored information
+2. **get_working_memory** - Check current session state, stored memories, and data
+3. **add_memory_to_working_memory** - Store important information as structured memories
+4. **update_working_memory_data** - Store/update session-specific data like trip plans
+5. **web_search** (optional) - Search the internet for current travel information
+
+### Usage Examples
+
+```bash
+# Basic interactive usage
+cd examples
+python travel_agent.py
+
+# Automated demo showing capabilities
+python travel_agent.py --demo
+
+# With custom configuration
+python travel_agent.py --session-id my_trip --user-id john_doe --memory-server-url http://localhost:8001
+```
+
+### Environment Setup
+
+```bash
+# Required
+export OPENAI_API_KEY="your-openai-key"
+
+# Optional (for web search)
+export TAVILY_API_KEY="your-tavily-key"
+export REDIS_URL="redis://localhost:6379"
+```
+
+### Key Implementation Patterns
+
+```python
+# Tool auto-discovery
+memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()
+
+# Unified tool resolution for any provider
+result = await client.resolve_tool_call(
+    tool_call=provider_tool_call,
+    session_id=session_id
+)
+
+if result["success"]:
+    print(result["formatted_response"])
+```
+
+## 🧠 Memory Prompt Agent
+
+**File**: [`examples/memory_prompt_agent.py`](https://github.com/redis/agent-memory-server/blob/main/examples/memory_prompt_agent.py)
+
+Demonstrates the simplified memory prompt feature for context-aware conversations without manual tool management.
+
+### Core Concept
+
+Uses `client.memory_prompt()` to automatically retrieve relevant memories and enrich prompts with contextual information.
+
+### How It Works
+
+1. **Store Messages**: All conversation messages stored in working memory
+2. **Memory Prompt**: `memory_prompt()` retrieves relevant context automatically
+3. **Enriched Context**: Memory context combined with system prompt
+4. **LLM Generation**: Enhanced context sent to LLM for personalized responses
+
+### Usage Examples
+
+```bash
+cd examples
+python memory_prompt_agent.py
+
+# With custom session
+python memory_prompt_agent.py --session-id my_session --user-id jane_doe
+```
+
+### Key Implementation Pattern
+
+```python
+# Automatic memory retrieval and context enrichment
+context = await client.memory_prompt(
+    query=user_message,
+    session_id=session_id,
+    long_term_search={
+        "text": user_message,
+        "limit": 5,
+        "user_id": user_id
+    }
+)
+
+# Enhanced prompt with memory context
+response = await openai_client.chat.completions.create(
+    model="gpt-4o",
+    messages=context.messages
+)
+```
+
+## ✏️ Memory Editing Agent
+
+**File**: [`examples/memory_editing_agent.py`](https://github.com/redis/agent-memory-server/blob/main/examples/memory_editing_agent.py)
+
+Demonstrates comprehensive memory editing capabilities through natural conversation patterns.
+
+### Core Features
+
+- **Memory Editing Workflow**: Complete lifecycle of creating, searching, editing, and deleting memories
+- **All Memory Tools**: Uses all available memory management tools including editing capabilities
+- **Realistic Scenarios**: Common patterns like corrections, updates, and information cleanup
+- **Interactive Demo**: Both automated demo and interactive modes
+
+### Memory Operations Demonstrated
+
+1. **search_memory** - Find existing memories using natural language
+2. **get_long_term_memory** - Retrieve specific memories by ID
+3. **add_memory_to_working_memory** - Store new information
+4. **edit_long_term_memory** - Update existing memories
+5. **delete_long_term_memories** - Remove outdated information
+6. **get_working_memory** - Check current session context
+
+### Common Editing Scenarios
+
+```python
+# Correction scenario
+"Actually, I work at Microsoft, not Google"
+# → Search for job memory, edit company name
+
+# Update scenario
+"I got promoted to Senior Engineer"
+# → Find job memory, update title and add promotion date
+
+# Preference change
+"I prefer tea over coffee now"
+# → Search beverage preferences, update from coffee to tea
+
+# Information cleanup
+"Delete that old job information"
+# → Search and remove outdated employment data
+```
+
+### Usage Examples
+
+```bash
+cd examples
+
+# Interactive mode (explore memory editing)
+python memory_editing_agent.py
+
+# Automated demo (see complete workflow)
+python memory_editing_agent.py --demo
+
+# Custom configuration
+python memory_editing_agent.py --session-id alice_session --user-id alice
+```
+
+### Demo Conversation Flow
+
+The automated demo shows a realistic conversation:
+
+1. **Initial Information**: User shares profile (name, job, preferences)
+2. **Corrections**: User corrects information (job company change)
+3. **Updates**: User provides updates (promotion, new title)
+4. **Multiple Changes**: User updates location and preferences
+5. **Information Retrieval**: User asks what agent remembers
+6. **Ongoing Updates**: Continued information updates
+7. **Memory Management**: Specific memory operations (show/delete)
+
+## 🏫 AI Tutor
+
+**File**: [`examples/ai_tutor.py`](https://github.com/redis/agent-memory-server/blob/main/examples/ai_tutor.py)
+
+A functional tutoring system that demonstrates episodic memory for learning tracking and semantic memory for concept management.
+
+### Core Features
+
+- **Quiz Management**: Runs interactive quizzes and stores results
+- **Learning Tracking**: Stores quiz results as episodic memories with timestamps
+- **Concept Tracking**: Tracks weak concepts as semantic memories
+- **Progress Analysis**: Provides summaries and personalized practice suggestions
+
+### Memory Patterns Used
+
+```python
+# Episodic: Per-question results with event dates
+{
+    "text": "User answered 'photosynthesis' question incorrectly",
+    "memory_type": "episodic",
+    "event_date": "2024-01-15T10:30:00Z",
+    "topics": ["quiz", "biology", "photosynthesis"]
+}
+
+# Semantic: Weak concepts for targeted practice
+{
+    "text": "User struggles with photosynthesis concepts",
+    "memory_type": "semantic",
+    "topics": ["weak_concept", "biology", "photosynthesis"]
+}
+```
+
+### Usage Examples
+
+```bash
+cd examples
+
+# Interactive tutoring session
+python ai_tutor.py
+
+# Demo with sample quiz flow
+python ai_tutor.py --demo
+
+# Custom student session
+python ai_tutor.py --user-id student123 --session-id bio_course
+```
+
+### Key Commands
+
+- **Practice**: Start a quiz on specific topics
+- **Summary**: Get learning progress summary
+- **Practice-next**: Get personalized practice recommendations based on weak areas
+
+## Getting Started with Examples
+
+### 1. Prerequisites
+
+```bash
+# Install dependencies
+cd /path/to/agent-memory-server
+uv install --all-extras
+
+# Start memory server
+uv run agent-memory server
+
+# Set required API keys
+export OPENAI_API_KEY="your-openai-key"
+```
+
+### 2. Run Examples
+
+```bash
+cd examples
+
+# Start with the travel agent (most comprehensive)
+python travel_agent.py --demo
+
+# Try memory editing workflows
+python memory_editing_agent.py --demo
+
+# Explore simplified memory prompts
+python memory_prompt_agent.py
+
+# Experience learning tracking
+python ai_tutor.py --demo
+```
+
+### 3. Customize and Extend
+
+Each example is designed to be:
+
+- **Self-contained**: Runs independently with minimal setup
+- **Configurable**: Supports custom sessions, users, and server URLs
+- **Educational**: Well-commented code showing best practices
+- **Production-ready**: Robust error handling and logging
+
+### 4. Implementation Patterns
+
+Key patterns demonstrated across examples:
+
+```python
+# Memory client setup
+client = MemoryAPIClient(
+    base_url="http://localhost:8000",
+    default_namespace=namespace,
+    user_id=user_id
+)
+
+# Tool integration
+tools = MemoryAPIClient.get_all_memory_tool_schemas()
+response = await openai_client.chat.completions.create(
+    model="gpt-4o",
+    messages=messages,
+    tools=tools
+)
+
+# Tool resolution
+for tool_call in response.choices[0].message.tool_calls:
+    result = await client.resolve_tool_call(
+        tool_call=tool_call,
+        session_id=session_id
+    )
+```
+
+## Next Steps
+
+- **Start with Travel Agent**: Most comprehensive example showing all features
+- **Explore Memory Editing**: Learn advanced memory management patterns
+- **Study Code Patterns**: Each example demonstrates different architectural approaches
+- **Build Your Own**: Use examples as templates for your specific use case
+
+All examples include detailed inline documentation and can serve as starting points for building production memory-enhanced AI applications.
diff --git a/docs/configuration.md b/docs/configuration.md
index 99a0af5..c69601b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1,44 +1,227 @@
 # Configuration
 
-You can configure the MCP and REST servers and task worker using environment
-variables. See the file `config.py` for all the available settings.
+The Redis Agent Memory Server can be configured via environment variables or YAML configuration files. All settings have sensible defaults for development, but you'll want to customize them for production.
 
-The names of the settings map directly to an environment variable, so for
-example, you can set the `openai_api_key` setting with the `OPENAI_API_KEY`
-environment variable.
+## Configuration Methods
 
-## Running the Background Task Worker
+### Environment Variables
+Setting names map directly to environment variables in UPPERCASE:
+```bash
+export REDIS_URL=redis://localhost:6379
+export OPENAI_API_KEY=your-key-here
+export GENERATION_MODEL=gpt-4o
+```
 
-The Redis Memory Server uses Docket for background task management. You can run a worker instance like this:
+### YAML Configuration File
+Set `REDIS_MEMORY_CONFIG` to point to a YAML file:
+```bash
+export REDIS_MEMORY_CONFIG=config.yaml
+```
+
+Example `config.yaml`:
+```yaml
+redis_url: redis://localhost:6379
+generation_model: gpt-4o
+embedding_model: text-embedding-3-small
+enable_topic_extraction: true
+log_level: INFO
+```
+
+**Note**: Environment variables override YAML file settings.
 
+## Core Settings
+
+### Redis Connection
 ```bash
-uv run agent-memory task-worker
+REDIS_URL=redis://localhost:6379  # Redis connection string
 ```
 
-You can customize the concurrency and redelivery timeout:
+### AI Model Configuration
+```bash
+# Generation models for LLM tasks
+GENERATION_MODEL=gpt-4o              # Primary model (default: gpt-4o)
+SLOW_MODEL=gpt-4o                    # Complex tasks (default: gpt-4o)
+FAST_MODEL=gpt-4o-mini               # Quick tasks (default: gpt-4o-mini)
+
+# Embedding model for vector search
+EMBEDDING_MODEL=text-embedding-3-small  # OpenAI embeddings (default)
+
+# API Keys
+OPENAI_API_KEY=your-openai-key
+ANTHROPIC_API_KEY=your-anthropic-key
+
+# Optional: Custom API endpoints
+OPENAI_API_BASE=https://api.openai.com/v1
+ANTHROPIC_API_BASE=https://api.anthropic.com
+```
 
+### Server Ports
 ```bash
-uv run agent-memory task-worker --concurrency 5 --redelivery-timeout 60
+PORT=8000          # REST API port (default: 8000)
+MCP_PORT=9000      # MCP server port (default: 9000)
+```
+
+## Memory System Configuration
+
+### Long-Term Memory
+```bash
+LONG_TERM_MEMORY=true                    # Enable persistent memory (default: true)
+ENABLE_DISCRETE_MEMORY_EXTRACTION=true  # Extract structured memories from conversations (default: true)
+INDEX_ALL_MESSAGES_IN_LONG_TERM_MEMORY=false  # Index every message (default: false)
+```
+
+### Vector Store Configuration
+```bash
+# Vector store factory (advanced)
+VECTORSTORE_FACTORY=agent_memory_server.vectorstore_factory.create_redis_vectorstore
+
+# RedisVL Settings (used by default Redis factory)
+REDISVL_INDEX_NAME=memory_records        # Index name (default: memory_records)
+REDISVL_DISTANCE_METRIC=COSINE           # Distance metric (default: COSINE)
+REDISVL_VECTOR_DIMENSIONS=1536           # Vector dimensions (default: 1536)
+REDISVL_INDEX_PREFIX=memory_idx          # Index prefix (default: memory_idx)
+REDISVL_INDEXING_ALGORITHM=HNSW          # Indexing algorithm (default: HNSW)
 ```
 
-## Memory Compaction
+### Working Memory
+```bash
+SUMMARIZATION_THRESHOLD=0.7  # Fraction of context window that triggers summarization (default: 0.7)
+```
 
-The memory compaction functionality optimizes storage by merging duplicate and semantically similar memories. This improves retrieval quality and reduces storage costs.
+## AI Features Configuration
 
-### Running Compaction
+### Topic Modeling
+```bash
+ENABLE_TOPIC_EXTRACTION=true     # Extract topics from memories (default: true)
+TOPIC_MODEL_SOURCE=LLM           # Options: LLM, BERTopic (default: LLM)
+TOPIC_MODEL=gpt-4o-mini          # Model for topic extraction (default: gpt-4o-mini)
+TOP_K_TOPICS=3                   # Maximum topics per memory (default: 3)
+```
 
-Memory compaction is available as a task function in `agent_memory_server.long_term_memory.compact_long_term_memories`. You can trigger it manually
-by running the `agent-memory schedule-task` command:
+### Entity Recognition
+```bash
+ENABLE_NER=true                  # Extract entities from text (default: true)
+NER_MODEL=dbmdz/bert-large-cased-finetuned-conll03-english  # NER model (default)
+```
 
+### Query Optimization
 ```bash
-uv run agent-memory schedule-task "agent_memory_server.long_term_memory.compact_long_term_memories"
+MIN_OPTIMIZED_QUERY_LENGTH=2     # Minimum query length to optimize (default: 2)
+
+# Custom query optimization prompt template
+QUERY_OPTIMIZATION_PROMPT_TEMPLATE="Transform this query for semantic search..."
 ```
 
-### Key Features
+## Memory Lifecycle
 
-- **Hash-based Deduplication**: Identifies and merges exact duplicate memories using content hashing
-- **Semantic Deduplication**: Finds and merges memories with similar meaning using vector search
-- **LLM-powered Merging**: Uses language models to intelligently combine memories
+### Forgetting Configuration
+```bash
+FORGETTING_ENABLED=false          # Enable automatic forgetting (default: false)
+FORGETTING_EVERY_MINUTES=60       # Run forgetting every N minutes (default: 60)
+FORGETTING_MAX_AGE_DAYS=30        # Delete memories older than N days
+FORGETTING_MAX_INACTIVE_DAYS=7    # Delete memories inactive for N days
+FORGETTING_BUDGET_KEEP_TOP_N=1000 # Keep only top N most recent memories
+```
+
+## Background Tasks
+
+### Docket Configuration
+```bash
+USE_DOCKET=true           # Enable background task processing (default: true)
+DOCKET_NAME=memory-server # Docket instance name (default: memory-server)
+```
+
+## Application Settings
+
+### Logging
+```bash
+LOG_LEVEL=INFO            # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
+```
+
+### MCP Defaults
+```bash
+DEFAULT_MCP_USER_ID=default-user    # Default user ID for MCP requests
+DEFAULT_MCP_NAMESPACE=default       # Default namespace for MCP requests
+```
+
+## Running the Background Task Worker
+
+The Redis Memory Server uses Docket for background task management. You can run a worker instance like this:
+
+```bash
+uv run agent-memory task-worker
+```
+
+You can customize the concurrency and redelivery timeout:
+
+```bash
+uv run agent-memory task-worker --concurrency 5 --redelivery-timeout 60
+```
+
+## Supported Models
+
+### Generation Models (OpenAI)
+- `gpt-4o` - Latest GPT-4 Optimized (recommended)
+- `gpt-4o-mini` - Faster, smaller GPT-4 (good for fast_model)
+- `gpt-4` - Previous GPT-4 version
+- `gpt-3.5-turbo` - Older, faster model
+- `o1` - OpenAI o1 reasoning model
+- `o1-mini` - Smaller o1 model
+- `o3-mini` - OpenAI o3 model
+
+### Generation Models (Anthropic)
+- `claude-3-7-sonnet-latest` - Latest Claude 3.7 Sonnet (recommended)
+- `claude-3-5-sonnet-latest` - Claude 3.5 Sonnet
+- `claude-3-5-haiku-latest` - Fast Claude 3.5 Haiku
+- `claude-3-opus-latest` - Most capable Claude model
+- Version-specific models also supported (e.g., `claude-3-5-sonnet-20241022`)
+
+### Embedding Models (OpenAI only)
+- `text-embedding-3-small` - 1536 dimensions (recommended)
+- `text-embedding-3-large` - 3072 dimensions (higher accuracy)
+- `text-embedding-ada-002` - Legacy model (1536 dimensions)
+
+## Configuration Examples
+
+### Development Setup
+```yaml
+# config-dev.yaml
+redis_url: redis://localhost:6379
+generation_model: gpt-4o-mini  # Faster for development
+embedding_model: text-embedding-3-small
+log_level: DEBUG
+disable_auth: true
+enable_topic_extraction: false  # Skip AI features for faster startup
+enable_ner: false
+```
+
+### Production Setup
+```yaml
+# config-prod.yaml
+redis_url: redis://prod-redis:6379
+generation_model: gpt-4o
+embedding_model: text-embedding-3-large
+log_level: INFO
+auth_mode: oauth2
+oauth2_issuer_url: https://your-auth.com
+oauth2_audience: https://your-api.com
+enable_topic_extraction: true
+enable_ner: true
+forgetting_enabled: true
+forgetting_max_age_days: 90
+```
+
+### High-Performance Setup
+```yaml
+# config-performance.yaml
+redis_url: redis://redis-cluster:6379
+fast_model: gpt-4o-mini
+slow_model: gpt-4o
+redisvl_indexing_algorithm: HNSW
+redisvl_vector_dimensions: 1536
+use_docket: true
+summarization_threshold: 0.8  # Less frequent summarization
+```
 
 ## Running Migrations
 
diff --git a/docs/index.md b/docs/index.md
index bb436d5..3aed132 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -6,37 +6,37 @@ Transform your AI agents from goldfish 🐠 into elephants 🐘 with Redis-power
 
 <div class="grid cards" markdown>
 
--   :rocket:{ .lg .middle } **Quick Start**
+-   🚀 **Quick Start**
 
     ---
 
     Get up and running in 5 minutes with our step-by-step guide
 
-    [:octicons-arrow-right-24: Quick Start Guide](quick-start.md)
+    [Quick Start Guide →](quick-start.md)
 
--   :brain:{ .lg .middle } **Use Cases**
+-   🧠 **Use Cases**
 
     ---
 
     See real-world examples across industries and applications
 
-    [:octicons-arrow-right-24: Explore Use Cases](use-cases.md)
+    [Explore Use Cases →](use-cases.md)
 
--   :material-sdk:{ .lg .middle } **Python SDK**
+-   🐍 **Python SDK**
 
     ---
 
     Easy integration with tool abstractions for OpenAI and Anthropic
 
-    [:octicons-arrow-right-24: SDK Documentation](python-sdk.md)
+    [SDK Documentation →](python-sdk.md)
 
--   :sparkles:{ .lg .middle } **New Features**
+-   ✨ **New Features**
 
     ---
 
     Advanced features in v0.10.0: query optimization, memory editing, and more
 
-    [:octicons-arrow-right-24: Advanced Features](query-optimization.md)
+    [Advanced Features →](query-optimization.md)
 
 </div>
 
@@ -134,7 +134,7 @@ Ready to give your AI agents perfect memory?
 
 Start with our quick tutorial to understand the basics and see immediate results.
 
-[Quick Start Guide :material-rocket-launch:](quick-start.md){ .md-button .md-button--primary }
+[🚀 Quick Start Guide](quick-start.md){ .md-button .md-button--primary }
 </div>
 
 <div markdown>
@@ -142,7 +142,7 @@ Start with our quick tutorial to understand the basics and see immediate results
 
 Jump into the API documentation and start building with REST or MCP interfaces.
 
-[API Documentation :material-api:](api.md){ .md-button }
+[📚 API Documentation](api.md){ .md-button }
 </div>
 
 </div>
@@ -153,46 +153,46 @@ Jump into the API documentation and start building with REST or MCP interfaces.
 
 <div class="grid cards" markdown>
 
--   :brain:{ .lg .middle } **Query Optimization**
+-   🧠 **Query Optimization**
 
     ---
 
     AI-powered query refinement with configurable models for better search accuracy
 
-    [:octicons-arrow-right-24: Learn More](query-optimization.md)
+    [Learn More →](query-optimization.md)
 
--   :link:{ .lg .middle } **Contextual Grounding**
+-   🔗 **Contextual Grounding**
 
     ---
 
     Resolve pronouns and references in extracted memories for clearer context
 
-    [:octicons-arrow-right-24: Learn More](contextual-grounding.md)
+    [Learn More →](contextual-grounding.md)
 
--   :pencil2:{ .lg .middle } **Memory Editing**
+-   ✏️ **Memory Editing**
 
     ---
 
     Update and correct existing memories through REST API and MCP tools
 
-    [:octicons-arrow-right-24: Learn More](memory-editing.md)
+    [Learn More →](memory-editing.md)
 
--   :clock1:{ .lg .middle } **Recency Boost**
+-   🕐 **Recency Boost**
 
     ---
 
     Time-aware memory ranking that surfaces relevant recent information
 
-    [:octicons-arrow-right-24: Learn More](recency-boost.md)
+    [Learn More →](recency-boost.md)
 
 </div>
 
 ## Community & Support
 
-- **:material-github: Source Code**: [GitHub Repository](https://github.com/redis/agent-memory-server)
-- **:material-docker: Docker Images**: [Docker Hub](https://hub.docker.com/r/andrewbrookins510/agent-memory-server)
-- **:material-bug: Issues**: [Report Issues](https://github.com/redis/agent-memory-server/issues)
-- **:material-book-open: Examples**: [Complete Examples](https://github.com/redis/agent-memory-server/tree/main/examples)
+- **💻 Source Code**: [GitHub Repository](https://github.com/redis/agent-memory-server)
+- **🐳 Docker Images**: [Docker Hub](https://hub.docker.com/r/andrewbrookins510/agent-memory-server)
+- **🐛 Issues**: [Report Issues](https://github.com/redis/agent-memory-server/issues)
+- **📖 Examples**: [Complete Examples](https://github.com/redis/agent-memory-server/tree/main/examples)
 
 ---
 
diff --git a/docs/memory-integration-patterns.md b/docs/memory-integration-patterns.md
index f2793f2..0bbf53b 100644
--- a/docs/memory-integration-patterns.md
+++ b/docs/memory-integration-patterns.md
@@ -27,7 +27,7 @@ memory_client = MemoryAPIClient(base_url="http://localhost:8000")
 openai_client = openai.AsyncOpenAI()
 
 # Get memory tools for the LLM
-memory_tools = memory_client.get_openai_tool_schemas()
+memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()
 
 # Give LLM access to memory tools
 response = await openai_client.chat.completions.create(
@@ -70,7 +70,7 @@ class LLMMemoryAgent:
         })
 
         # Get memory tools
-        tools = self.memory_client.get_openai_tool_schemas()
+        tools = MemoryAPIClient.get_all_memory_tool_schemas()
 
         # Generate response with memory tools
         response = await self.openai_client.chat.completions.create(
@@ -690,7 +690,7 @@ class SmartChatAgent:
 
     async def chat(self, user_message: str, user_id: str, session_id: str) -> str:
         # Get memory tools
-        tools = self.memory_client.get_openai_tool_schemas()
+        tools = MemoryAPIClient.get_all_memory_tool_schemas()
 
         # LLM-driven: Let LLM use memory tools
         response = await self.openai_client.chat.completions.create(
diff --git a/docs/memory-lifecycle.md b/docs/memory-lifecycle.md
index f08a0bb..3905834 100644
--- a/docs/memory-lifecycle.md
+++ b/docs/memory-lifecycle.md
@@ -13,6 +13,88 @@ Memory lifecycle in the system follows these stages:
 5. **Forgetting** - Memories are deleted based on configurable policies
 6. **Compaction** - Background processes optimize storage and indexes
 
+## Memory Creation Patterns
+
+The memory server is designed for **LLM-driven memory management**, where AI agents make intelligent decisions about what to remember and when. There are three primary patterns for creating long-term memories:
+
+### 1. Automatic Background Extraction
+The server continuously analyzes conversation messages using an LLM to automatically extract important facts:
+
+```python
+# Conversations are analyzed in the background
+working_memory = WorkingMemory(
+    session_id="user_session",
+    messages=[
+        {"role": "user", "content": "My name is Sarah, I'm a data scientist at Google"},
+        {"role": "assistant", "content": "Nice to meet you Sarah! How long have you been at Google?"},
+        {"role": "user", "content": "About 2 years now. I work primarily with machine learning models"}
+    ]
+)
+
+# Server automatically extracts and creates:
+# - "User's name is Sarah, works as data scientist at Google for 2 years"
+# - "Sarah specializes in machine learning models"
+```
+
+**Benefits**:
+- Zero extra API calls required
+- No LLM token usage from your application
+- Continuous learning from natural conversations
+- Handles implicit information extraction
+
+### 2. LLM-Optimized Batch Storage
+Your LLM pre-identifies important information and batches it with working memory updates:
+
+```python
+# Your LLM analyzes conversation and identifies memories
+working_memory = WorkingMemory(
+    session_id="user_session",
+    messages=conversation_messages,
+    memories=[
+        MemoryRecord(
+            text="User Sarah prefers Python over R for data analysis",
+            memory_type="semantic",
+            topics=["preferences", "programming", "data_science"],
+            entities=["Sarah", "Python", "R", "data analysis"]
+        )
+    ]
+)
+
+# Single API call stores both conversation and memories
+await client.set_working_memory("user_session", working_memory)
+```
+
+**Benefits**:
+- Performance optimization - no separate API calls
+- LLM has full conversation context for better memory decisions
+- Structured metadata (topics, entities) for better search
+- Immediate availability for search
+
+### 3. Direct Long-Term Memory API
+For real-time memory creation or when working without sessions:
+
+```python
+# LLM can use create_long_term_memory tool directly
+await client.create_long_term_memories([
+    {
+        "text": "User completed advanced Python certification course",
+        "memory_type": "episodic",
+        "event_date": "2024-01-15T10:00:00Z",
+        "topics": ["education", "certification", "python"],
+        "entities": ["Python certification"],
+        "user_id": "sarah_123"
+    }
+])
+```
+
+**Benefits**:
+- Immediate storage without working memory
+- Perfect for event-driven memory creation
+- Fine-grained control over memory attributes
+- Cross-session memory creation
+
+> **🎯 Recommended Pattern**: Use method #2 (LLM-optimized batch storage) for most applications as it provides the best balance of performance, control, and automatic background processing.
+
 ## Memory Forgetting
 
 ### Forgetting Policies
@@ -275,7 +357,7 @@ async def cleanup_working_memory(client: MemoryAPIClient):
 
 ### Background Compaction
 
-The system automatically runs compaction tasks to:
+The system automatically runs compaction tasks every 10 minutes to:
 
 - Merge similar memories
 - Update embeddings for improved accuracy
diff --git a/docs/memory-types.md b/docs/memory-types.md
index 6ddd59e..706a3ba 100644
--- a/docs/memory-types.md
+++ b/docs/memory-types.md
@@ -56,16 +56,17 @@ Working memory contains:
 
 2. **Temporary Structured Data**
    ```python
-   # Store temporary facts during conversation
+   # Store temporary facts during conversation (using data field)
    working_memory = WorkingMemory(
        session_id="chat_123",
-       memories=[
-           MemoryRecord(
-               text="User is planning a trip to Paris next month",
-               id="temp_trip_info",
-               memory_type="episodic"
-           )
-       ]
+       data={
+           "temp_trip_info": {
+               "destination": "Paris",
+               "travel_month": "next month",
+               "planning_stage": "initial"
+           },
+           "conversation_context": "travel planning"
+       }
    )
    ```
 
@@ -82,6 +83,29 @@ Working memory contains:
    )
    ```
 
+4. **Promoting Memories to Long-Term Storage**
+   ```python
+   # Memories in working memory are automatically promoted to long-term storage
+   working_memory = WorkingMemory(
+       session_id="chat_123",
+       memories=[
+           MemoryRecord(
+               text="User is planning a trip to Paris next month",
+               id="trip_planning_paris",
+               memory_type="episodic",
+               topics=["travel", "planning"],
+               entities=["Paris"]
+           )
+       ]
+   )
+   # This memory will become permanent in long-term storage
+   ```
+
+> **🔑 Key Distinction**:
+> - Use `data` field for **temporary** facts that stay only in the session
+> - Use `memories` field for **permanent** facts that should be promoted to long-term storage
+> - Anything in the `memories` field will automatically become persistent and searchable across all future sessions
+
 ### API Endpoints
 
 ```http
@@ -104,6 +128,61 @@ When structured memories in working memory are stored, they are automatically pr
 3. Memories are indexed in long-term storage with vector embeddings
 4. Working memory is updated with `persisted_at` timestamps
 
+### Three Ways to Create Long-Term Memories
+
+Long-term memories are typically created by LLMs (either yours or the memory server's) based on conversations. There are three pathways:
+
+#### 1. 🤖 **Automatic Extraction from Conversations**
+The server automatically extracts memories from conversation messages using an LLM in the background:
+
+```python
+# Server analyzes messages and creates memories automatically
+working_memory = WorkingMemory(
+    session_id="chat_123",
+    messages=[
+        {"role": "user", "content": "I love Italian food, especially carbonara"},
+        {"role": "assistant", "content": "Great! I'll remember your preference for Italian cuisine."}
+    ]
+    # Server will extract: "User enjoys Italian food, particularly carbonara pasta"
+)
+```
+
+#### 2. ⚡ **LLM-Identified Memories via Working Memory** (Performance Optimization)
+Your LLM can pre-identify memories and add them to working memory for batch storage:
+
+```python
+# LLM identifies important facts and adds to memories field
+working_memory = WorkingMemory(
+    session_id="chat_123",
+    memories=[
+        MemoryRecord(
+            text="User prefers morning meetings and dislikes calls after 4 PM",
+            memory_type="semantic",
+            topics=["preferences", "scheduling"],
+            entities=["morning meetings", "4 PM"]
+        )
+    ]
+    # Automatically promoted to long-term storage when saving working memory
+)
+```
+
+#### 3. 🎯 **Direct Long-Term Memory Creation**
+Create memories directly via API or LLM tool calls:
+
+```python
+# Direct API call or LLM using create_long_term_memory tool
+await client.create_long_term_memories([
+    {
+        "text": "User works as a software engineer at TechCorp",
+        "memory_type": "semantic",
+        "topics": ["career", "work"],
+        "entities": ["software engineer", "TechCorp"]
+    }
+])
+```
+
+> **💡 LLM-Driven Design**: The system is designed for LLMs to make memory decisions. Your LLM can use memory tools to search existing memories, decide what's important to remember, and choose the most efficient storage method.
+
 ## Long-Term Memory
 
 Long-term memory is **persistent**, **cross-session** storage designed for knowledge that should be retained and searchable across all interactions. It's the "knowledge base" where important facts, preferences, and experiences are stored.
diff --git a/docs/python-sdk.md b/docs/python-sdk.md
index 057cc33..da2c3f5 100644
--- a/docs/python-sdk.md
+++ b/docs/python-sdk.md
@@ -4,6 +4,8 @@ The Python SDK (`agent-memory-client`) provides the easiest way to integrate mem
 
 ## Installation
 
+**Requirements**: Python 3.10 or higher
+
 ```bash
 pip install agent-memory-client
 ```
@@ -89,7 +91,7 @@ memory_client = MemoryAPIClient(base_url="http://localhost:8000")
 openai_client = openai.AsyncClient()
 
 # Get tool schemas for OpenAI
-memory_tools = memory_client.get_openai_tool_schemas()
+memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()
 
 async def chat_with_memory(message: str, session_id: str):
     # Make request with memory tools
@@ -103,16 +105,32 @@ async def chat_with_memory(message: str, session_id: str):
     # Process tool calls automatically
     if response.choices[0].message.tool_calls:
         # Resolve all tool calls
-        tool_results = await memory_client.resolve_openai_tool_calls(
-            tool_calls=response.choices[0].message.tool_calls,
-            session_id=session_id
-        )
+        results = []
+        for tool_call in response.choices[0].message.tool_calls:
+            result = await memory_client.resolve_tool_call(
+                tool_call=tool_call,
+                session_id=session_id
+            )
+            if result["success"]:
+                results.append({
+                    "role": "tool",
+                    "tool_call_id": tool_call.id,
+                    "name": tool_call.function.name,
+                    "content": result["formatted_response"]
+                })
+            else:
+                results.append({
+                    "role": "tool",
+                    "tool_call_id": tool_call.id,
+                    "name": tool_call.function.name,
+                    "content": f"Error: {result['error']}"
+                })
 
         # Continue conversation with results
         messages = [
             {"role": "user", "content": message},
             response.choices[0].message,
-            *tool_results
+            *results
         ]
 
         final_response = await openai_client.chat.completions.create(
@@ -138,7 +156,7 @@ memory_client = MemoryAPIClient(base_url="http://localhost:8000")
 anthropic_client = anthropic.AsyncClient()
 
 # Get tool schemas for Anthropic
-memory_tools = memory_client.get_anthropic_tool_schemas()
+memory_tools = MemoryAPIClient.get_all_memory_tool_schemas_anthropic()
 
 async def chat_with_memory(message: str, session_id: str):
     response = await anthropic_client.messages.create(
@@ -150,16 +168,35 @@ async def chat_with_memory(message: str, session_id: str):
 
     # Process tool calls
     if response.stop_reason == "tool_use":
-        tool_results = await memory_client.resolve_anthropic_tool_calls(
-            tool_calls=response.content,
-            session_id=session_id
-        )
+        results = []
+        for content_block in response.content:
+            if content_block.type == "tool_use":
+                result = await memory_client.resolve_tool_call(
+                    tool_call={
+                        "type": "tool_use",
+                        "id": content_block.id,
+                        "name": content_block.name,
+                        "input": content_block.input
+                    },
+                    session_id=session_id
+                )
+                if result["success"]:
+                    results.append({
+                        "type": "tool_result",
+                        "tool_use_id": content_block.id,
+                        "content": result["formatted_response"]
+                    })
+                else:
+                    results.append({
+                        "type": "tool_result",
+                        "tool_use_id": content_block.id,
+                        "content": f"Error: {result['error']}"
+                    })
 
         # Continue conversation
         messages = [
             {"role": "user", "content": message},
-            {"role": "assistant", "content": response.content},
-            {"role": "user", "content": tool_results}
+            {"role": "assistant", "content": response.content + results}
         ]
 
         final_response = await anthropic_client.messages.create(
diff --git a/docs/quick-start.md b/docs/quick-start.md
index e933145..18fa84e 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -12,7 +12,7 @@ By the end of this guide, you'll:
 
 ## Prerequisites
 
-- Python 3.8 or higher
+- Python 3.12 (for the memory server)
 - Docker (for Redis)
 - 5 minutes
 
@@ -32,7 +32,7 @@ git clone https://github.com/redis/redis-memory-server.git
 cd redis-memory-server
 
 # Install server dependencies
-uv sync --all-extras
+uv sync
 ```
 
 ## Step 2: Start Redis
@@ -201,7 +201,7 @@ For more advanced use cases, use automatic tool integration with OpenAI:
 
 ```python
 # Get OpenAI tool schemas
-memory_tools = memory_client.get_openai_tool_schemas()
+memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()
 
 # Chat with automatic memory tools
 response = await openai_client.chat.completions.create(
@@ -213,11 +213,15 @@ response = await openai_client.chat.completions.create(
 
 # Let the AI decide when to store memories
 if response.choices[0].message.tool_calls:
-    tool_results = await memory_client.resolve_openai_tool_calls(
-        tool_calls=response.choices[0].message.tool_calls,
-        session_id="my-session"
-    )
-    print("AI automatically stored your allergy information!")
+    for tool_call in response.choices[0].message.tool_calls:
+        result = await memory_client.resolve_tool_call(
+            tool_call=tool_call,
+            session_id="my-session"
+        )
+        if result["success"]:
+            print("AI automatically stored your allergy information!")
+        else:
+            print(f"Error: {result['error']}")
 ```
 
 ## Alternative: REST API Usage
@@ -271,20 +275,12 @@ curl -X POST "http://localhost:8000/v1/memory/prompt" \
 
 ## Using MCP Interface (Optional)
 
-If you want to use the MCP interface with Claude Desktop or other MCP clients:
-
-### Start MCP Server
-
-```bash
-# Start MCP server in stdio mode (for Claude Desktop)
-uv run agent-memory mcp --mode stdio
-
-# Or start in SSE mode (for web clients)
-uv run agent-memory mcp --mode sse --port 9000
-```
+If you want to use the MCP interface with Claude Desktop:
 
 ### Configure Claude Desktop
 
+**Note**: You don't need to manually start the MCP server. Claude Desktop will automatically start and manage the server process when needed.
+
 Add to your Claude Desktop config:
 
 ```json
@@ -308,6 +304,17 @@ Add to your Claude Desktop config:
 
 Now Claude can use memory tools directly in conversations!
 
+### Alternative: SSE Mode (Advanced)
+
+For web-based MCP clients, you can use SSE mode, but this requires manually starting the server:
+
+```bash
+# Only needed for SSE mode
+uv run agent-memory mcp --mode sse --port 9000
+```
+
+**Recommendation**: Use stdio mode with Claude Desktop as it's much simpler to set up.
+
 ## Understanding Memory Types
 
 You've just worked with both types of memory:
@@ -360,8 +367,8 @@ Now that you have the basics working, explore these advanced features:
 - Or disable AI features temporarily
 
 **"Module 'redisvl' not found"**
-- Install with extras: `uv sync --all-extras`
-- Or install manually: `uv add redisvl`
+- Run: `uv sync` (redisvl is a required dependency, not optional)
+- If still failing, try: `uv add redisvl>=0.6.0`
 
 **"Background tasks not processing"**
 - Make sure the task worker is running: `uv run agent-memory task-worker`
diff --git a/docs/recency-boost.md b/docs/recency-boost.md
index a9fa008..9473f48 100644
--- a/docs/recency-boost.md
+++ b/docs/recency-boost.md
@@ -7,6 +7,7 @@ Recency boost is an intelligent memory ranking system that combines semantic sim
 Traditional semantic search relies solely on vector similarity, which may return old or rarely-used memories that are semantically similar but not contextually relevant. Recency boost addresses this by incorporating temporal factors to provide more useful, context-aware search results.
 
 **Key Benefits:**
+
 - **Time-aware search**: Recent memories are weighted higher in results
 - **Access pattern learning**: Frequently accessed memories get priority
 - **Freshness boost**: Newly created memories are more likely to surface
diff --git a/docs/vector-store-advanced.md b/docs/vector-store-advanced.md
index bec07f4..586683f 100644
--- a/docs/vector-store-advanced.md
+++ b/docs/vector-store-advanced.md
@@ -800,5 +800,3 @@ class ZeroDowntimeMigrator:
         print("✅ Cutover completed successfully")
         return final_check
 ```
-
-This documentation covers advanced architectural patterns for vector store configuration, focusing on flexible factory patterns, custom implementations, and data migration strategies that work across different backends.
diff --git a/mkdocs.yml b/mkdocs.yml
index d01dd5b..2a09727 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -78,20 +78,21 @@ nav:
   - Integration:
     - Python SDK: python-sdk.md
     - Memory Integration Patterns: memory-integration-patterns.md
+    - Agent Examples: agent-examples.md
 
   - Core Concepts:
     - Memory Types: memory-types.md
+    - Memory Editing: memory-editing.md
+    - Memory Lifecycle: memory-lifecycle.md
+    - Vector Store Backends: vector-store-backends.md
     - Authentication: authentication.md
     - Configuration: configuration.md
 
-  - Advanced Features:
+  - Advanced Topics:
     - Query Optimization: query-optimization.md
-    - Contextual Grounding: contextual-grounding.md
-    - Memory Editing: memory-editing.md
-    - Memory Lifecycle: memory-lifecycle.md
     - Recency Boost: recency-boost.md
-    - Vector Store Backends: vector-store-backends.md
     - Advanced Vector Store Config: vector-store-advanced.md
+    - Contextual Grounding: contextual-grounding.md
 
   - API Interfaces:
     - REST API: api.md
@@ -100,7 +101,6 @@ nav:
 
   - Development:
     - Development Guide: development.md
-    - Claude Code Guide: CLAUDE.md
 
 plugins:
   - search: