# Configuration Basics

## Overview

This notebook teaches you how to configure Semantica using `ConfigManager`, environment variables, and configuration files. Proper configuration is essential for using Semantica effectively.

### Learning Objectives

- Understand how to use `ConfigManager` for configuration management
- Learn to set and use environment variables
- Create and load configuration files (YAML/JSON)
- Configure common settings for API keys, models, and processing
- Follow best practices for configuration management

---

## Configuration Methods

Semantica supports three main configuration methods:

1. **ConfigManager** - Programmatic configuration management
2. **Environment Variables** - For sensitive data like API keys
3. **Config Files** - YAML or JSON files for structured configuration

Each method is demonstrated in the code cells below.

---

## Step 1: ConfigManager Basics

`ConfigManager` is the primary way to manage configuration in Semantica. It provides a unified interface for loading and accessing configuration values.


In [None]:
from semantica.core import ConfigManager

config_manager = ConfigManager()

print("ConfigManager initialized successfully!")
print(f"ConfigManager instance: {config_manager}")

try:
    print("\nAccessing configuration values:")
    print("  Use config_manager.get('path.to.config', default='default_value')")
    print("  Example: config_manager.get('llm_provider.provider', default='openai')")
except Exception as e:
    print(f"Error accessing config: {e}")

try:
    config = config_manager.config
    print(f"\n✓ Config object created: {config is not None}")
except Exception as e:
    print(f"Error creating config object: {e}")


## Step 2: Environment Variables

Environment variables are the recommended way to store sensitive information like API keys. They're secure and don't get committed to version control.


In [None]:
import os

print("Environment Variables (SEMANTICA_*):")
semantica_env_vars = {k: v for k, v in os.environ.items() if k.startswith('SEMANTICA_')}
if semantica_env_vars:
    for key, value in semantica_env_vars.items():
        masked_value = value[:4] + "..." if len(value) > 4 else "***"
        print(f"  {key} = {masked_value}")
else:
    print("  No SEMANTICA_* environment variables found")
    print("  To set: os.environ['SEMANTICA_API_KEY'] = 'your_key'")

api_key = os.getenv("SEMANTICA_API_KEY")
model_name = os.getenv("SEMANTICA_MODEL_NAME", "default-model")

print(f"\nRetrieved values:")
print(f"  API Key set: {api_key is not None}")
print(f"  Model name: {model_name}")

print("\nNote: Environment variables with SEMANTICA_ prefix")
print("  are automatically loaded by ConfigManager")


## Step 3: Configuration Files

Configuration files (YAML or JSON) are great for storing non-sensitive settings like model names, batch sizes, and processing parameters. They provide a structured way to manage configuration.


In [None]:
import yaml
import json
from pathlib import Path

sample_config_yaml = """
# Semantica Configuration File
api_keys:
  openai: your_openai_key_here
  anthropic: your_anthropic_key_here

llm_provider:
  provider: openai
  model: gpt-4
  temperature: 0.7

embedding:
  provider: openai
  model: text-embedding-3-large
  dimensions: 3072

knowledge_graph:
  backend: networkx
  temporal: true

processing:
  batch_size: 32
  max_workers: 4

logging:
  level: INFO
  file: semantica.log
"""

config_yaml_path = Path("sample_config.yaml")
config_yaml_path.write_text(sample_config_yaml)

print("Sample config.yaml created:")
print(f"  Path: {config_yaml_path}")
print("\nConfig file contents:")
print(sample_config_yaml)

try:
    config_from_file = config_manager.load_from_file(str(config_yaml_path))
    print("\n✓ Configuration loaded from YAML file!")
    print(f"  Config object: {config_from_file is not None}")
except Exception as e:
    print(f"\n✗ Error loading config file: {e}")

sample_config_json = {
    "api_keys": {
        "openai": "your_openai_key_here",
        "anthropic": "your_anthropic_key_here"
    },
    "llm_provider": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.7
    },
    "embedding": {
        "provider": "openai",
        "model": "text-embedding-3-large",
        "dimensions": 3072
    }
}

config_json_path = Path("sample_config.json")
with open(config_json_path, 'w') as f:
    json.dump(sample_config_json, f, indent=2)

print(f"\n✓ Sample config.json created: {config_json_path}")
print("\nNote: ConfigManager can load from both YAML and JSON files")
print("  config_manager.load_from_file('config.yaml')")
print("  config_manager.load_from_file('config.json')")


## Step 4: Common Settings

This section covers the most commonly used configuration settings, including API keys, model parameters, embedding settings, and processing options.


In [None]:
from semantica.core import Config

print("Common Configuration Settings:")
print("\n1. API Keys:")
print("   - OpenAI API key")
print("   - Anthropic API key")
print("   - Cohere API key")
print("   - Other provider keys")

print("\n2. Model Names and Parameters:")
print("   - LLM provider (openai, anthropic, etc.)")
print("   - Model name (gpt-4, claude-3, etc.)")
print("   - Temperature, max_tokens, etc.")

print("\n3. Embedding Settings:")
print("   - Embedding provider")
print("   - Embedding model")
print("   - Embedding dimensions")

print("\n4. Graph Database Connections:")
print("   - Backend (networkx, neo4j, arangodb)")
print("   - Connection strings")
print("   - Temporal graph settings")

print("\n5. Logging Levels:")
print("   - DEBUG, INFO, WARNING, ERROR")
print("   - Log file paths")

print("\n6. Cache Settings:")
print("   - Enable/disable caching")
print("   - Cache directory")

try:
    custom_config_dict = {
        "llm_provider": {
            "provider": "openai",
            "model": "gpt-4",
            "temperature": 0.7
        },
        "embedding": {
            "provider": "openai",
            "model": "text-embedding-3-large",
            "dimensions": 3072
        },
        "processing": {
            "batch_size": 32,
            "max_workers": 4
        }
    }
    
    custom_config = Config(config_dict=custom_config_dict)
    print("\n✓ Custom Config object created with settings:")
    print(f"  LLM Provider: {custom_config.llm_provider.get('provider', 'N/A')}")
    print(f"  Embedding Provider: {custom_config.embedding_model.get('provider', 'N/A')}")
    print(f"  Batch Size: {custom_config.processing.get('batch_size', 'N/A')}")
    
except Exception as e:
    print(f"\n✗ Error creating custom config: {e}")

try:
    if config_yaml_path.exists():
        config_yaml_path.unlink()
    if config_json_path.exists():
        config_json_path.unlink()
    print("\n✓ Sample config files cleaned up")
except:
    pass


## Step 5: Best Practices

Follow these best practices to ensure secure, maintainable, and effective configuration management.


In [None]:
print("Configuration Best Practices:")
print("\n1. Use Environment Variables for Sensitive Data:")
print("   - Never commit API keys to version control")
print("   - Use environment variables or secret management")
print("   - Example: export SEMANTICA_API_KEY=your_key")

print("\n2. Use Config Files for Non-Sensitive Settings:")
print("   - Store model names, batch sizes, etc. in config files")
print("   - Use YAML for readability or JSON for compatibility")
print("   - Keep config files in version control (without secrets)")

print("\n3. Configuration Hierarchy:")
print("   - Environment variables override config file values")
print("   - Config file values override defaults")
print("   - Use defaults as fallback")

print("\n4. Validate Configuration:")
print("   - Check required settings are present")
print("   - Validate API keys are set before use")
print("   - Use ConfigManager validation features")

print("\n5. Separate Configurations by Environment:")
print("   - Development: dev_config.yaml")
print("   - Production: prod_config.yaml")
print("   - Testing: test_config.yaml")

print("\n6. Document Configuration Options:")
print("   - Document all available settings")
print("   - Provide examples and defaults")
print("   - Explain the impact of each setting")

print("\n" + "="*60)
print("Example: Checking if required configuration is set")
print("="*60)

required_settings = [
    ("API Key", os.getenv("SEMANTICA_API_KEY")),
    ("Model Name", os.getenv("SEMANTICA_MODEL_NAME", "default")),
]

print("\nRequired Settings Status:")
for setting_name, value in required_settings:
    status = "✓ Set" if value and value != "default" else "✗ Not Set"
    print(f"  {setting_name}: {status}")

print("\nRecommendation:")
print("  Set up your configuration before running Semantica workflows")
print("  Use ConfigManager to load and validate your settings")
