# pycarta Data Management

This notebook demonstrates data management capabilities including FormsDB, Tablify, and Graph operations.

## Prerequisites

- Valid Carta authentication (see `01_authentication.ipynb`)
- Project access for FormsDB operations
- Understanding of JSON Schema (helpful for FormsDB)

## Setup

In [2]:
import pycarta as pc
import pandas as pd
import json
from datetime import datetime

# Ensure you're authenticated
pc.login(interactive=True)  # Uncomment and authenticate as needed

print("Data management setup complete - ensure authentication before running examples")

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/bkappes/src/contextualize/pycarta/pycarta/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File "/Users/bkappes/src/contextualize/pycarta/pycarta/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bkappes/src/contextualize/pycarta/pycarta/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File "/Users/bkappes/src/contextualize/pycarta/pycarta/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
    response = await connection.handle_async_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bkappes/src/contextua

KeyboardInterrupt: 

## FormsDB - Schema-Aware Data Management

### Basic FormsDB Operations

In [None]:
from pycarta.formsdb import FormsDb

# Initialize FormsDB
try:
    # pc.login()
    # agent = pc.get_agent()
    # formsdb = FormsDb(credentials=agent, project_id="my-project-id")
    print("FormsDB initialization ready (uncomment when authenticated)")
    print("FormsDb(credentials=pc.get_agent(), project_id='my-project')")
except Exception as e:
    print(f"FormsDB initialization error: {e}")
    print("Note: Requires valid authentication and project access")

### Working with Folders

In [None]:
# Folder management examples
def demo_folder_operations():
    """Demonstrate FormsDB folder operations."""
    print("""
    FORMSDB FOLDER OPERATIONS:
    
    # Create hierarchical folder structure
    project_folder = formsdb.folder.create("research-project")
    data_folder = formsdb.folder.create("research-project/data")
    results_folder = formsdb.folder.create("research-project/results")
    surveys_folder = formsdb.folder.create("research-project/surveys")
    
    # Get existing folders
    folder = formsdb.folder.get("research-project/data")
    print(f"Folder: {folder.name}, Path: {folder.path}")
    
    # List folder contents
    contents = formsdb.folder.list_contents("research-project")
    for item in contents:
        print(f"  - {item.name} ({item.type})")
    
    # Delete folders (use with caution)
    # formsdb.folder.delete("research-project/temp")
    """)

demo_folder_operations()

### Schema Management

In [None]:
# Schema examples
def create_survey_schema():
    """Create a comprehensive survey schema."""
    schema = {
        "type": "object",
        "title": "User Survey",
        "description": "A comprehensive user survey form",
        "properties": {
            "participant_id": {
                "type": "string",
                "description": "Unique participant identifier"
            },
            "personal_info": {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "minLength": 1},
                    "age": {"type": "integer", "minimum": 0, "maximum": 150},
                    "email": {"type": "string", "format": "email"},
                    "gender": {"type": "string", "enum": ["male", "female", "other", "prefer_not_to_say"]}
                },
                "required": ["name", "age"]
            },
            "survey_responses": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "question_id": {"type": "string"},
                        "question_text": {"type": "string"},
                        "response": {"type": "string"},
                        "response_type": {"type": "string", "enum": ["text", "rating", "multiple_choice"]}
                    },
                    "required": ["question_id", "response"]
                }
            },
            "metadata": {
                "type": "object",
                "properties": {
                    "timestamp": {"type": "string", "format": "date-time"},
                    "device_type": {"type": "string"},
                    "location": {"type": "string"},
                    "duration_minutes": {"type": "number", "minimum": 0}
                }
            }
        },
        "required": ["participant_id", "personal_info", "survey_responses"]
    }
    return schema

def create_experiment_schema():
    """Create a scientific experiment data schema."""
    schema = {
        "type": "object",
        "title": "Scientific Experiment Data",
        "properties": {
            "experiment_id": {"type": "string"},
            "researcher": {"type": "string"},
            "experiment_date": {"type": "string", "format": "date"},
            "conditions": {
                "type": "object",
                "properties": {
                    "temperature": {"type": "number"},
                    "humidity": {"type": "number", "minimum": 0, "maximum": 100},
                    "pressure": {"type": "number"},
                    "treatment": {"type": "string"}
                }
            },
            "measurements": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "time_point": {"type": "number"},
                        "value": {"type": "number"},
                        "unit": {"type": "string"},
                        "notes": {"type": "string"}
                    },
                    "required": ["time_point", "value", "unit"]
                }
            },
            "results": {
                "type": "object",
                "properties": {
                    "conclusion": {"type": "string"},
                    "statistical_significance": {"type": "boolean"},
                    "p_value": {"type": "number", "minimum": 0, "maximum": 1}
                }
            }
        },
        "required": ["experiment_id", "researcher", "measurements"]
    }
    return schema

# Create schema examples
survey_schema = create_survey_schema()
experiment_schema = create_experiment_schema()

print("Schema examples created:")
print(f"Survey schema has {len(survey_schema['properties'])} main properties")
print(f"Experiment schema has {len(experiment_schema['properties'])} main properties")
print("\nTo create schemas in FormsDB:")
print("user_schema = formsdb.schema.create('user-survey', survey_schema)")
print("exp_schema = formsdb.schema.create('experiment-data', experiment_schema)")

### Working with Data

In [None]:
# Sample data for testing schemas
def create_sample_survey_data():
    """Create sample survey data matching our schema."""
    return {
        "participant_id": "PART_001",
        "personal_info": {
            "name": "Alice Johnson",
            "age": 29,
            "email": "alice.johnson@example.com",
            "gender": "female"
        },
        "survey_responses": [
            {
                "question_id": "Q001",
                "question_text": "How satisfied are you with our service?",
                "response": "Very satisfied",
                "response_type": "multiple_choice"
            },
            {
                "question_id": "Q002",
                "question_text": "Rate your experience from 1-10",
                "response": "9",
                "response_type": "rating"
            },
            {
                "question_id": "Q003",
                "question_text": "Any additional comments?",
                "response": "Great service, very helpful staff!",
                "response_type": "text"
            }
        ],
        "metadata": {
            "timestamp": "2024-01-15T14:30:00Z",
            "device_type": "desktop",
            "location": "home",
            "duration_minutes": 5.5
        }
    }

def create_sample_experiment_data():
    """Create sample experiment data."""
    return {
        "experiment_id": "EXP_2024_001",
        "researcher": "Dr. Bob Smith",
        "experiment_date": "2024-01-15",
        "conditions": {
            "temperature": 23.5,
            "humidity": 65.2,
            "pressure": 1013.25,
            "treatment": "Control Group"
        },
        "measurements": [
            {"time_point": 0, "value": 100.0, "unit": "mg/L", "notes": "Baseline measurement"},
            {"time_point": 1, "value": 98.5, "unit": "mg/L"},
            {"time_point": 2, "value": 97.2, "unit": "mg/L"},
            {"time_point": 4, "value": 95.8, "unit": "mg/L"},
            {"time_point": 8, "value": 92.1, "unit": "mg/L", "notes": "Significant decrease observed"}
        ],
        "results": {
            "conclusion": "Treatment shows significant effect",
            "statistical_significance": True,
            "p_value": 0.023
        }
    }

# Create sample data
sample_survey = create_sample_survey_data()
sample_experiment = create_sample_experiment_data()

print("Sample data created:")
print(f"Survey participant: {sample_survey['personal_info']['name']}")
print(f"Experiment ID: {sample_experiment['experiment_id']}")
print(f"Measurements: {len(sample_experiment['measurements'])} data points")

print("\nTo store data in FormsDB:")
print("survey_data = formsdb.data.create(folder, user_schema, sample_survey)")
print("exp_data = formsdb.data.create(folder, exp_schema, sample_experiment)")

### Complete FormsDB Example

In [None]:
def complete_formsdb_workflow():
    """Demonstrate a complete FormsDB workflow."""
    print("""
    COMPLETE FORMSDB WORKFLOW:
    
    # 1. Initialize FormsDB
    pc.login()
    agent = pc.get_agent()
    formsdb = FormsDb(credentials=agent, project_id="research-project")
    
    # 2. Create folder structure
    project_folder = formsdb.folder.create("clinical-trial")
    patients_folder = formsdb.folder.create("clinical-trial/patients")
    results_folder = formsdb.folder.create("clinical-trial/results")
    
    # 3. Define and create schemas
    patient_schema = formsdb.schema.create("patient-data", {
        "type": "object",
        "properties": {
            "patient_id": {"type": "string"},
            "demographics": {
                "type": "object",
                "properties": {
                    "age": {"type": "integer", "minimum": 0},
                    "gender": {"type": "string"},
                    "weight_kg": {"type": "number", "minimum": 0}
                }
            },
            "medical_history": {
                "type": "array",
                "items": {"type": "string"}
            },
            "trial_data": {
                "type": "object",
                "properties": {
                    "enrollment_date": {"type": "string", "format": "date"},
                    "treatment_group": {"type": "string"},
                    "primary_endpoint": {"type": "number"}
                }
            }
        },
        "required": ["patient_id", "demographics", "trial_data"]
    })
    
    # 4. Store patient data
    patient_data = {
        "patient_id": "PT001",
        "demographics": {
            "age": 45,
            "gender": "female", 
            "weight_kg": 65.5
        },
        "medical_history": ["hypertension", "diabetes_type2"],
        "trial_data": {
            "enrollment_date": "2024-01-15",
            "treatment_group": "experimental",
            "primary_endpoint": 87.3
        }
    }
    
    stored_data = formsdb.data.create(
        patients_folder, 
        patient_schema, 
        patient_data
    )
    
    # 5. Query and retrieve data
    # Get all patient data
    all_patients = formsdb.data.list(patients_folder)
    
    # Get specific patient
    patient = formsdb.data.get(stored_data.id)
    
    # Update patient data
    updated_data = patient_data.copy()
    updated_data["trial_data"]["primary_endpoint"] = 89.1
    
    formsdb.data.update(stored_data.id, updated_data)
    
    # 6. Export data for analysis
    export_data = []
    for patient in all_patients:
        export_data.append(patient.data)
    
    # Convert to DataFrame for analysis
    import pandas as pd
    # df = pd.json_normalize(export_data)
    
    print(f"Clinical trial data management complete")
    print(f"Stored {len(all_patients)} patient records")
    """)

complete_formsdb_workflow()

## Tablify - JSON to DataFrame Conversion

Convert JSON form data to pandas DataFrames with intelligent column ordering:

In [None]:
from pycarta.tablify import tablify

# Sample JSON data for tablify
json_forms_data = [
    {
        "id": 1,
        "name": "Alice Johnson",
        "age": 28,
        "email": "alice@example.com",
        "skills": ["Python", "SQL", "Machine Learning"],
        "experience": {
            "years": 5,
            "level": "senior",
            "previous_companies": ["TechCorp", "DataLabs"]
        },
        "metadata": {
            "submission_date": "2024-01-15",
            "form_version": "2.1"
        }
    },
    {
        "id": 2,
        "name": "Bob Smith", 
        "age": 32,
        "email": "bob@example.com",
        "skills": ["R", "Statistics", "Data Visualization"],
        "experience": {
            "years": 7,
            "level": "lead",
            "previous_companies": ["StatsCorp", "AnalyticsPro", "DataViz Inc"]
        },
        "metadata": {
            "submission_date": "2024-01-16",
            "form_version": "2.1"
        }
    },
    {
        "id": 3,
        "name": "Charlie Brown",
        "age": 25,
        "email": "charlie@example.com",
        "skills": ["JavaScript", "React", "Node.js"],
        "experience": {
            "years": 3,
            "level": "mid",
            "previous_companies": ["WebDev Co"]
        },
        "metadata": {
            "submission_date": "2024-01-17",
            "form_version": "2.0"
        }
    }
]

print(f"Sample data prepared: {len(json_forms_data)} records")
print("Each record has nested structures (experience, metadata)")

In [None]:
# Convert JSON to DataFrame without schema
try:
    df_basic = tablify(json_forms_data)
    print("Basic tablify conversion:")
    print(f"DataFrame shape: {df_basic.shape}")
    print(f"Columns: {list(df_basic.columns)}")
    print("\nFirst few rows:")
    print(df_basic.head())
except Exception as e:
    print(f"Tablify error: {e}")
    print("Note: tablify requires proper import and may need schema")

In [None]:
# Schema-aware conversion
tablify_schema = {
    "type": "object",
    "properties": {
        "id": {"type": "integer"},
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"},
        "skills": {"type": "array", "items": {"type": "string"}},
        "experience": {
            "type": "object",
            "properties": {
                "years": {"type": "integer"},
                "level": {"type": "string"},
                "previous_companies": {"type": "array"}
            }
        },
        "metadata": {"type": "object"}
    }
}

try:
    # Schema-aware conversion with intelligent column ordering
    df_schema = tablify(json_forms_data, schema=tablify_schema)
    print("\nSchema-aware tablify conversion:")
    print(f"DataFrame shape: {df_schema.shape}")
    print(f"Columns: {list(df_schema.columns)}")
    print("\nColumn ordering based on schema priorities")
    print(df_schema.head())
except Exception as e:
    print(f"Schema tablify error: {e}")
    print("Using pandas json_normalize as fallback:")
    df_fallback = pd.json_normalize(json_forms_data)
    print(f"Fallback DataFrame shape: {df_fallback.shape}")
    print(df_fallback.head())

## Graph Operations

NetworkX-based graph operations with visitor patterns:

In [None]:
try:
    from pycarta.graph import Graph
    from pycarta.graph.vertex import Vertex
    from pycarta.graph.visitor import Visitor
    
    print("Graph modules imported successfully")
except ImportError as e:
    print(f"Graph import error: {e}")
    print("Using NetworkX directly as demonstration:")
    import networkx as nx
    import matplotlib.pyplot as plt

In [None]:
# Graph creation and operations example
def demonstrate_graph_operations():
    """Demonstrate graph operations with pycarta.graph."""
    try:
        # Create a graph
        graph = Graph()
        
        # Add vertices with data
        v1 = Vertex("user_1", {"name": "Alice", "role": "researcher", "department": "data_science"})
        v2 = Vertex("user_2", {"name": "Bob", "role": "analyst", "department": "data_science"})
        v3 = Vertex("user_3", {"name": "Charlie", "role": "engineer", "department": "engineering"})
        v4 = Vertex("project_1", {"name": "ML Pipeline", "type": "project", "status": "active"})
        v5 = Vertex("project_2", {"name": "Data Warehouse", "type": "project", "status": "completed"})
        
        # Add vertices to graph
        for vertex in [v1, v2, v3, v4, v5]:
            graph.add_vertex(vertex)
        
        # Add edges (relationships)
        graph.add_edge(v1, v4)  # Alice works on ML Pipeline
        graph.add_edge(v2, v4)  # Bob works on ML Pipeline
        graph.add_edge(v3, v5)  # Charlie worked on Data Warehouse
        graph.add_edge(v1, v2)  # Alice collaborates with Bob
        
        print("Graph created with vertices and edges")
        return graph
        
    except Exception as e:
        print(f"Graph creation error: {e}")
        return None

# Visitor pattern example
def demonstrate_visitor_pattern():
    """Demonstrate visitor pattern for graph traversal."""
    try:
        class DataCollectorVisitor(Visitor):
            def __init__(self):
                self.collected_data = []
            
            def visit(self, vertex):
                """Visit a vertex and collect its data."""
                self.collected_data.append({
                    'id': vertex.id,
                    'data': vertex.data
                })
                print(f"Visiting {vertex.id}: {vertex.data.get('name', 'Unknown')}")
        
        class RoleFilterVisitor(Visitor):
            def __init__(self, target_role):
                self.target_role = target_role
                self.matching_vertices = []
            
            def visit(self, vertex):
                """Visit vertex and collect if role matches."""
                if vertex.data.get('role') == self.target_role:
                    self.matching_vertices.append(vertex)
                    print(f"Found {self.target_role}: {vertex.data.get('name')}")
        
        print("Visitor classes defined for graph traversal")
        return DataCollectorVisitor, RoleFilterVisitor
        
    except Exception as e:
        print(f"Visitor pattern error: {e}")
        return None, None

# Run graph demonstrations
graph = demonstrate_graph_operations()
DataCollectorVisitor, RoleFilterVisitor = demonstrate_visitor_pattern()

if graph and DataCollectorVisitor:
    print("\nDemonstrating visitor pattern:")
    
    # Use data collector visitor
    collector = DataCollectorVisitor()
    graph.accept(collector)
    print(f"Collected data from {len(collector.collected_data)} vertices")
    
    # Use role filter visitor
    role_filter = RoleFilterVisitor('researcher')
    graph.accept(role_filter)
    print(f"Found {len(role_filter.matching_vertices)} researchers")
else:
    print("Graph demonstration requires pycarta.graph module")

## NetworkX Alternative Example

Using NetworkX directly for graph operations:

In [None]:
import networkx as nx

# Create NetworkX graph as alternative
def create_networkx_example():
    """Create a NetworkX graph example."""
    G = nx.DiGraph()  # Directed graph
    
    # Add nodes with attributes
    G.add_node("alice", role="researcher", department="data_science")
    G.add_node("bob", role="analyst", department="data_science")
    G.add_node("charlie", role="engineer", department="engineering")
    G.add_node("ml_pipeline", type="project", status="active")
    G.add_node("data_warehouse", type="project", status="completed")
    
    # Add edges with weights
    G.add_edge("alice", "ml_pipeline", relationship="leads", weight=1.0)
    G.add_edge("bob", "ml_pipeline", relationship="contributes", weight=0.8)
    G.add_edge("charlie", "data_warehouse", relationship="developed", weight=1.0)
    G.add_edge("alice", "bob", relationship="collaborates", weight=0.9)
    
    return G

# Analyze the graph
def analyze_networkx_graph(G):
    """Analyze NetworkX graph structure."""
    print(f"Graph Analysis:")
    print(f"  Nodes: {G.number_of_nodes()}")
    print(f"  Edges: {G.number_of_edges()}")
    print(f"  Is DAG: {nx.is_directed_acyclic_graph(G)}")
    
    # Find nodes by attribute
    researchers = [n for n, d in G.nodes(data=True) if d.get('role') == 'researcher']
    projects = [n for n, d in G.nodes(data=True) if d.get('type') == 'project']
    
    print(f"  Researchers: {researchers}")
    print(f"  Projects: {projects}")
    
    # Calculate centrality measures
    degree_centrality = nx.degree_centrality(G)
    betweenness_centrality = nx.betweenness_centrality(G)
    
    print(f"\nCentrality Analysis:")
    for node in G.nodes():
        print(f"  {node}: degree={degree_centrality[node]:.2f}, betweenness={betweenness_centrality[node]:.2f}")
    
    return degree_centrality, betweenness_centrality

# Create and analyze NetworkX graph
nx_graph = create_networkx_example()
degree_cent, between_cent = analyze_networkx_graph(nx_graph)

print("\nNetworkX graph created and analyzed")
print("This demonstrates similar functionality to pycarta.graph")

## Integrated Data Management Workflow

Combining FormsDB, Tablify, and Graph operations:

In [None]:
def integrated_data_workflow():
    """Demonstrate integrated data management workflow."""
    print("""
    INTEGRATED DATA MANAGEMENT WORKFLOW:
    
    # 1. Collect data using FormsDB
    pc.login()
    formsdb = FormsDb(credentials=pc.get_agent(), project_id="research")
    
    # Create schemas for different data types
    collaboration_schema = formsdb.schema.create("collaboration-data", {
        "type": "object",
        "properties": {
            "user_id": {"type": "string"},
            "project_id": {"type": "string"},
            "role": {"type": "string"},
            "start_date": {"type": "string", "format": "date"},
            "contribution_score": {"type": "number"}
        }
    })
    
    # Store collaboration data
    folder = formsdb.folder.create("research/collaborations")
    
    collaborations = [
        {"user_id": "alice", "project_id": "ml_pipeline", "role": "lead", 
         "start_date": "2024-01-01", "contribution_score": 0.9},
        {"user_id": "bob", "project_id": "ml_pipeline", "role": "contributor", 
         "start_date": "2024-01-15", "contribution_score": 0.7},
        {"user_id": "charlie", "project_id": "data_warehouse", "role": "developer", 
         "start_date": "2023-06-01", "contribution_score": 0.85}
    ]
    
    for collab in collaborations:
        formsdb.data.create(folder, collaboration_schema, collab)
    
    # 2. Retrieve and convert to DataFrame using Tablify
    stored_data = formsdb.data.list(folder)
    collaboration_json = [item.data for item in stored_data]
    
    # Convert to DataFrame for analysis
    df = tablify(collaboration_json, schema=collaboration_schema)
    
    print(f"Converted {len(collaboration_json)} collaboration records to DataFrame")
    print(df.head())
    
    # 3. Build graph from DataFrame data
    G = nx.DiGraph()
    
    # Add nodes and edges from collaboration data
    for _, row in df.iterrows():
        user_id = row['user_id']
        project_id = row['project_id']
        
        # Add nodes if not exist
        if not G.has_node(user_id):
            G.add_node(user_id, type='user')
        if not G.has_node(project_id):
            G.add_node(project_id, type='project')
        
        # Add edge with collaboration data
        G.add_edge(user_id, project_id, 
                  role=row['role'], 
                  contribution=row['contribution_score'],
                  start_date=row['start_date'])
    
    # 4. Analyze collaboration network
    print(f"\nCollaboration Network Analysis:")
    print(f"  Users: {[n for n, d in G.nodes(data=True) if d['type'] == 'user']}")
    print(f"  Projects: {[n for n, d in G.nodes(data=True) if d['type'] == 'project']}")
    
    # Find most connected users
    centrality = nx.degree_centrality(G)
    most_connected = max(centrality.items(), key=lambda x: x[1])
    print(f"  Most connected: {most_connected[0]} (centrality: {most_connected[1]:.2f})")
    
    # 5. Export results back to FormsDB
    analysis_schema = formsdb.schema.create("network-analysis", {
        "type": "object",
        "properties": {
            "analysis_date": {"type": "string", "format": "date-time"},
            "network_metrics": {"type": "object"},
            "most_connected_user": {"type": "string"},
            "total_collaborations": {"type": "integer"}
        }
    })
    
    analysis_result = {
        "analysis_date": datetime.now().isoformat(),
        "network_metrics": {
            "nodes": G.number_of_nodes(),
            "edges": G.number_of_edges(),
            "density": nx.density(G)
        },
        "most_connected_user": most_connected[0],
        "total_collaborations": len(collaboration_json)
    }
    
    results_folder = formsdb.folder.create("research/analysis-results")
    formsdb.data.create(results_folder, analysis_schema, analysis_result)
    
    print(f"\nWorkflow complete:")
    print(f"  Data collected in FormsDB")
    print(f"  Converted to DataFrame with Tablify")
    print(f"  Analyzed with Graph operations")
    print(f"  Results stored back in FormsDB")
    """)

integrated_data_workflow()

## Best Practices for Data Management

In [None]:
print("""
DATA MANAGEMENT BEST PRACTICES:

1. FormsDB:
   - Design comprehensive JSON schemas upfront
   - Use hierarchical folder structures for organization
   - Include metadata (timestamps, versions, sources)
   - Implement data validation at schema level
   - Regular backups and versioning

2. Tablify:
   - Provide schemas for consistent column ordering
   - Handle nested data structures appropriately
   - Consider data types when converting to DataFrame
   - Use for analysis, visualization, and ML pipelines

3. Graph Operations:
   - Model relationships explicitly in your domain
   - Use meaningful node and edge attributes
   - Implement visitor patterns for complex traversals
   - Consider performance for large graphs
   - Leverage NetworkX algorithms for analysis

4. Integration:
   - Design data flow from collection to analysis
   - Use consistent identifiers across systems
   - Implement error handling and data validation
   - Document data lineage and transformations
   - Monitor data quality throughout pipeline

5. Security:
   - Implement proper access controls
   - Encrypt sensitive data at rest and in transit
   - Audit data access and modifications
   - Regular security reviews and updates

6. Performance:
   - Optimize schema design for query patterns
   - Use appropriate indexing strategies
   - Consider data partitioning for large datasets
   - Monitor and optimize resource usage
""")

## Next Steps

After setting up data management:
1. Integrate with Seven Bridges for computational workflows (see `06_seven_bridges.ipynb`)
2. Create services to expose data APIs (see `03_services.ipynb`)
3. Use MQTT for real-time data streaming (see `04_mqtt.ipynb`)
4. Implement data analysis and machine learning pipelines
5. Set up monitoring and alerting for data quality