# FireProx Aggregations Guide

This notebook demonstrates Firestore aggregation queries in FireProx, enabling efficient analytics without fetching all documents.

## What are Aggregations?

Aggregations allow you to calculate statistics across documents without retrieving all data. This provides:

- **Performance**: Calculate stats server-side (much faster than client-side)
- **Cost Efficiency**: Only returns computed results, not full documents
- **Bandwidth Savings**: Minimal data transfer (just the aggregated values)
- **Scalability**: Works efficiently on large datasets

## Available Aggregations

- **count()**: Count matching documents
- **sum(field)**: Sum numeric values across documents
- **avg(field)**: Calculate average of numeric values
- **aggregate()**: Execute multiple aggregations in one query

## Key Features

- **Simple convenience methods** return raw values
- **aggregate() method** returns dictionary with multiple results
- **Works with filters** (`.where()`, `.order_by()`, `.limit()`)
- **Type-safe helpers** (Count, Sum, Avg) for complex queries

The demo is split into two sections:
1. Synchronous API examples
2. Asynchronous API examples

## Setup

Import modules and aggregation helpers.

In [1]:
from fire_prox import AsyncFireProx, Avg, Count, FireProx, Sum
from fire_prox.testing import async_demo_client, demo_client

---

# Part 1: Synchronous Aggregations

Examples using the synchronous FireProx API.

### Initialize Client and Create Sample Data

We'll create a collection of employees to demonstrate various aggregation operations.

In [2]:
# Create sync client and collection
client = demo_client()
db = FireProx(client)
employees = db.collection('aggregation_demo_employees')

# Create sample employees
sample_employees = [
    {'name': 'Alice Johnson', 'department': 'Engineering', 'salary': 120000, 'age': 28, 'years': 5, 'active': True},
    {'name': 'Bob Smith', 'department': 'Engineering', 'salary': 115000, 'age': 35, 'years': 8, 'active': True},
    {'name': 'Carol White', 'department': 'Data', 'salary': 130000, 'age': 42, 'years': 12, 'active': True},
    {'name': 'David Lee', 'department': 'Engineering', 'salary': 110000, 'age': 31, 'years': 6, 'active': True},
    {'name': 'Emma Davis', 'department': 'Design', 'salary': 95000, 'age': 26, 'years': 3, 'active': True},
    {'name': 'Frank Miller', 'department': 'Sales', 'salary': 85000, 'age': 29, 'years': 4, 'active': True},
    {'name': 'Grace Chen', 'department': 'Sales', 'salary': 90000, 'age': 33, 'years': 7, 'active': True},
    {'name': 'Henry Wilson', 'department': 'Marketing', 'salary': 88000, 'age': 27, 'years': 2, 'active': False},
    {'name': 'Iris Taylor', 'department': 'Data', 'salary': 125000, 'age': 38, 'years': 10, 'active': True},
    {'name': 'Jack Brown', 'department': 'Engineering', 'salary': 105000, 'age': 30, 'years': 5, 'active': False},
]

for emp_data in sample_employees:
    emp = employees.new()
    for key, value in emp_data.items():
        setattr(emp, key, value)
    emp.save()

print(f"Created {len(sample_employees)} employees")
print(f"Departments: {len(set(e['department'] for e in sample_employees))} unique")
print(f"Total salary budget: ${sum(e['salary'] for e in sample_employees):,}")

Created 10 employees
Departments: 5 unique
Total salary budget: $1,063,000


## Feature 1: Count Documents

Count documents without fetching them.

In [3]:
# Count all employees
total_count = employees.count()
print(f"📊 Total employees: {total_count}")

# Count active employees
active_count = employees.where('active', '==', True).count()
print(f"✅ Active employees: {active_count}")

# Count by department
eng_count = employees.where('department', '==', 'Engineering').count()
print(f"👨‍💻 Engineering team: {eng_count}")

print("\n💡 count() returns an integer directly")

📊 Total employees: 10
✅ Active employees: 8
👨‍💻 Engineering team: 4

💡 count() returns an integer directly


## Feature 2: Sum Numeric Fields

Sum numeric values across documents.

In [4]:
# Sum all salaries
total_payroll = employees.sum('salary')
print(f"💰 Total payroll: ${total_payroll:,}")

# Sum for specific department
eng_payroll = employees.where('department', '==', 'Engineering').sum('salary')
print(f"👨‍💻 Engineering payroll: ${eng_payroll:,}")

# Sum years of experience
total_experience = employees.sum('years')
print(f"📅 Total years of experience: {total_experience} years")

print("\n💡 sum() returns the numeric total (int or float)")

💰 Total payroll: $1,063,000
👨‍💻 Engineering payroll: $450,000
📅 Total years of experience: 62 years

💡 sum() returns the numeric total (int or float)


## Feature 3: Average Numeric Fields

Calculate averages across documents.

In [5]:
# Average salary
avg_salary = employees.avg('salary')
print(f"💵 Average salary: ${avg_salary:,.2f}")

# Average age by department
eng_avg_age = employees.where('department', '==', 'Engineering').avg('age')
print(f"👥 Average age in Engineering: {eng_avg_age:.1f} years")

# Average years of experience
avg_experience = employees.avg('years')
print(f"📊 Average experience: {avg_experience:.1f} years")

print("\n💡 avg() returns a float value")

💵 Average salary: $106,300.00
👥 Average age in Engineering: 31.0 years
📊 Average experience: 6.2 years

💡 avg() returns a float value


## Feature 4: Multiple Aggregations

Execute multiple aggregations in a single query using `aggregate()`.

In [6]:
# Get multiple stats in one query
stats = employees.aggregate(
    total_employees=Count(),
    total_salary=Sum('salary'),
    avg_salary=Avg('salary'),
    avg_age=Avg('age')
)

print("📈 Company Statistics:")
print(f"  Total employees: {stats['total_employees']}")
print(f"  Total payroll: ${stats['total_salary']:,}")
print(f"  Average salary: ${stats['avg_salary']:,.2f}")
print(f"  Average age: {stats['avg_age']:.1f} years")

print("\n💡 aggregate() returns a dictionary with named results")

📈 Company Statistics:
  Total employees: 10
  Total payroll: $1,063,000
  Average salary: $106,300.00
  Average age: 31.9 years

💡 aggregate() returns a dictionary with named results


## Feature 5: Aggregations with Filters

Combine aggregations with query filters for targeted analytics.

In [7]:
# Engineering department analytics
eng_stats = (employees
             .where('department', '==', 'Engineering')
             .aggregate(
                 count=Count(),
                 total_payroll=Sum('salary'),
                 avg_salary=Avg('salary'),
                 avg_experience=Avg('years')
             ))

print("👨‍💻 Engineering Department:")
print(f"  Team size: {eng_stats['count']}")
print(f"  Total payroll: ${eng_stats['total_payroll']:,}")
print(f"  Average salary: ${eng_stats['avg_salary']:,.2f}")
print(f"  Average experience: {eng_stats['avg_experience']:.1f} years")

# High earners (>$100k)
high_earner_stats = (employees
                     .where('salary', '>', 100000)
                     .aggregate(
                         count=Count(),
                         avg_salary=Avg('salary')
                     ))

print("\n💎 High Earners (>$100k):")
print(f"  Count: {high_earner_stats['count']}")
print(f"  Average salary: ${high_earner_stats['avg_salary']:,.2f}")

👨‍💻 Engineering Department:
  Team size: 4
  Total payroll: $450,000
  Average salary: $112,500.00
  Average experience: 6.0 years

💎 High Earners (>$100k):
  Count: 6
  Average salary: $117,500.00


## Feature 6: Multiple Filters with Aggregations

Combine multiple filters for precise analytics.

In [8]:
# Active engineering employees with 5+ years experience
senior_eng_stats = (employees
                    .where('department', '==', 'Engineering')
                    .where('active', '==', True)
                    .where('years', '>=', 5)
                    .aggregate(
                        count=Count(),
                        avg_salary=Avg('salary')
                    ))

print("🎯 Senior Active Engineers (5+ years):")
print(f"  Count: {senior_eng_stats['count']}")
print(f"  Average salary: ${senior_eng_stats['avg_salary']:,.2f}")

print("\n💡 All where() filters apply before aggregation")

🎯 Senior Active Engineers (5+ years):
  Count: 3
  Average salary: $115,000.00

💡 All where() filters apply before aggregation


## Feature 7: Department Comparison

Compare statistics across different departments.

In [9]:
# Get unique departments
departments = ['Engineering', 'Data', 'Sales', 'Design', 'Marketing']

print("📊 Department Comparison:\n")
print(f"{'Department':<15} {'Count':<8} {'Avg Salary':<15} {'Total Payroll'}")
print("-" * 60)

for dept in departments:
    stats = (employees
             .where('department', '==', dept)
             .aggregate(
                 count=Count(),
                 avg_salary=Avg('salary'),
                 total=Sum('salary')
             ))

    if stats['count'] > 0:  # Only show departments with employees
        print(f"{dept:<15} {stats['count']:<8} ${stats['avg_salary']:>11,.0f}   ${stats['total']:>12,}")

print("\n💡 Efficient analytics without fetching documents")

📊 Department Comparison:

Department      Count    Avg Salary      Total Payroll
------------------------------------------------------------
Engineering     4        $    112,500   $     450,000
Data            2        $    127,500   $     255,000
Sales           2        $     87,500   $     175,000
Design          1        $     95,000   $      95,000
Marketing       1        $     88,000   $      88,000

💡 Efficient analytics without fetching documents


## Feature 8: Real-World Analytics Dashboard

Build a comprehensive analytics dashboard with multiple queries.

In [10]:
# Overall company metrics
company_stats = employees.aggregate(
    total_employees=Count(),
    total_payroll=Sum('salary'),
    avg_salary=Avg('salary'),
    avg_age=Avg('age'),
    avg_experience=Avg('years')
)

# Active vs inactive
active_count = employees.where('active', '==', True).count()
inactive_count = employees.where('active', '==', False).count()

# Salary ranges
under_100k = employees.where('salary', '<', 100000).count()
over_100k = employees.where('salary', '>=', 100000).count()

print("="*60)
print("           COMPANY ANALYTICS DASHBOARD")
print("="*60)
print()
print("📊 WORKFORCE OVERVIEW")
print(f"  Total Employees: {company_stats['total_employees']}")
print(f"  Active: {active_count} | Inactive: {inactive_count}")
print()
print("💰 COMPENSATION")
print(f"  Total Annual Payroll: ${company_stats['total_payroll']:,}")
print(f"  Average Salary: ${company_stats['avg_salary']:,.2f}")
print(f"  Under $100k: {under_100k} | Over $100k: {over_100k}")
print()
print("👥 DEMOGRAPHICS")
print(f"  Average Age: {company_stats['avg_age']:.1f} years")
print(f"  Average Experience: {company_stats['avg_experience']:.1f} years")
print()
print("="*60)
print("\n✅ Dashboard built with efficient aggregation queries!")

           COMPANY ANALYTICS DASHBOARD

📊 WORKFORCE OVERVIEW
  Total Employees: 10
  Active: 8 | Inactive: 2

💰 COMPENSATION
  Total Annual Payroll: $1,063,000
  Average Salary: $106,300.00
  Under $100k: 4 | Over $100k: 6

👥 DEMOGRAPHICS
  Average Age: 31.9 years
  Average Experience: 6.2 years


✅ Dashboard built with efficient aggregation queries!


## Feature 9: Performance Comparison

Compare aggregations with client-side calculations.

In [12]:
import time

# Method 1: Server-side aggregation (fast)
start = time.time()
avg_salary_server = employees.avg('salary')
server_time = time.time() - start

# Method 2: Client-side calculation (slower)
start = time.time()
all_employees = employees.get_all()
salaries = [emp.salary for emp in all_employees]
avg_salary_client = sum(salaries) / len(salaries)
client_time = time.time() - start

print("⚡ Performance Comparison:\n")
print(f"Server-side aggregation: {server_time*1000:.2f}ms")
print(f"Client-side calculation: {client_time*1000:.2f}ms")
print(f"Speedup: {client_time/server_time:.1f}x faster")
print()
print(f"Results match: {abs(avg_salary_server - avg_salary_client) < 0.01}")
print()
print("💡 Benefits of server-side aggregations:")
print("  • Faster execution (server-side processing)")
print("  • Less bandwidth (no document transfer)")
print("  • Lower costs (smaller reads)")
print("  • Scales better with large datasets")

⚡ Performance Comparison:

Server-side aggregation: 4.50ms
Client-side calculation: 10.85ms
Speedup: 2.4x faster

Results match: True

💡 Benefits of server-side aggregations:
  • Faster execution (server-side processing)
  • Less bandwidth (no document transfer)
  • Lower costs (smaller reads)
  • Scales better with large datasets


---

# Part 2: Asynchronous Aggregations

Examples using the asynchronous AsyncFireProx API with async/await.

### Initialize Async Client and Create Sample Data

In [13]:
# Create async client and collection
async_client = async_demo_client()
async_db = AsyncFireProx(async_client)
async_employees = async_db.collection('aggregation_demo_employees_async')

# Create sample data
for emp_data in sample_employees:
    emp = async_employees.new()
    for key, value in emp_data.items():
        setattr(emp, key, value)
    await emp.save()

print(f"Created {len(sample_employees)} employees for async demo")

Created 10 employees for async demo


## Feature 1: Async Count

In [14]:
# Async count operations
total = await async_employees.count()
active = await async_employees.where('active', '==', True).count()
engineers = await async_employees.where('department', '==', 'Engineering').count()

print(f"📊 Total: {total}")
print(f"✅ Active: {active}")
print(f"👨‍💻 Engineers: {engineers}")
print("\n💡 Async count() works identically to sync API")

📊 Total: 10
✅ Active: 8
👨‍💻 Engineers: 4

💡 Async count() works identically to sync API


## Feature 2: Async Sum and Average

In [15]:
# Async sum
total_payroll = await async_employees.sum('salary')
print(f"💰 Total payroll: ${total_payroll:,}")

# Async average
avg_salary = await async_employees.avg('salary')
print(f"💵 Average salary: ${avg_salary:,.2f}")

# Async with filter
eng_avg = await async_employees.where('department', '==', 'Engineering').avg('salary')
print(f"👨‍💻 Engineering avg: ${eng_avg:,.2f}")

💰 Total payroll: $1,063,000
💵 Average salary: $106,300.00
👨‍💻 Engineering avg: $112,500.00


## Feature 3: Async Multiple Aggregations

In [16]:
# Multiple async aggregations
stats = await async_employees.aggregate(
    total_employees=Count(),
    total_salary=Sum('salary'),
    avg_salary=Avg('salary'),
    avg_age=Avg('age')
)

print("📈 Async Company Statistics:")
print(f"  Employees: {stats['total_employees']}")
print(f"  Total payroll: ${stats['total_salary']:,}")
print(f"  Avg salary: ${stats['avg_salary']:,.2f}")
print(f"  Avg age: {stats['avg_age']:.1f}")

📈 Async Company Statistics:
  Employees: 10
  Total payroll: $1,063,000
  Avg salary: $106,300.00
  Avg age: 31.9


## Feature 4: Async Department Analytics

In [17]:
# Analyze each department asynchronously
departments = ['Engineering', 'Data', 'Sales']

print("📊 Async Department Analysis:\n")

for dept in departments:
    stats = await (async_employees
                   .where('department', '==', dept)
                   .aggregate(
                       count=Count(),
                       avg_salary=Avg('salary'),
                       total_payroll=Sum('salary')
                   ))

    if stats['count'] > 0:
        print(f"{dept}:")
        print(f"  Team size: {stats['count']}")
        print(f"  Avg salary: ${stats['avg_salary']:,.0f}")
        print(f"  Total: ${stats['total_payroll']:,}")
        print()

print("✅ All async operations complete")

📊 Async Department Analysis:

Engineering:
  Team size: 4
  Avg salary: $112,500
  Total: $450,000

Data:
  Team size: 2
  Avg salary: $127,500
  Total: $255,000

Sales:
  Team size: 2
  Avg salary: $87,500
  Total: $175,000

✅ All async operations complete


## Feature 5: Concurrent Async Aggregations

Run multiple aggregation queries concurrently for maximum performance.

In [18]:
import asyncio

# Run multiple aggregations concurrently
results = await asyncio.gather(
    async_employees.count(),
    async_employees.sum('salary'),
    async_employees.avg('age'),
    async_employees.where('department', '==', 'Engineering').count(),
    async_employees.where('active', '==', True).avg('salary')
)

total, payroll, avg_age, eng_count, active_avg_salary = results

print("⚡ Concurrent Aggregation Results:")
print(f"  Total employees: {total}")
print(f"  Total payroll: ${payroll:,}")
print(f"  Average age: {avg_age:.1f}")
print(f"  Engineers: {eng_count}")
print(f"  Active avg salary: ${active_avg_salary:,.2f}")
print("\n💡 asyncio.gather() runs all queries in parallel for maximum speed!")

⚡ Concurrent Aggregation Results:
  Total employees: 10
  Total payroll: $1,063,000
  Average age: 31.9
  Engineers: 4
  Active avg salary: $108,750.00

💡 asyncio.gather() runs all queries in parallel for maximum speed!


---

## Summary

This demo showcased all aggregation features:

### ✅ Available Aggregations

1. **count()** - Count matching documents (returns int)
2. **sum(field)** - Sum numeric field (returns int/float)
3. **avg(field)** - Average numeric field (returns float)
4. **aggregate()** - Multiple aggregations (returns dict)

### ✅ Usage Patterns

**Simple aggregations:**
```python
count = collection.count()
total = collection.sum('price')
average = collection.avg('rating')
```

**With filters:**
```python
count = collection.where('active', '==', True).count()
total = collection.where('dept', '==', 'Sales').sum('revenue')
```

**Multiple aggregations:**
```python
stats = collection.aggregate(
    count=Count(),
    total=Sum('amount'),
    average=Avg('rating')
)
# Returns: {'count': 42, 'total': 15000, 'average': 4.5}
```

**Async usage:**
```python
count = await collection.count()
stats = await collection.aggregate(count=Count(), avg=Avg('age'))
```

### 💡 When to Use Aggregations

**Use aggregations when:**
- Calculating statistics (counts, sums, averages)
- Building dashboards and reports
- Analyzing large datasets
- You don't need individual document data
- Performance and cost are important

**Use regular queries when:**
- You need to access document fields
- Modifying documents
- Complex client-side calculations
- Custom aggregation logic not supported by Firestore

### 🚀 Performance Benefits

- **Speed**: Server-side processing (2-10x faster)
- **Bandwidth**: Minimal data transfer (only results)
- **Cost**: Lower Firestore read costs
- **Scalability**: Works efficiently on millions of documents

### 📚 Technical Details

- **Empty collections**: Return 0 for all aggregations
- **Type safety**: Sum/Avg require field names
- **Filter support**: All where() clauses apply
- **Chainable**: Works with order_by(), limit() (though rarely needed)
- **Async support**: Full async/await support with asyncio.gather()

### 📚 Learn More

- **Tests**: `tests/test_integration_aggregations.py` (sync)
- **Tests**: `tests/test_integration_aggregations_async.py` (async)
- **Source**: `src/fire_prox/fire_query.py` and `async_fire_query.py`
- **Helpers**: `src/fire_prox/aggregation.py`