# Cap√≠tulo 04: Leitura de Tabelas Delta

T√©cnicas avan√ßadas para ler e processar tabelas Delta usando DuckDB com Pandas, Polars e Arrow.

## üì¶ Setup e Instala√ß√£o

In [None]:
%pip install duckdb deltalake pandas polars pyarrow -q

import duckdb
import pandas as pd
import pyarrow as pa

con = duckdb.connect(':memory:')
print(f"‚úì DuckDB {duckdb.__version__}")

## üêº Leitura para Pandas DataFrame

Convers√£o de tabelas Delta para Pandas para an√°lise interativa.

In [None]:
# Ler Delta table para Pandas
df = con.execute("""
    SELECT *
    FROM delta_scan('./delta_tables/sales')
    WHERE order_date >= '2024-01-01'
    LIMIT 1000
""").df()

print(f"‚úì {len(df)} registros carregados")
print("\nPrimeiros registros:")
print(df.head())

print("\nEstat√≠sticas descritivas:")
print(df[['amount', 'quantity']].describe())

## üöÄ Leitura para Polars (High Performance)

Convers√£o para Polars para an√°lises de alto desempenho.

In [None]:
import polars as pl

# Ler para Polars
df_polars = con.execute("""
    SELECT *
    FROM delta_scan('./delta_tables/sales')
    LIMIT 10000
""").pl()

# An√°lise agregada com Polars
result = (
    df_polars
    .group_by('customer_id')
    .agg([
        pl.col('amount').sum().alias('total_spent'),
        pl.col('order_id').count().alias('order_count'),
        pl.col('amount').mean().alias('avg_order')
    ])
    .sort('total_spent', descending=True)
    .head(10)
)

print("Top 10 Clientes por Valor Total:")
print(result)

## üìä Dashboard de Vendas

Fun√ß√£o completa para gerar dashboard anal√≠tico com m√©tricas de vendas.

In [None]:
from datetime import datetime, timedelta

delta_path = './delta_tables/sales'
days_back = 90

# M√©tricas principais
metrics = con.execute(f"""
    SELECT
        COUNT(DISTINCT customer_id) as unique_customers,
        COUNT(*) as total_orders,
        SUM(amount) as total_revenue,
        AVG(amount) as avg_order_value,
        MAX(order_date) as last_order_date
    FROM delta_scan('{delta_path}')
    WHERE order_date >= CURRENT_DATE - INTERVAL '{days_back} days'
""").df()

print(f"=== DASHBOARD ({days_back} dias) ===")
print(f"Clientes √önicos: {metrics['unique_customers'][0]:,}")
print(f"Total Pedidos: {metrics['total_orders'][0]:,}")
print(f"Receita Total: ${metrics['total_revenue'][0]:,.2f}")
print(f"Ticket M√©dio: ${metrics['avg_order_value'][0]:,.2f}")

# Top 10 clientes
top_customers = con.execute(f"""
    SELECT
        customer_id,
        COUNT(*) as orders,
        SUM(amount) as lifetime_value
    FROM delta_scan('{delta_path}')
    WHERE order_date >= CURRENT_DATE - INTERVAL '{days_back} days'
    GROUP BY customer_id
    ORDER BY lifetime_value DESC
    LIMIT 10
""").df()

print("\nTop 10 Clientes:")
print(top_customers)

# Capitulo 04 Leitura Tabelas Delta

Notebook gerado automaticamente a partir do c√≥digo fonte python.
