# Camada Gold - BI Executivo de E-commerce

> **POC:** E-commerce Executive BI  
> **Objetivo:** Constru√ß√£o da camada Gold para consumo em Power BI

Este notebook documenta a constru√ß√£o da **camada Gold** utilizada na POC de **Business Intelligence Executivo**, a partir da base curada previamente.

O foco desta etapa √© consolidar informa√ß√µes operacionais e financeiras em um dataset √∫nico, com granularidade adequada para consumo em ferramentas de BI (e.g. Power BI), permitindo an√°lises r√°pidas e consistentes sobre o desempenho do e-commerce.

---

## Contexto

Os dados utilizados neste notebook s√£o derivados diretamente da camada de
curadoria (Silver), constru√≠da no notebook [`01_curadoria_sql.ipynb`](notebooks/01_curadoria_sql.ipymb).

Toda a l√≥gica de padroniza√ß√£o, valida√ß√£o e agrega√ß√£o intermedi√°ria j√° foi realizada anteriormente. Neste notebook, **n√£o h√° tratamento de dados brutos**, apenas consolida√ß√£o anal√≠tica orientada a consumo.

---

## Objetivo da Camada Gold

Criar um dataset com:
- Granularidade **1 linha por pedido (`order_id`)**
- M√©tricas financeiras e operacionais consolidadas
- Informa√ß√µes de cliente, localiza√ß√£o e categoria
- Flags anal√≠ticas que simplificam filtros e KPIs no BI

O resultado final √© a view `gold_orders_enriched`, que serve como fonte direta para o dashboard executivo no Power BI.

## Conex√£o e Verifica√ß√£o Inicial

Nesta se√ß√£o s√£o realizadas a importa√ß√£o das bibliotecas necess√°rias, a conex√£o com o banco de dados DuckDB e uma verifica√ß√£o inicial das tabelas e views dispon√≠veis no ambiente anal√≠tico.

In [1]:
import sys; sys.path.insert(0, "..")
from src.paths import PROCESSED_DATA
import duckdb

In [2]:
con = duckdb.connect(database=str(PROCESSED_DATA), read_only=False)
con.execute("SELECT 1").fetchone()

(1,)

In [3]:
# listar tabelas e views
con.execute("""
SELECT table_schema, table_name, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
ORDER BY table_type, table_schema, table_name;
""").df()

Unnamed: 0,table_schema,table_name,table_type
0,main,category_translation,BASE TABLE
1,main,customers,BASE TABLE
2,main,geolocation,BASE TABLE
3,main,order_items,BASE TABLE
4,main,order_payments,BASE TABLE
5,main,order_reviews,BASE TABLE
6,main,orders,BASE TABLE
7,main,products,BASE TABLE
8,main,sellers,BASE TABLE
9,main,curated_category_translation,VIEW


## Constru√ß√£o da Camada Gold ‚Äî `gold_orders_enriched`

Nesta etapa √© criada a view `gold_orders_enriched`, respons√°vel por consolidar informa√ß√µes de pedidos, itens agregados, pagamentos e atributos de cliente em um √∫nico dataset anal√≠tico.

A granularidade adotada √© **1 linha por pedido**, o que facilita:
- Modelagem no Power BI
- Cria√ß√£o de KPIs executivos
- An√°lises temporais e geogr√°ficas


In [4]:
con.execute("""
CREATE OR REPLACE VIEW gold_orders_enriched AS
WITH order_category AS (
  SELECT
    x.order_id,
    arg_max(pr.product_category_en, x.cnt) AS main_category_en
  FROM (
    SELECT
      order_id,
      product_id,
      COUNT(*) AS cnt
    FROM curated_order_items
    GROUP BY order_id, product_id
  ) x
  LEFT JOIN curated_products pr
    ON x.product_id = pr.product_id
  GROUP BY x.order_id
)
SELECT
  -- chaves
  o.order_id,
  o.customer_id,

  -- status
  o.order_status,

  -- flags BI (para filtro e KPI limpo)
  CASE WHEN o.order_status = 'delivered' THEN 1 ELSE 0 END AS is_completed_order,
  CASE
    WHEN COALESCE(i.total_gmv, 0) > 0 AND COALESCE(p.total_payment_value, 0) > 0 THEN 1
    ELSE 0
  END AS has_financials,

  -- timestamps
  o.order_purchase_ts,
  o.order_approved_ts,
  o.order_delivered_carrier_ts,
  o.order_delivered_customer_ts,
  o.order_estimated_delivery_ts,

  -- datas
  o.purchase_date,
  o.delivered_date,
  o.estimated_delivery_date,

  -- calend√°rio
  EXTRACT(year  FROM o.purchase_date) AS purchase_year,
  EXTRACT(month FROM o.purchase_date) AS purchase_month,

  -- log√≠stica
  o.days_to_deliver,
  o.delay_days,
  o.delay_days_pos,
  o.early_days_pos,
  o.is_delivered,
  o.is_delayed,

  -- itens
  COALESCE(i.items_count, 0) AS items_count,
  COALESCE(i.unique_products, 0) AS unique_products,
  COALESCE(i.unique_sellers, 0) AS unique_sellers,
  COALESCE(i.total_price, 0) AS total_price,
  COALESCE(i.total_freight, 0) AS total_freight,
  COALESCE(i.total_gmv, 0) AS total_gmv,

  -- pagamentos
  COALESCE(p.total_payment_value, 0) AS total_payment_value,
  p.max_installments,
  COALESCE(p.num_payments, 0) AS num_payments,
  p.payment_type_main,

  -- cliente
  c.customer_state,
  c.customer_city,
  c.customer_zip_code_prefix,

  -- geo
  g.latitude  AS customer_latitude,
  g.longitude AS customer_longitude,

  -- categoria
  COALESCE(oc.main_category_en, 'unknown') AS main_category_en

FROM curated_orders o
LEFT JOIN curated_items_by_order i
  ON o.order_id = i.order_id
LEFT JOIN curated_payments_by_order p
  ON o.order_id = p.order_id
LEFT JOIN curated_customers c
  ON o.customer_id = c.customer_id
LEFT JOIN curated_geolocation_by_zip g
  ON c.customer_zip_code_prefix = g.zip_code_prefix
LEFT JOIN order_category oc
  ON o.order_id = oc.order_id

-- Conten√ß√£o: remove registros inconsistentes para consumo executivo no BI
WHERE
  o.order_status = 'delivered'
  AND COALESCE(i.total_gmv, 0) > 0
  AND COALESCE(p.total_payment_value, 0) > 0
  AND p.payment_type_main IS NOT NULL
  AND p.payment_type_main <> 'not_defined';
""")

<_duckdb.DuckDBPyConnection at 0x7f4a76e21930>

### Flags Anal√≠ticas

Foram adicionadas duas flags √† camada Gold com o objetivo de simplificar o
consumo no BI e reduzir ru√≠do em an√°lises executivas:

- **`is_completed_order`**  
  Indica pedidos efetivamente entregues (`order_status = 'delivered'`), permitindo
  a constru√ß√£o de KPIs focados apenas em opera√ß√µes conclu√≠das.

- **`has_financials`**  
  Indica pedidos com valores financeiros consistentes (GMV e pagamento maiores
  que zero), evitando a inclus√£o de pedidos cancelados ou incompletos em an√°lises
  de receita.


## Valida√ß√µes da Camada Gold

Ap√≥s a cria√ß√£o da view, s√£o realizadas valida√ß√µes b√°sicas para garantir:

- Integridade da granularidade (1 linha por pedido)
- Coer√™ncia das flags anal√≠ticas
- Sanidade das m√©tricas financeiras e temporais

In [5]:
# Integridade da granularidade
con.execute("""
SELECT
  COUNT(*) AS rows,
  COUNT(DISTINCT order_id) AS distinct_orders,
  COUNT(*) - COUNT(DISTINCT order_id) AS dup_orders
FROM gold_orders_enriched;
""").df()

Unnamed: 0,rows,distinct_orders,dup_orders
0,96477,96477,0


In [6]:
# Coer√™ncia das flags anal√≠ticas
con.execute("""
SELECT
  SUM(is_completed_order) AS completed_orders,
  SUM(has_financials) AS orders_with_financials,
  COUNT(*) AS total_orders
FROM gold_orders_enriched;
""").df()

Unnamed: 0,completed_orders,orders_with_financials,total_orders
0,96477.0,96477.0,96477


In [7]:
# zeros vs flags
con.execute("""
SELECT
  SUM(CASE WHEN total_gmv = 0 THEN 1 ELSE 0 END) AS zero_gmv_orders,
  SUM(CASE WHEN total_payment_value = 0 THEN 1 ELSE 0 END) AS zero_payment_orders,
  SUM(CASE WHEN has_financials = 0 THEN 1 ELSE 0 END) AS no_financials_flagged
FROM gold_orders_enriched;
""").df()

Unnamed: 0,zero_gmv_orders,zero_payment_orders,no_financials_flagged
0,0.0,0.0,0.0


## Estado Final do Ambiente Anal√≠tico

A seguir √© apresentada a listagem final de tabelas e views dispon√≠veis ap√≥s a cria√ß√£o da camada Gold.

In [8]:
con.execute("""
SELECT table_schema, table_name, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'pg_catalog')
ORDER BY table_type, table_schema, table_name;
""").df()

Unnamed: 0,table_schema,table_name,table_type
0,main,category_translation,BASE TABLE
1,main,customers,BASE TABLE
2,main,geolocation,BASE TABLE
3,main,order_items,BASE TABLE
4,main,order_payments,BASE TABLE
5,main,order_reviews,BASE TABLE
6,main,orders,BASE TABLE
7,main,products,BASE TABLE
8,main,sellers,BASE TABLE
9,main,curated_category_translation,VIEW


In [9]:
con.close()

## Conclus√£o

A camada Gold criada e validada com sucesso, estando pronta para consumo em ferramentas de Business Intelligence.

O pr√≥ximo passo consiste na modelagem e visualiza√ß√£o dos dados no Power BI, utilizando a view `gold_orders_enriched` como fonte principal.