# Camada Bronze:
---

## Tabela Pedidos:

### Objetivo:

Entender a padronização e comportamento dos dados para criar a view pedidos_bronze a partir da camada RAW.

### Fonte:

`workspace.default.pedidos`

### Destino:

`workspace.olist_bronze.pedidos_bronze`

In [0]:
%sql
-- Garantindo a existência do SCHEMA para armazenamento:

CREATE SCHEMA IF NOT EXISTS workspace.olist_bronze;

In [0]:
%sql
-- Verificando a existência da tabela:
SELECT * FROM workspace.default.pedidos;

In [0]:
%sql
-- Verificando quantas linhas tem na tabela:
SELECT COUNT(*) AS LINHAS 
FROM workspace.default.pedidos;

In [0]:
%sql
-- Verificando os tipos de dados:
DESC workspace.default.pedidos;

Tipos de dados corretos.

In [0]:
%sql
-- Verificando missing values em cada coluna:
SELECT
      SUM(CASE WHEN order_id IS NULL THEN 1 ELSE 0 END) AS order_id_null,
      SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) AS customer_id_null,
      SUM(CASE WHEN order_status IS NULL THEN 1 ELSE 0 END) AS order_status_null,
      SUM(CASE WHEN order_purchase_timestamp IS NULL THEN 1 ELSE 0 END) AS order_purchase_timestamp_null,
      SUM(CASE WHEN order_approved_at IS NULL THEN 1 ELSE 0 END) AS order_approved_at_null,
      SUM(CASE WHEN order_delivered_carrier_date IS NULL THEN 1 ELSE 0 END) AS order_delivered_carrier_date_null,
      SUM(CASE WHEN order_delivered_customer_date IS NULL THEN 1 ELSE 0 END) AS order_delivered_customer_date_null,
      SUM(CASE WHEN order_estimated_delivery_date IS NULL THEN 1 ELSE 0 END) AS order_estimated_delivery_date_null
FROM workspace.default.pedidos;

As colunas referentes à data de aprovação do pagamento, data de despacho e data de entrega apresentam valores nulos, sendo necessário um melhor entendimento das causas que levaram a essa ausência de dados.

In [0]:
%sql
SELECT 
      order_status,
      order_approved_at,
      order_delivered_carrier_date,
      order_delivered_customer_date
FROM workspace.default.pedidos
WHERE order_approved_at IS NULL 
   OR order_delivered_carrier_date IS NULL 
   OR order_delivered_customer_date IS NULL;

In [0]:
%sql
SELECT 
        order_status,
        SUM(CASE
                WHEN order_approved_at IS NULL 
                THEN 0 
                ELSE 1
            END
            )
        AS order_approved_at,

        SUM(CASE
                WHEN order_delivered_carrier_date IS NULL 
                THEN 0 
                ELSE 1 
            END 
            )
        AS order_delivered_carrier_date,
        
        SUM(CASE
                WHEN order_delivered_customer_date IS NULL 
                THEN 0 
                ELSE 1 
            END
            ) 
        AS order_delivered_customer_date

FROM workspace.default.pedidos
GROUP BY order_status;

Os valores ausentes nas colunas de data estão diretamente relacionados ao status do pedido. Dessa forma, caso o pedido ainda não tenha sido enviado, as colunas `order_delivered_carrier_date` e `order_delivered_customer_date` permanecerão vazias. A mesma lógica se aplica aos status cancelado, faturado, indisponível e despachado.

In [0]:
%sql
-- Verificando valores inválidos em cada coluna:
SELECT 
      SUM(CASE WHEN order_id IS NOT NULL AND NOT order_id RLIKE '^[A-Za-z0-9]+$' THEN 1 ELSE 0 END) AS order_id_invalid,

      SUM(CASE WHEN customer_id IS NOT NULL AND NOT customer_id RLIKE '^[A-Za-z0-9]+$' THEN 1 ELSE 0 END) AS customer_id_invalid,

      SUM(CASE WHEN order_status IS NOT NULL AND NOT order_status RLIKE '^[a-z]+$' THEN 1 ELSE 0 END) AS order_status_invalid,

      SUM(CASE WHEN order_purchase_timestamp IS NOT NULL AND to_timestamp(order_purchase_timestamp, 'yyyy-MM-dd HH:mm:ss') IS NULL THEN 1 ELSE 0 END) AS order_purchase_timestamp_invalid,

      SUM(CASE WHEN order_approved_at IS NOT NULL AND to_timestamp(order_approved_at, 'yyyy-MM-dd HH:mm:ss') IS NULL THEN 1 ELSE 0 END) AS order_approved_at_invalid,

      SUM(CASE WHEN order_delivered_carrier_date IS NOT NULL AND to_timestamp(order_delivered_carrier_date, 'yyyy-MM-dd HH:mm:ss') IS NULL THEN 1 ELSE 0 END) AS order_delivered_carrier_date_invalid,

      SUM(CASE WHEN order_delivered_customer_date IS NOT NULL AND to_timestamp(order_delivered_customer_date, 'yyyy-MM-dd HH:mm:ss') IS NULL THEN 1 ELSE 0 END) AS order_delivered_customer_date_invalid,

      SUM(CASE WHEN order_estimated_delivery_date IS NOT NULL AND to_timestamp(order_estimated_delivery_date, 'yyyy-MM-dd HH:mm:ss') IS NULL THEN 1 ELSE 0 END) AS order_estimated_delivery_date_invalid

FROM workspace.default.pedidos;

Não há valores inválidos em nenhuma coluna.

In [0]:
%sql
CREATE OR REPLACE VIEW workspace.olist_bronze.pedidos_bronze AS
SELECT *

FROM workspace.default.pedidos;