# üåê NIC ETL - REST API

## üìã O que este notebook faz

Este notebook **documenta a API REST do NIC ETL** seguindo padr√µes REST conhecidos:

- üó∫Ô∏è **Mapa do site** com todos os endpoints dispon√≠veis
- üìö **Documenta√ß√£o OpenAPI** simplificada
- üîó **Links para recursos** e notebooks relacionados
- üìä **Esquemas de resposta** para cada endpoint

## üéØ Padr√£o de documenta√ß√£o

Segue o padr√£o **OpenAPI 3.0** (Swagger) com:
- Informa√ß√µes b√°sicas da API
- Lista de endpoints organizados por categoria
- Descri√ß√µes claras de cada opera√ß√£o
- Exemplos de resposta

## üöÄ Endpoints Dispon√≠veis

### üîÜ Entrada e Navega√ß√£o
- `GET /` - Ponto de entrada da API com links de navega√ß√£o
- `GET /health` - Status b√°sico da API

### üìö Documenta√ß√£o
- `GET /api/v1` - Documenta√ß√£o OpenAPI completa da API

### ‚ñ∂Ô∏è Pipeline ETL
- `GET /api/v1/pipelines/gitlab-qdrant/run` - Executar pipeline ETL completo

### üìä Monitoramento
- `GET /api/v1/pipelines/gitlab-qdrant/runs/last` - Relat√≥rio da √∫ltima execu√ß√£o

---

## üîÜ API: Home Page

`GET /`

In [None]:
# GET /
from pathlib import Path

# Ler arquivo HTML com tratamento de encoding
html_file = Path("pages/home.html")
if html_file.exists():
    try:
        # Tentar UTF-8 primeiro
        html_content = html_file.read_text(encoding="utf-8")
    except UnicodeDecodeError:
        try:
            # Fallback para latin-1
            html_content = html_file.read_text(encoding="latin-1")
        except Exception:
            # √öltimo recurso - bytes mode
            with open(html_file, 'rb') as f:
                html_content = f.read().decode('utf-8', errors='replace')
else:
    html_content = "<h1>Error: Home page not found</h1>"

# Configurar response para HTML
try:
    RESPONSE
except NameError:
    RESPONSE = {"status": 200, "headers": {}}

RESPONSE["Headers"] = {"Content-Type": "text/html; charset=utf-8"}
print(RESPONSE.get("Headers"))

# RESPONSE["headers"] = {"Content-Type": "text/html; charset=utf-8"}
# RESPONSE.headers["Content-Type"] = "text/html; charset=utf-8"
# print(html_content)

## ü©∫ API: Status

`GET /health`

In [None]:
# GET /health
import json

try:
    RESPONSE["headers"]["Content-Type"] = "application/json; charset=utf-8"
except NameError:
    RESPONSE = {"status": 200, "headers": {}}
    
response = {
    "status": "ok",
    "see": [ "/api/v1" ]
}
print(json.dumps(response, indent=2, ensure_ascii=False))

## üó∫Ô∏è API: Mapa OpenAPI

`GET /api/v1`

In [None]:
# GET /api/v1
import json
from datetime import datetime

# NIC ETL API Documentation - OpenAPI 3.0 Style
api_documentation = {
    "openapi": "3.0.0",
    "info": {
        "title": "NIC ETL Pipeline API",
        "description": "API REST para execu√ß√£o e monitoramento do pipeline ETL do NIC (N√∫cleo de Intelig√™ncia e Conhecimento)",
        "version": "1.0.0",
        "contact": {
            "name": "NIC ETL Team",
            "url": "http://nic.processa.info"
        }
    },
    "servers": [
        {
            "url": "http://localhost:8000",
            "description": "Servidor de desenvolvimento"
        }
    ],
    "paths": {
        "/": {
            "get": {
                "summary": "NIC REST API - Ponto de entrada",
                "description": "Status b√°sico e links de navega√ß√£o da API",
                "tags": ["Health"],
                "responses": {
                    "200": {
                        "description": "Status OK com links de navega√ß√£o",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "status": {"type": "string", "example": "ok"},
                                        "see": {
                                            "type": "array",
                                            "items": {"type": "string"},
                                            "example": ["/health", "/api/v1"]
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "/health": {
            "get": {
                "summary": "Health check geral",
                "description": "Verifica status b√°sico da API",
                "tags": ["Health"],
                "responses": {
                    "200": {
                        "description": "Status OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "status": {"type": "string", "example": "ok"},
                                        "see": {"type": "string", "example": "/api/v1"}
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "/api/v1": {
            "get": {
                "summary": "Documenta√ß√£o da API",
                "description": "Retorna esta documenta√ß√£o da API em formato OpenAPI",
                "tags": ["Documentation"],
                "responses": {
                    "200": {
                        "description": "Documenta√ß√£o da API",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/api/v1/pipelines/gitlab-qdrant/run": {
            "get": {
                "summary": "Executar pipeline ETL",
                "description": "Executa o pipeline completo GitLab para QDrant atrav√©s do notebook etl.ipynb",
                "tags": ["Pipeline"],
                "responses": {
                    "200": {
                        "description": "Pipeline executado com sucesso",
                        "content": {
                            "text/plain": {
                                "schema": {
                                    "type": "string",
                                    "example": "‚úÖ etl.ipynb executado com sucesso"
                                }
                            }
                        }
                    },
                    "500": {
                        "description": "Erro na execu√ß√£o do pipeline",
                        "content": {
                            "text/plain": {
                                "schema": {
                                    "type": "string",
                                    "example": "‚ùå Erro ao executar etl.ipynb"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/api/v1/pipelines/gitlab-qdrant/runs/last": {
            "get": {
                "summary": "√öltimo relat√≥rio de execu√ß√£o",
                "description": "Retorna o relat√≥rio completo da √∫ltima execu√ß√£o do pipeline ETL a partir do arquivo pipeline-data/report.json",
                "tags": ["Pipeline", "Monitoring"],
                "responses": {
                    "200": {
                        "description": "Relat√≥rio de execu√ß√£o",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "pipeline_info": {"type": "object"},
                                        "context": {"type": "object"},
                                        "stages": {"type": "array"},
                                        "summary": {"type": "object"},
                                        "api_metadata": {"type": "object"}
                                    }
                                }
                            }
                        }
                    },
                    "404": {
                        "description": "Pipeline ainda n√£o foi executado",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "object",
                                    "properties": {
                                        "pipeline_status": {"type": "string", "example": "NOT_EXECUTED"},
                                        "message": {"type": "string", "example": "Pipeline has not been executed yet"}
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "PipelineStage": {
                "type": "object",
                "properties": {
                    "stage": {"type": "integer"},
                    "name": {"type": "string"},
                    "status": {"type": "string", "enum": ["SUCCESS", "FAILED", "RUNNING"]},
                    "duration_seconds": {"type": "number"}
                }
            },
            "ApiMetadata": {
                "type": "object",
                "properties": {
                    "endpoint": {"type": "string"},
                    "served_at": {"type": "string", "format": "date-time"},
                    "report_exists": {"type": "boolean"}
                }
            }
        }
    },
    "tags": [
        {
            "name": "Documentation",
            "description": "Documenta√ß√£o e metadados da API"
        },
        {
            "name": "Health",
            "description": "Verifica√ß√µes de sa√∫de do sistema"
        },
        {
            "name": "Pipeline",
            "description": "Opera√ß√µes do pipeline ETL"
        },
        {
            "name": "Monitoring",
            "description": "Monitoramento e relat√≥rios"
        }
    ]
}

# Adicionar metadados da resposta
api_documentation["_metadata"] = {
    "endpoint": "/api/v1",
    "served_at": datetime.now().isoformat() + "Z",
    "description": "Documenta√ß√£o autom√°tica da API NIC ETL",
    "available_endpoints": [
        "GET /",
        "GET /health", 
        "GET /api/v1",
        "GET /api/v1/pipelines/gitlab-qdrant/run",
        "GET /api/v1/pipelines/gitlab-qdrant/runs/last"
    ],
    "notebook_cells": {
        "root": "cell-2",
        "health": "cell-4", 
        "documentation": "cell-6",
        "pipeline_run": "cell-8",
        "pipeline_status": "cell-10"
    }
}

print(json.dumps(api_documentation, indent=2, ensure_ascii=False))

## üöÄ Pipeline GitLab-QDrant: Run

`POST /api/v1/pipelines/gitlab-qdrant/run`

In [None]:
# POST /api/v1/pipelines/gitlab-qdrant/run
import json
import sys
import os

# Adicionar o diret√≥rio src ao path
sys.path.append(os.path.join(os.getcwd(), 'src'))

from pipeline_runner import PipelineRunner
from request_parser import parse_request

# Capturar REQUEST global do Jupyter Kernel Gateway
try:
    request_obj = REQUEST  # Vari√°vel global do Jupyter Kernel Gateway
except NameError:
    request_obj = None  # Mock para teste local

# Parse da requisi√ß√£o
parsed = parse_request(request_obj)

# Inicializar o runner
runner = PipelineRunner()

# Verificar se j√° est√° rodando ou iniciar novo job
if runner.is_running():
    result = {"status": "job_running"}
else:
    result = runner.start_background()

print(json.dumps(result))

## üöÄ Pipeline GitLab-QDrant: Run (Via GET)

`GET /api/v1/pipelines/gitlab-qdrant/run`

In [None]:
# GET /api/v1/pipelines/gitlab-qdrant/run
import json
import sys
import os

# Adicionar o diret√≥rio src ao path
sys.path.append(os.path.join(os.getcwd(), 'src'))

from pipeline_runner import PipelineRunner
from request_parser import get_query_param

# Parse da requisi√ß√£o
try:
    REQUEST
except NameError:
    REQUEST = None  # mock para teste local

# Obter action
action = get_query_param(REQUEST, "action")

# Inicializar o runner
runner = PipelineRunner()

# Determinar status atual
current_status = "job_running" if runner.is_running() else "idle"

# Comportamento baseado no par√¢metro action
if action is None:
    # Sem action = apenas retornar status
    result = {"status": current_status}

elif action == "run_pipeline":
    # action=run_pipeline = executar pipeline
    if runner.is_running():
        result = {"status": "job_running"}
    else:
        result = runner.start_background()

else:
    # action inv√°lida - retornar status + mensagem de erro
    result = {
        "status": current_status,
        "invalid_usage": f"Action not supported: {action}",
        "known_actions": ["run_pipeline"]
    }

print(json.dumps(result))

## üìä Pipeline GitLab-QDrant: Status

`GET /api/v1/pipelines/gitlab-qdrant/runs/last`

In [None]:
# GET /api/v1/pipelines/gitlab-qdrant
import json
from datetime import datetime

# Informa√ß√µes sobre o pipeline GitLab-QDrant
pipeline_info = {
    "pipeline": {
        "name": "gitlab-qdrant",
        "title": "Pipeline ETL GitLab para QDrant",
        "description": "Pipeline completo de ETL que processa documentos do GitLab e armazena embeddings no QDrant para busca sem√¢ntica",
        "version": "1.0.0",
        "type": "etl",
        "source": "gitlab",
        "destination": "qdrant"
    },
    "stages": [
        {
            "id": "01",
            "name": "fundacao-preparacao",
            "title": "üèóÔ∏è Funda√ß√£o e Prepara√ß√£o",
            "description": "Configura√ß√£o do ambiente, valida√ß√£o de credenciais e prepara√ß√£o dos diret√≥rios",
            "notebook": "etl-1-fundacao-preparacao.ipynb",
            "dependencies": [],
            "outputs": ["Valida√ß√£o de ambiente", "Estrutura de diret√≥rios"]
        },
        {
            "id": "02", 
            "name": "coleta-gitlab",
            "title": "üì• Coleta GitLab",
            "description": "Download de documentos do reposit√≥rio GitLab especificado",
            "notebook": "etl-2-coleta-gitlab.ipynb",
            "dependencies": ["01"],
            "outputs": ["Documentos baixados", "Metadados de arquivos"]
        },
        {
            "id": "03",
            "name": "processamento-docling", 
            "title": "‚öôÔ∏è Processamento Docling",
            "description": "Extra√ß√£o de conte√∫do de documentos usando OCR e processamento de texto",
            "notebook": "etl-3-processamento-docling.ipynb",
            "dependencies": ["02"],
            "outputs": ["Texto extra√≠do", "Estrutura de documentos"]
        },
        {
            "id": "04",
            "name": "segmentacao-chunks",
            "title": "üî™ Segmenta√ß√£o em Chunks", 
            "description": "Divis√£o inteligente do texto em segmentos para processamento",
            "notebook": "etl-4-segmentacao-chunks.ipynb",
            "dependencies": ["03"],
            "outputs": ["Chunks de texto", "Metadados de segmenta√ß√£o"]
        },
        {
            "id": "05",
            "name": "geracao-embeddings",
            "title": "üß† Gera√ß√£o de Embeddings",
            "description": "Cria√ß√£o de vetores sem√¢nticos usando modelo BAAI/bge-m3",
            "notebook": "etl-5-geracao-embeddings.ipynb", 
            "dependencies": ["04"],
            "outputs": ["Vetores de embeddings", "√çndices de mapeamento"]
        },
        {
            "id": "06",
            "name": "armazenamento-qdrant",
            "title": "üíæ Armazenamento QDrant",
            "description": "Inser√ß√£o dos embeddings no banco vetorial QDrant para busca sem√¢ntica",
            "notebook": "etl-6-armazenamento-qdrant.ipynb",
            "dependencies": ["05"],
            "outputs": ["Collection QDrant", "√çndices de busca"]
        }
    ],
    "configuration": {
        "source": {
            "type": "gitlab",
            "repository": "nic/documentacao/base-de-conhecimento",
            "target_folder": "30-Aprovados",
            "supported_formats": [".pdf", ".docx", ".md", ".txt"]
        },
        "processing": {
            "embedding_model": "BAAI/bge-m3",
            "chunk_size": 1000,
            "chunk_overlap": 200,
            "batch_size": 32
        },
        "destination": {
            "type": "qdrant",
            "collection": "nic_documents",
            "vector_size": 1024,
            "distance_metric": "COSINE"
        }
    },
    "features": [
        "üîí Trava autom√°tica para evitar execu√ß√µes simult√¢neas",
        "üåê Execu√ß√£o em background independente da conex√£o HTTP",
        "üìä Monitoramento de status em tempo real",
        "üîÑ Recupera√ß√£o autom√°tica de falhas",
        "üìù Logs detalhados de cada etapa",
        "‚ö° Processamento batch otimizado",
        "üß† IA para embeddings de alta qualidade",
        "üîç Busca sem√¢ntica avan√ßada"
    ],
    "endpoints": {
        "run": "/api/v1/pipelines/gitlab-qdrant/run",
        "status": "/api/v1/pipelines/gitlab-qdrant/runs/last",
        "info": "/api/v1/pipelines/gitlab-qdrant"
    },
    "execution": {
        "estimated_duration": "1-3 horas",
        "execution_mode": "background",
        "supported_methods": ["GET", "POST"],
        "trigger_parameter": "action=run_pipeline"
    },
    "metadata": {
        "created_at": "2024-01-01T00:00:00Z",
        "updated_at": datetime.now().isoformat() + "Z",
        "maintainer": "NIC Lab Team",
        "documentation": "/docs",
        "source_code": "notebooks/"
    }
}

print(json.dumps(pipeline_info, indent=2, ensure_ascii=False))