<a href="https://colab.research.google.com/github/fabiopauli/Qwen3.5-colab/blob/main/Server_Qwen27B_llamacpp_256k_context_L4_20gb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üöÄ Robust LLM API Server: Qwen3.5-27B + FastAPI Task Queue + Cloudflare

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/SEU_USUARIO/SEU_REPOSITORIO/blob/main/NOME_DO_NOTEBOOK.ipynb)

Este notebook transforma o Google Colab em um **servidor de infer√™ncia de IA robusto e ass√≠ncrono**, ideal para integra√ß√µes com aplica√ß√µes externas (backends, automa√ß√µes, bots) que precisam lidar com m√∫ltiplas requisi√ß√µes sem sofrer com *timeouts*.

## üß† O Modelo
Estamos utilizando o **Qwen3.5-27B** (vers√£o GGUF quantizada pela Unsloth). √â um modelo extremamente capaz, rodando de forma otimizada via `llama.cpp` utilizando acelera√ß√£o por GPU (CUDA).

> ‚ö†Ô∏è **Requisito de Hardware:** Para rodar este modelo de 27 bilh√µes de par√¢metros adequadamente, certifique-se de alterar o ambiente de execu√ß√£o do Colab para **GPU L4** (Preferencial) ou **A100**.

---

## üèóÔ∏è A Arquitetura: Filas (Queueing) e Polling

Diferente de APIs tradicionais (onde o cliente faz a requisi√ß√£o, a conex√£o fica aberta, e ele espera a resposta de forma bloqueante), este servidor implementa um padr√£o de **Filas Ass√≠ncronas (Task Queue)**.

**Por que isso √© necess√°rio?**
Gera√ß√£o de texto em LLMs pode demorar minutos dependendo do tamanho do prompt. Requisi√ß√µes HTTP tradicionais costumam dar *timeout* ap√≥s 30 ou 60 segundos. Al√©m disso, se 5 usu√°rios pedirem textos ao mesmo tempo, o Colab ficaria sem mem√≥ria (OOM - Out of Memory).

**Como funciona agora:**
1. **Envio (Queueing):** O cliente envia o *prompt*. A API salva o pedido em uma fila, gera um **ID √∫nico** e responde instantaneamente: *"Recebi seu pedido, ele est√° na fila"*.
2. **Processamento (Worker):** Em *background*, um *worker* pega uma tarefa da fila por vez e envia para o modelo procesar, protegendo a GPU contra sobrecarga.
3. **Consulta (Polling):** O cliente usa o **ID √∫nico** para consultar a API periodicamente: *"A tarefa j√° acabou?"*. Quando o status mudar para `finished`, a resposta completa ser√° entregue.

---

## üîå Documenta√ß√£o da API

Ap√≥s rodar todas as c√©lulas de infraestrutura, um t√∫nel do **Cloudflare** ser√° gerado com uma URL p√∫blica (ex: `https://seu-tunel.trycloudflare.com/v1`).

### 1. Criar uma Tarefa (POST)
**Endpoint:** `/v1/chat/completions`

```json
// Request Body
{
  "model": "unsloth/Qwen3.5-27B-GGUF",
  "messages": [{"role": "user", "content": "Seu prompt aqui"}]
}

// Response (HTTP 202 Accepted)
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "queued"
}

2. Consultar Status e Resultado (GET)
Endpoint: /v1/tasks/{id}
code
JSON
// Response enquato aguarda (HTTP 200)
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "processing", // ou "queued"
  "result": null,
  "error": null
}

// Response quando conclu√≠do (HTTP 200)
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "finished",
  "result": {
      "id": "chatcmpl-...",
      "choices": [
          {"message": {"content": "Resposta gerada pela IA..."}}
      ]
  },
  "error": null
}
üõ†Ô∏è Passo a Passo para Uso
C√©lula 1: Baixa as depend√™ncias, compila o llama.cpp com suporte a CUDA e baixa o modelo Qwen3.5-27B (Demora cerca de 5 a 8 minutos).
C√©lula 2: Inicia o servidor base do llama.cpp em background.
C√©lula 3: Instala as depend√™ncias do Python (FastAPI, Uvicorn, etc).
C√©lula 4: Inicia o servidor de Fila Ass√≠ncrona (FastAPI) e cria o t√∫nel p√∫blico do Cloudflare.
C√©lula 5: Um script Python pr√°tico que demonstra como fazer requisi√ß√µes ass√≠ncronas enviando m√∫ltiplas tarefas simult√¢neas e fazendo o polling para coletar os resultados.
Aviso: As URLs geradas pelo Cloudflare Try s√£o ef√™meras e duram apenas enquanto a sess√£o do Colab estiver ativa.

A Celula abaixo demora 8 minutos para ser conclu√≠da

In [1]:
# Cell 1: Build llama.cpp with CUDA and run Qwen3.5-27B (non-thinking mode)
!apt-get update -qq && apt-get install -qq -y pciutils build-essential cmake curl libcurl4-openssl-dev > /dev/null 2>&1

!git clone --depth 1 https://github.com/ggml-org/llama.cpp 2>/dev/null || echo "already cloned"

!cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON > /dev/null 2>&1

!cmake --build llama.cpp/build --config Release -j$(nproc) --clean-first --target llama-cli llama-server 2>&1 | tail -5

!cp llama.cpp/build/bin/llama-* llama.cpp/


W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
[ 98%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server.cpp.o
[ 98%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-http.cpp.o
[ 98%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-models.cpp.o
[100%] Linking CXX executable ../../bin/llama-server
[100%] Built target llama-server
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m3.6/3.6 MB[0m [31m113.6 MB/s[0m eta [36m0:00:00[0m
[?25h/bin/bash: line 1: huggingface-cli: command not found


In [24]:
!curl -LsSf https://hf.co/cli/install.sh | bash

[0;34m[INFO][0m Installing Hugging Face CLI...
[0;34m[INFO][0m OS: linux
[0;34m[INFO][0m Force reinstall: false
[0;34m[INFO][0m Install dir: /root/.hf-cli
[0;34m[INFO][0m Bin dir: /root/.local/bin
[0;34m[INFO][0m Skip PATH update: false
[0;34m[INFO][0m Using Python: Python 3.12.12
[0;34m[INFO][0m Creating directories...
[0;34m[INFO][0m Creating virtual environment...
[0;34m[INFO][0m Virtual environment already exists; reusing (pass --force to recreate)
[0;34m[INFO][0m Installing/upgrading Hugging Face CLI (latest)...
[0;34m[INFO][0m Installation output suppressed; set HF_CLI_VERBOSE_PIP=1 for full logs
[0;34m[INFO][0m Using uv for faster installation
[0;34m[INFO][0m Linking hf CLI into /root/.local/bin...
[0;34m[INFO][0m hf available at /root/.local/bin/hf (symlink to venv)
[0;34m[INFO][0m Run without touching PATH: env PATH="/root/.local/bin:$PATH" hf --help
[0;34m[INFO][0m /root/.local/bin is not in your PATH
[0;32m[SUCCESS][0m Added /root/.local/b

In [23]:
!uv venv /root/.hf-cli/venv

Using CPython 3.12.12 interpreter at: [36m/usr/bin/python3[39m
Creating virtual environment at: [36m/root/.hf-cli/venv[39m
[33m?[0m [1mA virtual environment already exists at `/root/.hf-cli/venv`. Do you want to replace it?[0m [38;5;8m[y/n][0m [38;5;8m‚Ä∫[0m [36myes[0m

[36m[1mhint[0m[1m:[0m Use the `[32m--clear[39m` flag or set `[32mUV_VENV_CLEAR=1[39m` to skip this prompt[?25l^C


In [25]:
!hf download unsloth/Qwen3.5-27B-GGUF \
    --local-dir unsloth/Qwen3.5-27B-GGUF \
    --include "*UD-Q4_K_XL*"

Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Downloading (incomplete total...): 100% 16.7G/16.7G [00:55<00:00, 374MB/s]
Fetching 1 files: 100% 1/1 [00:55<00:00, 55.60s/it]
Download complete: 100% 16.7G/16.7G [00:55<00:00, 374MB/s]                /content/unsloth/Qwen3.5-27B-GGUF
Download complete: 100% 16.7G/16.7G [00:55<00:00, 301MB/s]


A c√©lula abaixo cria o servidor Llamacpp em background.

In [2]:
# Cell 2: Run llama-server in the background
import os
import time
import subprocess

# Kill any existing server to free up the port
os.system("pkill -f llama-server")
time.sleep(2)

os.environ["LLAMA_CACHE"] = "unsloth/Qwen3.5-27B-GGUF"

# Start the server using nohup so it runs in the background
server_cmd = """
nohup ./llama.cpp/llama-server \
    -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL \
    --host 127.0.0.1 \
    --port 8081 \
    --ctx-size 32768 \
    -ngl 99 \
    --temp 0.7 \
    --top-p 0.8 \
    --top-k 20 \
    --min-p 0.00 \
    --chat-template-kwargs '{"enable_thinking": false}' \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 > llama_server.log 2>&1 &
"""

print("Starting llama-server on port 8081...")
os.system(server_cmd)

# Wait for the server to spin up and load the model into VRAM
print("Waiting for model to load into VRAM (this takes 30-60 seconds)...")
for i in range(600):
    try:
        import requests
        res = requests.get("http://127.0.0.1:8081/health")
        if res.status_code == 200:
            print("\n‚úÖ llama-server is ready and listening on port 8081!")
            break
    except:
        pass
    time.sleep(2)
    print(".", end="", flush=True)
else:
    print("\n‚ö†Ô∏è Server might not have started correctly. Check llama_server.log:")
    os.system("tail -n 20 llama_server.log")

Starting llama-server on port 8081...
Waiting for model to load into VRAM (this takes 30-60 seconds)...
...............................................................................................................................................................................................
‚úÖ llama-server is ready and listening on port 8081!


A seguir, criamos outro servidor para gerar os endpoints da API, tamb√©m em background

In [12]:
# Cell 3: Install dependencies for FastAPI wrapper
!pip install -q fastapi uvicorn pyngrok httpx pydantic nest-asyncio slowapi

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/61.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m61.0/61.0 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h

C√©lula 4 - servidor robusto, com rate limiting

In [17]:
# Cell 4: Robust Background FastAPI (Queue + Rate Limits + Safe Tokenizer)
import os
import time
import re
import sys

# 1. Prepare the FastAPI code
fastapi_code = """
import uvicorn
import asyncio
import uuid
import time
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
import httpx
from typing import Dict, Any, List

# --- Imports for Rate Limiting ---
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded

# --- ROBUST TOKENIZER SETUP ---
# We wrap this in try/except so the server NEVER crashes due to tiktoken
encoding = None
try:
    import tiktoken
    encoding = tiktoken.get_encoding("cl100k_base")
    print("‚úÖ Tiktoken loaded successfully.")
except Exception as e:
    print(f"‚ö†Ô∏è Tiktoken failed to load ({e}). Using fallback estimation.")
    encoding = None

MAX_TOKEN_LIMIT = 64000

# --- Helper to get Real IP ---
def get_real_ip(request: Request) -> str:
    cf_ip = request.headers.get("cf-connecting-ip")
    return cf_ip if cf_ip else "127.0.0.1"

# --- Config Limiter ---
limiter = Limiter(key_func=get_real_ip)
app = FastAPI(title="Robust Queued FastAPI")
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

LLAMA_SERVER_URL = "http://127.0.0.1:8081"
tasks_db: Dict[str, Dict[str, Any]] = {}
request_queue = asyncio.Queue(maxsize=100)

# --- Token Counting Function (Safe Version) ---
def count_tokens_safe(messages: List[Dict[str, str]]) -> int:
    text_buffer = ""
    for m in messages:
        text_buffer += m.get("content", "")

    if encoding:
        return len(encoding.encode(text_buffer))
    else:
        # Fallback: ~4 characters per token
        return len(text_buffer) // 4

# --- Worker ---
async def process_queue():
    async with httpx.AsyncClient(timeout=600.0) as client:
        while True:
            task_id, payload = await request_queue.get()
            if task_id not in tasks_db:
                request_queue.task_done()
                continue

            tasks_db[task_id]["status"] = "processing"
            try:
                payload["stream"] = False
                response = await client.post(
                    f"{LLAMA_SERVER_URL}/v1/chat/completions",
                    json=payload
                )
                if response.status_code != 200:
                    tasks_db[task_id]["status"] = "failed"
                    tasks_db[task_id]["error"] = f"Upstream: {response.text}"
                else:
                    tasks_db[task_id]["status"] = "finished"
                    tasks_db[task_id]["result"] = response.json()
            except Exception as e:
                tasks_db[task_id]["status"] = "failed"
                tasks_db[task_id]["error"] = str(e)
            finally:
                request_queue.task_done()

@app.on_event("startup")
async def startup_event():
    asyncio.create_task(process_queue())

@app.post("/v1/chat/completions")
@limiter.limit("10/minute")
async def queue_chat_completion(request: Request):
    try:
        payload = await request.json()
    except:
        raise HTTPException(status_code=400, detail="Invalid JSON")

    messages = payload.get("messages", [])

    # 1. Count Tokens
    token_count = count_tokens_safe(messages)

    # 2. Reject if too big
    if token_count > MAX_TOKEN_LIMIT:
        raise HTTPException(status_code=400, detail=f"Context too long: {token_count} > {MAX_TOKEN_LIMIT}")

    # 3. Queue
    task_id = str(uuid.uuid4())
    tasks_db[task_id] = {
        "id": task_id, "status": "queued",
        "created_at": time.time(), "tokens": token_count
    }

    try:
        request_queue.put_nowait((task_id, payload))
    except asyncio.QueueFull:
        del tasks_db[task_id]
        raise HTTPException(status_code=503, detail="Queue full")

    return JSONResponse(content={"id": task_id, "status": "queued"}, status_code=202)

@app.get("/v1/tasks/{task_id}")
@limiter.limit("60/minute")
async def get_task_status(request: Request, task_id: str):
    if task_id not in tasks_db:
        raise HTTPException(status_code=404, detail="Task not found")
    return tasks_db[task_id]

# Endpoint for CRON deletion
@app.delete("/v1/tasks/cleanup")
async def cleanup(request: Request, older_than: int = 3600):
    now = time.time()
    to_del = [k for k, v in tasks_db.items() if now - v.get("created_at", 0) > older_than]
    for k in to_del: del tasks_db[k]
    return {"deleted": len(to_del)}
"""

with open("fastapi_server.py", "w") as f:
    f.write(fastapi_code)

# 2. Cleanup old processes
os.system("pkill -f uvicorn")
os.system("pkill -f cloudflared")
time.sleep(2)

# 3. Start FastAPI and LOG OUTPUT
print("üöÄ Starting FastAPI...")
# We redirect stderr to stdout to catch python errors
os.system("nohup python -m uvicorn fastapi_server:app --host 0.0.0.0 --port 8000 > fastapi.log 2>&1 &")

# 4. Wait and Check if it crashed
time.sleep(5)
with open("fastapi_server.py", "r") as f:
    pass # Just checking file exists

# Check logs to see if it's actually running
with open("fastapi.log", "r") as f:
    log_content = f.read()
    if "Application startup complete" not in log_content:
        print("\n‚ùå CRITICAL ERROR: FastAPI failed to start!")
        print("--- LOG START ---")
        print(log_content)
        print("--- LOG END ---")
        raise RuntimeError("Fix the errors above before starting Cloudflare.")
    else:
        print("‚úÖ FastAPI started successfully.")

# 5. Start Cloudflare Tunnel
print("üîó Starting Cloudflare Tunnel...")
if not os.path.exists("cloudflared"):
    os.system("wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared")
    os.system("chmod +x cloudflared")

os.system("nohup ./cloudflared tunnel --url http://127.0.0.1:8000 > cloudflare.log 2>&1 &")
time.sleep(8)

# 6. Get URL
with open("cloudflare.log", "r") as f:
    logs = f.read()
    match = re.search(r"(https://[a-zA-Z0-9-]+\.trycloudflare\.com)", logs)
    if match:
        public_url = match.group(1)
        base_url = f"{public_url}/v1"
        with open("api_url.txt", "w") as url_file:
            url_file.write(base_url)
        print(f"\n‚úÖ URL saved: {base_url}")
    else:
        print("‚ö†Ô∏è Could not find Cloudflare URL.")

üöÄ Starting FastAPI...
‚úÖ FastAPI started successfully.
üîó Starting Cloudflare Tunnel...

‚úÖ URL saved: https://trek-ken-doing-attended.trycloudflare.com/v1


Test da API

In [26]:
# Cell 6: Test Suite (Queue, Token Limits, and Rate Limiting)
import requests
import time
import sys

# 1. Load the API URL
try:
    with open("api_url.txt", "r") as f:
        BASE_URL = f.read().strip()
    print(f"üéØ Targeting API at: {BASE_URL}")
except FileNotFoundError:
    print("‚ùå Error: 'api_url.txt' not found. Run the server cell first.")
    sys.exit(1)

def poll_task(task_id):
    """Helper to poll for completion"""
    print(f"   ‚è≥ Polling task {task_id}...", end="", flush=True)
    for _ in range(30): # Wait up to 60 seconds
        try:
            res = requests.get(f"{BASE_URL}/tasks/{task_id}")
            if res.status_code == 429:
                print(" (Rate limited on polling) ", end="")
                time.sleep(5)
                continue

            data = res.json()
            status = data.get("status")
            if status == "finished":
                print(f"\n   ‚úÖ Completed! Response: {data['result']['choices'][0]['message']['content'][:50]}...")
                return True
            elif status == "failed":
                print(f"\n   ‚ùå Task Failed: {data.get('error')}")
                return False
            else:
                print(".", end="", flush=True)
                time.sleep(2)
        except Exception as e:
            print(f" Error: {e}")
            return False
    print("\n   ‚ö†Ô∏è Polling timed out.")
    return False

# --- TEST 1: The "Happy Path" ---
print("\n" + "="*50)
print("TEST 1: Standard Request (Queue System)")
payload_normal = {
    "model": "unsloth/Qwen3.5-27B-GGUF",
    "messages": [{"role": "user", "content": "What is 2+2? Answer in one word."}]
}
resp = requests.post(f"{BASE_URL}/chat/completions", json=payload_normal)

if resp.status_code == 202:
    data = resp.json()
    t_id = data['id']
    print(f"‚úÖ Request Queued. ID: {t_id}")
    poll_task(t_id)
else:
    print(f"‚ùå Failed: {resp.status_code} - {resp.text}")


# --- TEST 2: The "Context Limit" (Rejection > 64k) ---
print("\n" + "="*50)
print("TEST 2: Context Window Limit (64k Tokens)")
# Create a massive string. 'test ' is 1 token. 70,000 repeats > 64k limit.
huge_content = "test " * 70000
payload_huge = {
    "model": "unsloth/Qwen3.5-27B-GGUF",
    "messages": [{"role": "user", "content": huge_content}]
}
print(f"   üì§ Sending payload with ~70,000 tokens...")
resp = requests.post(f"{BASE_URL}/chat/completions", json=payload_huge)

if resp.status_code == 400:
    print(f"‚úÖ Success! Server rejected the request.")
    print(f"   Response: {resp.json()['detail']}")
elif resp.status_code == 202:
    print(f"‚ùå Fail: Server accepted the huge request (Limit didn't work).")
else:
    print(f"‚ö†Ô∏è Unexpected status: {resp.status_code}")


# --- TEST 3: Rate Limiting (Spamming) ---
print("\n" + "="*50)
print("TEST 3: Rate Limiting (Max 10/min)")
print("   üöÄ Spamming requests to trigger 429...")

limit_hit = False
for i in range(1, 15):
    payload = {"messages": [{"role": "user", "content": "hi"}]}
    resp = requests.post(f"{BASE_URL}/chat/completions", json=payload)

    if resp.status_code == 429:
        print(f"\n‚úÖ Rate Limit Triggered on request #{i}!")
        print(f"   Server said: {resp.text}")
        limit_hit = True
        break
    else:
        print(f"   Request {i}: {resp.status_code}", end="\r")
        time.sleep(0.1)

if not limit_hit:
    print("\n‚ùå Failed to trigger rate limit. (Did you wait a minute since the last test?)")

üéØ Targeting API at: https://trek-ken-doing-attended.trycloudflare.com/v1

TEST 1: Standard Request (Queue System)
‚ùå Failed: 530 - <!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Cloudflare Tunnel error | trek-ken-doing-attended.trycloudflare.com | Cloudflare</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/main.css" />


<script>
(function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){

Exemplo de como consumir a API

In [19]:
# Cell 7: Consumer Client - Testing the Queue with Legal Prompts
import requests
import time
import json
import sys

# --- CONFIGURATION ---
try:
    with open("api_url.txt", "r") as f:
        API_BASE_URL = f.read().strip()
    print(f"üîó Connected to: {API_BASE_URL}")
except FileNotFoundError:
    print("‚ùå Error: URL file not found. Run the server cell first.")
    sys.exit(1)

# The prompts to test
PROMPTS = [
    "Escreva um texto sobre direito previdenci√°rio",
    "Escreva sobre o Supremo Tribunal Federal no Brasil"
]

# Store task IDs here
active_tasks = {}

# --- STEP 1: SEND REQUESTS (Queueing) ---
print("\n" + "="*60)
print("üì§ STEP 1: SENDING REQUESTS TO QUEUE")
print("="*60)

for i, prompt in enumerate(PROMPTS):
    print(f"Sending prompt {i+1}: '{prompt}'...")

    payload = {
        "model": "unsloth/Qwen3.5-27B-GGUF",
        "messages": [
            {"role": "system", "content": "Voc√™ √© um assistente jur√≠dico especialista em direito brasileiro. Responda de forma t√©cnica e completa."},
            {"role": "user", "content": prompt}
        ],
        "temperature": 0.6,
        "max_tokens": 2048 # Limit output length
    }

    try:
        response = requests.post(f"{API_BASE_URL}/chat/completions", json=payload)

        if response.status_code == 202:
            data = response.json()
            task_id = data["id"]
            active_tasks[task_id] = {"prompt": prompt, "status": "queued"}
            print(f"   ‚úÖ Queued! Task ID: {task_id}")
        elif response.status_code == 429:
            print("   ‚ùå Rate Limited (Wait a minute and try again)")
        else:
            print(f"   ‚ùå Error {response.status_code}: {response.text}")

    except Exception as e:
        print(f"   ‚ùå Connection Error: {e}")

# --- STEP 2: POLLING LOOP (Waiting for results) ---
print("\n" + "="*60)
print("‚è≥ STEP 2: WAITING FOR GENERATION (POLLING)")
print("="*60)

completed_results = {}

while len(active_tasks) > 0:
    # Iterate over a copy of keys so we can remove finished tasks safely
    current_ids = list(active_tasks.keys())

    for task_id in current_ids:
        try:
            # Check status
            res = requests.get(f"{API_BASE_URL}/tasks/{task_id}")

            if res.status_code == 200:
                data = res.json()
                status = data.get("status")

                # Update status for display
                if active_tasks[task_id]["status"] != status:
                    print(f"Task {task_id[:8]}... status changed to: {status.upper()}")
                    active_tasks[task_id]["status"] = status

                if status == "finished":
                    # Save result and remove from active list
                    content = data["result"]["choices"][0]["message"]["content"]
                    completed_results[task_id] = {
                        "prompt": active_tasks[task_id]["prompt"],
                        "content": content
                    }
                    del active_tasks[task_id]
                    print(f"üéâ Task {task_id[:8]}... FINISHED!")

                elif status == "failed":
                    error_msg = data.get("error")
                    completed_results[task_id] = {
                        "prompt": active_tasks[task_id]["prompt"],
                        "content": f"ERROR: {error_msg}"
                    }
                    del active_tasks[task_id]
                    print(f"üíÄ Task {task_id[:8]}... FAILED!")

            elif res.status_code == 429:
                print("   (Polling too fast, slowing down...)")
                time.sleep(2)

        except Exception as e:
            print(f"Network error polling {task_id}: {e}")

    if len(active_tasks) > 0:
        # Wait 5 seconds before checking again to be nice to the server
        time.sleep(5)

# --- STEP 3: DISPLAY RESULTS ---
print("\n" + "="*60)
print("üìú STEP 3: FINAL RESULTS")
print("="*60)

for tid, data in completed_results.items():
    print(f"\nüì¢ PROMPT: {data['prompt']}")
    print("-" * 60)
    print(data['content'])
    print("=" * 60)

üîó Connected to: https://trek-ken-doing-attended.trycloudflare.com/v1

üì§ STEP 1: SENDING REQUESTS TO QUEUE
Sending prompt 1: 'Escreva um texto sobre direito previdenci√°rio'...
   ‚úÖ Queued! Task ID: 2fb3ee40-6901-456d-ad28-a4742fb8e443
Sending prompt 2: 'Escreva sobre o Supremo Tribunal Federal no Brasil'...
   ‚úÖ Queued! Task ID: b6f041f9-f522-42b1-8706-558e0ee72042

‚è≥ STEP 2: WAITING FOR GENERATION (POLLING)
Task 2fb3ee40... status changed to: PROCESSING
Task 2fb3ee40... status changed to: FINISHED
üéâ Task 2fb3ee40... FINISHED!
Task b6f041f9... status changed to: PROCESSING
Task b6f041f9... status changed to: FINISHED
üéâ Task b6f041f9... FINISHED!

üìú STEP 3: FINAL RESULTS

üì¢ PROMPT: Escreva um texto sobre direito previdenci√°rio
------------------------------------------------------------
# O Direito Previdenci√°rio Brasileiro: Fundamentos, Estrutura e Contemporaneidades

## 1. Introdu√ß√£o e Natureza Jur√≠dica

O Direito Previdenci√°rio, no ordenamento jur√≠dico 

Documenta√ß√£o da API


1. API Documentation
Here are the endpoints available on your robust server. You can use these with any HTTP client (Postman, curl, Python requests, etc.).
Base URL
The URL is generated dynamically (e.g., https://random-name.trycloudflare.com/v1).
1. Submit a Chat Task (Queue)
Submits a request to the queue. Returns immediately with a Task ID.
Method: POST
Endpoint: /chat/completions
Rate Limit: 10 requests per minute per IP.
Body (JSON):
code
JSON
{
  "model": "unsloth/Qwen3.5-27B-GGUF",
  "messages": [
    {"role": "system", "content": "Optional system prompt"},
    {"role": "user", "content": "Your prompt here"}
  ],
  "temperature": 0.7
}
Response (202 Accepted):
code
JSON
{
  "id": "uuid-string-here",
  "status": "queued"
}
Errors: 400 (Context too long > 64k tokens), 503 (Queue full), 429 (Rate limit exceeded).
2. Check Task Status (Poll)
Checks if your generation is finished.
Method: GET
Endpoint: /tasks/{task_id}
Rate Limit: 60 requests per minute per IP.
Response (JSON):
If Queued/Processing:
code
JSON
{ "id": "...", "status": "queued" } // or "processing"
If Finished:
code
JSON
{
  "id": "...",
  "status": "finished",
  "result": { ... OpenAI Standard Response ... }
}
If Failed:
code
JSON
{ "id": "...", "status": "failed", "error": "Error details" }
3. Manual Cleanup (Cron)
Deletes old tasks from memory.
Method: DELETE
Endpoint: /tasks/cleanup?older_than={seconds}

Celula 4: Servidor fastapi simples (sem rate limiting). N√£o acionar esta c√©lula se acionou a anterior.

In [16]:
# Cell 4: Background FastAPI + Cloudflare Tunnel
import os
import time
import re

# 1. Write the FastAPI app to a file
fastapi_code = """
import uvicorn
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse, JSONResponse
from fastapi.middleware.cors import CORSMiddleware
import httpx

app = FastAPI(title="Custom FastAPI Wrapper for llama.cpp")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

LLAMA_SERVER_URL = "http://127.0.0.1:8081"

@app.get("/v1/models")
async def get_models():
    async with httpx.AsyncClient() as client:
        response = await client.get(f"{LLAMA_SERVER_URL}/v1/models")
        return response.json()

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    payload = await request.json()
    is_stream = payload.get("stream", False)

    if is_stream:
        async def generate():
            async with httpx.AsyncClient(timeout=300.0) as client:
                async with client.stream("POST", f"{LLAMA_SERVER_URL}/v1/chat/completions", json=payload) as response:
                    async for chunk in response.aiter_bytes():
                        yield chunk

        return StreamingResponse(generate(), media_type="text/event-stream")
    else:
        async with httpx.AsyncClient(timeout=300.0) as client:
            response = await client.post(f"{LLAMA_SERVER_URL}/v1/chat/completions", json=payload)
            return JSONResponse(content=response.json(), status_code=response.status_code)
"""

with open("fastapi_server.py", "w") as f:
    f.write(fastapi_code)

# 2. Kill existing processes (if you run this cell multiple times)
os.system("pkill -f uvicorn")
os.system("pkill -f cloudflared")
time.sleep(1)

# 3. Download Cloudflare if needed
if not os.path.exists("cloudflared"):
    os.system("wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared")
    os.system("chmod +x cloudflared")

# 4. Start FastAPI in the background via Uvicorn
print("Starting FastAPI server in the background...")
os.system("nohup python -m uvicorn fastapi_server:app --host 0.0.0.0 --port 8000 > fastapi.log 2>&1 &")

# 5. Start Cloudflare Tunnel in the background
print("Starting Cloudflare Tunnel...")
os.system("nohup ./cloudflared tunnel --url http://127.0.0.1:8000 > cloudflare.log 2>&1 &")

# Wait a few seconds for Cloudflare to assign a URL
print("Waiting for URL...")
time.sleep(8)

# 6. Read the log to extract the URL
with open("cloudflare.log", "r") as f:
    logs = f.read()
    match = re.search(r"(https://[a-zA-Z0-9-]+\.trycloudflare\.com)", logs)

    if match:
        public_url = match.group(1)
        base_url = f"{public_url}/v1"

        # Save the URL to a file
        with open("api_url.txt", "w") as url_file:
            url_file.write(base_url)

        print(f"\n‚úÖ URL saved to api_url.txt")
        print(f"üëâ {base_url}\n")
    else:
        print("‚ö†Ô∏è Could not find Cloudflare URL.")

Starting FastAPI server in the background...
Starting Cloudflare Tunnel...
Waiting for URL...

‚úÖ URL saved to api_url.txt
üëâ https://examinations-titled-worker-counting.trycloudflare.com/v1



In [20]:
# Separate cell: Check if Cloudflared is running (run this in a new cell while server is active)
!ps aux | grep cloudflared  # Lists running Cloudflared processes

# If you need to kill it (optional)
#!kill $(pgrep cloudflared)

root       21601  0.1  0.0 1262148 38460 ?       Sl   11:56   0:00 ./cloudflared tunnel --url http://127.0.0.1:8000
root       24226  0.0  0.0   7376  3476 ?        S    12:06   0:00 /bin/bash -c ps aux | grep cloudflared  # Lists running Cloudflared processes
root       24228  0.0  0.0   6484  2304 ?        S    12:06   0:00 grep cloudflared


Abaixo est√° um exemplo de uso da API, pode ser usado de qualquer computador, basta preencher o API_BASE_URL com a URL do servidor da c√©lula acima

In [18]:
# Cell 5: Test your API with the official OpenAI Python package
from openai import OpenAI

# Read the base URL automatically from the file
with open("api_url.txt", "r") as f:
    API_BASE_URL = f.read().strip()

print(f"Connecting to: {API_BASE_URL}\n")

client = OpenAI(
    base_url=API_BASE_URL,
    api_key="sk-no-key-required"
)


# --- 1. GET MODELS ---
print("Fetching models...")
models = client.models.list()
print(f"Available models: {[m.id for m in models.data]}\n")
print("-" * 50)


# --- 2. STREAMING COMPLETION ---
print("Sending chat request (Streaming)...\n")
stream_response = client.chat.completions.create(
    model="unsloth/Qwen3.5-27B-GGUF",
    messages=[
        {"role": "system", "content": "You are a helpful and concise AI assistant."},
        {"role": "user", "content": "Explique o que √© um llamacpp server e o que √© um Cloudflared tunnel"}
    ],
    stream=True # <--- Set to True
)

# Print the streaming response as it arrives
for chunk in stream_response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print("\n\n" + "-" * 50)


# --- 3. NON-STREAMING COMPLETION ---
print("Sending chat request (Non-Streaming)...\n")
standard_response = client.chat.completions.create(
    model="unsloth/Qwen3.5-27B-GGUF",
    messages=[
        {"role": "system", "content": "You are a helpful and concise AI assistant."},
        {"role": "user", "content": "O que √© aux√≠lio-doen√ßa no direito brasileiro ? N√£o use markdown na resposta"}
    ],
    stream=False # <--- Set to False
)

# Print the final complete message
print(standard_response.choices[0].message.content)
print("\n" + "-" * 50)

Connecting to: https://trek-ken-doing-attended.trycloudflare.com/v1

Fetching models...


NotFoundError: Error code: 404 - {'detail': 'Not Found'}

Abaixo, uma api ass√≠ncrona, que organiza a fila de requisi√ß√µes (polling and queue)

In [None]:
# Cell 4: Background FastAPI (Queue System) + Cloudflare Tunnel
import os
import time
import re

# 1. C√≥digo do novo FastAPI com Filas (Queue)
fastapi_code = """
import uvicorn
import asyncio
import uuid
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
import httpx
from typing import Dict, Any

app = FastAPI(title="Queued FastAPI Wrapper for llama.cpp")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

LLAMA_SERVER_URL = "http://127.0.0.1:8081"

# "Banco de dados" em mem√≥ria para salvar as requisi√ß√µes e respostas
tasks_db: Dict[str, Dict[str, Any]] = {}

# Fila ass√≠ncrona
request_queue = asyncio.Queue()

# Worker que processar√° a fila em background
async def process_queue():
    async with httpx.AsyncClient(timeout=600.0) as client:
        while True:
            # Pega o pr√≥ximo item da fila (espera se estiver vazia)
            task_id, payload = await request_queue.get()

            # Atualiza status
            tasks_db[task_id]["status"] = "processing"

            try:
                # For√ßa stream=False pois estamos salvando o resultado final
                payload["stream"] = False

                response = await client.post(
                    f"{LLAMA_SERVER_URL}/v1/chat/completions",
                    json=payload
                )
                response.raise_for_status()

                # Salva o resultado
                tasks_db[task_id]["status"] = "finished"
                tasks_db[task_id]["result"] = response.json()

            except Exception as e:
                tasks_db[task_id]["status"] = "failed"
                tasks_db[task_id]["error"] = str(e)
            finally:
                request_queue.task_done()

@app.on_event("startup")
async def startup_event():
    # Inicia o worker em background quando o servidor iniciar
    asyncio.create_task(process_queue())

@app.get("/v1/models")
async def get_models():
    async with httpx.AsyncClient() as client:
        response = await client.get(f"{LLAMA_SERVER_URL}/v1/models")
        return response.json()

# Endpoint para CRIAR a requisi√ß√£o
@app.post("/v1/chat/completions")
async def queue_chat_completion(request: Request):
    payload = await request.json()

    # Gera um ID √∫nico para esta requisi√ß√£o
    task_id = str(uuid.uuid4())

    # Salva no "banco de dados" com status inicial
    tasks_db[task_id] = {
        "id": task_id,
        "status": "queued",
        "result": None,
        "error": None
    }

    # Adiciona na fila
    await request_queue.put((task_id, payload))

    # Retorna imediatamente para o usu√°rio
    return JSONResponse(content={"id": task_id, "status": "queued"}, status_code=202)

# Novo endpoint para CONSULTAR o status da requisi√ß√£o
@app.get("/v1/tasks/{task_id}")
async def get_task_status(task_id: str):
    if task_id not in tasks_db:
        raise HTTPException(status_code=404, detail="Task not found")

    return tasks_db[task_id]
"""

with open("fastapi_server.py", "w") as f:
    f.write(fastapi_code)

# 2. Kill existing processes
os.system("pkill -f uvicorn")
os.system("pkill -f cloudflared")
time.sleep(1)

# 3. Download Cloudflare se necess√°rio
if not os.path.exists("cloudflared"):
    os.system("wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared")
    os.system("chmod +x cloudflared")

# 4. Start FastAPI
print("Starting Queued FastAPI server in the background...")
os.system("nohup python -m uvicorn fastapi_server:app --host 0.0.0.0 --port 8000 > fastapi.log 2>&1 &")

# 5. Start Cloudflare Tunnel
print("Starting Cloudflare Tunnel...")
os.system("nohup ./cloudflared tunnel --url http://127.0.0.1:8000 > cloudflare.log 2>&1 &")

print("Waiting for URL...")
time.sleep(8)

# 6. Read URL
with open("cloudflare.log", "r") as f:
    logs = f.read()
    match = re.search(r"(https://[a-zA-Z0-9-]+\.trycloudflare\.com)", logs)

    if match:
        public_url = match.group(1)
        base_url = f"{public_url}/v1"

        with open("api_url.txt", "w") as url_file:
            url_file.write(base_url)

        print(f"\n‚úÖ URL saved to api_url.txt")
        print(f"üëâ {base_url}\n")
    else:
        print("‚ö†Ô∏è Could not find Cloudflare URL.")

Starting Queued FastAPI server in the background...
Starting Cloudflare Tunnel...
Waiting for URL...

‚úÖ URL saved to api_url.txt
üëâ https://weblog-actors-webshots-sig.trycloudflare.com/v1



Teste com apenas uma tarefa

In [None]:
# Cell 5: Test the Async Queue API
import requests
import time

# L√™ a URL
with open("api_url.txt", "r") as f:
    API_BASE_URL = f.read().strip()

print(f"Connecting to: {API_BASE_URL}\n")

# 1. Enviar a requisi√ß√£o para a fila
print("1. Enviando requisi√ß√£o para a fila...")
payload = {
    "model": "unsloth/Qwen3.5-27B-GGUF",
    "messages": [
        {"role": "system", "content": "Voc√™ √© um assistente prestativo."},
        {"role": "user", "content": "Me conte uma hist√≥ria curta sobre um rob√¥ que aprendeu a programar em Python."}
    ],
    "temperature": 0.7
}

# Usamos requests normal em vez da biblioteca OpenAI
response = requests.post(f"{API_BASE_URL}/chat/completions", json=payload)
data = response.json()

print("Resposta imediata do servidor:")
print(data)

task_id = data.get("id")

print("\n" + "-"*50 + "\n")

if task_id:
    # 2. Consultar o status da requisi√ß√£o (Polling)
    print(f"2. Consultando o status da Tarefa ID: {task_id}")

    while True:
        status_response = requests.get(f"{API_BASE_URL}/tasks/{task_id}")
        task_data = status_response.json()

        status = task_data.get("status")
        print(f"Status atual: {status}")

        if status == "finished":
            print("\n‚úÖ Tarefa conclu√≠da! Aqui est√° a resposta final:\n")
            # Extraindo a resposta do formato OpenAI salvo no banco de dados
            mensagem_final = task_data["result"]["choices"][0]["message"]["content"]
            print(mensagem_final)
            break

        elif status == "failed":
            print(f"\n‚ùå Falha na tarefa: {task_data.get('error')}")
            break

        # Espera 15 segundos antes de perguntar novamente
        time.sleep(15)
else:
    print("Falha ao obter o ID da tarefa.")

Connecting to: https://weblog-actors-webshots-sig.trycloudflare.com/v1

1. Enviando requisi√ß√£o para a fila...
Resposta imediata do servidor:
{'id': '43b4bbaf-d4e1-4efa-9b6b-97fdce9cd99b', 'status': 'queued'}

--------------------------------------------------

2. Consultando o status da Tarefa ID: 43b4bbaf-d4e1-4efa-9b6b-97fdce9cd99b
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: processing
Status atual: finished

‚úÖ Tarefa conclu√≠da! Aqui est√° a resposta final:

Era uma vez um rob√¥ chamado **Pyro**, fabricado em uma oficina antiga para realizar apenas tarefas repetitivas: organizar parafusos e limpar o ch√£o. Pyro funcionava com um c√≥digo bin√°rio r√≠gido, sem capacidade de adapta√ß√£o ou criatividade.

Um dia, enquanto limrava a mesa de um jovem estudante de progr

Aqui um teste com v√°rias tarefas simult√¢neas

In [None]:
# Cell 5: Test the Async Queue API with Multiple Tasks
import requests
import time

# L√™ a URL
with open("api_url.txt", "r") as f:
    API_BASE_URL = f.read().strip()

print(f"Connecting to: {API_BASE_URL}\n")

# Nossos dois prompts
prompts = [
    "Explique o que √© queueing and polling no contexto de APIs. Seja conciso.",
    "Explique o conceito de trabalhos ass√≠ncronos em APIs. Seja conciso."
]

task_ids = []

# 1. Enviar ambas as requisi√ß√µes para a fila
print("1. ENVIANDO TAREFAS PARA A FILA...\n")
for i, prompt in enumerate(prompts, 1):
    payload = {
        "model": "unsloth/Qwen3.5-27B-GGUF",
        "messages": [
            {"role": "system", "content": "Voc√™ √© um especialista em engenharia de software e APIs. Responda em portugu√™s."},
            {"role": "user", "content": prompt}
        ],
        "temperature": 0.7
    }

    response = requests.post(f"{API_BASE_URL}/chat/completions", json=payload)
    data = response.json()

    task_id = data.get("id")
    if task_id:
        print(f"‚úÖ Tarefa {i} enviada! ID recebido: {task_id}")
        task_ids.append(task_id)
    else:
        print(f"‚ùå Erro ao enviar Tarefa {i}: {data}")

print("\n" + "="*50 + "\n")

# 2. Consultar o status das requisi√ß√µes (Polling M√∫ltiplo)
print("2. INICIANDO O POLLING (CONSULTA DE STATUS)...\n")

# Criamos uma lista de tarefas pendentes
pending_tasks = task_ids.copy()
resultados = {}

# O loop continua enquanto houver tarefas pendentes na lista
while pending_tasks:
    # Usamos .copy() para iterar com seguran√ßa enquanto removemos itens da lista original
    for task_id in pending_tasks.copy():
        status_response = requests.get(f"{API_BASE_URL}/tasks/{task_id}")
        task_data = status_response.json()

        status = task_data.get("status")
        hora_atual = time.strftime('%H:%M:%S')

        # Imprime o ID encurtado para facilitar a leitura no console
        short_id = task_id[:8]
        print(f"[{hora_atual}] Tarefa {short_id}... | Status atual: {status}")

        if status == "finished":
            print(f"\nüéâ Tarefa {short_id} conclu√≠da com sucesso!\n")
            # Salva o resultado final no dicion√°rio
            resultados[task_id] = task_data["result"]["choices"][0]["message"]["content"]
            # Remove da lista de pendentes para n√£o consultar mais
            pending_tasks.remove(task_id)

        elif status == "failed":
            print(f"\n‚ùå Tarefa {short_id} falhou: {task_data.get('error')}\n")
            resultados[task_id] = "ERRO NA GERA√á√ÉO"
            pending_tasks.remove(task_id)

    if pending_tasks:
        print("-" * 30)
        print("Aguardando 5 segundos antes da pr√≥xima consulta...\n")
        time.sleep(5)

# 3. Exibir os resultados finais
print("\n" + "="*50)
print("üèÜ TODAS AS TAREFAS FORAM FINALIZADAS!")
print("="*50 + "\n")

for i, task_id in enumerate(task_ids, 1):
    print(f"--- RESULTADO DA TAREFA {i} ---")
    print(f"PROMPT: {prompts[i-1]}")
    print(f"RESPOSTA:\n{resultados.get(task_id)}\n")
    print("-" * 50 + "\n")

Connecting to: https://weblog-actors-webshots-sig.trycloudflare.com/v1

1. ENVIANDO TAREFAS PARA A FILA...

‚úÖ Tarefa 1 enviada! ID recebido: 90b03f1c-b31b-4a1b-8cf3-6fe0d8e0773a
‚úÖ Tarefa 2 enviada! ID recebido: ca47413d-9f3e-4635-8ac1-c91f6e62c1d4


2. INICIANDO O POLLING (CONSULTA DE STATUS)...

[20:28:14] Tarefa 90b03f1c... | Status atual: processing
[20:28:14] Tarefa ca47413d... | Status atual: queued
------------------------------
Aguardando 5 segundos antes da pr√≥xima consulta...

[20:28:19] Tarefa 90b03f1c... | Status atual: processing
[20:28:19] Tarefa ca47413d... | Status atual: queued
------------------------------
Aguardando 5 segundos antes da pr√≥xima consulta...

[20:28:24] Tarefa 90b03f1c... | Status atual: processing
[20:28:25] Tarefa ca47413d... | Status atual: queued
------------------------------
Aguardando 5 segundos antes da pr√≥xima consulta...

[20:28:30] Tarefa 90b03f1c... | Status atual: processing
[20:28:30] Tarefa ca47413d... | Status atual: queued
------