# 🚀 AION GPU Worker - Kaggle

Este notebook transforma seu Kaggle em um **GPU Worker gratuito** para o AION!

## ✨ O que este worker faz:
- **Inferência LLM**: Responde queries usando modelos customizados
- **Training LoRA**: Fine-tuna modelos com dados de qualidade
- **Embeddings**: Gera embeddings localmente
- **Zero Custo**: Usa GPUs gratuitas do Kaggle!

## 📊 GPU Disponível:
- Tesla P100 (16GB VRAM) ou T4 (15GB)
- **30 horas/semana GARANTIDAS!**
- Sessão máxima: 9 horas
- Melhor que Colab para jobs longos!

## ⚙️ Configuração:
1. Enable GPU: Settings > Accelerator > GPU P100
2. Substitua `AION_API_URL` pela URL do seu AION
3. Run All!

---

In [None]:
# ============================================================================
# CONFIGURAÇÃO - EDITE AQUI!
# ============================================================================

AION_API_URL = "https://seu-repl.replit.app"  # ⚠️ SUBSTITUA pela sua URL!
WORKER_NAME = "Kaggle-GPU-1"  # Nome único para este worker
GOOGLE_ACCOUNT_EMAIL = "sua-conta@gmail.com"  # Sua conta (tracking)

# ============================================================================

In [None]:
# 📦 Instalar dependências
!pip install -q transformers accelerate bitsandbytes peft torch requests GPUtil

print("✅ Dependências instaladas!")

In [None]:
# 🔍 Detectar GPU
import torch
import subprocess

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    
    print(f"✅ GPU Detectada: {gpu_name}")
    print(f"   VRAM Total: {gpu_mem:.1f} GB")
    
    GPU_MODEL = gpu_name
    VRAM_GB = int(gpu_mem)
else:
    print("❌ ERRO: GPU não detectada!")
    print("⚠️  Vá em: Settings > Accelerator > GPU")
    raise RuntimeError("GPU not available")

In [None]:
# 🤖 Carregar modelo LLM
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"  # Modelo leve para começar

print(f"📥 Carregando modelo: {MODEL_NAME}...")

# Configuração 4-bit para economizar VRAM
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

print("✅ Modelo carregado!")

In [None]:
# 🔗 Registrar worker no AION
import requests
import json

def register_worker():
    url = f"{AION_API_URL}/api/gpu/workers/register"
    
    payload = {
        "tenantId": 1,
        "name": WORKER_NAME,
        "workerType": "kaggle",
        "gpuModel": GPU_MODEL,
        "vramGB": VRAM_GB,
        "capabilities": {
            "inference": True,
            "training": True,
            "embeddings": True,
            "maxBatchSize": 8,
            "supportedModels": [MODEL_NAME]
        },
        "metadata": {
            "accountEmail": GOOGLE_ACCOUNT_EMAIL,
            "region": "kaggle-us",
            "quotaHoursPerWeek": 30,  # Kaggle: 30h/semana garantidas!
            "usedHoursThisWeek": 0
        }
    }
    
    response = requests.post(url, json=payload)
    
    if response.status_code == 200:
        data = response.json()
        worker_id = data["worker"]["id"]
        api_key = data["worker"]["apiKey"]
        
        print(f"✅ Worker registrado!")
        print(f"   Worker ID: {worker_id}")
        print(f"   API Key: {api_key[:20]}...")
        
        return api_key
    else:
        print(f"❌ Erro: {response.status_code}")
        print(response.text)
        raise RuntimeError("Registration failed")

WORKER_API_KEY = register_worker()

In [None]:
# 💓 Heartbeat
import time
from threading import Thread

def send_heartbeat():
    while True:
        try:
            url = f"{AION_API_URL}/api/gpu/workers/heartbeat"
            payload = {"apiKey": WORKER_API_KEY, "status": "online"}
            requests.post(url, json=payload, timeout=10)
        except Exception as e:
            print(f"⚠️  Heartbeat falhou: {e}")
        time.sleep(30)

heartbeat_thread = Thread(target=send_heartbeat, daemon=True)
heartbeat_thread.start()
print("✅ Heartbeat ativo!")

In [None]:
# 🎯 Worker Loop
def generate_text(prompt, max_tokens=512, temperature=0.7):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        temperature=temperature,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

def process_job(job):
    job_id = job["id"]
    job_type = job["jobType"]
    payload = job["payload"]
    
    print(f"🎯 Processando job {job_id}...")
    
    try:
        if job_type == "inference":
            text = generate_text(
                payload.get("prompt", ""),
                payload.get("maxTokens", 512),
                payload.get("temperature", 0.7)
            )
            
            result = {
                "text": text,
                "tokensGenerated": len(tokenizer.encode(text))
            }
            
            url = f"{AION_API_URL}/api/gpu/jobs/{job_id}/complete"
            headers = {"Authorization": f"Bearer {WORKER_API_KEY}"}
            requests.post(url, json={"result": result}, headers=headers)
            
            print(f"✅ Job {job_id} concluído!")
    except Exception as e:
        print(f"❌ Erro: {e}")
        url = f"{AION_API_URL}/api/gpu/jobs/{job_id}/fail"
        headers = {"Authorization": f"Bearer {WORKER_API_KEY}"}
        requests.post(url, json={"error": str(e)}, headers=headers)

def worker_loop():
    print("\n🚀 Worker iniciado! Aguardando jobs...\n")
    
    while True:
        try:
            url = f"{AION_API_URL}/api/gpu/jobs/next"
            headers = {"Authorization": f"Bearer {WORKER_API_KEY}"}
            response = requests.get(url, headers=headers, timeout=10)
            
            if response.status_code == 200:
                job = response.json().get("job")
                if job:
                    process_job(job)
                else:
                    time.sleep(5)
        except KeyboardInterrupt:
            print("\n🛑 Worker parado.")
            break
        except Exception as e:
            print(f"⚠️  Erro: {e}")
            time.sleep(10)

worker_loop()

---

## 🎉 Worker Rodando!

### ✅ Status:
- Worker **online** no AION Dashboard
- Heartbeat ativo (30s)
- Aguardando jobs
- **30h/semana garantidas!**

### 📊 Vantagens do Kaggle:
- ✅ Quota fixa e garantida (30h/semana)
- ✅ Sessões mais longas (9h vs 12h Colab)
- ✅ Background mode (pode fechar aba)
- ✅ P100 (16GB VRAM) é melhor para training

### 💡 Dica:
Use **5 contas Kaggle** = **150h/semana de GPU grátis**! 🚀

---