# üìö RAG Wissensdatenbank - Google Colab

Dieses Notebook installiert und startet die RAG Wissensdatenbank mit √∂ffentlichem Zugang √ºber Cloudflare Tunnel.

## Voraussetzungen
- **OpenAI API Key** als Colab Secret (Name: `OPENAI_API_KEY`)

### Secret einrichten
1. Links auf das üîë Symbol klicken
2. "Neues Secret hinzuf√ºgen"
3. Name: `OPENAI_API_KEY`, Wert: Ihr API Key
4. "Notebook-Zugriff" aktivieren

## 1Ô∏è‚É£ Repository klonen

In [None]:
!git clone https://github.com/janschachtschabel/simple-document-rag.git
%cd simple-document-rag

## 2Ô∏è‚É£ Abh√§ngigkeiten installieren

In [None]:
!pip install -q -r requirements.txt
!pip install -q cloudflared

## 3Ô∏è‚É£ API Key laden und Modelle konfigurieren

In [None]:
import os
from google.colab import userdata

# API Key aus Colab Secrets laden
try:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    print("‚úÖ OpenAI API Key geladen")
except Exception as e:
    print("‚ùå Secret 'OPENAI_API_KEY' nicht gefunden!")
    print("   üîë Symbol links ‚Üí Neues Secret ‚Üí OPENAI_API_KEY")
    raise e

# Modelle (hier anpassen falls gew√ºnscht)
OPENAI_MODEL = "gpt-4.1-mini"
EMBEDDING_MODEL = "text-embedding-ada-002"

os.environ["OPENAI_MODEL"] = OPENAI_MODEL
os.environ["EMBEDDING_MODEL"] = EMBEDDING_MODEL
os.environ["CHROMA_PERSIST_DIRECTORY"] = "./chroma_db"
os.environ["CHUNK_SIZE"] = "1000"
os.environ["CHUNK_OVERLAP"] = "200"
os.environ["TOP_K_RETRIEVAL"] = "5"

print(f"‚úÖ LLM: {OPENAI_MODEL}")
print(f"‚úÖ Embedding: {EMBEDDING_MODEL}")

## 4Ô∏è‚É£ FastAPI Server starten

In [None]:
import subprocess
import time
import requests

api_process = subprocess.Popen(
    ["python", "main.py"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

print("‚è≥ Starte API...")
time.sleep(15)

try:
    r = requests.get("http://localhost:8000/health", timeout=5)
    print("‚úÖ API l√§uft auf http://localhost:8000")
except:
    print("‚ùå API nicht erreichbar")

## 5Ô∏è‚É£ Streamlit + Cloudflare Tunnel starten

Nach Ausf√ºhrung erscheint eine **√∂ffentliche URL**.

In [None]:
import subprocess
import threading
import re
import time

streamlit_process = subprocess.Popen(
    ["streamlit", "run", "app.py", "--server.port", "8501", "--server.headless", "true"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)
print("‚è≥ Starte Streamlit...")
time.sleep(5)

tunnel_process = subprocess.Popen(
    ["cloudflared", "tunnel", "--url", "http://localhost:8501"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

def show_url():
    for line in tunnel_process.stderr:
        match = re.search(r'https://[\w-]+\.trycloudflare\.com', line)
        if match:
            print("\n" + "="*50)
            print("üéâ APP IST ONLINE!")
            print("="*50)
            print(f"üîó {match.group(0)}")
            print("="*50)
            return

threading.Thread(target=show_url).start()
print("‚è≥ Warte auf Tunnel-URL...")

## 6Ô∏è‚É£ Status pr√ºfen

In [None]:
import requests

print("üìä Status")
print("-" * 30)

try:
    r = requests.get("http://localhost:8000/health", timeout=5)
    print(f"‚úÖ API: OK ({r.json().get('statistics', {}).get('total_documents', 0)} Dokumente)")
except:
    print("‚ùå API: Offline")

try:
    r = requests.get("http://localhost:8501", timeout=5)
    print("‚úÖ Streamlit: OK")
except:
    print("‚ùå Streamlit: Offline")

if 'tunnel_process' in dir() and tunnel_process.poll() is None:
    print("‚úÖ Tunnel: Aktiv")
else:
    print("‚ùå Tunnel: Inaktiv")

## üõë Prozesse beenden

In [None]:
try: api_process.terminate(); print("‚úÖ API beendet")
except: pass
try: streamlit_process.terminate(); print("‚úÖ Streamlit beendet")
except: pass
try: tunnel_process.terminate(); print("‚úÖ Tunnel beendet")
except: pass

---
## üìù Hinweise

- **Laufzeit**: Bis zu 12 Stunden (kostenlos)
- **Dokumente**: Gehen nach Sitzungsende verloren
- **Tunnel-URL**: √Ñndert sich bei jedem Neustart
- **Confluence**: In der App unter üî∑ Confluence konfigurieren