# ߚ APEX-ULTRA™ AGI COSMOS: Llama 3 4-bit (any4/AWQ) Colab Deployment

This notebook sets up the full AGI system with a blazing-fast, open-source Llama 3 4-bit model using vLLM. All agents and modules are ready to use the local LLM endpoint.

---

## 1. Clone Your Repo and Mount Google Drive (for persistent checkpoints)

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
!git clone https://github.com/YOUR_USERNAME/APEX-ULTRA.git
%cd /content/APEX-ULTRA


## 2. Download a 4-bit Llama 3 Model (AWQ, any4, or NF4)
Replace the Hugging Face link with your preferred quantized model.


In [ ]:
!pip install huggingface_hub
from huggingface_hub import snapshot_download
model_path = '/content/llama-3-8b-awq'
snapshot_download(repo_id='TheBloke/Llama-3-8B-AWQ', local_dir=model_path, local_dir_use_symlinks=False)


## 3. Install vLLM, pyngrok, and All Dependencies

In [ ]:
!pip install vllm pyngrok python-dotenv fastapi uvicorn aiohttp nest_asyncio pandas numpy requests

## 4. Set Up .env for Llama 3 4-bit vLLM
This cell writes a best-practice .env file for you.


In [ ]:
with open('.env', 'w') as f:
    f.write('''
LLAMA_SERVER_ENABLED=true
LLAMA_MODEL_PATH=/content/llama-3-8b-awq
LLAMA_SERVER_CMD=python -m vllm.entrypoints.openai.api_server --model /content/llama-3-8b-awq --host 0.0.0.0 --port 8000
LLAMA_API_BASE=http://localhost:8000/v1
GPT25PRO_API_KEY=
GPT25PRO_ENDPOINT=http://localhost:8000/v1/completions
COLAB=true
CHECKPOINT_PATH=/content/drive/MyDrive/agi_checkpoint.json
''')

## 5. Start the vLLM Llama 3 4-bit Server (in background)

In [ ]:
import subprocess
llama_proc = subprocess.Popen(
    'python -m vllm.entrypoints.openai.api_server --model /content/llama-3-8b-awq --host 0.0.0.0 --port 8000',
    shell=True
)
import time; time.sleep(10)  # Give server time to start
print('vLLM Llama 3 4-bit server started.')

## 6. Expose the API to the Internet (ngrok)
This gives you a public endpoint for testing.


In [ ]:
from pyngrok import ngrok
public_url = ngrok.connect(8000, 'http')
print('Public vLLM endpoint:', public_url)

## 7. Start the AGI Orchestrator (APEX-ULTRA™)
This will launch the main system. All agents will use the Llama 3 4-bit endpoint.


In [ ]:
!python3 main.py

---
**You now have a production-grade AGI system running with the best open-source 4-bit LLM!**

- All agents use the Llama 3 4-bit endpoint for reasoning.
- Checkpoints are saved to your Google Drive.
- The API is exposed for easy integration and testing.

**For advanced usage, see the repo README and .env.example.**
