# Study Query LLM - Google Colab Setup

This notebook sets up and runs the Study Query LLM application in Google Colab.

## Features
- Run LLM inferences across multiple providers (Azure OpenAI, OpenAI, Hyperbolic)
- Automatic logging to SQLite database
- Analytics dashboard with provider comparison
- No local installation required!

## Setup Instructions

1. **Get the source code** - Clone from GitHub or upload the project folder (Step 1)
2. **Set your API keys** - Use Colab Secrets (recommended) or set in code (Step 3)
3. **Run all cells** - Install dependencies and start the app
4. **Use the app** - It will open in a new tab or show a URL

**Important:** Make sure to update the GitHub URL in Step 1 with your repository URL, or upload the project folder to Colab.

Note: The app will run in this Colab session. When you close the notebook, the session ends.


## Step 1: Get the Source Code

**Choose one method below to get the source code:**


In [None]:
# OPTION 1: Clone from GitHub (Recommended)
# This uses the official repository. If you have your own fork, update the URL below.

import os
from pathlib import Path

# Clone the repository
repo_url = "https://github.com/spencermcbridemoore/study-query-llm.git"
repo_name = "study-query-llm"

print(f"Cloning repository: {repo_url}")

# Use git clone (magic command works in Colab)
get_ipython().system('git clone ' + repo_url)

# Verify clone was successful
project_path = Path("/content") / repo_name
if project_path.exists():
    print("✅ Repository cloned successfully!")
    # Change to the project directory
    os.chdir(project_path)
    print(f"✅ Changed to directory: {project_path}")
else:
    print(f"❌ Directory not found: {project_path}")
    print("Available directories:")
    get_ipython().system('ls -la /content/')
    print("\nIf you don't have a GitHub repo, use OPTION 2 below:")
    print("1. Upload the study-query-llm folder to Colab")
    print("2. Uncomment OPTION 2 code below")
    raise FileNotFoundError(f"Project directory not found: {project_path}")

# Verify setup.py exists
if not Path("setup.py").exists():
    print("❌ setup.py not found!")
    print("Current directory contents:")
    get_ipython().system('ls -la')
    raise FileNotFoundError("setup.py not found. Make sure you cloned the correct repository.")

# Install the package and dependencies
print("\nInstalling package...")
%pip install -q -e .

print("\nInstalling dependencies...")
%pip install -q panel python-dotenv openai tenacity sqlalchemy pandas

print("\n✅ Source code and dependencies installed!")

# OPTION 2: If you uploaded the project folder to Colab, uncomment below:
# project_path = Path("/content/study-query-llm")  # Adjust path if different
# if project_path.exists():
#     %cd {project_path}
#     %pip install -q -e .
#     %pip install -q panel python-dotenv openai tenacity sqlalchemy pandas
#     print("✅ Installed from uploaded files!")
# else:
#     print(f"❌ Directory not found: {project_path}")
#     print("Please upload the study-query-llm folder to Colab first.")


## Step 2: Verify Installation

Check that the package is installed correctly.


In [None]:
# Verify the package can be imported
import sys
from pathlib import Path

print("Checking installation...")
print(f"Current directory: {Path.cwd()}")
print(f"Python path includes: {[p for p in sys.path if 'study-query' in p or 'content' in p]}")

project_root = Path.cwd()
src_path = project_root / 'src'

# Adding src/ to PYTHONPATH helps when pip install -e . hasn't yet cached the package
if src_path.exists() and str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))
    print(f"[INFO] Added {src_path} to Python path")

try:
    import study_query_llm
    from study_query_llm.config import config
    print("\n[OK] Main package imported successfully!")
    print(f"   Package location: {study_query_llm.__file__}")
    print(f"   Package version: {getattr(study_query_llm, '__version__', 'unknown')}")
    
    # Test that submodules can be imported
    print("\nTesting submodule imports...")
    try:
        from study_query_llm.db.connection import DatabaseConnection
        from study_query_llm.db.models import InferenceRun
        from study_query_llm.providers import BaseLLMProvider
        from study_query_llm.services.inference_service import InferenceService
        print("[OK] All submodules imported successfully!")
    except ImportError as submod_error:
        print(f"[WARN] Some submodules failed to import: {submod_error}")
        print("   This might cause issues later. The package may need to be reinstalled.")
        print("   Try running: !pip install -e . --force-reinstall")
        
except ImportError as e:
    print(f"\n[ERROR] Error importing package: {e}")
    print("\nDiagnostics:")
    print(f"   Current directory: {Path.cwd()}")
    print(f"   Directory exists: {Path.cwd().exists()}")
    print(f"   setup.py exists: {Path('setup.py').exists()}")
    print(f"   src/study_query_llm exists: {Path('src/study_query_llm').exists()}")
    
    if src_path.exists():
        print("\n   Source code found! Trying to add to path manually...")
        if str(src_path) not in sys.path:
            sys.path.insert(0, str(src_path))
            print(f"   Added {src_path} to Python path")
        try:
            import study_query_llm
            print("   [OK] Package now imports after manual path addition!")
        except ImportError as e2:
            print(f"   [ERROR] Still can't import: {e2}")
    
    print("\nTroubleshooting:")
    print("1. Make sure you updated the GitHub URL in Step 1")
    print("2. Verify the repository was cloned successfully")
    print("3. Check that you're in the study-query-llm directory")
    print("4. Try running: !pip install -e .")
    print("5. Check if src/study_query_llm directory exists")


## Step 3: Configure API Keys

**Recommended:** Use Colab Secrets (left sidebar → 🔑 Secrets)

1. Click the 🔑 icon in the left sidebar
2. Click "+ Add secret"
3. Add secrets with these exact names:
   - `AZURE_OPENAI_API_KEY`
   - `AZURE_OPENAI_ENDPOINT`
   - `AZURE_OPENAI_DEPLOYMENT`
   - `AZURE_OPENAI_API_VERSION`
   - (Optional) `OPENAI_API_KEY`, `OPENAI_MODEL`
   - (Optional) `HYPERBOLIC_API_KEY`, `HYPERBOLIC_ENDPOINT`

**Alternative:** Set API keys directly in the code cell below (less secure)


In [None]:
import os

# Try to load from Colab Secrets (recommended method)
try:
    from google.colab import userdata
    
    # Load secrets from Colab Secrets (left sidebar)
    # userdata.get() only takes one argument (the key) and raises SecretNotFoundError if missing
    secrets_loaded = False
    try:
        os.environ["AZURE_OPENAI_API_KEY"] = userdata.get('AZURE_OPENAI_API_KEY')
        os.environ["AZURE_OPENAI_ENDPOINT"] = userdata.get('AZURE_OPENAI_ENDPOINT')
        
        # Optional secrets with defaults if not found
        try:
            os.environ["AZURE_OPENAI_DEPLOYMENT"] = userdata.get('AZURE_OPENAI_DEPLOYMENT')
        except:
            os.environ["AZURE_OPENAI_DEPLOYMENT"] = 'gpt-4o'  # Default
        
        try:
            os.environ["AZURE_OPENAI_API_VERSION"] = userdata.get('AZURE_OPENAI_API_VERSION')
        except:
            os.environ["AZURE_OPENAI_API_VERSION"] = '2024-02-15-preview'  # Default
        
        # Optional providers
        try:
            os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
        except:
            pass  # OpenAI not configured
        
        try:
            os.environ["OPENAI_MODEL"] = userdata.get('OPENAI_MODEL')
        except:
            os.environ["OPENAI_MODEL"] = 'gpt-4'  # Default if OpenAI is configured elsewhere
        
        try:
            os.environ["HYPERBOLIC_API_KEY"] = userdata.get('HYPERBOLIC_API_KEY')
        except:
            pass  # Hyperbolic not configured
        
        try:
            os.environ["HYPERBOLIC_ENDPOINT"] = userdata.get('HYPERBOLIC_ENDPOINT')
        except:
            os.environ["HYPERBOLIC_ENDPOINT"] = 'https://api.hyperbolic.xyz'  # Default if Hyperbolic is configured elsewhere
        
        secrets_loaded = True
        print("✅ Configuration loaded from Colab Secrets!")
        
    except Exception as e:
        print(f"⚠️  Could not load required secrets from Colab Secrets: {e}")
        print("   Falling back to environment variables...")
        secrets_loaded = False
        
except ImportError:
    # Not running in Colab, use environment variables
    secrets_loaded = False
    print("ℹ️  Not running in Colab - using environment variables")

# Fallback: Set API keys directly (if not using Colab Secrets)
if not secrets_loaded:
    # Azure OpenAI
    os.environ.setdefault("AZURE_OPENAI_API_KEY", "your-azure-api-key-here")
    os.environ.setdefault("AZURE_OPENAI_ENDPOINT", "https://your-resource.openai.azure.com/")
    os.environ.setdefault("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")
    os.environ.setdefault("AZURE_OPENAI_API_VERSION", "2024-02-15-preview")
    
    # OpenAI (optional)
    # os.environ.setdefault("OPENAI_API_KEY", "your-openai-api-key-here")
    # os.environ.setdefault("OPENAI_MODEL", "gpt-4")
    
    # Hyperbolic (optional)
    # os.environ.setdefault("HYPERBOLIC_API_KEY", "your-hyperbolic-api-key-here")
    # os.environ.setdefault("HYPERBOLIC_ENDPOINT", "https://api.hyperbolic.xyz")
    
    if os.environ["AZURE_OPENAI_API_KEY"] == "your-azure-api-key-here":
        print("⚠️  Using default placeholder values!")
        print("   Please set your API keys in Colab Secrets or update the code above.")

# Database (SQLite - will be created automatically)
os.environ["DATABASE_URL"] = "sqlite:///study_query_llm.db"

print("\n✅ Configuration complete!")


## Step 4: Initialize Database


In [None]:
# Initialize the database
import sys
from pathlib import Path

# First, verify the package can be imported
try:
    import study_query_llm
    print(f"✅ Main package found at: {study_query_llm.__file__}")
except ImportError as e:
    print(f"❌ Cannot import study_query_llm: {e}")
    print("\nTrying to fix installation...")
    
    # Try to reinstall
    get_ipython().system('pip install -e . --force-reinstall --no-deps')
    get_ipython().system('pip install panel python-dotenv openai tenacity sqlalchemy pandas')
    
    # Try importing again
    try:
        import study_query_llm
        print("✅ Package imported after reinstall!")
    except ImportError as e2:
        print(f"❌ Still cannot import: {e2}")
        print("\nDiagnostics:")
        print(f"   Current directory: {Path.cwd()}")
        print(f"   Python path: {sys.path[:5]}")  # First 5 entries
        if Path('src/study_query_llm').exists():
            print("   src/study_query_llm exists - trying manual path addition...")
            src_path = Path.cwd() / 'src'
            if str(src_path) not in sys.path:
                sys.path.insert(0, str(src_path))
                print(f"   Added {src_path} to path")
        raise

def _verify_package_structure():
    pkg_path = Path(study_query_llm.__file__).parent
    db_path = pkg_path / "db"
    connection_file = db_path / "connection.py"
    print(f"   Package path: {pkg_path}")
    print(f"   DB path exists: {db_path.exists()}")
    if connection_file.exists():
        print(f"   ✅ Database package found at: {connection_file}")
        return connection_file

    print("   ❌ study_query_llm.db.connection is missing.")
    print("   Fix steps:")
    print("     1. Run `!rm -rf /content/study-query-llm` to clear cached files.")
    print("     2. Re-run Step 1 (git clone or upload the project) so the latest repo is available.")
    print("     3. Then re-run installation and this cell.")
    raise FileNotFoundError("study_query_llm.db package is missing from the workspace.")


def _load_database_dependencies():
    try:
        from study_query_llm.config import config
        from study_query_llm.db.connection import DatabaseConnection
        print("✅ Database modules imported successfully!")
        return config, DatabaseConnection
    except ImportError as db_error:
        print(f"❌ Cannot import database modules: {db_error}")
        _verify_package_structure()
        print("   Tip: ensure you pushed your latest changes before cloning in Colab.")
        raise


config, DatabaseConnection = _load_database_dependencies()

# Initialize the database
try:
    db = DatabaseConnection(config.database.connection_string)
    db.init_db()
    print("✅ Database initialized!")
except Exception as e:
    print(f"❌ Database initialization failed: {e}")
    raise


## Step 5: Start the Application


In [None]:
# Import and create the app
from panel_app.app import serve_app
from IPython.display import Markdown, display
from urllib.parse import urlparse

PORT = 5006
ADDRESS = "0.0.0.0"
colab_proxy_url = None
proxy_origin = None

# Try to create a Colab proxy so the browser can reach the Panel server
try:
    from google.colab import output  # type: ignore

    colab_proxy_url = output.eval_js(f"google.colab.kernel.proxyPort({PORT})")
    if colab_proxy_url:
        parsed = urlparse(colab_proxy_url)
        proxy_origin = parsed.netloc or None
except Exception as proxy_error:
    print(f"ℹ️  Could not configure Colab proxy automatically: {proxy_error}")

serve_kwargs = dict(
    address=ADDRESS,
    port=PORT,
    route=None,
    open_browser=False,
)

# Allow the Colab proxy host to open the WebSocket connection
if proxy_origin:
    serve_kwargs["allow_websocket_origin"] = [proxy_origin]

# Stop any existing server
if 'dashboard_server' in globals():
    try:
        dashboard_server.stop()
    except Exception:
        pass

# Start the server (Colab will create a public URL via the proxy)
dashboard_server, dashboard_url = serve_app(**serve_kwargs)
public_url = colab_proxy_url or dashboard_url

if colab_proxy_url:
    message = (
        "## ✅ Application Started in Colab!\n\n"
        f"**[Open the dashboard using the Colab proxy link]({public_url})**\n\n"
        "Or copy this URL (works while the Colab session is running):\n"
        f"`{public_url}`\n\n"
        "> ℹ️ Use this proxy link instead of `http://127.0.0.1:5006`."
    )
else:
    message = (
        "## ✅ Application Started!\n\n"
        f"**[Open the dashboard]({public_url})**\n\n"
        f"Or copy this URL: `{public_url}`"
    )

display(Markdown(message))


## Alternative: Display in Notebook

If the above doesn't work, try displaying the app directly in the notebook:


In [None]:
# Alternative: Display app in notebook cell
# Uncomment the line below to display the app inline
# app


## Troubleshooting

### If the app doesn't start:
1. Check that all API keys are set correctly
2. Verify your Azure deployment name matches what's in Azure Portal
3. Make sure all cells above have run successfully
4. Ensure the project source code is accessible (uploaded or cloned)

### To stop the app:
- Interrupt the kernel (Runtime → Interrupt execution)
- Or restart the runtime (Runtime → Restart runtime)

### Database location:
- The SQLite database is created in the Colab session
- It will be deleted when the session ends
- To persist data, download the database file or use a cloud database

## Next Steps

1. Go to the **Inference** tab in the app
2. Select your provider (Azure, OpenAI, etc.)
3. For Azure: Click "Load Deployments" and select a deployment
4. Enter a prompt and run inference
5. Check the **Analytics** tab to see your results!
