# Study Query LLM - Google Colab Setup

This notebook sets up and runs the Study Query LLM application in Google Colab.

## Features
- Run LLM inferences across multiple providers (Azure OpenAI, OpenAI, Hyperbolic)
- Automatic logging to SQLite database
- Analytics dashboard with provider comparison
- No local installation required!

## Setup Instructions

1. **Get the source code** - Clone from GitHub or upload the project folder (Step 1)
2. **Set your API keys** - Use Colab Secrets (recommended) or set in code (Step 3)
3. **Run all cells** - Install dependencies and start the app
4. **Use the app** - It will open in a new tab or show a URL

**Important:** Make sure to update the GitHub URL in Step 1 with your repository URL, or upload the project folder to Colab.

Note: The app will run in this Colab session. When you close the notebook, the session ends.


## Step 1: Get the Source Code

**Choose one method below to get the source code:**


In [None]:
# OPTION 1: Clone from GitHub (Recommended)
# This uses the official repository. If you have your own fork, update the URL below.

import os
from pathlib import Path

# Clone the repository
repo_url = "https://github.com/spencermcbridemoore/study-query-llm.git"
repo_name = "study-query-llm"

print(f"Cloning repository: {repo_url}")

# Use git clone (magic command works in Colab)
get_ipython().system('git clone ' + repo_url)

# Verify clone was successful
project_path = Path("/content") / repo_name
if project_path.exists():
    print("‚úÖ Repository cloned successfully!")
    # Change to the project directory
    os.chdir(project_path)
    print(f"‚úÖ Changed to directory: {project_path}")
else:
    print(f"‚ùå Directory not found: {project_path}")
    print("Available directories:")
    get_ipython().system('ls -la /content/')
    print("\nIf you don't have a GitHub repo, use OPTION 2 below:")
    print("1. Upload the study-query-llm folder to Colab")
    print("2. Uncomment OPTION 2 code below")
    raise FileNotFoundError(f"Project directory not found: {project_path}")

# Verify setup.py exists
if not Path("setup.py").exists():
    print("‚ùå setup.py not found!")
    print("Current directory contents:")
    get_ipython().system('ls -la')
    raise FileNotFoundError("setup.py not found. Make sure you cloned the correct repository.")

# Install the package and dependencies
print("\nInstalling package...")
%pip install -q -e .

print("\nInstalling dependencies...")
%pip install -q panel python-dotenv openai tenacity sqlalchemy pandas

print("\n‚úÖ Source code and dependencies installed!")

# OPTION 2: If you uploaded the project folder to Colab, uncomment below:
# project_path = Path("/content/study-query-llm")  # Adjust path if different
# if project_path.exists():
#     %cd {project_path}
#     %pip install -q -e .
#     %pip install -q panel python-dotenv openai tenacity sqlalchemy pandas
#     print("‚úÖ Installed from uploaded files!")
# else:
#     print(f"‚ùå Directory not found: {project_path}")
#     print("Please upload the study-query-llm folder to Colab first.")


## Step 2: Verify Installation

Check that the package is installed correctly.


In [None]:
# Verify the package can be imported
import sys
from pathlib import Path

print("Checking installation...")
print(f"Current directory: {Path.cwd()}")
print(f"Python path includes: {[p for p in sys.path if 'study-query' in p or 'content' in p]}")

try:
    import study_query_llm
    from study_query_llm.config import config
    print("\n‚úÖ Main package imported successfully!")
    print(f"   Package location: {study_query_llm.__file__}")
    print(f"   Package version: {getattr(study_query_llm, '__version__', 'unknown')}")
    
    # Test that submodules can be imported
    print("\nTesting submodule imports...")
    try:
        from study_query_llm.db.connection import DatabaseConnection
        from study_query_llm.db.models import InferenceRun
        from study_query_llm.providers import BaseLLMProvider
        from study_query_llm.services.inference_service import InferenceService
        print("‚úÖ All submodules imported successfully!")
    except ImportError as submod_error:
        print(f"‚ö†Ô∏è  Warning: Some submodules failed to import: {submod_error}")
        print("   This might cause issues later. The package may need to be reinstalled.")
        print("   Try running: !pip install -e . --force-reinstall")
        
except ImportError as e:
    print(f"\n‚ùå Error importing package: {e}")
    print("\nDiagnostics:")
    print(f"   Current directory: {Path.cwd()}")
    print(f"   Directory exists: {Path.cwd().exists()}")
    print(f"   setup.py exists: {Path('setup.py').exists()}")
    print(f"   src/study_query_llm exists: {Path('src/study_query_llm').exists()}")
    
    if Path('src/study_query_llm').exists():
        print("\n   Source code found! Trying to add to path manually...")
        src_path = Path.cwd() / 'src'
        if str(src_path) not in sys.path:
            sys.path.insert(0, str(src_path))
            print(f"   Added {src_path} to Python path")
            try:
                import study_query_llm
                print("   ‚úÖ Package now imports after manual path addition!")
            except ImportError as e2:
                print(f"   ‚ùå Still can't import: {e2}")
    
    print("\nTroubleshooting:")
    print("1. Make sure you updated the GitHub URL in Step 1")
    print("2. Verify the repository was cloned successfully")
    print("3. Check that you're in the study-query-llm directory")
    print("4. Try running: !pip install -e .")
    print("5. Check if src/study_query_llm directory exists")


## Step 3: Configure API Keys

**Recommended:** Use Colab Secrets (left sidebar ‚Üí üîë Secrets)

1. Click the üîë icon in the left sidebar
2. Click "+ Add secret"
3. Add secrets with these exact names:
   - `AZURE_OPENAI_API_KEY`
   - `AZURE_OPENAI_ENDPOINT`
   - `AZURE_OPENAI_DEPLOYMENT`
   - `AZURE_OPENAI_API_VERSION`
   - (Optional) `OPENAI_API_KEY`, `OPENAI_MODEL`
   - (Optional) `HYPERBOLIC_API_KEY`, `HYPERBOLIC_ENDPOINT`

**Alternative:** Set API keys directly in the code cell below (less secure)


In [None]:
import os

# Try to load from Colab Secrets (recommended method)
try:
    from google.colab import userdata
    
    # Load secrets from Colab Secrets (left sidebar)
    # userdata.get() only takes one argument (the key) and raises SecretNotFoundError if missing
    secrets_loaded = False
    try:
        os.environ["AZURE_OPENAI_API_KEY"] = userdata.get('AZURE_OPENAI_API_KEY')
        os.environ["AZURE_OPENAI_ENDPOINT"] = userdata.get('AZURE_OPENAI_ENDPOINT')
        
        # Optional secrets with defaults if not found
        try:
            os.environ["AZURE_OPENAI_DEPLOYMENT"] = userdata.get('AZURE_OPENAI_DEPLOYMENT')
        except:
            os.environ["AZURE_OPENAI_DEPLOYMENT"] = 'gpt-4o'  # Default
        
        try:
            os.environ["AZURE_OPENAI_API_VERSION"] = userdata.get('AZURE_OPENAI_API_VERSION')
        except:
            os.environ["AZURE_OPENAI_API_VERSION"] = '2024-02-15-preview'  # Default
        
        # Optional providers
        try:
            os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
        except:
            pass  # OpenAI not configured
        
        try:
            os.environ["OPENAI_MODEL"] = userdata.get('OPENAI_MODEL')
        except:
            os.environ["OPENAI_MODEL"] = 'gpt-4'  # Default if OpenAI is configured elsewhere
        
        try:
            os.environ["HYPERBOLIC_API_KEY"] = userdata.get('HYPERBOLIC_API_KEY')
        except:
            pass  # Hyperbolic not configured
        
        try:
            os.environ["HYPERBOLIC_ENDPOINT"] = userdata.get('HYPERBOLIC_ENDPOINT')
        except:
            os.environ["HYPERBOLIC_ENDPOINT"] = 'https://api.hyperbolic.xyz'  # Default if Hyperbolic is configured elsewhere
        
        secrets_loaded = True
        print("‚úÖ Configuration loaded from Colab Secrets!")
        
    except Exception as e:
        print(f"‚ö†Ô∏è  Could not load required secrets from Colab Secrets: {e}")
        print("   Falling back to environment variables...")
        secrets_loaded = False
        
except ImportError:
    # Not running in Colab, use environment variables
    secrets_loaded = False
    print("‚ÑπÔ∏è  Not running in Colab - using environment variables")

# Fallback: Set API keys directly (if not using Colab Secrets)
if not secrets_loaded:
    # Azure OpenAI
    os.environ.setdefault("AZURE_OPENAI_API_KEY", "your-azure-api-key-here")
    os.environ.setdefault("AZURE_OPENAI_ENDPOINT", "https://your-resource.openai.azure.com/")
    os.environ.setdefault("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")
    os.environ.setdefault("AZURE_OPENAI_API_VERSION", "2024-02-15-preview")
    
    # OpenAI (optional)
    # os.environ.setdefault("OPENAI_API_KEY", "your-openai-api-key-here")
    # os.environ.setdefault("OPENAI_MODEL", "gpt-4")
    
    # Hyperbolic (optional)
    # os.environ.setdefault("HYPERBOLIC_API_KEY", "your-hyperbolic-api-key-here")
    # os.environ.setdefault("HYPERBOLIC_ENDPOINT", "https://api.hyperbolic.xyz")
    
    if os.environ["AZURE_OPENAI_API_KEY"] == "your-azure-api-key-here":
        print("‚ö†Ô∏è  Using default placeholder values!")
        print("   Please set your API keys in Colab Secrets or update the code above.")

# Database (SQLite - will be created automatically)
os.environ["DATABASE_URL"] = "sqlite:///study_query_llm.db"

print("\n‚úÖ Configuration complete!")


## Step 4: Initialize Database


In [None]:
# Initialize the database
import os
import sys
import importlib.util
import types
from importlib import import_module
from pathlib import Path

# First, verify the package can be imported
try:
    import study_query_llm
    print(f"‚úÖ Main package found at: {study_query_llm.__file__}")
except ImportError as e:
    print(f"‚ùå Cannot import study_query_llm: {e}")
    print("\nTrying to fix installation...")
    
    # Try to reinstall
    get_ipython().system('pip install -e . --force-reinstall --no-deps')
    get_ipython().system('pip install panel python-dotenv openai tenacity sqlalchemy pandas')
    
    # Try importing again
    try:
        import study_query_llm
        print("‚úÖ Package imported after reinstall!")
    except ImportError as e2:
        print(f"‚ùå Still cannot import: {e2}")
        print("\nDiagnostics:")
        print(f"   Current directory: {Path.cwd()}")
        print(f"   Python path: {sys.path[:5]}")  # First 5 entries
        if Path('src/study_query_llm').exists():
            print("   src/study_query_llm exists - trying manual path addition...")
            src_path = Path.cwd() / 'src'
            if str(src_path) not in sys.path:
                sys.path.insert(0, str(src_path))
                print(f"   Added {src_path} to path")
        raise

def _collect_candidate_db_files(package_module=None):
    """Build a deduplicated list of possible db/connection.py locations."""
    candidates = []
    seen = set()

    def add_candidate(path: Path) -> None:
        if not path:
            return
        key = str(path)
        if key in seen:
            return
        seen.add(key)
        candidates.append(path)

    # Paths declared by the package itself
    for pkg_path in getattr(package_module, "__path__", []):
        add_candidate(Path(pkg_path) / "db" / "connection.py")

    pkg_file = Path(getattr(package_module, "__file__", "")).resolve()
    if pkg_file.exists():
        add_candidate(pkg_file.parent / "db" / "connection.py")

    # Environment overrides let users point directly at a repo checkout
    env_root = os.environ.get("STUDY_QUERY_LLM_PATH")
    if env_root:
        env_root_path = Path(env_root)
        add_candidate(env_root_path / "src" / "study_query_llm" / "db" / "connection.py")
        add_candidate(env_root_path / "study_query_llm" / "db" / "connection.py")

    # Use current working directory and its parents
    cwd = Path.cwd()
    add_candidate(cwd / "src" / "study_query_llm" / "db" / "connection.py")
    add_candidate(cwd / "study_query_llm" / "db" / "connection.py")
    if cwd.name == "study-query-llm":
        add_candidate(cwd / "src" / "study_query_llm" / "db" / "connection.py")
    if cwd.parent.exists():
        parent = cwd.parent
        add_candidate(parent / "study-query-llm" / "src" / "study_query_llm" / "db" / "connection.py")

    # Scan sys.path entries (handles %pip install -e . and manual path tweaks)
    for entry in sys.path:
        try:
            entry_path = Path(entry)
        except TypeError:
            continue
        if not entry_path.exists():
            continue
        if entry_path.is_file():
            entry_path = entry_path.parent
        add_candidate(entry_path / "study_query_llm" / "db" / "connection.py")
        add_candidate(entry_path / "src" / "study_query_llm" / "db" / "connection.py")

    # Colab-specific common checkout locations
    colab_repo = Path("/content/study-query-llm")
    if colab_repo.exists():
        add_candidate(colab_repo / "src" / "study_query_llm" / "db" / "connection.py")

    return candidates


def _import_database_connection(package_module=None):
    """
    Attempt to load DatabaseConnection from the packaged module first,
    then fall back to known file paths if the package metadata is stale.
    """
    search_paths = [
        "study_query_llm.db.connection",
        "db.connection",
    ]
    last_error = None
    for module_path in search_paths:
        try:
            module = import_module(module_path)
            database_cls = getattr(module, "DatabaseConnection", None)
            if database_cls is None:
                module_file = getattr(module, "__file__", module_path)
                print(f"   ‚ö†Ô∏è {module_path} ({module_file}) loaded but DatabaseConnection not found")
                continue
            module_file = getattr(module, "__file__", module_path)
            print(f"   ‚úÖ Using DatabaseConnection from {module_path} ({module_file})")
            return database_cls
        except Exception as err:
            print(f"   ‚ö†Ô∏è Could not import {module_path}: {err}")
            last_error = err

    # Fall back to loading the module directly from the filesystem
    candidate_files = _collect_candidate_db_files(package_module)
    if not candidate_files:
        print("   ‚ö†Ô∏è No candidate db/connection.py paths discovered")

    for file_path in candidate_files:
        if not file_path.exists():
            try:
                readable_path = file_path.relative_to(Path.cwd())
            except ValueError:
                readable_path = file_path
            print(f"   ‚ö†Ô∏è Candidate not found on disk: {readable_path}")
            continue
        try:
            module_name = f"_study_query_llm_db_connection_{file_path.stem}_{file_path.stat().st_mtime_ns}"
            spec = importlib.util.spec_from_file_location(
                module_name,
                file_path,
            )
            if spec is None or spec.loader is None:
                print(f"   ‚ö†Ô∏è Could not create import spec for {file_path}")
                continue
            module = importlib.util.module_from_spec(spec)
            spec.loader.exec_module(module)
            database_cls = getattr(module, "DatabaseConnection", None)
            if database_cls is None:
                print(f"   ‚ö†Ô∏è {file_path} loaded but DatabaseConnection not found")
                continue

            # Ensure future imports can discover the module
            if package_module is not None:
                package_name = getattr(package_module, "__name__", "study_query_llm")
                db_package_name = f"{package_name}.db"
                connection_module_name = f"{db_package_name}.connection"

                db_package = sys.modules.get(db_package_name)
                if db_package is None:
                    db_package = types.ModuleType(db_package_name)
                    db_package.__path__ = [str(file_path.parent)]
                    sys.modules[db_package_name] = db_package
                sys.modules[connection_module_name] = module
                setattr(db_package, "connection", module)
                setattr(package_module, "db", db_package)

            try:
                readable_path = file_path.relative_to(Path.cwd())
            except ValueError:
                readable_path = file_path
            print(f"   ‚úÖ Using DatabaseConnection from local file {readable_path}")
            return database_cls
        except Exception as err:
            print(f"   ‚ö†Ô∏è Failed to load DatabaseConnection from {file_path}: {err}")
            last_error = err

    raise ImportError("DatabaseConnection is unavailable in study_query_llm.db, db, or local file paths") from last_error


# Resolve database imports (package-first, fallback to root-level db/)
try:
    from study_query_llm.config import config
except ImportError as config_error:
    print(f"‚ùå Cannot import configuration module: {config_error}")
    raise

try:
    DatabaseConnection = _import_database_connection(study_query_llm)
    print("‚úÖ Database modules imported successfully!")
except ImportError as e:
    print(f"‚ùå Cannot import database modules: {e}")
    print("\nChecking package structure...\n")
    
    # Check if db module exists
    import study_query_llm
    pkg_path = Path(study_query_llm.__file__).parent
    db_path = pkg_path / 'db'
    print(f"   Package path: {pkg_path}")
    print(f"   DB path exists: {db_path.exists()}")
    if db_path.exists():
        print(f"   DB path contents: {list(db_path.iterdir())}")
    
    # Try to see what's available
    print(f"\n   Available in study_query_llm: {dir(study_query_llm)}")
    try:
        import study_query_llm.db
        print(f"   study_query_llm.db imported: {study_query_llm.db.__file__}")
        print(f"   Available in db: {dir(study_query_llm.db)}")
    except Exception as e2:
        print(f"   Cannot import db submodule: {e2}")
    
    raise

# Initialize the database
try:
    db = DatabaseConnection(config.database.connection_string)
    db.init_db()
    print("‚úÖ Database initialized!")
except Exception as e:
    print(f"‚ùå Database initialization failed: {e}")
    raise


## Step 5: Start the Application


In [None]:
# Import and create the app
from panel_app.app import serve_app
from IPython.display import Markdown, display

# Stop any existing server
if 'dashboard_server' in globals():
    try:
        dashboard_server.stop()
    except Exception:
        pass

# Start the server (Colab will create a public URL)
dashboard_server, dashboard_url = serve_app(
    address='0.0.0.0',  # Listen on all interfaces for Colab
    port=5006,
    route=None,
    open_browser=False,
)

# Display the URL
display(Markdown(f"## ‚úÖ Application Started!\n\n**[Open the dashboard]({dashboard_url})**\n\nOr copy this URL: `{dashboard_url}`"))


## Alternative: Display in Notebook

If the above doesn't work, try displaying the app directly in the notebook:


In [None]:
# Alternative: Display app in notebook cell
# Uncomment the line below to display the app inline
# app


## Troubleshooting

### If the app doesn't start:
1. Check that all API keys are set correctly
2. Verify your Azure deployment name matches what's in Azure Portal
3. Make sure all cells above have run successfully
4. Ensure the project source code is accessible (uploaded or cloned)

### To stop the app:
- Interrupt the kernel (Runtime ‚Üí Interrupt execution)
- Or restart the runtime (Runtime ‚Üí Restart runtime)

### Database location:
- The SQLite database is created in the Colab session
- It will be deleted when the session ends
- To persist data, download the database file or use a cloud database

## Next Steps

1. Go to the **Inference** tab in the app
2. Select your provider (Azure, OpenAI, etc.)
3. For Azure: Click "Load Deployments" and select a deployment
4. Enter a prompt and run inference
5. Check the **Analytics** tab to see your results!
