# üöÄ Invoice Extraction Demo App - Google Colab

Run the Streamlit demo app in Google Colab!

**Instructions:**
1. Run all cells in order
2. Click the public URL that appears (localtunnel or ngrok)
3. Use the app in your browser!

**Note:** You'll need to upload invoice files through the web interface.

## Step 1: Setup Repository

In [None]:
# Clone repository if not already cloned
import os
if not os.path.exists('/content/orbit_challenge'):
    !git clone https://github.com/marvin-schumann/orbit_challenge.git
    %cd orbit_challenge
    !git checkout claude/capabilities-overview-01BzAZxMUjPBveeHos3gVvok
else:
    %cd /content/orbit_challenge
    !git pull

print("‚úÖ Repository ready!")

## Step 2: Install Dependencies

In [None]:
%%capture
# Install system dependencies
!apt-get update -qq
!apt-get install -y -qq poppler-utils

# Install Python packages
!pip install -q -r requirements.txt
!pip install -q pyngrok  # For tunneling

print("‚úÖ All dependencies installed!")

## Step 3: Configure Ngrok (Optional - for stable URL)

**Option A: Use ngrok (recommended for stable URL)**
1. Sign up at https://ngrok.com (free)
2. Get your auth token from dashboard
3. Paste it below

**Option B: Skip this cell to use localtunnel (no signup required)**

In [None]:
# OPTIONAL: Set your ngrok auth token here
NGROK_AUTH_TOKEN = ""  # Get from https://dashboard.ngrok.com/get-started/your-authtoken

if NGROK_AUTH_TOKEN:
    from pyngrok import ngrok, conf
    conf.get_default().auth_token = NGROK_AUTH_TOKEN
    print("‚úÖ Ngrok configured!")
else:
    print("‚ö†Ô∏è  Ngrok not configured - will use localtunnel instead")

## Step 4: Run the App!

**This cell will:**
1. Start the Streamlit app
2. Create a public tunnel
3. Show you the public URL

**Click the URL to access the app!**

‚ö†Ô∏è **Important:** Keep this cell running while using the app!

In [None]:
import subprocess
import threading
import time
from IPython.display import display, HTML

# Check if ngrok is configured
USE_NGROK = 'NGROK_AUTH_TOKEN' in locals() and NGROK_AUTH_TOKEN

def run_streamlit():
    """Run Streamlit in background"""
    subprocess.run([
        "streamlit", "run", "app.py",
        "--server.port", "8501",
        "--server.headless", "true",
        "--server.enableCORS", "false",
        "--server.enableXsrfProtection", "false"
    ])

# Start Streamlit in background
streamlit_thread = threading.Thread(target=run_streamlit, daemon=True)
streamlit_thread.start()

print("üöÄ Starting Streamlit app...")
time.sleep(5)  # Wait for Streamlit to start

if USE_NGROK:
    # Use ngrok
    from pyngrok import ngrok
    public_url = ngrok.connect(8501, bind_tls=True)
    print("\n" + "="*70)
    print("‚úÖ APP IS RUNNING!")
    print("="*70)
    print(f"\nüåê Public URL (ngrok): {public_url}")
    print("\nüëÜ Click the URL above to access the app!")
    print("\n‚ö†Ô∏è  Keep this cell running while using the app")
    print("="*70)
    
    # Display clickable link
    display(HTML(f'<h2><a href="{public_url}" target="_blank">üöÄ Open Invoice Extraction App</a></h2>'))
    
else:
    # Use localtunnel as fallback
    print("\nüì¶ Installing localtunnel...")
    !npm install -g localtunnel 2>/dev/null
    
    print("\nüåê Starting localtunnel...")
    print("\n" + "="*70)
    print("‚úÖ APP IS RUNNING!")
    print("="*70)
    print("\nStarting tunnel... (wait 5-10 seconds)\n")
    
    # Run localtunnel in background
    !lt --port 8501 &
    
    time.sleep(10)
    
    print("\nüëÜ Look for the URL above (format: https://******.loca.lt)")
    print("\n‚ö†Ô∏è  Important:")
    print("   1. Click the URL")
    print("   2. Click 'Continue' on the localtunnel page")
    print("   3. Use the app!")
    print("\n‚ö†Ô∏è  Keep this cell running while using the app")
    print("="*70)

# Keep running
print("\n‚è≥ App is running... Press stop button to shutdown.\n")

# Keep the cell alive
try:
    streamlit_thread.join()
except KeyboardInterrupt:
    print("\nüõë Shutting down...")

## Troubleshooting

### "Can't connect to the URL"

**Solution:**
1. Make sure the cell above is still running (has spinning circle)
2. Wait 10-15 seconds after starting
3. Click the tunnel URL again
4. For localtunnel, click "Continue" button

### "App is slow"

**Solution:**
- First extraction is slow (model loading)
- Subsequent extractions are faster
- Consider using "Claude API Only" mode for speed

### "GPU out of memory"

**Solution:**
1. Runtime ‚Üí Restart runtime
2. Re-run all cells
3. Or use "Claude API Only" mode (no GPU needed)

### "Want a stable URL"

**Solution:**
1. Sign up at https://ngrok.com (free)
2. Get auth token
3. Paste in Step 3 above
4. Re-run Step 4

## Alternative: Use Notebooks Only

If Streamlit doesn't work, you can still use the notebooks:
- `exercise_v05_hybrid.ipynb` - Full hybrid pipeline
- Results display in Colab itself (no web app needed)

## How to Use the App

Once you open the public URL:

### 1. Configure Settings (Sidebar)
- ‚úÖ Use Qwen2-VL (requires GPU)
- ‚úÖ Use Claude API (add your API key)

### 2. Upload Invoices
- Click "Browse files" or drag & drop
- Upload PDF, PNG, or JPG files

### 3. Extract Data
- Click "üöÄ Extract Data"
- Watch progress in real-time

### 4. View Results
- Switch to "Results" tab
- See metrics, table, and details
- Download CSV

### 5. Learn More
- Switch to "How It Works" tab
- Read technical documentation