# Demo: SPN Read + User Write Authentication Flow

This notebook demonstrates the authentication flow:
1. **Service Principal (SPN)** authenticates for Scanner API reads
2. **User** authenticates via device code for lakehouse writes

## Step 1: Import and Configure

In [None]:
# Import the scanner module
%run ./fabric_scanner_cloud_connections.py

In [None]:
# AUTHENTICATION CONFIGURATION

# 1. Use SPN for Scanner API (reads)
AUTH_MODE = "spn"

# 2. Use User for Lakehouse uploads (writes)
import os
os.environ["UPLOAD_USE_USER_AUTH"] = "true"

print("‚úÖ Configuration set:")
print(f"   Scanner API auth: Service Principal (AUTH_MODE={AUTH_MODE})")
print(f"   Lakehouse write auth: User (UPLOAD_USE_USER_AUTH=true)")

## Step 2: Initialize SPN Authentication

This authenticates the Service Principal for reading from Scanner API.

In [None]:
# Initialize SPN authentication for Scanner API
initialize_authentication()

print("\n‚úÖ Service Principal authenticated successfully")
print("   Ready to read from Scanner API")

## Step 3: Run Full Scan

**What happens:**
1. ‚úÖ SPN reads workspace list from Scanner API (no prompt)
2. ‚úÖ SPN reads workspace details via Scanner API (no prompt)
3. ‚è∏Ô∏è **USER LOGIN PROMPT** appears when saving to lakehouse
4. ‚úÖ User authenticates, results saved with user credentials

**The login prompt will look like:**
```
üîê User authentication required for lakehouse uploads...
   Opening browser for login (or follow device code instructions)...

To sign in, use a web browser to open the page https://microsoft.com/devicelogin 
and enter the code ABC123XYZ to authenticate.
```

In [None]:
# Run full tenant scan
# SPN will read data, user authentication prompt will appear on first write
run_cloud_connection_scan(
    enable_full_scan=True,
    include_personal=True,
    table_name="tenant_cloud_connections"
)

# Expected flow:
# 1. "Using Service Principal authentication..." (SPN auth for reads)
# 2. Scanner API calls proceed (reads workspaces)
# 3. "üîê User authentication required..." (first write to lakehouse)
# 4. Device code displayed - go to https://microsoft.com/devicelogin
# 5. Enter code, sign in with YOUR user account
# 6. "‚úÖ User authentication successful!"
# 7. Results saved to lakehouse with your user identity

## Understanding the Authentication Flow

### Phase 1: Scanner API Reads (SPN)
- `AUTH_MODE = "spn"` ‚Üí Uses Service Principal
- Calls: `get_access_token_spn()`
- No user interaction needed
- Token cached for 1 hour

### Phase 2: Lakehouse Writes (User)
- `UPLOAD_USE_USER_AUTH = true` ‚Üí Uses User Auth
- Calls: `get_upload_token()`
- **FIRST WRITE ONLY:** Device code prompt
- Subsequent writes: Uses cached token

### Device Code Flow
```python
# This happens automatically in get_upload_token()
# You'll see output like:

üîê User authentication required for lakehouse uploads...
   Opening browser for login (or follow device code instructions)...

To sign in, use a web browser to open the page:
    https://microsoft.com/devicelogin
and enter the code: ABC123XYZ to authenticate.

# After you sign in:
‚úÖ User authentication successful!
```

## Test: Verify Both Auth Methods

In [None]:
# Test 1: Verify SPN token (Scanner API)
print("Testing SPN authentication:")
if ACCESS_TOKEN:
    print(f"   ‚úÖ SPN Token exists (length: {len(ACCESS_TOKEN)})")
    print(f"   Token starts with: {ACCESS_TOKEN[:20]}...")
else:
    print("   ‚ùå No SPN token found")

# Test 2: Check User auth setting
print("\nChecking User auth configuration:")
print(f"   UPLOAD_USE_USER_AUTH = {os.getenv('UPLOAD_USE_USER_AUTH', 'not set')}")
print(f"   MSAL library available: {MSAL_AVAILABLE}")

if not MSAL_AVAILABLE:
    print("\n   ‚ö†Ô∏è  WARNING: msal not installed!")
    print("   Install with: pip install msal")
    print("   Without msal, user auth will fall back to SPN")

## Quick Scan Example (Small Scale)

Test with incremental scan to see the auth flow faster:

In [None]:
# Small incremental scan to test auth flow
run_cloud_connection_scan(
    enable_incremental_scan=True,
    incremental_hours_back=6,  # Only last 6 hours
    enable_hash_optimization=True,
    table_name="tenant_cloud_connections"
)

# This will:
# 1. Use SPN to check modified workspaces (fast)
# 2. Prompt for user login when saving (if not already authenticated)
# 3. Save with your user credentials

## Verify Results

In [None]:
# Check if data was saved
if RUNNING_IN_FABRIC and SPARK_AVAILABLE:
    try:
        df = spark.sql("SELECT COUNT(*) as row_count FROM tenant_cloud_connections")
        display(df)
        print("\n‚úÖ Data successfully saved to lakehouse table")
    except:
        print("‚ö†Ô∏è  Table not yet created (run a scan first)")
else:
    print("Not in Fabric - check ./scanner_output/curated/ for files")

## Key Takeaways

### ‚úÖ Benefits
- **Automated**: SPN allows scheduled/unattended scans
- **Accountable**: User login tracks WHO saved the data
- **Secure**: Separate credentials for read vs write
- **MFA Compatible**: User login respects your org's MFA policies

### üîê Security
- SPN credentials: Store in Key Vault or environment variables
- User auth: Standard Microsoft login (supports MFA)
- Tokens cached: Login only once per session
- Audit trail: All writes tracked to your user identity

### ‚öôÔ∏è Configuration Summary
```python
# For Scanner API reads (SPN)
AUTH_MODE = "spn"
TENANT_ID = os.getenv("FABRIC_SP_TENANT_ID")
CLIENT_ID = os.getenv("FABRIC_SP_CLIENT_ID")
CLIENT_SECRET = os.getenv("FABRIC_SP_CLIENT_SECRET")

# For Lakehouse writes (User)
os.environ["UPLOAD_USE_USER_AUTH"] = "true"
```