# File Access Authentication Demo

This notebook walks through a runnable demonstration of why returning a raw S3 URL from a GraphQL API is unsafe and how to fix it by placing an authenticated download service in front of S3. Run the cells in order to spin up each of the building blocks: a mock S3 bucket, an auth provider, a GraphQL API, and the simulated front-end flows.

## 1. Mock S3 Bucket

We start with a tiny in-memory stand-in for S3. It can store files, toggle between public and private access, and generate URLs that only work when the bucket is public (mimicking an `s3:GetObject` policy that allows anonymous reads).

In [None]:
from __future__ import annotations
from dataclasses import dataclass
import secrets
import time
import re
from typing import Dict, Optional

class PermissionError(Exception):
    """Raised when access is denied in the mock services."""

class MockS3:
    """Minimal S3-like storage with optional public reads."""

    def __init__(self, bucket_name: str, public: bool = False):
        self.bucket_name = bucket_name
        self._objects: Dict[str, str] = {}
        self.public = public

    def upload_file(self, key: str, content: str) -> None:
        self._objects[key] = content

    def set_public(self, is_public: bool) -> None:
        self.public = is_public

    def generate_public_url(self, key: str) -> str:
        if key not in self._objects:
            raise FileNotFoundError(f"{key} not found in bucket {self.bucket_name}")
        if not self.public:
            raise PermissionError("Bucket policy blocks anonymous reads; no public URL available.")
        return f"https://mock-s3/{self.bucket_name}/{key}"

    def download_via_public_url(self, url: str) -> str:
        if not self.public:
            raise PermissionError("Anonymous read blocked: bucket is private.")
        key = url.split('/')[-1]
        return self._objects[key]

    def get_object(self, key: str) -> str:
        if key not in self._objects:
            raise FileNotFoundError(f"{key} not found in bucket {self.bucket_name}")
        return self._objects[key]

mock_s3 = MockS3(bucket_name="team-reports", public=False)
mock_s3.upload_file("quarterly-report.txt", "CONFIDENTIAL: Revenue breakdown...")
print("Mock S3 ready with 1 private file.")


## 2. Authentication Provider

We do not have an auth system inside S3, so the API cluster has to own authentication. The provider issues bearer tokens for users and tracks which scopes they hold. Tokens expire quickly to keep the demo compact.

In [None]:
class AuthProvider:
    """Toy auth provider that issues and validates scoped bearer tokens."""

    def __init__(self):
        self._users = {"alice": "hunter2", "bob": "p@ssw0rd"}
        self._active_tokens: Dict[str, Dict[str, object]] = {}
        self._default_scopes = {"download:request", "download:use"}
        self._ttl_seconds = 120

    def authenticate(self, username: str, password: str) -> str:
        expected = self._users.get(username)
        if expected is None or expected != password:
            raise PermissionError("Invalid credentials.")
        token = secrets.token_urlsafe(16)
        self._active_tokens[token] = {
            "user": username,
            "scopes": set(self._default_scopes),
            "expires_at": time.time() + self._ttl_seconds,
        }
        return token

    def validate(self, token: str, required_scope: Optional[str] = None) -> Dict[str, object]:
        claims = self._active_tokens.get(token)
        if not claims:
            raise PermissionError("Unknown or revoked token.")
        if claims["expires_at"] < time.time():
            raise PermissionError("Token expired.")
        if required_scope and required_scope not in claims["scopes"]:
            raise PermissionError(f"Missing required scope: {required_scope}")
        return claims

auth = AuthProvider()
user_token = auth.authenticate("alice", "hunter2")
print("Issued token for Alice:", user_token)


## 3. Naïve GraphQL API (Insecure)

Next, we implement a GraphQL-like endpoint that simply returns whatever the client asks for. In this variant the resolver turns around and shares the direct S3 URL. That only works when the bucket is wide open to the public.

In [None]:
class NaiveGraphQLAPI:
    """GraphQL endpoint that exposes raw S3 URLs without authorization."""

    _query_pattern = re.compile(r'fileDownloadUrl\s*\(\s*fileId:\s*"(?P<file_id>[^"]+)"\s*\)')

    def __init__(self, s3: MockS3):
        self.s3 = s3

    def execute(self, query: str) -> Dict[str, object]:
        match = self._query_pattern.search(query)
        if not match:
            return {"errors": ["Unsupported query"]}
        file_id = match.group("file_id")
        try:
            url = self.s3.generate_public_url(file_id)
        except Exception as exc:
            return {"errors": [str(exc)]}
        return {"data": {"fileDownloadUrl": url}}

naive_api = NaiveGraphQLAPI(mock_s3)
query = 'query { fileDownloadUrl(fileId: "quarterly-report.txt") }'
print(naive_api.execute(query))


### Client Flow Against the Naïve API

The front-end receives a URL and tries to download the file directly from S3. This only succeeds when the bucket policy allows anonymous reads, which is the exact misconfiguration we are trying to avoid.

In [None]:
def naive_frontend_flow(api: NaiveGraphQLAPI, s3: MockS3, file_id: str) -> None:
    query = f'query {{ fileDownloadUrl(fileId: "{file_id}") }}'
    response = api.execute(query)
    print("GraphQL response:", response)
    data = response.get("data")
    if not data:
        print("Front-end cannot download the file; no usable URL.")
        return
    try:
        file_contents = s3.download_via_public_url(data["fileDownloadUrl"])
    except Exception as exc:
        print("Download failed:", exc)
        return
    print("File contents delivered to browser:", file_contents)

print("--- Bucket forced public (works but unsafe) ---")
mock_s3.set_public(True)
naive_frontend_flow(naive_api, mock_s3, "quarterly-report.txt")

print("--- Bucket made private (secure policy) ---")
mock_s3.set_public(False)
naive_frontend_flow(naive_api, mock_s3, "quarterly-report.txt")


## 4. File Service in Front of S3 (Secure Path)

To keep the bucket private, we need an internal web service that downloads the object server-side after verifying the caller. The GraphQL layer can then return a URL that points to **our** service instead of S3.

In [None]:
class FileService:
    """Authorizes downloads and streams files from S3 on behalf of the caller."""

    def __init__(self, s3: MockS3, auth_provider: AuthProvider):
        self.s3 = s3
        self.auth_provider = auth_provider
        self._pending_tokens: Dict[str, Dict[str, object]] = {}
        self._ttl_seconds = 60

    def prepare_download(self, file_id: str, user_token: str) -> str:
        claims = self.auth_provider.validate(user_token, required_scope="download:request")
        if file_id not in self.s3._objects:
            raise FileNotFoundError(file_id)
        download_token = secrets.token_urlsafe(20)
        self._pending_tokens[download_token] = {
            "file_id": file_id,
            "user": claims["user"],
            "expires_at": time.time() + self._ttl_seconds,
        }
        return download_token

    def download(self, download_token: str, user_token: str) -> str:
        claims = self.auth_provider.validate(user_token, required_scope="download:use")
        meta = self._pending_tokens.get(download_token)
        if not meta:
            raise PermissionError("Invalid or already used download token.")
        if meta["expires_at"] < time.time():
            raise PermissionError("Download token expired.")
        if meta["user"] != claims["user"]:
            raise PermissionError("Token does not belong to this user.")
        del self._pending_tokens[download_token]
        return self.s3.get_object(meta["file_id"])

class SecureGraphQLAPI:
    """GraphQL facade that returns links to the internal download service."""

    _query_pattern = NaiveGraphQLAPI._query_pattern

    def __init__(self, auth_provider: AuthProvider, file_service: FileService):
        self.auth_provider = auth_provider
        self.file_service = file_service

    def execute(self, query: str, user_token: str) -> Dict[str, object]:
        match = self._query_pattern.search(query)
        if not match:
            return {"errors": ["Unsupported query"]}
        file_id = match.group("file_id")
        try:
            download_token = self.file_service.prepare_download(file_id, user_token)
        except Exception as exc:
            return {"errors": [str(exc)]}
        return {"data": {"fileDownloadUrl": f"https://files.internal/download?token={download_token}"}}

file_service = FileService(mock_s3, auth)
secure_api = SecureGraphQLAPI(auth, file_service)
print("Secure GraphQL API ready.")


### Secure Client Flow

The front-end now keeps its user token, asks GraphQL for a download link, and then calls the internal file service with both the user token and the short-lived download token from GraphQL. S3 never sees an anonymous request.

In [None]:
from urllib.parse import urlparse, parse_qs

def secure_frontend_flow(api: SecureGraphQLAPI, file_service: FileService, auth_provider: AuthProvider,
                         file_id: str, username: str, password: str) -> None:
    user_token = auth_provider.authenticate(username, password)
    query = f'query {{ fileDownloadUrl(fileId: "{file_id}") }}'
    graphql_response = api.execute(query, user_token=user_token)
    print("GraphQL response:", graphql_response)
    data = graphql_response.get("data")
    if not data:
        print("GraphQL denied access; download will not proceed.")
        return
    download_url = data["fileDownloadUrl"]
    qs = parse_qs(urlparse(download_url).query)
    download_token = qs.get("token", [None])[0]
    if not download_token:
        print("Malformed download URL.")
        return
    try:
        file_contents = file_service.download(download_token, user_token=user_token)
    except Exception as exc:
        print("Download denied:", exc)
        return
    print("File contents delivered to browser:", file_contents)

secure_frontend_flow(secure_api, file_service, auth, "quarterly-report.txt", "alice", "hunter2")

print("Trying reuse of the one-time token (should fail):")
secure_frontend_flow(secure_api, file_service, auth, "quarterly-report.txt", "alice", "hunter2")


## 5. What the Front-End Would Do

Here is the barebones HTML + JavaScript sketch that mirrors the secure flow. It never sees the raw S3 URL; it only handles GraphQL responses and calls the download service with the authenticated request.

In [None]:
html_example = """<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Secure Download Demo</title>
    <script>
      async function fetchReport() {
        const userToken = document.getElementById('token').value;
        const graphqlResponse = await fetch('/graphql', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${userToken}` },
          body: JSON.stringify({ query: 'query { fileDownloadUrl(fileId: "quarterly-report.txt") }' }),
        }).then(r => r.json());

        const downloadUrl = graphqlResponse?.data?.fileDownloadUrl;
        if (!downloadUrl) {
          alert('GraphQL denied access to the file');
          return;
        }

        const downloadResponse = await fetch(downloadUrl, {
          headers: { 'Authorization': `Bearer ${userToken}` }
        });

        if (!downloadResponse.ok) {
          alert('Download failed: ' + (await downloadResponse.text()));
          return;
        }

        const blob = await downloadResponse.blob();
        const url = URL.createObjectURL(blob);
        const anchor = document.createElement('a');
        anchor.href = url;
        anchor.download = 'quarterly-report.txt';
        anchor.click();
      }
    </script>
  </head>
  <body>
    <h1>Secure Download Demo</h1>
    <label>Auth token <input id="token" placeholder="paste bearer token" /></label>
    <button onclick="fetchReport()">Download report</button>
  </body>
</html>
"""

print(html_example)


## 6. Key Takeaways

- Returning a raw S3 URL forces the bucket to allow public reads, which violates least privilege.
- Keep S3 private and expose files through an authenticated service that shares the same identity tokens as the GraphQL layer.
- GraphQL should only ever return URLs that point to infrastructure you control; those URLs can encode short-lived download tokens to prevent replay.
- The front-end never needs direct S3 credentials—its single bearer token works for both GraphQL and the download service.