Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 74 additions & 1 deletion backend/core/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
import httpx
from modules.config.config_manager import config_manager

import jwt
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dependency PyJWT in requirements.txt. The jwt module is imported but not listed as a dependency. Add PyJWT to requirements.txt to ensure the application can decode and verify JWT tokens.

Copilot uses AI. Check for mistakes.
import requests
import base64
import json
import time

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'time' is not used.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import. The time module is imported but never used in the function. Remove the import to clean up the code.

Suggested change
import time

Copilot uses AI. Check for mistakes.

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -60,8 +66,75 @@
return group_id in user_groups


def get_user_from_aws_alb_jwt(encoded_jwt, expected_alb_arn, aws_region):
"""
Validates the AWS ALB JWT and parses the email address from the payload.
Args:
encoded_jwt (str): The JWT from the x-amzn-oidc-data header.
expected_alb_arn (str): The ARN of your Application Load Balancer.
aws_region (str): The AWS region where your ALB is located (e.g., 'us-east-1').
Returns:
str: The user's email address, or None if validation fails.
"""
if not encoded_jwt:
return None
try:
# Step 1: Decode the JWT header to get the key ID (kid) and signer
jwt_headers_encoded = encoded_jwt.split('.')[0]
# JWTs use base64url encoding, not standard base64
# Add padding if missing, as Python's b64decode expects it
jwt_headers_decoded = base64.b64decode(jwt_headers_encoded + '===').decode("utf-8")
Comment on lines +87 to +88
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect base64 decoding for base64url-encoded JWTs. JWT headers use base64url encoding (URL-safe), not standard base64. Use base64.urlsafe_b64decode() instead of base64.b64decode(), similar to the existing pattern in backend/core/capabilities.py lines 26-28. The current implementation may fail on JWTs containing - or _ characters.

Suggested change
# Add padding if missing, as Python's b64decode expects it
jwt_headers_decoded = base64.b64decode(jwt_headers_encoded + '===').decode("utf-8")
# Add padding if missing, as Python's urlsafe_b64decode expects it
padded = jwt_headers_encoded + '=' * (-len(jwt_headers_encoded) % 4)
jwt_headers_decoded = base64.urlsafe_b64decode(padded).decode("utf-8")

Copilot uses AI. Check for mistakes.
decoded_json_headers = json.loads(jwt_headers_decoded)
kid = decoded_json_headers['kid']
received_alb_arn = decoded_json_headers.get('signer')
Comment on lines +84 to +91
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential security issue: Manual JWT header parsing is error-prone and unnecessary. Instead of manually decoding the JWT header to extract kid and signer, use PyJWT's built-in jwt.get_unverified_header(encoded_jwt) which safely handles base64url decoding. This reduces the risk of decoding errors and is more maintainable.

Copilot uses AI. Check for mistakes.

# Step 2: Validate the signer matches the expected ALB ARN
if received_alb_arn != expected_alb_arn:
print(f"Error: Invalid signer ARN. Expected {expected_alb_arn}, got {received_alb_arn}")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. The codebase uses structured logging throughout (see lines 48, 51 in the existing code), and error messages should be logged using the logger for consistency and proper log aggregation.

Copilot uses AI. Check for mistakes.
return None

# Step 3: Get the public key from the regional endpoint
url = f'https://public-keys.auth.elb.{aws_region}.amazonaws.com/{kid}'
req = requests.get(url)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing security: No timeout specified for external HTTP request. The requests.get() call should include a timeout to prevent the authentication from hanging indefinitely if AWS's public key endpoint is slow or unresponsive. Add a timeout parameter, e.g., requests.get(url, timeout=5.0).

Suggested change
req = requests.get(url)
req = requests.get(url, timeout=5.0)

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use httpx instead of requests for consistency. The codebase uses httpx for HTTP requests (see line 6 and usage in is_user_in_group function). Replace requests.get() with an httpx client to maintain consistency and support async patterns if needed in the future.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for HTTP request failures. The requests.get() call at line 100 should check the response status code before using the text. If the public key fetch fails (e.g., 404, 500), pub_key will contain an error message instead of a valid key, causing JWT validation to fail silently. Add: req.raise_for_status() after line 100 to ensure proper error handling.

Suggested change
req = requests.get(url)
req = requests.get(url)
req.raise_for_status()

Copilot uses AI. Check for mistakes.
pub_key = req.text
Comment on lines +98 to +101
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: Public key is fetched from AWS on every authentication request. Consider implementing a caching mechanism for the public keys (keyed by kid) with an appropriate TTL (e.g., 1 hour). AWS ALB rotates keys infrequently, and caching would significantly reduce latency and external API calls. The cache should handle key rotation by falling back to a fresh fetch if validation fails.

Copilot uses AI. Check for mistakes.

# Step 4: Validate the signature and claims using PyJWT
# The decode method handles signature verification and standard claims (like expiration)
# The ALB uses ES256 algorithm
payload = jwt.decode(
encoded_jwt,
pub_key,
algorithms=['ES256'],
# Optional: Add audience or issuer validation if needed, though ALB handles most standard claims validation
options={"verify_aud": False, "verify_iss": False}
)

# Step 5: Extract the email address from the payload
email_address = payload.get('email')
if email_address:
return email_address
else:
print("Error: 'email' claim not found in JWT payload.")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. This error logging should be consistent with the existing patterns in this module.

Copilot uses AI. Check for mistakes.
return None

except jwt.ExpiredSignatureError:
print("Error: Token has expired.")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. All error logging in this function should use the logger instance defined at line 15.

Copilot uses AI. Check for mistakes.
return None
except jwt.InvalidTokenError as e:
print(f"Error: Invalid token - {e}")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. Consistent with the rest of the module's error handling.

Copilot uses AI. Check for mistakes.
return None
except requests.exceptions.RequestException as e:
print(f"Error fetching public key: {e}")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. This maintains consistency with the error logging patterns used throughout the codebase.

Copilot uses AI. Check for mistakes.
return None
except Exception as e:
print(f"An unexpected error occurred: {e}")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for error messages. All error messages should be logged consistently using the logger.

Copilot uses AI. Check for mistakes.
return None


def get_user_from_header(x_email_header: Optional[str]) -> Optional[str]:
"""Extract user email from authentication header value."""
if not x_email_header:
return None
return x_email_header.strip()
return x_email_header.strip()
24 changes: 18 additions & 6 deletions backend/core/middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from starlette.responses import Response

from core.auth import get_user_from_header
from core.auth import get_user_from_aws_alb_jwt
from core.capabilities import verify_file_token
from infrastructure.app_factory import app_factory

Expand All @@ -22,6 +23,9 @@ def __init__(
app,
debug_mode: bool = False,
auth_header_name: str = "X-User-Email",
auth_header_type: str = "email-string",
auth_aws_expected_alb_arn: str = "arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/your-alb-name/...",
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default ARN value in middleware matches the config default, which appears to be a placeholder. Consider using an empty string or None as default in the middleware signature to make it clearer that this value should come from configuration, not hardcoded defaults.

Copilot uses AI. Check for mistakes.
auth_aws_region: str = "us-east-1",
proxy_secret_enabled: bool = False,
proxy_secret_header: str = "X-Proxy-Secret",
proxy_secret: str = None,
Expand All @@ -30,6 +34,9 @@ def __init__(
super().__init__(app)
self.debug_mode = debug_mode
self.auth_header_name = auth_header_name
self.auth_header_type = auth_header_type
self.auth_aws_expected_alb_arn = auth_aws_expected_alb_arn
self.auth_aws_region = auth_aws_region
self.proxy_secret_enabled = proxy_secret_enabled
self.proxy_secret_header = proxy_secret_header
self.proxy_secret = proxy_secret
Expand Down Expand Up @@ -83,17 +90,22 @@ async def dispatch(self, request: Request, call_next) -> Response:
user_email = None
if self.debug_mode:
# In debug mode, honor auth header if provided, otherwise use config test user
x_email_header = request.headers.get(self.auth_header_name)
if x_email_header:
user_email = get_user_from_header(x_email_header)
x_auth_header = request.headers.get(self.auth_header_name)
if x_auth_header:
user_email = get_user_from_header(x_auth_header)
else:
# Get test user from config
config_manager = app_factory.get_config_manager()
user_email = config_manager.app_settings.test_user
# logger.info(f"Debug mode: using user {user_email}")
else:
Comment on lines +95 to 101
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent authentication logic in debug mode. In debug mode (lines 93-95), the code uses get_user_from_header() regardless of the auth_header_type setting. This means AWS ALB JWT authentication cannot be tested in debug mode. Consider applying the same authentication type logic in debug mode to allow proper testing, or document that AWS ALB JWT authentication only works in production mode.

Suggested change
user_email = get_user_from_header(x_auth_header)
else:
# Get test user from config
config_manager = app_factory.get_config_manager()
user_email = config_manager.app_settings.test_user
# logger.info(f"Debug mode: using user {user_email}")
else:
if self.auth_header_type == "aws-alb-jwt":
user_email = get_user_from_aws_alb_jwt(x_auth_header, self.auth_aws_expected_alb_arn, self.auth_aws_region)
else:
user_email = get_user_from_header(x_auth_header)
else:
# Get test user from config
config_manager = app_factory.get_config_manager()
user_email = config_manager.app_settings.test_user
# logger.info(f"Debug mode: using user {user_email}")

Copilot uses AI. Check for mistakes.
x_email_header = request.headers.get(self.auth_header_name)
user_email = get_user_from_header(x_email_header)
x_auth_header = request.headers.get(self.auth_header_name)

# Extract the user's email, depending on the datatype of auth header
if self.auth_header_type == "aws-alb-jwt": # Amazon Application Load Balancer
user_email = get_user_from_aws_alb_jwt(x_auth_header, self.auth_aws_expected_alb_arn, self.auth_aws_region)
else:
user_email = get_user_from_header(x_auth_header)

if not user_email:
# Distinguish between API endpoints (return 401) and browser endpoints (redirect)
Expand All @@ -108,4 +120,4 @@ async def dispatch(self, request: Request, call_next) -> Response:
request.state.user_email = user_email

response = await call_next(request)
return response
return response
3 changes: 3 additions & 0 deletions backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ async def lifespan(app: FastAPI):
AuthMiddleware,
debug_mode=config.app_settings.debug_mode,
auth_header_name=config.app_settings.auth_user_header,
auth_header_type=config.app_settings.auth_user_header_type,
auth_aws_expected_alb_arn=config.app_settings.auth_aws_expected_alb_arn,
auth_aws_region=config.app_settings.auth_aws_region,
proxy_secret_enabled=config.app_settings.feature_proxy_secret_enabled,
proxy_secret_header=config.app_settings.proxy_secret_header,
proxy_secret=config.app_settings.proxy_secret,
Expand Down
23 changes: 22 additions & 1 deletion backend/modules/config/config_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,27 @@ def agent_mode_available(self) -> bool:
description="HTTP header name to extract authenticated username from reverse proxy",
validation_alias="AUTH_USER_HEADER"
)

# Authentication header configuration
auth_user_header_type: str = Field(
default="email-string",
description="The datatype stored in AUTH_USER_HEADER",
validation_alias="AUTH_USER_HEADER_TYPE"
)

# Authentication AWS expected ALB ARN
auth_aws_expected_alb_arn: str = Field(
default="arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/your-alb-name/...",
description="The expected AWS ALB ARN",
validation_alias="AUTH_AWS_EXPECTED_ALB_ARN"
)

Comment on lines +225 to +229
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default ARN value appears to be a placeholder and may cause confusion in production. Consider using an empty string as the default or adding validation that warns/errors if the default placeholder value is still in use when auth_header_type is set to aws-alb-jwt. This prevents accidental misconfigurations in production.

Suggested change
default="arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/your-alb-name/...",
description="The expected AWS ALB ARN",
validation_alias="AUTH_AWS_EXPECTED_ALB_ARN"
)
default="",
description="The expected AWS ALB ARN",
validation_alias="AUTH_AWS_EXPECTED_ALB_ARN"
)
def model_post_init(self, __context):
# Validate that if auth_user_header_type is aws-alb-jwt, ARN must be set and not placeholder
if self.auth_user_header_type == "aws-alb-jwt":
placeholder = "arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/your-alb-name/..."
if not self.auth_aws_expected_alb_arn or self.auth_aws_expected_alb_arn == placeholder:
raise ValueError(
"auth_aws_expected_alb_arn must be set to a valid AWS ALB ARN when auth_user_header_type is 'aws-alb-jwt'. "
"Current value is empty or a placeholder."
)

Copilot uses AI. Check for mistakes.
# Authentication AWS region
auth_aws_region: str = Field(
default="us-east-1",
description="The AWS region",
validation_alias="AUTH_AWS_REGION"
)

# Proxy secret authentication configuration
feature_proxy_secret_enabled: bool = Field(
Expand Down Expand Up @@ -659,4 +680,4 @@ def get_llm_config() -> LLMConfig:

def get_mcp_config() -> MCPConfig:
"""Get MCP configuration."""
return config_manager.mcp_config
return config_manager.mcp_config
11 changes: 11 additions & 0 deletions docs/02_admin_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,17 @@ The intended flow for user authentication in a production environment is as foll

The backend application reads this header to identify the user. The header name is configurable via the `AUTH_USER_HEADER` environment variable (default: `X-User-Email`). This allows flexibility for different reverse proxy setups that may use different header names (e.g., `X-Authenticated-User`, `X-Remote-User`). This model is secure only if the backend is not directly exposed to the internet, ensuring that all requests are processed by the proxy first.

If using AWS Application Load Balancer (ALB) as the Auth Service, the following authentication configuration should be used:

```
AUTH_USER_HEADER=x-amzn-oidc-data
AUTH_USER_HEADER_TYPE=aws-alb-jwt
AUTH_AWS_EXPECTED_ALB_ARN=arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/your-alb-name/...
AUTH_AWS_REGION=us-east-1
```

This configuration will decode the base64-encoded JWT passed in the x-amzn-oidc-data header, validate it, and extract the user's email address from the validated JWT.

### Development Behavior

In a local development environment (when `DEBUG_MODE=true` in the `.env` file), the system falls back to using a default `test@test.com` user if the configured authentication header is not present.
Expand Down
Loading