Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 25, 2025

Implements a generic middleware to restrict access based on email domain whitelist, controlled by configuration file and FEATURE_DOMAIN_WHITELIST_ENABLED flag. DOE/NNSA/national lab domains provided as default example.

Implementation

  • Middleware: backend/core/domain_whitelist_middleware.py

    • Generic domain whitelist validation (not DOE-specific)
    • Loads domains from JSON configuration file
    • Supports subdomains (e.g., user@mail.sandia.gov)
    • Bypasses /api/health and auth redirect endpoints
    • Returns JSON 403 for API paths, redirect 302 for UI paths
  • Configuration Manager: backend/core/domain_whitelist.py

    • Loads domains from config/defaults/domain-whitelist.json or config/overrides/domain-whitelist.json
    • Optimized subdomain matching (O(log n) parent-level checking)
    • Dual control: config file enabled field + environment variable
  • Configuration Files:

    • config/defaults/domain-whitelist.json - Default config with DOE labs as example (22 domains)
    • config/overrides/domain-whitelist.json - User customization (takes precedence)
    • config/defaults/domain-whitelist-example.json - Template for custom configs
    • config/defaults/DOMAIN_WHITELIST_README.md - Complete documentation
  • Feature flag: FEATURE_DOMAIN_WHITELIST_ENABLED in AppSettings

    • Default: false
    • Added to .env.example
    • Backward compatible: FEATURE_DOE_LAB_CHECK_ENABLED still works
  • Integration: Added to middleware stack in main.py

    • Positioned before AuthMiddleware in registration order (runs after in request flow)
    • Only registered when feature flag is enabled

Usage

Enable with default DOE configuration

# Set in config/defaults/domain-whitelist.json
"enabled": true

# Set environment variable
FEATURE_DOMAIN_WHITELIST_ENABLED=true

Custom domain configuration

Create config/overrides/domain-whitelist.json:

{
  "version": "1.0",
  "enabled": true,
  "domains": [
    {"domain": "mycompany.com", "description": "My Company"},
    {"domain": "partner.org", "description": "Partner Organization"}
  ],
  "subdomain_matching": true
}

Configuration Format

{
  "version": "1.0",
  "description": "Domain whitelist configuration",
  "enabled": true,
  "domains": [
    {
      "domain": "example.com",
      "description": "Example Corporation",
      "category": "Enterprise"
    }
  ],
  "subdomain_matching": true
}

Test Coverage

8 tests covering:

  • Configuration loading and validation
  • Domain matching (exact and subdomain)
  • Middleware behavior (allow/deny/bypass)
  • Health endpoint bypassing
  • Disabled state handling
  • Custom config paths

Performance

  • Direct domain lookup: O(1) set membership
  • Subdomain matching: O(log n) parent-level checking
  • Example: mail.dept.sandia.gov checks 2 parent levels instead of iterating 22 domains

Documentation

Complete documentation available in config/defaults/DOMAIN_WHITELIST_README.md:

  • Configuration format and fields
  • Multiple example configurations (government, corporate, educational)
  • Enabling instructions
  • Troubleshooting guide
  • Logging details
Original prompt

add this DOE check behind a feature flag. make the cehck omore compact. make code improvements.

from fastapi import FastAPI, Request, HTTPException

app = FastAPI()

def doe_lab_email_middleware(app: FastAPI):
DOE_LAB_DOMAINS = {
# HQ / NNSA / DOE-wide
"doe.gov","nnsa.doe.gov","hq.doe.gov",

    # National labs (not exhaustive but broad coverage)
    "anl.gov","bnl.gov","fnal.gov","inl.gov","lbl.gov","lanl.gov","llnl.gov",
    "ornl.gov","pnnl.gov","sandia.gov","srnl.doe.gov","ameslab.gov",
    "jlab.org","princeton.edu","slac.stanford.edu","pppl.gov","nrel.gov",
    "netl.doe.gov","stanford.edu",  # SLAC/Stanford mix
}

@app.middleware("http")
async def _check_doe_email(request: Request, call_next):
    # Expect email in header; adjust name if needed
    email = request.headers.get("x-user-email")
    if not email or "@" not in email:
        raise HTTPException(status_code=401, detail="User email required")

    domain = email.split("@", 1)[1].lower()
    if not any(domain == d or domain.endswith("." + d) for d in DOE_LAB_DOMAINS):
        raise HTTPException(status_code=403, detail="Access restricted to DOE / NNSA / DOE labs")

    return await call_next(request)

return app

use it

doe_lab_email_middleware(app)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Copilot AI changed the title [WIP] Add DOE email check behind a feature flag Add DOE lab email domain restriction middleware with feature flag Nov 25, 2025
Copilot AI requested a review from garland3 November 25, 2025 23:24
Copilot finished work on behalf of garland3 November 25, 2025 23:24
Copy link
Collaborator

@copilot make it more generic with the doe as an example. so an ability to white list some domains. set the config file in the config folder similar to the existing config files.

Copilot AI and others added 2 commits November 26, 2025 01:41
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Nov 26, 2025

@copilot make it more generic with the doe as an example. so an ability to white list some domains. set the config file in the config folder similar to the existing config files.

Made the middleware generic with config file support. Changes in commits 2092ede and 97bca0f:

Generic Implementation:

  • Renamed DOELabMiddlewareDomainWhitelistMiddleware
  • Created DomainWhitelistManager to load domains from config
  • Feature flag: FEATURE_DOMAIN_WHITELIST_ENABLED (backward compatible with old name)

Config Files:

  • config/defaults/domain-whitelist.json - Default config with DOE as example (22 domains)
  • config/overrides/domain-whitelist.json - For customization (takes precedence)
  • config/defaults/domain-whitelist-example.json - Template

Usage:

  1. Create config/overrides/domain-whitelist.json with your domains
  2. Set "enabled": true in config file
  3. Set FEATURE_DOMAIN_WHITELIST_ENABLED=true env var

See config/defaults/DOMAIN_WHITELIST_README.md for complete documentation and examples.

Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Copilot AI changed the title Add DOE lab email domain restriction middleware with feature flag Add generic email domain whitelist middleware with configuration file Nov 26, 2025
Copilot finished work on behalf of garland3 November 26, 2025 01:48
manager = DomainWhitelistManager(config_path=temp_config)

assert manager.is_enabled() is True
assert "sandia.gov" in manager.get_domains()

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
sandia.gov
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.


assert manager.is_enabled() is True
assert "sandia.gov" in manager.get_domains()
assert "doe.gov" in manager.get_domains()

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
doe.gov
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

assert manager.is_enabled() is True
assert "sandia.gov" in manager.get_domains()
assert "doe.gov" in manager.get_domains()
assert "example.org" in manager.get_domains()

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High test

The string
example.org
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

1 similar comment
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

- Remove unused Request and TestClient imports from test_domain_whitelist.py
- Remove unused List import from domain_whitelist.py
- Addresses CodeQL static analysis warnings
- All tests passing
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

1 similar comment
@github-actions
Copy link

🔒 Security Scan Results

Security Scan Summary

Scan Results

Python SAST (Bandit)

⚠️ Security issues found in Python code

Recommendations

  • Review all SARIF files uploaded to GitHub Security tab
  • Address high and critical severity vulnerabilities immediately
  • Run npm audit fix and pip-audit locally to fix dependencies
  • Consider implementing additional security controls

@garland3
Copy link
Collaborator

image

confirmed testing that this works with a test config file and the feature turned on.

the code ql checks are false positive.

@ktpedre This works.

@garland3 garland3 marked this pull request as ready for review November 26, 2025 02:46
Copilot AI review requested due to automatic review settings November 26, 2025 02:46
@garland3 garland3 merged commit 194a23b into main Nov 26, 2025
10 of 11 checks passed
@garland3 garland3 deleted the copilot/add-doe-check-feature-flag branch November 26, 2025 02:46
Copilot finished reviewing on behalf of garland3 November 26, 2025 02:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a generic email domain whitelist middleware to restrict application access based on user email domains. The feature is controlled by a configuration file (domain-whitelist.json) and a feature flag (FEATURE_DOMAIN_WHITELIST_ENABLED), making it flexible for different deployment scenarios (government labs, corporate, educational institutions). The implementation includes optimized subdomain matching (O(log n) instead of O(n)), comprehensive test coverage, and detailed documentation.

Key changes:

  • Generic, configurable domain whitelist middleware (not DOE-specific despite example config)
  • Layered configuration system with override support
  • Backward compatibility via FEATURE_DOE_LAB_CHECK_ENABLED alias
  • Proper middleware ordering to leverage request.state.user_email from AuthMiddleware

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
backend/core/domain_whitelist_middleware.py New middleware enforcing email domain restrictions with endpoint-specific responses (JSON 403 for API, redirect for UI)
backend/core/domain_whitelist.py Configuration manager with optimized subdomain matching and layered config file search
backend/modules/config/config_manager.py Added feature_domain_whitelist_enabled field with backward-compatible alias
backend/main.py Registered middleware conditionally based on feature flag
config/defaults/domain-whitelist.json Default configuration with 22 DOE/NNSA/national lab domains as example
backend/tests/test_domain_whitelist.py Test suite covering configuration loading, domain matching, and middleware behavior
docs/admin/domain-whitelist.md Comprehensive documentation with examples and troubleshooting guide
.env.example Added FEATURE_DOMAIN_WHITELIST_ENABLED flag documentation

Comment on lines +99 to +106
def test_invalid_email(self, temp_config):
"""Test handling of invalid email addresses."""
manager = DomainWhitelistManager(config_path=temp_config)

assert manager.is_domain_allowed("notanemail") is False
assert manager.is_domain_allowed("") is False
assert manager.is_domain_allowed("no-at-sign.com") is False

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: There's no test case for when a user has an email with a subdomain that should be rejected. For example, testing user@mail.gmail.com when only sandia.gov is whitelisted, to ensure subdomain matching only works for whitelisted parent domains, not for non-whitelisted domains.

Consider adding a test case:

def test_subdomain_of_non_whitelisted_domain(self, temp_config):
    """Test that subdomains of non-whitelisted domains are rejected."""
    manager = DomainWhitelistManager(config_path=temp_config)
    
    # gmail.com is not whitelisted, so mail.gmail.com should also be rejected
    assert manager.is_domain_allowed("user@mail.gmail.com") is False

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +189
def test_middleware_with_disallowed_domain(self, temp_config, create_middleware):
"""Test that disallowed domains are blocked."""
from starlette.requests import Request
from starlette.responses import Response

middleware = create_middleware(temp_config)

async def call_next(request):
return Response("OK", status_code=200)

async def test_request():
scope = {
"type": "http",
"method": "GET",
"path": "/api/test",
"query_string": b"",
"headers": [],
"state": {},
}
request = Request(scope)
request.state.user_email = "test@gmail.com"

response = await middleware.dispatch(request, call_next)
assert response.status_code == 403

import asyncio
asyncio.run(test_request())
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: The middleware's _unauthorized_response method returns different responses based on whether the path starts with /api/ (JSON 403) or not (redirect 302), but there's no test verifying the redirect behavior for non-API paths.

Consider adding a test case:

def test_middleware_redirect_for_ui_path(self, temp_config, create_middleware):
    """Test that non-API paths get redirected instead of 403."""
    from starlette.requests import Request
    from starlette.responses import Response, RedirectResponse
    
    middleware = create_middleware(temp_config)
    
    async def call_next(request):
        return Response("OK", status_code=200)
    
    async def test_request():
        scope = {
            "type": "http",
            "method": "GET",
            "path": "/dashboard",  # Non-API path
            "query_string": b"",
            "headers": [],
            "state": {},
        }
        request = Request(scope)
        request.state.user_email = "test@gmail.com"
        
        response = await middleware.dispatch(request, call_next)
        assert isinstance(response, RedirectResponse)
        assert response.status_code == 302
    
    import asyncio
    asyncio.run(test_request())

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +79
domains.add(domain_entry.get('domain', '').lower())
elif isinstance(domain_entry, str):
domains.add(domain_entry.lower())
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: Empty domain strings can be added to the whitelist set. If a domain entry has an empty domain field (line 77), it will add an empty string to the domains set. This could cause unexpected behavior where emails like user@ (malformed) might incorrectly match against an empty domain.

Add validation to skip empty domains:

for domain_entry in config_data.get('domains', []):
    if isinstance(domain_entry, dict):
        domain = domain_entry.get('domain', '').lower().strip()
        if domain:  # Only add non-empty domains
            domains.add(domain)
    elif isinstance(domain_entry, str):
        domain = domain_entry.lower().strip()
        if domain:  # Only add non-empty domains
            domains.add(domain)
Suggested change
domains.add(domain_entry.get('domain', '').lower())
elif isinstance(domain_entry, str):
domains.add(domain_entry.lower())
domain = domain_entry.get('domain', '').lower().strip()
if domain:
domains.add(domain)
elif isinstance(domain_entry, str):
domain = domain_entry.lower().strip()
if domain:
domains.add(domain)

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +106
"domain": "pppl.gov",
"description": "Princeton Plasma Physics Laboratory",
"category": "National Laboratory"
},
{
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate domain entry: "pppl.gov" appears twice in the domains list (lines 92-95 and 102-105). This will not cause runtime errors since the domains are stored in a set, but creates unnecessary redundancy in the configuration file and could be confusing for users editing it.

Remove one of the duplicate entries.

Suggested change
"domain": "pppl.gov",
"description": "Princeton Plasma Physics Laboratory",
"category": "National Laboratory"
},
{

Copilot uses AI. Check for mistakes.
"""
app.add_middleware(SecurityHeadersMiddleware)
app.add_middleware(RateLimitMiddleware)
# Domain whitelist check (if enabled) - add before Auth so it runs after
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment "add before Auth so it runs after" is potentially confusing. In FastAPI/Starlette middleware, when you call app.add_middleware(), middlewares are registered in reverse order of execution. The middleware added last runs first in the request processing chain.

In this code:

  1. SecurityHeadersMiddleware is added first → runs last (modifies response headers)
  2. RateLimitMiddleware is added second → runs third
  3. DomainWhitelistMiddleware is added third → runs second
  4. AuthMiddleware is added last → runs first

So DomainWhitelistMiddleware actually runs after AuthMiddleware in the request flow, which is correct (it needs request.state.user_email set by AuthMiddleware). However, the comment says "add before Auth so it runs after" which might be confusing since we're adding it before the AuthMiddleware registration.

Consider rewording to: "Add DomainWhitelistMiddleware before AuthMiddleware registration (executes after AuthMiddleware in request flow)" for clarity.

Suggested change
# Domain whitelist check (if enabled) - add before Auth so it runs after
# Domain whitelist check (if enabled) - add before AuthMiddleware registration (executes after AuthMiddleware in request flow)

Copilot uses AI. Check for mistakes.
# Check if domain is allowed
if not self.whitelist_manager.is_domain_allowed(email):
domain = email.split("@", 1)[1].lower()
logger.warning(f"Domain whitelist check failed: unauthorized domain {domain}")
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Potential security information disclosure: The middleware logs the rejected domain at WARNING level when a domain is not whitelisted. This could expose information about which domains are attempting to access the system, which might be sensitive in some security contexts.

Consider either:

  1. Lowering this to DEBUG level to reduce exposure in production logs
  2. Using a more generic log message without the specific domain
  3. Ensuring logs are properly secured if this information is intentionally captured for security monitoring

Example:

logger.debug(f"Domain whitelist check failed: unauthorized domain {domain}")
# or
logger.warning("Domain whitelist check failed: unauthorized domain")
Suggested change
logger.warning(f"Domain whitelist check failed: unauthorized domain {domain}")
logger.warning("Domain whitelist check failed: unauthorized domain")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants