-
Notifications
You must be signed in to change notification settings - Fork 5
Add generic email domain whitelist middleware with configuration file #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
|
@copilot make it more generic with the doe as an example. so an ability to white list some domains. set the config file in the config folder similar to the existing config files. |
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
Made the middleware generic with config file support. Changes in commits 2092ede and 97bca0f: Generic Implementation:
Config Files:
Usage:
See |
Co-authored-by: garland3 <1162675+garland3@users.noreply.github.com>
| manager = DomainWhitelistManager(config_path=temp_config) | ||
|
|
||
| assert manager.is_enabled() is True | ||
| assert "sandia.gov" in manager.get_domains() |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
sandia.gov
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
|
|
||
| assert manager.is_enabled() is True | ||
| assert "sandia.gov" in manager.get_domains() | ||
| assert "doe.gov" in manager.get_domains() |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
doe.gov
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| assert manager.is_enabled() is True | ||
| assert "sandia.gov" in manager.get_domains() | ||
| assert "doe.gov" in manager.get_domains() | ||
| assert "example.org" in manager.get_domains() |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High test
example.org
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
🔒 Security Scan ResultsSecurity Scan SummaryScan ResultsPython SAST (Bandit)Recommendations
|
1 similar comment
🔒 Security Scan ResultsSecurity Scan SummaryScan ResultsPython SAST (Bandit)Recommendations
|
- Remove unused Request and TestClient imports from test_domain_whitelist.py - Remove unused List import from domain_whitelist.py - Addresses CodeQL static analysis warnings - All tests passing
🔒 Security Scan ResultsSecurity Scan SummaryScan ResultsPython SAST (Bandit)Recommendations
|
1 similar comment
🔒 Security Scan ResultsSecurity Scan SummaryScan ResultsPython SAST (Bandit)Recommendations
|
confirmed testing that this works with a test config file and the feature turned on. the code ql checks are false positive. @ktpedre This works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a generic email domain whitelist middleware to restrict application access based on user email domains. The feature is controlled by a configuration file (domain-whitelist.json) and a feature flag (FEATURE_DOMAIN_WHITELIST_ENABLED), making it flexible for different deployment scenarios (government labs, corporate, educational institutions). The implementation includes optimized subdomain matching (O(log n) instead of O(n)), comprehensive test coverage, and detailed documentation.
Key changes:
- Generic, configurable domain whitelist middleware (not DOE-specific despite example config)
- Layered configuration system with override support
- Backward compatibility via
FEATURE_DOE_LAB_CHECK_ENABLEDalias - Proper middleware ordering to leverage
request.state.user_emailfrom AuthMiddleware
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
backend/core/domain_whitelist_middleware.py |
New middleware enforcing email domain restrictions with endpoint-specific responses (JSON 403 for API, redirect for UI) |
backend/core/domain_whitelist.py |
Configuration manager with optimized subdomain matching and layered config file search |
backend/modules/config/config_manager.py |
Added feature_domain_whitelist_enabled field with backward-compatible alias |
backend/main.py |
Registered middleware conditionally based on feature flag |
config/defaults/domain-whitelist.json |
Default configuration with 22 DOE/NNSA/national lab domains as example |
backend/tests/test_domain_whitelist.py |
Test suite covering configuration loading, domain matching, and middleware behavior |
docs/admin/domain-whitelist.md |
Comprehensive documentation with examples and troubleshooting guide |
.env.example |
Added FEATURE_DOMAIN_WHITELIST_ENABLED flag documentation |
| def test_invalid_email(self, temp_config): | ||
| """Test handling of invalid email addresses.""" | ||
| manager = DomainWhitelistManager(config_path=temp_config) | ||
|
|
||
| assert manager.is_domain_allowed("notanemail") is False | ||
| assert manager.is_domain_allowed("") is False | ||
| assert manager.is_domain_allowed("no-at-sign.com") is False | ||
|
|
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage: There's no test case for when a user has an email with a subdomain that should be rejected. For example, testing user@mail.gmail.com when only sandia.gov is whitelisted, to ensure subdomain matching only works for whitelisted parent domains, not for non-whitelisted domains.
Consider adding a test case:
def test_subdomain_of_non_whitelisted_domain(self, temp_config):
"""Test that subdomains of non-whitelisted domains are rejected."""
manager = DomainWhitelistManager(config_path=temp_config)
# gmail.com is not whitelisted, so mail.gmail.com should also be rejected
assert manager.is_domain_allowed("user@mail.gmail.com") is False| def test_middleware_with_disallowed_domain(self, temp_config, create_middleware): | ||
| """Test that disallowed domains are blocked.""" | ||
| from starlette.requests import Request | ||
| from starlette.responses import Response | ||
|
|
||
| middleware = create_middleware(temp_config) | ||
|
|
||
| async def call_next(request): | ||
| return Response("OK", status_code=200) | ||
|
|
||
| async def test_request(): | ||
| scope = { | ||
| "type": "http", | ||
| "method": "GET", | ||
| "path": "/api/test", | ||
| "query_string": b"", | ||
| "headers": [], | ||
| "state": {}, | ||
| } | ||
| request = Request(scope) | ||
| request.state.user_email = "test@gmail.com" | ||
|
|
||
| response = await middleware.dispatch(request, call_next) | ||
| assert response.status_code == 403 | ||
|
|
||
| import asyncio | ||
| asyncio.run(test_request()) |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage: The middleware's _unauthorized_response method returns different responses based on whether the path starts with /api/ (JSON 403) or not (redirect 302), but there's no test verifying the redirect behavior for non-API paths.
Consider adding a test case:
def test_middleware_redirect_for_ui_path(self, temp_config, create_middleware):
"""Test that non-API paths get redirected instead of 403."""
from starlette.requests import Request
from starlette.responses import Response, RedirectResponse
middleware = create_middleware(temp_config)
async def call_next(request):
return Response("OK", status_code=200)
async def test_request():
scope = {
"type": "http",
"method": "GET",
"path": "/dashboard", # Non-API path
"query_string": b"",
"headers": [],
"state": {},
}
request = Request(scope)
request.state.user_email = "test@gmail.com"
response = await middleware.dispatch(request, call_next)
assert isinstance(response, RedirectResponse)
assert response.status_code == 302
import asyncio
asyncio.run(test_request())| domains.add(domain_entry.get('domain', '').lower()) | ||
| elif isinstance(domain_entry, str): | ||
| domains.add(domain_entry.lower()) |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: Empty domain strings can be added to the whitelist set. If a domain entry has an empty domain field (line 77), it will add an empty string to the domains set. This could cause unexpected behavior where emails like user@ (malformed) might incorrectly match against an empty domain.
Add validation to skip empty domains:
for domain_entry in config_data.get('domains', []):
if isinstance(domain_entry, dict):
domain = domain_entry.get('domain', '').lower().strip()
if domain: # Only add non-empty domains
domains.add(domain)
elif isinstance(domain_entry, str):
domain = domain_entry.lower().strip()
if domain: # Only add non-empty domains
domains.add(domain)| domains.add(domain_entry.get('domain', '').lower()) | |
| elif isinstance(domain_entry, str): | |
| domains.add(domain_entry.lower()) | |
| domain = domain_entry.get('domain', '').lower().strip() | |
| if domain: | |
| domains.add(domain) | |
| elif isinstance(domain_entry, str): | |
| domain = domain_entry.lower().strip() | |
| if domain: | |
| domains.add(domain) |
| "domain": "pppl.gov", | ||
| "description": "Princeton Plasma Physics Laboratory", | ||
| "category": "National Laboratory" | ||
| }, | ||
| { |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate domain entry: "pppl.gov" appears twice in the domains list (lines 92-95 and 102-105). This will not cause runtime errors since the domains are stored in a set, but creates unnecessary redundancy in the configuration file and could be confusing for users editing it.
Remove one of the duplicate entries.
| "domain": "pppl.gov", | |
| "description": "Princeton Plasma Physics Laboratory", | |
| "category": "National Laboratory" | |
| }, | |
| { |
| """ | ||
| app.add_middleware(SecurityHeadersMiddleware) | ||
| app.add_middleware(RateLimitMiddleware) | ||
| # Domain whitelist check (if enabled) - add before Auth so it runs after |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment "add before Auth so it runs after" is potentially confusing. In FastAPI/Starlette middleware, when you call app.add_middleware(), middlewares are registered in reverse order of execution. The middleware added last runs first in the request processing chain.
In this code:
- SecurityHeadersMiddleware is added first → runs last (modifies response headers)
- RateLimitMiddleware is added second → runs third
- DomainWhitelistMiddleware is added third → runs second
- AuthMiddleware is added last → runs first
So DomainWhitelistMiddleware actually runs after AuthMiddleware in the request flow, which is correct (it needs request.state.user_email set by AuthMiddleware). However, the comment says "add before Auth so it runs after" which might be confusing since we're adding it before the AuthMiddleware registration.
Consider rewording to: "Add DomainWhitelistMiddleware before AuthMiddleware registration (executes after AuthMiddleware in request flow)" for clarity.
| # Domain whitelist check (if enabled) - add before Auth so it runs after | |
| # Domain whitelist check (if enabled) - add before AuthMiddleware registration (executes after AuthMiddleware in request flow) |
| # Check if domain is allowed | ||
| if not self.whitelist_manager.is_domain_allowed(email): | ||
| domain = email.split("@", 1)[1].lower() | ||
| logger.warning(f"Domain whitelist check failed: unauthorized domain {domain}") |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Potential security information disclosure: The middleware logs the rejected domain at WARNING level when a domain is not whitelisted. This could expose information about which domains are attempting to access the system, which might be sensitive in some security contexts.
Consider either:
- Lowering this to DEBUG level to reduce exposure in production logs
- Using a more generic log message without the specific domain
- Ensuring logs are properly secured if this information is intentionally captured for security monitoring
Example:
logger.debug(f"Domain whitelist check failed: unauthorized domain {domain}")
# or
logger.warning("Domain whitelist check failed: unauthorized domain")| logger.warning(f"Domain whitelist check failed: unauthorized domain {domain}") | |
| logger.warning("Domain whitelist check failed: unauthorized domain") |

Implements a generic middleware to restrict access based on email domain whitelist, controlled by configuration file and
FEATURE_DOMAIN_WHITELIST_ENABLEDflag. DOE/NNSA/national lab domains provided as default example.Implementation
Middleware:
backend/core/domain_whitelist_middleware.pyuser@mail.sandia.gov)/api/healthand auth redirect endpointsConfiguration Manager:
backend/core/domain_whitelist.pyconfig/defaults/domain-whitelist.jsonorconfig/overrides/domain-whitelist.jsonenabledfield + environment variableConfiguration Files:
config/defaults/domain-whitelist.json- Default config with DOE labs as example (22 domains)config/overrides/domain-whitelist.json- User customization (takes precedence)config/defaults/domain-whitelist-example.json- Template for custom configsconfig/defaults/DOMAIN_WHITELIST_README.md- Complete documentationFeature flag:
FEATURE_DOMAIN_WHITELIST_ENABLEDinAppSettingsfalse.env.exampleFEATURE_DOE_LAB_CHECK_ENABLEDstill worksIntegration: Added to middleware stack in
main.pyAuthMiddlewarein registration order (runs after in request flow)Usage
Enable with default DOE configuration
Custom domain configuration
Create
config/overrides/domain-whitelist.json:{ "version": "1.0", "enabled": true, "domains": [ {"domain": "mycompany.com", "description": "My Company"}, {"domain": "partner.org", "description": "Partner Organization"} ], "subdomain_matching": true }Configuration Format
{ "version": "1.0", "description": "Domain whitelist configuration", "enabled": true, "domains": [ { "domain": "example.com", "description": "Example Corporation", "category": "Enterprise" } ], "subdomain_matching": true }Test Coverage
8 tests covering:
Performance
mail.dept.sandia.govchecks 2 parent levels instead of iterating 22 domainsDocumentation
Complete documentation available in
config/defaults/DOMAIN_WHITELIST_README.md:Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.