-
Notifications
You must be signed in to change notification settings - Fork 607
Batch User Model Loading Queries. 32% speedup #2361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 21566888187Details
💛 - Coveralls |
| 'web_authn_credential', f.web_authn_credential | ||
| )) from ` + Factor{}.TableName() + ` f where f.user_id = u.id), '[]') as factors_json | ||
| from ` + User{}.TableName() + ` u | ||
| where ` + query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 Severity: HIGH
SQL Injection Risk via String Concatenation: The query parameter is concatenated directly into the SQL string before being passed to RawQuery. While current callers use hardcoded strings, this pattern is dangerous because it breaks the parameterization contract. The old code used tx.Eager().Q().Where(query, args...) which properly handled parameterization through the ORM. Future developers might pass dynamically constructed queries, creating SQL injection vectors.
Helpful? Add 👍 / 👎
💡 Fix Suggestion
Suggestion: This SQL injection risk requires an architectural change to the findUser function. Consider one of these approaches:
-
Use parameterized WHERE clause builder: Instead of accepting a raw
querystring, accept structured parameters (e.g., field names and operators) and build the WHERE clause programmatically with proper escaping. -
Implement query validation/whitelisting: Add a whitelist of allowed query patterns at the start of
findUserto ensure only safe, hardcoded queries are accepted. Reject any query that doesn't match the whitelist. -
Revert to ORM-based approach: Consider reverting to the original
tx.Eager().Q().Where(query, args...)pattern which properly handles parameterization through the ORM, accepting the performance trade-off for better security. -
Create specific finder methods: Instead of a generic
findUserhelper, create specific methods (findUserByEmail, findUserByPhone, etc.) that construct their own safe SQL queries, eliminating the need to pass query strings as parameters.
The current implementation breaks the parameterization contract by concatenating user-provided strings directly into SQL, even though current callers use hardcoded strings. This creates a dangerous pattern that future developers might misuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i don't love passing raw sql around, but that seems to be the pattern. any suggestions would be great. This could easily be a pair of uuids instead but I don't want to change the functions interface in this PR
| 'id', f.id, | ||
| 'user_id', f.user_id, | ||
| 'created_at', f.created_at, | ||
| 'updated_at', f.updated_at, | ||
| 'status', f.status, | ||
| 'friendly_name', f.friendly_name, | ||
| 'factor_type', f.factor_type, | ||
| 'secret', f.secret, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Severity: CRITICAL
MFA Secret Exposure - The TOTP secret field is included in JSON aggregation, bypassing the json:"-" protection in the Factor struct. This exposes the shared secret used to generate TOTP codes. The original Eager loading respected this tag and excluded secrets. Remove this line and web_authn_credential (line 654) to prevent exposure in API responses, logs, or caches.
Helpful? Add 👍 / 👎
💡 Fix Suggestion
Suggestion: Remove the 'secret' field (line 651) and 'web_authn_credential' field (line 654) from the JSON aggregation query. These fields are marked with json:"-" tags in the Factor struct to prevent exposure in API responses. Including them in the SQL JSON aggregation bypasses this protection and exposes sensitive MFA secrets that could be logged, cached, or transmitted in API responses. The corrected query should only include non-sensitive Factor fields that are safe to serialize.
⚠️ Experimental Feature: This code suggestion is automatically generated. Please review carefully.
| 'id', f.id, | |
| 'user_id', f.user_id, | |
| 'created_at', f.created_at, | |
| 'updated_at', f.updated_at, | |
| 'status', f.status, | |
| 'friendly_name', f.friendly_name, | |
| 'factor_type', f.factor_type, | |
| 'secret', f.secret, | |
| 'id', f.id, | |
| 'user_id', f.user_id, | |
| 'created_at', f.created_at, | |
| 'updated_at', f.updated_at, | |
| 'status', f.status, | |
| 'friendly_name', f.friendly_name, | |
| 'factor_type', f.factor_type, | |
| 'phone', f.phone, | |
| 'last_challenged_at', f.last_challenged_at | |
| )) from ` + Factor{}.TableName() + ` f where f.user_id = u.id), '[]') as factors_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was already being loaded when we used *. Its not a change in behavior. I'm not clear on the implications of removing it but feel free
a517e59 to
1e078e4
Compare
…gregation Reduces FindUserWithRefreshToken from 3 separate queries (user + identities + factors) to a single query using json_agg subqueries. This optimization impacts the /token endpoint (~45% of total traffic) and /user endpoint by eliminating 2 database round-trips per call. Since both of those endpoints call FindUserWithRefreshToken twice it removes 4 database roundtrips per request. Performance impact: - Query execution: 399µs → 209µs (47.6% faster) - Memory allocations: 18.4KB → 7.1KB (61% reduction) - Allocation count: 299 → 117 allocs (61% reduction) - /token throughput: +20.3% (55.18 vs 45.87 req/s) in local testing - /token latency: -16.8% (181ms vs 218ms) Replaces Pop ORM .Eager() pattern with explicit SQL column enumeration and coalesce(json_agg()) for related entities. No changes to User struct or API.
1e078e4 to
b407d87
Compare
What kind of change does this PR introduce?
Batches User Model queries to reduce from 3:1
No worries if you don't want to merge this, I was just playing around to see how hard it'd be to make the endpoints less chatty in case we want to separate Auth from the DB instance at some point.
Context
A bunch of frequently hit endpoints like
call
findUser, which goes towhich eagerly loads the user with relations like this:
Authenticated endpoints run those 3 queries twice. Once in non-transactional pre-flight checks, and again transactionally for a total of 6 round trips
This PR combines the 3 queries into 1 by returning JSON for
identitiesandfactors.Here the results
Before (Eager loading - 3 queries)
After (JSON aggregation - 1 query)
I did not include the benchmarking script for the users model in this PR. If that's a best practice I can add it.
Performance Difference
Query Plan
The biggest danger of updating the user loading query is if the query plan drops off index at high scale. To test that I created 1M users, 1M identities, and 250k MFA factors and checked the query plan with
explain analyzeThe original pattern of 3 queries had
and the updated query was
Which shows that all joins and conditions are on-index at scale.
Total time spent in database is reduced 5% but the real benefit is reducing the number of roundtrips and contention for the connection pool
IMPORTANT
I have never contributed to Auth before so please double check everything