-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/feat: use joins instead of n+1/opening multiple connections #1297
Conversation
internal/storage/sql/common/flag.go
Outdated
// append flag to output results if we haven't seen it yet | ||
if _, ok := uniqueFlags[f.Key]; !ok { | ||
flags = append(flags, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you could actually do all of this without the map. Given we ensure the return rows are ordered by Flag. This might be the natural order of the join. However, we could ensure it in the ORDER BY
clause. e.g. ORDER BY f.created_at, f.key DESC
.
Once we have that invariant, we can simply do the following two steps:
- Compare the newly scanned flag key to the last seen flag key in the result so far.
- If the flag keys differ, then append the newly scanned flag to the results.
append
the currently scanned variant to the last flag in the results seen so far.
There is another approach to consider. We're doing the classic Rails usually lets you pick between JOIN ( The An ORM would likely just hide the implementation detail of this. Which might be favourable to be fair. |
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1297 +/- ##
==========================================
+ Coverage 80.09% 80.37% +0.28%
==========================================
Files 43 43
Lines 3300 3307 +7
==========================================
+ Hits 2643 2658 +15
+ Misses 527 518 -9
- Partials 130 131 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
7e34f9b
to
d8e27e2
Compare
Update: we chose to go with I continued the approach to the ListSegments and ListRules storage methods. We are also now able to compare benchmarks added in #1307 (now on main) to those from this PR to see how using JOINS instead of multiple queries affects performance/memory allocations. Here are the results: Main Branch (no JOINS)
This Branch (Joins)
If I'm reading it correctly, it looks like this approach results in anywhere from a 2-10x speed improvement AND uses anywhere from half to a tenth of the memory allocations!! And also it fixes the SQLite contention issue, as we are now setting the Seems like a win/win..win. |
Note: we also opted to continue using the map approach, we can potentially lessen the number of allocations by going with the approach laid out by @GeorgeMac here: #1297 (comment) in the future. Now that we have these benchmarks in place it should be easy to see if we can eek out any more performance here. We also will create an internal issue to look into SQLx or alternative approaches in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. There is one suggested quality of life change to config.
Co-authored-by: George <me@georgemac.com>
I moved the log warning to |
Overview
This is kind of a doozy so bear with me. 🐻
This came about while I was looking at our StackHawk scans in reference to FLI-177.
Basically, StackHawk issues a bunch of concurrent queries to probe our app for security concerns, SQL injections, etc.
When I was running this locally (using the SQLite DB for Flipt), I kept running into these 500 errors:
ERROR finished unary call with code Internal {"server": "grpc", "grpc.start_time": "2023-01-27T15:40:34-05:00", "system": "grpc", "span.kind": "server", "grpc.service": "flipt.Flipt", "grpc.method": "GetRule", "peer.address": "127.0.0.1:63802", "error": "rpc error: code = Internal desc = database is locked", "grpc.code": "Internal", "grpc.time_ms": 5191.929}
Being the relevant bit.
A trip to the DB
In looking into this further and doing some googling, I realized that this is mainly a problem with SQLite and opening multiple connections simultaneously, as SQLite is really just a file, so it makes sense that only 1 connection should hold the lock on the file while doing its querying.
This also led me to mattn/go-sqlite3#274, where they basically say the same thing, and recommend:
row.Close()
to return the connection back to the pooldb.SetMaxOpenConnections(1)
to enforce this constraint of 1 connectionYou can see the deadlock/timeout happening now in the unit tests since I followed this advice to set max open connections to
1
:Going through our storage code
This led me to look at our storage (SQL) layer, to ensure we were doing (1), which we are, but then I realized there are several cases where we are trying to open issue multiple queries at once ourselves. Mainly when we try to populate a 1-many relationship such as
flag HAS MANY variants
like here and here (GetFlag and ListFlags respectively).Here and in several other places, we are issuing 1 query to get the parent (flag), then calling this method (variants) to issue a query to get all the children (variants) for that flag.
The key part is that we don't actually close the flag
Row
before issuing the second query to get all the variants, so we are basically creating a deadlock in the case of SQLite.While we could work around this for
GetFlag
, by first getting the flag row, then closing the row, then issuing the query to get all variants, this approach won't work as easily forListFlags
as we are required to loop through each row to hydrate the flags.Proposal(s)
1. Joins The Old Fashioned Way
The correct way to do this kind of thing in SQL is to just use JOINs and get all the flags and their variants at once. This has the benefit of not locking the db in our case and also results in fewer queries overall.
This is what I have prototyped in this PR, starting with the
ListFlags
because it is the most 'difficult'.Unfortunately, Go does not make it easy to work with JOINS using the stdlib or even using
masterminds/squirrel
which is the SQLbuilder library we are using here. This is why I had to add themap
in this PR to check to see if we have 'seen' this flag before adding it to the results.Proposal 1 is basically to continue this pattern throughout the entire storage layer for all parent->child relationships, so that we can do things the 'right way' and stop creating unnecessary queries and potential performance degradations/errors in highly concurrent scenarios.
2. Introduce a Data Model layer + use SQLx or SQLC
This option is basically introducing data models that map to our database instead of passing the proto models all the way down + introducing a library to help with this parent <-> child relationship, removing the need to do this mapping ourselves.
SQLx
SQLx supports parent-child relationships in their StructScan helper, it would just require us to define each of these data structs and add the appropriate
db:
struct tags to each field.The benefits of this approach seem to me to be:
StructScan
instead of scanning the row results into the data types ourselvesSQLC
SQLC is similar to the approach that SQLx would result in, however, the main difference is that it generates the data models and the query/mutation methods themselves! Ex: https://docs.sqlc.dev/en/latest/howto/select.html
The benefits of this approach seem to me to be:
The downside of SQLC however is that it does not have the same database compatibility that the Go SQL package/SQLx has.
Currently, it only supports:
While these are the only DBs we need now, it could potentially limit us if we wanted to support another SQL db down the road (although I'm not sure if we'll ever want to actually do this).
Tl;dr
JOIN
s instead of multiple queries to populate parent child data