-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hashed settings to cache keys #59
Conversation
pkg/sql/datasource/datasource.go
Outdated
@@ -46,13 +46,13 @@ func (s *AWSDatasource) createDB(id int64, args sqlds.Options, settings models.S | |||
return db, nil | |||
} | |||
|
|||
func (d *AWSDatasource) storeAPI(id int64, args sqlds.Options, dsAPI api.AWSAPI) { | |||
key := connectionKey(id, args) | |||
func (d *AWSDatasource) storeAPI(id int64, args sqlds.Options, settings models.Settings, dsAPI api.AWSAPI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the changes in this package. If the settings
change, the session will change so this will be a new connection. But I may be wrong, it's worth to double check it, you can do this changing the access key and trying to reload databases, for example. The second connection, using the wrong key, should fail (and not reuse the previous session connection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @andresmgot I'm not sure I follow what you mean when you say you don't think I need the changes in this package? Do you mean you don't think the changes are necessary in datasource.go but they are necessary in sessions.go ?
I believe we need the changes here in addition to session, because when we first create an api instance we load it with a particular session and store it. While we no longer use that session directly, the stored api instance holds onto the out of date session. When I remove my changes from datasource.go I see the bug reappear.
Let me know if I'm misunderstanding tho!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you don't think the changes are necessary in datasource.go but they are necessary in sessions.go ?
yes, that was what I meant.
the stored api instance holds onto the out of date session.
That is what I don't fully understand why 🤔 What I thought should be happening, whenever you click "Save" in the data source configuration is:
- Call
sqlds
NewDatasource
function - That calls
Connect
in the data source code. - That calls
GetDB
from this package - That calls
createAPI
which finally calls theNew
method of the data source API package. Which creates a AWS session and stores the result, overwritting whatever API instance was there already.
If that is not happening, I may have gotten that flow wrong but it feels weird that this package cares about AWS settings, since that should have been solved in a lower level. Can you maybe debug what's happening here? This code is a bit complicated but it's important to understand it well since it can cause many errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm a really good question, that I'm not totally sure I know the answer just yet but I did try looking yesterday. I'm going to keep digging today if I can since it feels like a worthy thing to understand.
Here's what I've found so far, just in case it's sort of helpful to read my debugging process
- I can tell you that without this code the error still persists. But to your point maybe that means we aren't overwriting the original value properly or something, and there's a better way to solve this problem.
- I can tell you that we do call Connect/GetDb/createAPI in an effort to overwrite the api with the right settings, and that we do this before calling getAPI, so it's not an issue of a race condition I don't think.
- I can tell you that we use this sync map store method: https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/sync/map.go;l=137 to store the new api
- I find this line in particular very interesting in that source code:
// tryStore stores a value if the entry has not been expunged.
//
// If the entry is expunged, tryStore returns false and leaves the entry
// unchanged.
I don't really know what makes something expunged yet, but I do sort of wonder if its possible that we are calling store but not actually storing anything and leaving the original entry somehow? This feels like a wild theory but I suppose it's possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"expunge" is a new word for me but it seems that only applies when you remove stuff from the map (which we don't) but in the debugger you should be able to see if the entry changes in memory. It's more likely I did something wrong there or it's not calling what we think it's calling.
If you don't spot the issue feel free to shoot me a meeting for tomorrow and we can take a look at this together, I am curious what's going on 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks andrés, I think I'm ok, I'm just writing out thoughts for myself at this point sorry lol, but if you have opinions or corrections feel free to chime in.
Learned a few more things today, logging it so I don't forget for tomorrow.
- definitely was on a totally wrong track with that expunge stuff lol 😅
- our cacheKey was hashing a settings map that had hidden fields, so we weren't hashing secret key for example. I have fixed that in a local pr.
- but interesting this still doesn't work without the hashed connectionkey (when it totally should), this is because when we store the api when we update the datasource, we store it with a connectionKey that follows this format:
id-map-of-args-that-could-affect-the-api-like-region
and when we update the datasource and call Connect we do not send along a region as a query arg (which is fine and fair imo), and so when we create the api we store that with a connectionKey of
id-emptymap[]
and then when we query for catalogs right after we try to find the api we just saved but we make this call with a region we search with the default region so we use a connection key like this:
id-map-with-region
which now contains out of date information
So I think the thing we need to do is either update Connect (in athena and other datasources) to somehow get the default region and use that to pass along to getDB, or maybe alter create/storeAPI (in aws-sdk) to ensure that if we're trying to store with an empty region we use the default.
Going to give that a try tomorrow. Sorry I'm very slow at debugging this but it feels worth struggling through hahaha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, that makes more sense! It works when you include the settings in the key because getApi was failing to get one instance from the cache and created a new instance. The default
thing has given me soo many headaches since the beginning 😓
So I think the thing we need to do is either update Connect (in athena and other datasources) to somehow get the default region and use that to pass along to getDB, or maybe alter create/storeAPI (in aws-sdk) to ensure that if we're trying to store with an empty region we use the default.
mmh, it seems that there are two issues here. In the Connect
method we are using args = nil
so, as you say, the map is empty but for all the resources, we send as "args" elements that may not be relevant to the connection. For example, for retrieving Columns we are using args = {"region": "default", "catalog": "default", "database": "default", "table": "default"}
, which are all considered "connection args" since we don't make a distinction in the backend. This is probably causing that we have many copies of the same connection.
Since for Athena, the only argument that really affects the connection is the region, I would modify the Connect method to fill the args
with {"region": "default"}
if it's empty and in the rest of methods (catalogs, databases, tables...) strip the args to only setting the region.
In the case of Redshift, we are not even allowing to change the region so I would just use nil
(or empty) when getting the API instances in all the methods.
BTW, note that we are literally using default
as the value (and that will be resolved afterwards), you don't need to worry about what's actually the default value. It's done here.
In any case, those changes are for the data sources code, not something that you need to change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andresmgot good to know that this is probably not an issue with say redshift and is just isolated to athena. I won't put the fix in this repo then! I did try your method of passing region: default
in Connect and got back this error from our sdk. We could try to support it here, but as you say it sounds like this is for now isolated to athena?
So I kept the fix in Connect, but manually grabbed the default region from the config (which hasn't been set yet) and involved unmarshalling twice which is odd, but I suppose ok? You can see it here grafana/athena-datasource#168
Let me know if this seems like a good path forward to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replied in your PR 👍
that this is probably not an issue with say redshift
Sorry no, this is probably an issue in Redshift too. My only point was that in Redshift we don't allow to change the region from the QueryEditor so we don't need to use different args
to keep track of different API connections. Right now, it is wrongly using other parameters (like the db name) to store different copies of the same (or the wrong) thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ok, I think that does makes sense, thanks for explaining! Let me know if you think this seems good to go now!
Yes, this should be fine. For example, see this discussion.
So we are caching two different things here. The first one is caching the AWS API session, the second is caching the connectioln to the resource, for example, the connection to a Redshift database. We should not open new connections for every request since it's less performant and we can reach simultaneous connection limits.
We are actually expiring session clients. See: grafana-aws-sdk/pkg/awsds/sessions.go Line 172 in df7d5c7
Since we are expiring the session, the resource connection should change as well. It's true though that if a session is never used again, there will be some garbage left there in memory but I do not think it's something that we need to worry about. |
Tested it manually in Athena (after using aws-sdk-react 0.0.37) and it is working as expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for your patience in this PR :)
pkg/awsds/sessions.go
Outdated
@@ -185,7 +188,7 @@ func (sc *SessionCache) GetSession(c SessionConfig) (*session.Session, error) { | |||
|
|||
var regionCfg *aws.Config | |||
if c.Settings.Region == defaultRegion { | |||
backend.Logger.Warn("Region is set to \"default\", which is unsupported") | |||
backend.Logger.Warn("Region is set to \"default\", which is unsupported, please use sqlModels.DefaultKey") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a different thing (and not specific to sql
) so I would leave the message as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah fair point
relates to grafana/athena-datasource#144
The issue:
Proposed solution:
Open questions/things I'd love feedback on: