Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: adds configurable query parameters to tile endpoints #795

Closed

Conversation

dechristopher
Copy link
Contributor

Hi all. I recently stumbled upon a use-case for having query parameters present in URLs. Being able to configure Tegola's queries on the fly with optional URL query parameters for filtering down data can simplify configuration for large datasets that would otherwise have many layers defined manually.

Defining a query parameter is as simple as this:

[[maps.params]]
name = "param"
token = "!PARAM!"
default = "7"

This will configure a parameter named param with the SQL replacement token !PARAM!. The default key is optional and allows you to specify a default value for the parameter if none is provided. Without a default specified, the replacer will default to substituting an empty string.

The endpoint then becomes queryable at:

http://server:8080/maps/example/{z}/{x}/{y}.pbf?param=123

Currently, to prevent SQL injection, parameter values are limited to values matching the following regular expression. This is to limit control characters from being injected via query parameters:

[a-zA-Z0-9,._-]+

This obviously needs more work and I am open to feedback regarding security. I've also thought of limiting the scope of query parameters to integers only since most of the use cases I've found for this feature involve numeric ID matching in queries.

Any and all feedback is welcome. I hope others can find value in this contribution.

@ARolek
Copy link
Member

ARolek commented Aug 4, 2021

@dechristopher awesome! This feature has been requested before so it's great you tackled it! I should be able to get a review in the next week.

cc @gdey @ear7h

@dechristopher
Copy link
Contributor Author

@dechristopher awesome! This feature has been requested before so it's great you tackled it! I should be able to get a review in the next week.

cc @gdey @ear7h

Any chance to review yet?

@gdey
Copy link
Member

gdey commented Aug 18, 2021

I'll look at this Friday. @dechristopher do you know why the test is failing? Have not dug into the logs. But a quick scan is complaining about bbox. I know you are dynamically changing that in the SQL.

@dechristopher
Copy link
Contributor Author

I'll look at this Friday. @dechristopher do you know why the test is failing? Have not dug into the logs. But a quick scan is complaining about bbox. I know you are dynamically changing that in the SQL.

Unsure why the tests are failing on the runner. They run cleanly on my machine. I'll investigate and see if I can come up with a fix.

@ear7h
Copy link
Contributor

ear7h commented Aug 19, 2021

This is a really cool feature! Thanks for getting it started!

WRT to input validation I think the best approach is to use the "placeholder" mechanism offered by pgx (it's described briefly in pgx.Query). I think the biggest difficulty with this is getting the numbering right, and keeping track of it. I've come up with a function that could do that, but I'm not sure how it would fit with that part of the code base:

https://play.golang.org/p/Dk6SWSf2f_N

This is also missing a few things like case insensitivity and special handling of the builtin tokens. The later could be implemented by writing to the queryArgs map after writing the user provided args (so any colliding user provided args get overwritten).

Copy link
Member

@ARolek ARolek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition, so thank you for taking the time to submit this PR as well as clean up some parts of the codebase! I added a few comments inline, most are fairly minor. The main issue I would like to discuss is the use of context for passing around the params. I think it might be better to update the feature provider to support a new parameter for filters rather than use context.

// replaceParams substitutes configured query parameter tokens for their values
// within the provided SQL string
func replaceParams(ctx context.Context, sql string) (string, error) {
if ctx.Value("params") == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make the value "params" a const since you're referencing it multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the commit, I'm not sure if passing the params around via the context is the best approach. I know this allows us to avoid updating the provider interfaces, but I'm generally not a fan of stashing values into context.

cc: @gdey @ear7h

}

var params map[string]string
params = ctx.Value("params").(map[string]string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an !ok check when doing an assertion.


for token, val := range params {
if val != "" && !paramValRe.Match([]byte(val)) {
return "", fmt.Errorf("Param value for token %s contains illegal characters: %s ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, you're changing the first letter of the error text to lowercase in other parts of the codebase, but using uppercase here. 😜

// check for query parameters and populate param map with their values
if req.Atlas.HasParams(req.mapName) {
params = make(map[string]string)
err = r.ParseForm()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you went with r.ParseForm() instead of using r.URL.Query()?

@ARolek
Copy link
Member

ARolek commented Sep 7, 2021

Another consideration with this PR is going to be the tile cache. If the query params can produce different features based on the query strings, we're going to need to revisit the tile cache. Alternatively, we can say if filter params are used, then no caching is allowed right now.

@kszafran
Copy link

Thanks for this PR! We need something similar and I thought I'll have to contribute, but I'm glad someone has already started the work! Our use case is a bit different, since we use tegola like a library, and set up our own routing etc. And the parameters we need to pass will not be coming from query parameters but from a JWT token.

Few things I'd suggest:

  1. Consider an alternative/additional method of passing these parameters. In my case I could cram them into a Request.URL with some middleware, but perhaps there's a better way (e.g. not adding "params" to the context if they are already there or supporting some additional value, like "external_params").
  2. The "SQL injection" protection is pretty weak and very limiting on what values can be passed. The only correct way to defend again SQLi is to use parameters, like $1, $2 etc. and pass user's input as query arguments and not inside the SQL string. To make this easier and more flexible, we could use ? as placeholders and only replace them with Postgres-style numbered placeholders$1 before executing the query. That how e.g. sqrl works. This way the order of parameters in the configurable layer SQL will not matter.
  3. Tokens like !BBOX! are not individual values, but whole expressions. I think we should treat parameters the same way. For example:
[[maps.params]]
name = "param"
token = "!PARAM!"
query = "some_field @> ARRAY[?]"
default = "7"

This way more complicated expressions could be shared by multiple layers. And we could make query be ? by default, so that if you don't specify query it will work like the currently proposed version.

@ARolek Let me know if I can help somehow. I'd like to see this PR improved and merged.

@dechristopher
Copy link
Contributor Author

@kszafran this is all great feedback and I'm happy to see others interested in getting this feature wrapped up and merged. Admittedly, I've not had enough time to sit down and address all of the review with how busy my work schedule has been. If you'd like to see this through with your improvements, then by all means.

Also, I wholeheartedly agree with your third point. In my testing, I was required to modify the queries a bit to support the params and with this, I think it'd be easier to generalize and pass tests. Something that needs to be taken into account is the batch of queries that Tegola performs at startup to determine the geometry type from the table in each configured layer. With manually specified types this isn't a problem, but with the introspection queries I've seen failures if the queries aren't built in a way that they can be valid without having an included parameter passed in. This is why I've added the default field. There's definitely a better way to handle this via treating params as expressions.

@ARolek ARolek changed the base branch from v0.14.x to v0.15.x November 5, 2021 17:34
@ARolek
Copy link
Member

ARolek commented Nov 5, 2021

@dechristopher and @kszafran apologies for the slow response. I missed the comments you two dropped in here. I'm interested in bringing this feature to life as well, but I posted some concerns with the current implementation. It might be easier to work on the design in real-time via slack. We can keep the discussion here as well, but I don't think we should be using context.Context to pass around the filters. I think we should enhance the Tilter interface:

// Tiler is a Layers that allows one to encode features in that layer
type Tiler interface {
	Layerer

	// TileFeature will stream decoded features to the callback function fn
	// if fn returns ErrCanceled, the TileFeatures method should stop processing
	TileFeatures(ctx context.Context, layer string, t Tile, fn func(f *Feature) error) error
}

We can add another parameter to TileFeatures for filters. I'm not sure what the design for Filters should be yet, but I think this thread already has some ideas brewing. Ideally we can figure out a design that allows us to filter not only SQL providers but other future providers too (i.e. GeoJSON). We don't need to implement the filters now, but I would like to consider them.

@ARolek ARolek changed the base branch from v0.15.x to master May 18, 2022 17:39
@bemyak
Copy link
Contributor

bemyak commented Jun 2, 2022

Hi!

I'm starting working to push this feature forward. The main issue currently is the SQL injection vulnerability. We definitely need to use prepared statements, i.e., replace all user-defined parameters with $1, $2 and so on, and pass the values as arguments.

Unfortunately, this means that we need to have type information available for parameters.

So far the idea is to have parameters configured like this:

[[maps.params]]
name = "param2"
token = "!PARAM2!"
# only very simply types are suppored: `int`, `string`, `float`, `bool`
type = "int"
# `?` will be replaced with the actual value
# if `sql` is not specified, it is considered to be `?`
sql = "AND ANSWER = ?"
# if the parameter value is missing from the HTTP query, this value will be used
# `default_value` is an optional parameter. If it is not defined, the parameter is considered as required and cliet will get 400 error
default_value = "42"
# instead of `default_value`, `default_sql` can be defined, e.g. to omit param query entirely:
#default_sql = ""
# defining both `default_value` and `default_sql` is forbidden

I will be really glad to hear any feedback or thoughts on this.

@kszafran
Copy link

kszafran commented Jun 2, 2022

I'm working with @bemyak on preparing the spec for this. Another approach that I would propose is this:

[[maps.params]]
name = "param2"
token = "!PARAM2!"
type = "int"
# if "sql" is not specified, it is `?` by default, so just the param value and not a whole expression
sql = "AND answer = ?"
# if the parameter is not specified by the caller, the "default" value will be used instead
# if "default" is not defined, the "!PARAM2!" token will be replaced with an empty string instead of "sql"
default = "42"
# "required" could be used in cases where we don't want to specify a "default" value, but want to make the parameter mandatory
required = "true"

The main difference compared to @bemyak proposal is how we behave when there's no default and a caller does not specify the parameter. There are a few different cases that we would like to handle:

  1. parameter is required
  2. parameter is optional and we use a default value if it's not specified
  3. parameter is optional and we skip the whole condition if it's not specified

Those can be achieved with both designs. I just think that having only default plus replacing the token with an empty string would be simpler than having both default_value and default_sql`. Here's how you could achieve the three use cases:

1:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... AND answer = !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
required = "true"

2:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... AND answer = !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
default = "42"

3:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
sql = "AND answer = ?"

@ARolek
Copy link
Member

ARolek commented Jun 17, 2022

@bemyak and @kszafran apologies for the slow response and thanks for the effort on this issue.

Additional context to this discussion:

Additional considerations:

  • When dynamic tile filtering is implemented, we need to think about how the tile cache is supposed to work. Currently the tile cache is either enabled or disabled. This is not configured on a per map basis. It seems to me that enabling tile filtering will require the tile cache to be off. If this is the case, I think we need to investigate if we want to support cache configs on a per map basis so we can allow for some maps to leverage the cache but others to have dynamic filtering.

  • SQL injections are a very real problem with this feature. I'm trying to figure out where the responsibility should lie for sanitizing the input. Tegola could be responsible for this as you're proposing. Alternatively we could use Postgres functions and pass on the responsibility to the database function. This might be more prone to error though as it's pushing the security responsibility to the implementer, but it would simplify the implementation in tegola. We might just be able to get away with a "query string whitelist" in the config that would then look for query strings and then turn them into tokens. Here's an example of calling a database function from tegola:

SELECT * FROM my_tile_func(!ZOOM!, !BBOX!, !FILTER_1!, !FILTER_2!)

I have used this pattern in tegola to push complex query logic to the database and it's quite nice.

Let's chat in real time:

This feature is one I would like to get right. I see the value, and I'm happy to support the effort. I do think it deserves some strong vetting though as the feature can be prone to abuse. Maybe we should try to chat through some of this in real time. You can find us in a couple of Slack channels:

@kszafran
Copy link

@ARolek Thanks for the comments! @bemyak is actively working on a PR. It's almost ready and we should soon be able to create a PR to the upstream. If you want to take a look already, check out SharperShape#6.

How about this - once we're ready, we'll create a PR to the upstream, and then we could have a call to discuss it (e.g. on Google Meet)? Slack is fine, too. Our implementation solves SQL injection by using query arguments instead of putting parameter values into the query string itself. For now, caching is disabled for parameterized maps, but I imagine that could be improved in the future.

@ARolek
Copy link
Member

ARolek commented Jun 17, 2022

How about this - once we're ready, we'll create a PR to the upstream, and then we could have a call to discuss it (e.g. on Google Meet)?

Love it. Once we get there we can see who can all join the discussion. Thanks for working on this contribution.

@gdey
Copy link
Member

gdey commented Aug 30, 2022

#867 supersedes this PR.

@gdey gdey closed this Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants