Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex filters #37

Closed
andyb-esdm opened this issue Jan 29, 2019 · 40 comments
Closed

Complex filters #37

andyb-esdm opened this issue Jan 29, 2019 · 40 comments

Comments

@andyb-esdm
Copy link
Collaborator

These are filters that may not be filtering directly on the WMS attributes but via joins to related tables in the database e.g. show habitats that have these species associated with them.
Further specification and design needed here. Current suggested approach is to leverage GeoServer sql views, which have parameters that can be substituted at runtime. However, these are essentially different layers so use of a proxy to make it 'appear' that the unfiltered layer and associated filtered layers are all the same layer.

Need to agree that 'pre-cooked' queries are most viable. We are reluctant to expose users to 'raw' CQL for example.

Investigation and specification will need to happen during Sprint 2 but implementation will be for Sprint 3.

@JamesPe
Copy link
Collaborator

JamesPe commented Jan 31, 2019

Some preliminary thoughts / investigations ahead of call to discuss. (Note links are onto our internal dev copy of GeoServer so won't be generally visible)

Using SQL filters in Geoserver

Nice overview of the process here
[https://docs.geoserver.org/stable/en/user/data/database/sqlview.html ]

You can have pretty much any SQL statement in there you want and can add parameters that are substituted in at runtime (having been validated using regex to protect against SQL injection etc)

Default “everything” query that already exists
http://192.168.54.4:8080/geoserver/emodnet/wms?service=WMS&version=1.1.0&request=GetMap&layers=emodnet:eusm2016_200&styles=&bbox=-4047295.25,2907255.5,4693708.0,1.272164E7&width=684&height=768&srs=EPSG:3857&format=image/jpeg
Same layer but exposed via custom view called “Test”
Test layer is a view defined as:
SELECT * FROM public.eusm2016_200 WHERE hab_type='%code%'

And can be called:
http://192.168.54.4:8080/geoserver/emodnet/wms?service=WMS&version=1.1.0&request=GetMap&layers=emodnet:Test&styles=&bbox=-3485692.38922186,3206003.73135211,3491129.57545939,1.1482529467166E7&width=647&height=768&srs=EPSG:3857&format=image/jpeg&viewparams=code:A5.15

Advantages:
• Allows JNCC to lever inbuilt geoserver functionality
• Would build a specific parameterised view for a given query
• Can control what users can query on to protect the server performance
• Should be quite performant if views are written well – especially when joining

So development would be:
• Have a table that listed these configured geoserver views – that also listed the parameters to collect
• Have a form that presented these to the user
• Fires the WMS query off to geoserver with the parameters on the URL
• I think for this “advanced” querying we should avoid offering options to users of drop down codes etc to pick. That could rapidly get complex – that exists for basic filtering.
• We also need to consider what is meant by permalinks – does it include active filter? Actually this makes it easy to allow that I think.

@JamesPe
Copy link
Collaborator

JamesPe commented Feb 13, 2019

Above approach was agreed with JNCC

@JamesPe
Copy link
Collaborator

JamesPe commented Feb 27, 2019

A little more detail on the above now we are implementing.

An example SQL filter layer could be:
SELECT * FROM public.eusm2016_200 WHERE hab_type IN (%code%)

This would require input in the form of:
&viewparams=code:'A5.15','A5.37'

However we have realised that often the parameters supplied could potentially be lookup driven (as with the simple filters). e.g. the above example is simply picking multiple codes as in the simple filter. In other words just because the filter query is complex doesn't mean that the parameters the user supplies are complex.

So what we propose is to allow a complex filter to be optionally setup to use dropdowns (in exactly the same way as simple filters) when appropriate. However we don't want to restrict to this so it will also be possible to set a complex filter up for free text - so for example you could Enter A5% as the parameter to filter all codes starting with A5 or a value (e.g. 500 for only areas > 500 for example).

The free text would do a very limited bit of string preparation (e.g. it might swap a , for , as it needs escaping in a list for geoserver.

We assume you are OK with this as it's basically an extension on what we originally proposed allowing you to use drop down lists where appropriate on complex filters as a bonus.

But if @HelenwoodsJNCC @JordanPinder could confirm you are happy with that that would be great

@HelenwoodsJNCC
Copy link

@JamesPe All sounds sensible to me.

@JordanPinder
Copy link

@JamesPe Fine with me as well. Would this filtering also provide multiple filtering conditions (e.g. and / or etc.)?

@JamesPe
Copy link
Collaborator

JamesPe commented Feb 27, 2019

@JordanPinder - Basically yes - the caveat being depending on what you specified in the SQL for the geoserver layer.

The above example allows ORs if multiple codes are provided, but I presume you really mean different attributes.

You could setup a Geoserver layer that said:
SELECT * FROM public.eusm2016_200 WHERE (hab_type IN (%code%) OR substrate=%substrate%) AND Area>%area%

Then the user would be prompted for:
%code% - e.g. 'A5.15','A5.37'
%substrate% - e.g. sand
%area% - e.g. 500

Above example obviously made up - but hope it shows the principle. Could of course be based on SQL that joined tables as well.

@JordanPinder
Copy link

@JamesPe yup that's fine, just double checking for my own reference.

@andyb-esdm
Copy link
Collaborator Author

@SimonAnnetts I think our layer config might not be right. Each filter in a layer's filters has the isComplex attribute. But you can't mix and match simple and complex filters on one layer. I think we need to indicate the filter type at the layer level.
For now though I am just checking that all filters have isComplex = true before using the alternative filter parameter syntax.

@andyb-esdm
Copy link
Collaborator Author

All implemented as far as I can take it at the moment. Will need to test with some real filters (simple and complex) before UAT.

@SimonAnnetts
Copy link
Collaborator

SimonAnnetts commented Mar 6, 2019

@andyb-esdm
I have set up the map layer called 'Test' in the test web-api data . It has two filters currently, one is a lookup of EunisHabitats , the other a free text input. Both use the attribute 'code' which will be substituted into the view's SQL query:

SELECT * FROM public.eusm2016_200 WHERE hab_type IN (%code%)

The multi select filter appears to work fine in that the query contains data like
viewparams=code:'A5.15'.'A5.16'
which is correctly placed into the IN (%code%)

The text filter seems not to be enclosing the text with '' so is not working. I guess you need to check if the value is purely numeric (and send as is) or contains text (and should be enclosed in '')

The view also is acting strangely in that it always defaults to showing two hab_types (the original default value was set to 'A5.15','A5.37'). Even if I change the default value in the view to 'xxx', it still show those original two habitats when an empty filter is sent. This is a geoserver problem I think and possibly the view needs to be deleted and re-created when changing the defaults for %code%. I will have a play with this.

@SimonAnnetts
Copy link
Collaborator

SimonAnnetts commented Mar 6, 2019

For testing purposes, I created a new layer called 'Test1' on our geoserver. It's a SQL view with the SQL:

SELECT * FROM public.eusm2016_200
WHERE hab_type IN (%code%) or to_tsvector('english', substrate) @@ to_tsquery('english', '%substrate%')

I added the layer 'Test1' to the web-api config with the name 'Test Layer with Complex Filters(2) (Test)' and defined two filters. One is a Lookup of EunisHabitats and returns the attribute 'code' in a geoserver query. The other is a Text filter that returns the attribute 'substrate'.
For performance reasons I had to add an index to the Postgres table for the full text naturalised search of the substrate column:

CREATE INDEX eusm2016_200_substrate_ftidx ON public.eusm2016_200 USING GIN (to_tsvector('english', substrate));

It's now possible to search for substrates by typing in things like 'sand', 'mud', 'rock', 'bed' etc. as well as using the multi select for habitat selection.

Helper documentation:
https://www.postgresql.org/docs/9.6/textsearch-tables.html

@HelenwoodsJNCC
Copy link

@andyb-esdm @SimonAnnetts this is more general filtering feedback from the UAT:

  • When the filter box is open users would like to be able to interact with the page without having to click to exit the filtering window first. Can we make it so if they click somewhere else the filter window closes?
  • Users would also like a clearer message that the filtering has done something and be alerted if there search has returned zero results. When you click apply could the main filtering window close and a message appear saying filtering successful/unsuccessful 0 results returned?

@andyb-esdm
Copy link
Collaborator Author

First one is fine - it's altering the behaviour of the popup. I deliberately enforced the close button for this one, though, because I could imagine someone carefully selecting a lot of filter values and accidentally clicking off the dialog before applying the filter! But I will implement as you describe.

I don't know how to detect if filtering has done anything with a WMS since you are just getting image tiles back. Any ideas from anyone?

@JamesPe
Copy link
Collaborator

JamesPe commented Mar 19, 2019

I don't think the second one is possible as you say you only get back an image. The only way round it is to have a matching WFS service setup and to query that to get a count. That all becomes a lot of fiddle - and makes setting up / maintaining the map layers is much harder.

Its also of questionable value as the filtered object could be on some part of the map not currently visible

@HelenwoodsJNCC
Copy link

OK thanks, we didn't know if it was possible. I think it just means we will have to think carefully at our end about how we implement our filters to try and avoid running into the zero results returned.

@GMDuncan
Copy link

A possible alternative solution could be to use Geoserver's countMatched legend option (For example ) but it would mean refreshing the legend every time someone moves the map, and feeding through the entire map BBOX rather than the tiles, and it slows down the legend rendering significantly, and I'm not sure if anyone would actually find and use the functionality hiding away in the legend anyway, so I'm not sure whether or not it would be worthwhile attempting to implement...

@JamesPe
Copy link
Collaborator

JamesPe commented Mar 19, 2019

I have done similar in the past - but that will does potentially significantly slow down the whole mapping considerably in my experience (depending on the layers used)

@GMDuncan
Copy link

Aye, seems to be a significant slow down indeed.

@JamesPe
Copy link
Collaborator

JamesPe commented Mar 19, 2019

Yes - just about OK with small simple layers - but as soon as the layer gets complex (large) delay becomes unacceptable IMO

@GMDuncan
Copy link

GMDuncan commented Apr 15, 2019

Hi, I've just noticed that when mixing complex and non-complex filters, the filter logic for creating the WMS request seems to get confused.

With both a complex query and non-complex filter set on a layer, the CQL_FILTER parameter is set in the querystring, but the viewparams snipped is not generated. For example with the Annex I habitat maps layer turned on, and filtered to a GUI (complex) and habitat (non-complex) the wms calls just have the CQL in them
https://staging.ows.emodnet-seabedhabitats.eu/emodnet_view/wms?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&FORMAT=image%2Fpng8&TRANSPARENT=true&LAYERS=annexiMaps_all&TILED=TRUE&CQL_FILTER=annexi%20IN%20(%271170%27)&WIDTH=256&HEIGHT=256&CRS=EPSG%3A3857&STYLES=&BBOX=2504688.542848654%2C8766409.899970295%2C3130860.678560818%2C9392582.035682458

Whereas they should preferably include both, resulting in something like this:
https://staging.ows.emodnet-seabedhabitats.eu/emodnet_view/wms?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&FORMAT=image%2Fpng8&TRANSPARENT=true&LAYERS=annexiMaps_all&TILED=TRUE&CQL_FILTER=annexi%20IN%20(%271170%27)&WIDTH=256&HEIGHT=256&CRS=EPSG%3A3857&STYLES=&BBOX=2504688.542848654%2C8766409.899970295%2C3130860.678560818%2C9392582.035682458&viewParams=gui:GB200053
(image should be blank as the map in the previous image is not GB200053)

@HelenwoodsJNCC
Copy link

Hey, Im having a bit of difficultly getting the Complex filtering with lookups to work via SQL view. Im trying to set up the MPA boundary layers so they can be filtered by feature. Each site can have multiple features and rather than have duplicated polygons these are stored as comma delimited lists (annoying I know!).

I have written the following SQL view:
SELECT * FROM sac_mc_full WHERE site_code IN (SELECT site_code FROM sac_mc_full
CROSS JOIN regexp_split_to_table(species,',') WHERE regexp_split_to_table IN (%spCode%'))

This query works ok when I substitute a list of spCodes in but not when i use the actual filter on the SAC (filtering test) layer on the mapper.

@JamesPe
Copy link
Collaborator

JamesPe commented Apr 16, 2019

I think this works now @HelenwoodsJNCC - you didn't need the last quotes round the %spCode% as the app provides these.

So I changed your query to :
SELECT *
FROM sac_mc_full
WHERE site_code IN (SELECT site_code
FROM sac_mc_full
CROSS JOIN regexp_split_to_table(species,',')
WHERE regexp_split_to_table IN (%spCode%))

Also changed default to 'xxx' - so nothing shows by default.

Having said all that there does seem to be a slight geoserver issue in getting a change to go live - had to fight it a bit - but then it suddenly started working. may just be a slight delay for some reason

@JamesPe
Copy link
Collaborator

JamesPe commented Apr 16, 2019

Hmm @HelenwoodsJNCC - the sql you have above looks correct actually - but that wasn't what was in geoserver - you had: SELECT *
FROM sac_mc_full
WHERE site_code IN (SELECT site_code
FROM sac_mc_full
CROSS JOIN regexp_split_to_table(species,',')
WHERE regexp_split_to_table IN ('%spCode%'))

@HelenwoodsJNCC
Copy link

@JamesPe Graeme and I were playing around too. The change of the validation expression to ^[\w\d,']+$ seems to have done the trick. Does including the ,' increase any risk of injection issues?

Ideally I would like the layer to show everything by default, but we aren't sure of the best way to go about specifying this in the default parameter. We can do a bit of a bodge to make it work, but do you have any neat tricks to make this work?

@JamesPe
Copy link
Collaborator

JamesPe commented Apr 16, 2019

Think we are fighting each other @HelenwoodsJNCC !
Ah - afraid I may have overwritten your workaround (which was selecting all the codes from teh lookup?) Not sure there is anything neater for an IN query.

As for ' in RegEx - well it is needed for the query to work. You aren't allowing ; which is the dangerous one combined with a '. So long as your geoserver account is readonly etc there shouldn't really be any risk without another vulnerability.
The only other thing I could think of is to have a horrendous regex which listed all the possible codes that are accepted.

I'll leave alone now so we don't fight

@HelenwoodsJNCC
Copy link

Interesting issue flagged with me today - filters don't work in Internet Explorer version 11. The window opens and the explanatory text is there, but the windows containing the checknboxes aren't expanded.
image

@GMDuncan
Copy link

GMDuncan commented Jun 14, 2019

Could be related to this?
angular/angular-cli#4128

(console error showing in IE with "Object doesn't support property or method 'includes'")

@andyb-esdm
Copy link
Collaborator Author

Thanks for the heads up @GMDuncan . That was it. I've swapped includes for indexOf, which is supported by IE11.

image

@andyb-esdm
Copy link
Collaborator Author

This is on the esdm test server at the moment.
Probably worth deploying this after we've finished this month's new features?

@HelenwoodsJNCC
Copy link

HelenwoodsJNCC commented Jun 19, 2019

@andyb-esdm @SimonAnnetts any chance we could do a re-deploy in the early part of next week (W/C 24th)? The MPA mapper is due to go live around 3rd July so would be useful to check everything new is working as expected.

Could you also add the following into the feature-info-styles.css?

a.blue:link, a.blue:visited{
color: blue;
}

a.boldblue:link, a.boldblue:visited{
color: blue;
font-weight:bolder;
font-size:13px;
}

@andyb-esdm
Copy link
Collaborator Author

@HelenwoodsJNCC In that case I need to finish those new features! I'll do them next week and let you know how I get on.
You can add anything you want to the feature-info-styles.css but I guess you want me to add these so that they are used for all mappers?

@HelenwoodsJNCC
Copy link

@andyb-esdm thanks. The css is likely useful for all mappers - mainly for making any links clear.

@HelenwoodsJNCC
Copy link

@andyb-esdm in IE 11 permalinks don't seem to work properly, as in no active layers load - is your IE fix likely to sort this issue also?

@andyb-esdm
Copy link
Collaborator Author

@HelenwoodsJNCC
Copy link

@andyb-esdm looks like it's working fine. Thanks

@andyb-esdm
Copy link
Collaborator Author

@HelenwoodsJNCC No problem. It's likely the code change did make a difference because it's the bit that loads the filter lookup tables from the API, which it will do whether filters are set in the permalink or not.

@HelenwoodsJNCC
Copy link

@andyb-esdm any idea when you will be able to do the deployment?

@andyb-esdm
Copy link
Collaborator Author

andyb-esdm commented Jun 25, 2019

@HelenwoodsJNCC I can deploy at any time. However, I'm sorry, but I haven't got the mapper to use the proxy service that Simon wrote yet. I'm working on the goto lat/lon component now. Do you want me to deploy today when I've done the goto lat/lon or wait until the mapper is using the proxy for external layers (which will likely be tomorrow or thursday)?

@HelenwoodsJNCC
Copy link

@andyb-esdm I wasn't expecting the proxy service as part of this deployment, so i'm happy for you to deploy after you have finished working on the lat/long component. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants