Optimise `tagLocations` #9946

daemon1024 · 2021-07-25T11:58:01Z

When we first open up the map endpoint, first and default markers that load on maps are tag locations.

But when we start interacting with the map, it starts sending loads of requests to the API.

pl-tagloc.mp4

Which causes throttles eventually

We could debounce queries on these events. ( cc @jywarren )

Potential source for these event triggers

plots2/app/assets/javascripts/leafletHelper.js

Line 154 in f3ec8ff

map.on('load viewreset resize zoomend moveend', contentMap);

plots2/app/assets/javascripts/leafletHelper.js

Line 54 in f3ec8ff

function contentLayerParser(map, markers_hash, map_tagname) {

Ref #9698

Ref #9946

daemon1024 · 2021-07-25T15:23:19Z

Well, it may help somewhat but I don't think debounce is helping as much as I thought it would.

On a wider map, the endpoint itself is slow.

But unlike our findings in #7556 ,

I don't see any particular SQL bottlenecks. More like I don't see anything useful in Skylight.

On switching to allocation insights,

I see a lot of new events which are hogging up a lot of time, will check it out.

Link for skylight info of the screenshots - https://oss.skylight.io/app/applications/GZDPChmcfm1Q/1627225620/5m/endpoints/Srch::Search%20%5BGET%5D%20srch%2Ftaglocations?mode=memory

jywarren · 2021-07-27T16:19:58Z

One clue to figure out is, when you change the limit does it vastly affect the speed, as @17sushmita noted in the call today? That might make sense, but if it doesn't much help, a question is, does limit actually reduce query time or are we just adding a limit near the end of a chain of queries and we aren't getting much benefit from it?

If a low limit does help a lot, we could think about hitting the endpoint with a low limit, but either randomizing it or maybe better yet paginating it by recency or some other sorting function? Then as time goes on we hit the same endpoint but with page=1, page=2 that kind of thing, so we slowly build up with shorter queries. That could look a bit weird but would be like debouncing - spacing out queries giving the DB time to recover memory? Testing would tell if this actually helps overall.

I would like to see @17sushmita and you both try out this idea of truncating longitude/latitude queries to see if this improves our load time. The window for data we fetch would be a bit different then the viewed region. You can learn a little about this kind of truncation for a different purpose (privacy) here: https://github.com/publiclab/leaflet-blurred-location but it also results in better database optimization with some caveats. We could discuss this approach in a new issue if you both prefer?

#9946

daemon1024 · 2021-08-02T20:42:29Z

One clue to figure out is, when you change the limit does it vastly affect the speed, That might make sense, but if it doesn't much help, a question is, does limit actually reduce query time or are we just adding a limit near the end of a chain of queries and we aren't getting much benefit from it?

Limits are helping for larger boundary boxes, It's somewhere between 2x-4x slower when I removed the limit.

One thing that intrigues me is
https://unstable.publiclab.org/api/srch/taglocations?nwlat=52&selat=-52&nwlng=-150&selng=150

I tried this query, which took around 47s to load but since there's no limit this should be somewhere near the max amount of taglocations we can get?

and we have a fully loaded map now which is more loaded than I have ever seen :P

I am not sure how to put it, but there's a big time tradeoff initially but this is the most content we have ever seen. What if we can cache this specific content and send it out initially and later on update the cache in the background, that should eliminate the time tradeoff

jywarren · 2021-08-03T15:34:04Z

Comparison: https://publiclab.org/unearthing-pvd

daemon1024 · 2021-08-03T15:44:45Z

Provided we cache things, this much of data won't be easy to navigate and might as well cause some client side slowness ( like you noted about for the link you referred which needed webgl? I am not sure if I understood that correctly), to mitigate that we can try map clustering. An Example for that.

jywarren · 2021-08-03T16:45:20Z

Handling 15k responses wasn't too bad with the gl based layer. I don't honestly think we'll get that high given the optimizations possible, but you can see how it's implemented in the code here - not too bad!

https://github.com/publiclab/leaflet-environmental-layers/blob/7782b57a8ae0a08134d32b7c29a09f0e5923f518/src/unearthing.js#L35-L61

daemon1024 · 2021-08-05T20:53:57Z

Here's what the current unstable looks like which loads the markers under 20ms with 0b5080e

simplescreenrecorder-2021-08-06_02.20.57.mp4

What I did is, I am storing the json data based on response from https://publiclab.org/api/srch/taglocations?nwlat=52&selat=-52&nwlng=-150&selng=150&limit=1000 under static/tagLocStore.json and referencing the raw github content in our map. ( We can update the box boundary to cover max of the map and maybe increase our limit if needed )

But we can't really keep the data once and forget it, so I made a github scheduled action in #9997 which will get the data from our endpoint every hour and commit the changes if any directly onto our main branch. We can choose to keep it on a seperate branch if it's not suitable to commit onto main branch, since we can fetch the raw json content anyway.

What do you think about this?

jywarren · 2021-08-06T03:49:59Z

This looks pretty cool! Could we achieve something similar with action caching? https://guides.rubyonrails.org/caching_with_rails.html

I know it's precaching though so I guess we could have a script that prefetches the cache daily...

Also, would there be scaling issues as the number of markers increases?

jywarren · 2021-08-06T03:51:12Z

And if we should fetch for different regions as it becomes too many markers, couldn't we precache by lat/lon boxes?

daemon1024 · 2021-08-06T14:20:08Z

And if we should fetch for different regions as it becomes too many markers, couldn't we precache by lat/lon boxes?

That's where the microservice fits in I think 👀👀
at a more granular level, storing individual elements as objects making up clusters as required. Our current ruby endpoint is needing to deal with a lot of things, but since we will be precaching the processed part these elements won't take up much time to load up.
As for the scraping/precaching part, the github actions should provide the proof of concept for that! Instead of storing it in the repository, we can send a post request to our microservice to store it.

daemon1024 · 2021-08-06T21:57:40Z

Could we achieve something similar with action caching? https://guides.rubyonrails.org/caching_with_rails.html

I tried to do things on the ruby side but didn't proceed well.

I know it's precaching though so I guess we could have a script that prefetches the cache daily...

Like updating the file every day instead of hour that I did right now?

Also, would there be scaling issues as the number of markers increases?

I don't think so 🤔 We would need to update the limit number though maybe. Currently we have 1152 records I believe. I am not sure if there's a provision to remove limit on our endpoint right now.

daemon1024 · 2021-08-06T21:58:26Z

@jywarren I am kind of unclear, what my next steps should be here.

daemon1024 · 2021-08-11T20:35:41Z

Map Cluster Demo based on the above precached json data

simplescreenrecorder-2021-08-10_21.31.37.mp4

daemon1024 · 2021-08-12T20:59:37Z

I believe you raised a query about we limiting the points to 1000, I realised we have around 1152 point right now in total, so we can maybe set the limit to 1500? for the api call we make in #9997. Else we can make another endpoint which just returns things without any limit.

jywarren · 2021-08-14T18:23:20Z

Hi @daemon1024 - we have a few different issues here. In the near term, displaying the most recent 1500 and caching them seems reasonable.

But over the long term, we might expect this to grow to 10k, 20k, right? And we assume we wouldn't want to display just the most recent or most popular. So we need a way to a) prioritize which to show in the overall view, which for now is complete but as the dataset grows will become a partial view (so, by recency? popularity? a switch between, with some minimal UI to toggle, like on a Tag page -- https://publiclab.org/tag/water-quality) and b) a way to make an efficient sub-query of a zoomed in view. That is, as you zoom in, we are not fetching data for the whole globe, so we can show more of a "complete" view as you zoom in. That'll mean a query that is geographically confined, but also one which is cache-able. We can cache by geographic region but you want the cache key to be very standard, even if people don't zoom in exactly the same way. So we can cache by latitude/longitude range, for example -- cache by integer grid squares, say. Then we could limit to some number (500? 1000?) for each lat/lon grid square, and cache those. But we'd have to ensure Leaflet knows it can fetch by those - like:

/api/nearby______/?bbox=71,-144,72,-143

for example. It would have to fetch all of the grid squares that intersect with the current view, and each one can be individually cached. There may be a standard way to do this. Or we may need to write code to find all grid squares that intersect the current view box and manually fetch them all. This code could be modeled on the Leaflet Blurred Location grid square code, because actually it already does this dividing in order to draw grid squares (although they're not always integer lat/lon - they vary based on zoom).

Finally, we'd have to think about the schedule for caching these, as we wouldn't want to send requests for 100 grid squares and expire their caches all at once, as that would hit the database hard as well, just as we wanted to avoid. So what are some strategies for a rolling expiration of these tiles?

Anyhow, i know this is a lot. You may want to read up a bit on how geographic systems manage this. There might be some in Leaflet plugins: https://leafletjs.com/plugins.html or there may be an assumption that people will use pre-made geo servers like, i dunno, ESRI, or GeoServer or MapServer. There may be a lighter weight one that we could consider as a microservice. Or we could just come up with a good rolling Rails cache expiration strategy. It may be that Leaflet has some plugins that help it fetch marker data by grid square. Or we may need to write that part ourselves.

Finally, do we "trash" loaded data as you continue to browse around? What if you build up like 50,000 points by moving the map and continuing to load more data as you zoom and pan? Do we need to "expire" the markers you loaded before, so we don't overload the interface? This could be thought about later, as we don't yet have that many, but it might eventually become a problem.

This is a lot! But I hope it helps you structure the research and planning around this. I'm happy to answer some questions!

Ref publiclab#9946

* debounce changes in map content Ref #9946 * increase limit of taglocations Since we have an optimised api now, we can increase the limit to increase content * add query limit as a search parameter * minor refactor

* debounce changes in map content Ref publiclab#9946 * increase limit of taglocations Since we have an optimised api now, we can increase the limit to increase content * add query limit as a search parameter * minor refactor

jywarren · 2022-01-04T21:33:28Z

I just wanted to say that in #10123 we are doing quite well now. I think it's possible we can call this "solved" but if we want to really push performance, we could keep it open and/or try other strategies. But maps are super useful and plenty dense now!

daemon1024 added enhancement explains that the issue is to improve upon one of our existing features optimization reducing load times and increasing code efficiency through refactoring labels Jul 25, 2021

daemon1024 self-assigned this Jul 25, 2021

daemon1024 mentioned this issue Jul 25, 2021

[Planning] Geographic Features Refinement #9698

Open

daemon1024 added a commit that referenced this issue Jul 25, 2021

debounce changes in map content

16e8a34

Ref #9946

daemon1024 added a commit that referenced this issue Aug 2, 2021

Remove limits in tag locations service

edcb03b

#9946

daemon1024 mentioned this issue Aug 5, 2021

store tagLocations data #9997

Closed

5 tasks

17sushmita mentioned this issue Aug 17, 2021

Optimised taglocations API #10028

Merged

5 tasks

jywarren closed this as completed in #10028 Aug 17, 2021

jywarren reopened this Aug 17, 2021

daemon1024 added a commit to daemon1024/plots2 that referenced this issue Aug 18, 2021

debounce changes in map content

8ba437a

Ref publiclab#9946

daemon1024 mentioned this issue Aug 18, 2021

Increase Limit and Debounce Queries #10059

Merged

jywarren mentioned this issue Aug 31, 2021

Gradually increase default results limit for /map view #10123

Open

3 tasks

cesswairimu mentioned this issue Apr 11, 2022

Weekly Community Check-In #136 - Kindness Always Wins #10948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise `tagLocations` #9946

Optimise `tagLocations` #9946

daemon1024 commented Jul 25, 2021 •

edited

Loading

daemon1024 commented Jul 25, 2021

jywarren commented Jul 27, 2021

daemon1024 commented Aug 2, 2021 •

edited

Loading

jywarren commented Aug 3, 2021

daemon1024 commented Aug 3, 2021 •

edited

Loading

jywarren commented Aug 3, 2021

daemon1024 commented Aug 5, 2021

jywarren commented Aug 6, 2021

jywarren commented Aug 6, 2021

daemon1024 commented Aug 6, 2021 •

edited

Loading

daemon1024 commented Aug 6, 2021

daemon1024 commented Aug 6, 2021 •

edited

Loading

daemon1024 commented Aug 11, 2021

daemon1024 commented Aug 12, 2021 •

edited

Loading

jywarren commented Aug 14, 2021

jywarren commented Jan 4, 2022

Optimise tagLocations #9946

Optimise tagLocations #9946

Comments

daemon1024 commented Jul 25, 2021 • edited Loading

daemon1024 commented Jul 25, 2021

jywarren commented Jul 27, 2021

daemon1024 commented Aug 2, 2021 • edited Loading

jywarren commented Aug 3, 2021

daemon1024 commented Aug 3, 2021 • edited Loading

jywarren commented Aug 3, 2021

daemon1024 commented Aug 5, 2021

jywarren commented Aug 6, 2021

jywarren commented Aug 6, 2021

daemon1024 commented Aug 6, 2021 • edited Loading

daemon1024 commented Aug 6, 2021

daemon1024 commented Aug 6, 2021 • edited Loading

daemon1024 commented Aug 11, 2021

daemon1024 commented Aug 12, 2021 • edited Loading

jywarren commented Aug 14, 2021

jywarren commented Jan 4, 2022

Optimise `tagLocations` #9946

Optimise `tagLocations` #9946

daemon1024 commented Jul 25, 2021 •

edited

Loading

daemon1024 commented Aug 2, 2021 •

edited

Loading

daemon1024 commented Aug 3, 2021 •

edited

Loading

daemon1024 commented Aug 6, 2021 •

edited

Loading

daemon1024 commented Aug 6, 2021 •

edited

Loading

daemon1024 commented Aug 12, 2021 •

edited

Loading