-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise tagLocations
#9946
Comments
Well, it may help somewhat but I don't think debounce is helping as much as I thought it would. On a wider map, the endpoint itself is slow. But unlike our findings in #7556 , On switching to allocation insights, I see a lot of new events which are hogging up a lot of time, will check it out. Link for skylight info of the screenshots - https://oss.skylight.io/app/applications/GZDPChmcfm1Q/1627225620/5m/endpoints/Srch::Search%20%5BGET%5D%20srch%2Ftaglocations?mode=memory |
One clue to figure out is, when you change the If a low I would like to see @17sushmita and you both try out this idea of truncating longitude/latitude queries to see if this improves our load time. The window for data we fetch would be a bit different then the viewed region. You can learn a little about this kind of truncation for a different purpose (privacy) here: https://github.com/publiclab/leaflet-blurred-location but it also results in better database optimization with some caveats. We could discuss this approach in a new issue if you both prefer? |
Limits are helping for larger boundary boxes, It's somewhere between 2x-4x slower when I removed the limit. One thing that intrigues me is I tried this query, which took around 47s to load but since there's no limit this should be somewhere near the max amount of taglocations we can get? and we have a fully loaded map now which is more loaded than I have ever seen :P I am not sure how to put it, but there's a big time tradeoff initially but this is the most content we have ever seen. What if we can cache this specific content and send it out initially and later on update the cache in the background, that should eliminate the time tradeoff |
Comparison: https://publiclab.org/unearthing-pvd |
Provided we cache things, this much of data won't be easy to navigate and might as well cause some client side slowness ( like you noted about for the link you referred which needed webgl? I am not sure if I understood that correctly), to mitigate that we can try map clustering. An Example for that. |
Handling 15k responses wasn't too bad with the gl based layer. I don't honestly think we'll get that high given the optimizations possible, but you can see how it's implemented in the code here - not too bad! |
Here's what the current unstable looks like which loads the markers under 20ms with 0b5080e simplescreenrecorder-2021-08-06_02.20.57.mp4What I did is, I am storing the json data based on response from https://publiclab.org/api/srch/taglocations?nwlat=52&selat=-52&nwlng=-150&selng=150&limit=1000 under But we can't really keep the data once and forget it, so I made a github scheduled action in #9997 which will get the data from our endpoint every hour and commit the changes if any directly onto our main branch. We can choose to keep it on a seperate branch if it's not suitable to commit onto main branch, since we can fetch the raw json content anyway. What do you think about this? |
This looks pretty cool! Could we achieve something similar with action caching? https://guides.rubyonrails.org/caching_with_rails.html I know it's precaching though so I guess we could have a script that prefetches the cache daily... Also, would there be scaling issues as the number of markers increases? |
And if we should fetch for different regions as it becomes too many markers, couldn't we precache by lat/lon boxes? |
That's where the microservice fits in I think 👀👀 |
I tried to do things on the ruby side but didn't proceed well.
Like updating the file every day instead of hour that I did right now?
I don't think so 🤔 We would need to update the limit number though maybe. Currently we have 1152 records I believe. I am not sure if there's a provision to remove limit on our endpoint right now. |
@jywarren I am kind of unclear, what my next steps should be here. |
Map Cluster Demo based on the above precached json data simplescreenrecorder-2021-08-10_21.31.37.mp4 |
I believe you raised a query about we limiting the points to 1000, I realised we have around 1152 point right now in total, so we can maybe set the limit to 1500? for the api call we make in #9997. Else we can make another endpoint which just returns things without any limit. |
Hi @daemon1024 - we have a few different issues here. In the near term, displaying the most recent 1500 and caching them seems reasonable. But over the long term, we might expect this to grow to 10k, 20k, right? And we assume we wouldn't want to display just the most recent or most popular. So we need a way to a) prioritize which to show in the overall view, which for now is complete but as the dataset grows will become a partial view (so, by recency? popularity? a switch between, with some minimal UI to toggle, like on a Tag page -- https://publiclab.org/tag/water-quality) and b) a way to make an efficient sub-query of a zoomed in view. That is, as you zoom in, we are not fetching data for the whole globe, so we can show more of a "complete" view as you zoom in. That'll mean a query that is geographically confined, but also one which is cache-able. We can cache by geographic region but you want the cache key to be very standard, even if people don't zoom in exactly the same way. So we can cache by latitude/longitude range, for example -- cache by integer grid squares, say. Then we could limit to some number (500? 1000?) for each lat/lon grid square, and cache those. But we'd have to ensure Leaflet knows it can fetch by those - like:
for example. It would have to fetch all of the grid squares that intersect with the current view, and each one can be individually cached. There may be a standard way to do this. Or we may need to write code to find all grid squares that intersect the current view box and manually fetch them all. This code could be modeled on the Leaflet Blurred Location grid square code, because actually it already does this dividing in order to draw grid squares (although they're not always integer lat/lon - they vary based on zoom). Finally, we'd have to think about the schedule for caching these, as we wouldn't want to send requests for 100 grid squares and expire their caches all at once, as that would hit the database hard as well, just as we wanted to avoid. So what are some strategies for a rolling expiration of these tiles? Anyhow, i know this is a lot. You may want to read up a bit on how geographic systems manage this. There might be some in Leaflet plugins: https://leafletjs.com/plugins.html or there may be an assumption that people will use pre-made geo servers like, i dunno, ESRI, or GeoServer or MapServer. There may be a lighter weight one that we could consider as a microservice. Or we could just come up with a good rolling Rails cache expiration strategy. It may be that Leaflet has some plugins that help it fetch marker data by grid square. Or we may need to write that part ourselves. Finally, do we "trash" loaded data as you continue to browse around? What if you build up like 50,000 points by moving the map and continuing to load more data as you zoom and pan? Do we need to "expire" the markers you loaded before, so we don't overload the interface? This could be thought about later, as we don't yet have that many, but it might eventually become a problem. This is a lot! But I hope it helps you structure the research and planning around this. I'm happy to answer some questions! |
* debounce changes in map content Ref #9946 * increase limit of taglocations Since we have an optimised api now, we can increase the limit to increase content * add query limit as a search parameter * minor refactor
* debounce changes in map content Ref publiclab#9946 * increase limit of taglocations Since we have an optimised api now, we can increase the limit to increase content * add query limit as a search parameter * minor refactor
* debounce changes in map content Ref publiclab#9946 * increase limit of taglocations Since we have an optimised api now, we can increase the limit to increase content * add query limit as a search parameter * minor refactor
I just wanted to say that in #10123 we are doing quite well now. I think it's possible we can call this "solved" but if we want to really push performance, we could keep it open and/or try other strategies. But maps are super useful and plenty dense now! |
When we first open up the map endpoint, first and default markers that load on maps are tag locations.
But when we start interacting with the map, it starts sending loads of requests to the API.
pl-tagloc.mp4
Which causes throttles eventually
We could debounce queries on these events. ( cc @jywarren )
Potential source for these event triggers
plots2/app/assets/javascripts/leafletHelper.js
Line 154 in f3ec8ff
plots2/app/assets/javascripts/leafletHelper.js
Line 54 in f3ec8ff
Ref #9698
The text was updated successfully, but these errors were encountered: