UK National Charge Point Registry import is timing out #161

zymurgic · 2020-06-21T20:11:39Z

It seems that the import for the UK National Chargepoint Registry is timing out.

What needs doing to make imports of large datasets less intensive?

or is this just a cloudflare cacheing issue, with the cloudflare request timing out?

webprofusion-chrisc · 2020-06-22T03:34:00Z

I'm guessing you have a script to invoke that because I took the button away ages ago :) all large imports are subject to timeouts using the admin web page method and yes you will hit the cloudflare timeout first but the import will keep going. There is a plan B.

When an import runs, we get all the latest data from the import, transform it into our own POI object model, then compare it against our current data set (either whole world or country specific, usually country specific) to see if we already have items we previously imported that we could now update or if any of the new items are approximate duplicates (close distance, network etc) of something we already have (we discard these). If item have no changes then we discard those as well. The fetch of all data to compare against and the comparison/deduplication stage is extremely brute-force and expensive.

Once we have our list of new/updated POIs we write them to the SQL database as added/updated items and refresh the POI cache (mongodb).

A while ago I was working on an offline method to perform the import/deduplication on a different machine, then post the changes to the API as a batch. This works OK but it's not fully automated. The plan was to implement a .net worker service to run on Linux (because its cheaper to run linux VMs). I've run out of time/energy for that currently, which is a shame because it was nearly there, it's just a matter of pulling it all together. We currently also use a .net worker to wrap our API as a linux systemd service, this hosts our 2 API mirrors which constantly sync with the master api to local MongoDB instances and are load balanced through a cloudflare worker. Our master API/website still runs on windows/mongodb/sql server.

So to make large imports less intensive we need to optimise comparision and de-duplication and we need to fully offload the pre-import process so it can run on a different server. When we started in 2011 it ran as a little GUI/console app on the server itself but as it got more complex it was useful to see numbers of duplicates etc and run at times of low load, so it moved to being part of the admin website.

Currently I've been running the imports manually by using the little GUI to prepare the batch JSON file to upload, then uploading it so the objective of the final process if to completely automate that. There needs to be an api call to inform the database when the Date Last Imported was for provider as well as currently the manual process doesn't update that.

zymurgic · 2020-06-23T08:40:28Z

It wasn’t a script, just a browser bookmark. One thing that my comparison scripts do (which themselves need a complete rewrite anyway - they were written in Perl, and recently the language has changed in a way that not only deprecated some functionality I was using, but removed it) is just to pull from the API the particular network that it’s checking at a time, and in some cases, it will break that down on a per-country basis. (There’s no need to pull into RAM all the locations in the USA if the script is currently working on just one network in the UK). and no, I’m still not that familiar with .net and C#. I last used it for paid work 6 years ago. and have got a bit rusty. Perhaps I need to invest a bit of time to get myself up to speed. so that I could help you out a bit more? — Simon Hewison

…

On 22 Jun 2020, at 04:34, Christopher Cook ***@***.***> wrote: I'm guessing you have a script to invoke that because I took the button away ages ago :) all large imports are subject to timeouts using the admin web page method and yes you will hit the cloudflare timeout first but the import will keep going. There is a plan B. When an import runs, we get all the latest data from the import, transform it into our own POI object model, then compare it against our current data set (either whole world or country specific, usually country specific) to see if we already have items we previously imported that we could now update or if any of the new items are approximate duplicates (close distance, network etc) of something we already have (we discard these). If item have no changes then we discard those as well. The fetch of all data to compare against and the comparison/deduplication stage is extremely brute-force and expensive. Once we have our list of new/updated POIs we write them to the SQL database as added/updated items and refresh the POI cache (mongodb). A while ago I was working on an offline method to perform the import/deduplication on a different machine, then post the changes to the API as a batch. This works OK but it's not fully automated. The plan was to implement a .net worker service to run on Linux (because its cheaper to run linux VMs). I've run out of time/energy for that currently, which is a shame because it was nearly there, it's just a matter of pulling it all together. We currently also use a .net worker to wrap our API as a linux systemd service, this hosts our 2 API mirrors which constantly sync with the master api to local MongoDB instances and are load balanced through a cloudflare worker. Our master API/website still runs on windows/mongodb/sql server. So to make large imports less intensive we need to optimise comparision and de-duplication and we need to fully offload the pre-import process so it can run on a different server. When we started in 2011 it ran as a little GUI/console app on the server itself but as it got more complex it was useful to see numbers of duplicates etc and run at times of low load, so it moved to being part of the admin website. Currently I've been running the imports manually by using the little GUI to prepare the batch JSON file to upload, then uploading it so the objective of the final process if to completely automate that. There needs to be an api call to inform the database when the Date Last Imported was for provider as well as currently the manual process doesn't update that. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#161 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUY4RSX4HZIWL63B256X2DRX3GLHANCNFSM4OEAIUBA>.

webprofusion-chrisc · 2020-06-23T08:55:59Z

Well any help is much appreciated! As background, I work full time developing https://certifytheweb.com which is an app for Let's Encrypt/ACME certificates on Windows and it's taking up all my time currently. I'm also not a regular explorer of new charging stations (I use 2 regularly) which make my level of interest on public charging rather low at times. Meanwhile, the API is churning out 3 million queries a month/1TB of data, so while we don't have very many active contributors we do (appear to have) have plenty of consumers.

The advantage of the OCM stuff having moved to .net core (which will soon be called .net 5) is that it's now running on very current technology rather than something that was heading towards legacy. Plus, now most of it can run on docker/Linux (especially the API side) which makes it cheaper to scale and again it deals with very current technology from a sharpening your skillset point of view. The overall OCM software is a bit complex at the high level but if you break it down into small chunks it's OK, and there are parts of it that are mostly unused or rarely updated.

webprofusion-chrisc closed this as completed May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UK National Charge Point Registry import is timing out #161

UK National Charge Point Registry import is timing out #161

zymurgic commented Jun 21, 2020

webprofusion-chrisc commented Jun 22, 2020

zymurgic commented Jun 23, 2020 via email

webprofusion-chrisc commented Jun 23, 2020

UK National Charge Point Registry import is timing out #161

UK National Charge Point Registry import is timing out #161

Comments

zymurgic commented Jun 21, 2020

webprofusion-chrisc commented Jun 22, 2020

zymurgic commented Jun 23, 2020 via email

webprofusion-chrisc commented Jun 23, 2020