-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use in-memory store for station lookups. #111
Comments
Another advantage of sqlite is that we can run queries on other fields too, like stations near another station, or stations within x km from a given set of coordinates. This might make the code a little more future-proof. First tests with SQLite and memcached.Tests are run as 50 station queries from a brussels-south liveboard. They are run a single time, to simulate a station request (test1) and 100 times for server average across multiple clients (test2). Tests are run inside a virtualbox instance. For the memcached queries, a 3rd test was executed, to see how it scaled to 500 *50 queries. The database is loaded and stored in memory (static) every test run.
the virtualbox instance, and differences in processing power, might skew these results, but the relative difference should be about the same.
ConclusionWhile this was tested with a basic / prototype implementation of SQLite3, but it seems the data is still too small to get a real performance difference here due to the loading overhead. However, memcached proved we can get a 55% speed improvement by using an in-memory cache. Edit: MySQL and APCSince the overhead of Sqlite is likely caused by loading the file etc, and not so much by the query, I decided to give it a run on SQL (MariaDb). Results are below, using the same testing method. To prevent any impact from outside on the solution, the normal tests were run again and used as reference. (Prevents processor load by other applications from distorting test results).
As we can see, SQL removes the speed limit imposed by the sqlite loading. It also comes relatively close to the normal approach combined with memcached. Possible disadvantages are the fact that this requires an SQL server. When combined with APC, we can see between 5 and 10% speed improvements over memcached. The final results are significant. When using SQL and APC combined, a benchmark starting from an empty cache is handled 40% faster than now. When running a benchmark multiple times, the average single benchmark is handled up to 77% faster than a request at this moment. How can a single benchmark starting from an empty cache still improve performanceThe caching mechanism (APC or memcached) store the stations one by one, based on their query. ConclusionAPC should be easy to install on the server, and is extremely easy to implement as well. Therefore, we should add it without hesitating. |
The final implementation reduces a liveboard benchmark to a value between 0.75 and 5 msec. This is a reduction of 95 - 99,17%. Therefore, there's no need for SQL here. However, the benchmarks above give some idea on the response time of different technologies, and might be useful for other (iRail) projects too. Final results:
|
Great work @Bertware! For some reason I supposed that PHP would optimize getting files from disk, yet your tests clearly indicate this is not the case. I have left some comments in pull request #112:
|
I agree on abstracting using PSR-6 and the running while it's not available. Currently it already works without PHP available, I only overlooked the clear-cache statement in the tests (All code is already wrapped in IF statements to check if apc is available). For the iRail servers, I would not use memcached. The additional improvements by APC are way to big to ignore. Note that the current code achieves way better performance than mentioned in the tests above (I'd say APC easily halves the time required by Memcached). Using memcached might lower the priority of using APC ("We're already using caching, so APC is not top priority"). Therefore, I would do it right from the first time, and just install APC. As I discussed in #88, we do need to move to PHP7 too. The install of APC could be combined by upgrading to PHP7. PHP7 is the way to go anyhow, and shouldn't break too much code (if it breaks anything at all). That 2000x speedup should also allow serving more requests from the same hardware, which in turn should lower costs. (I know that stations is only part of iRail, but any improvement is good, and I might look into improving iRail API as well) |
Use APC as in-memory store for extreme performance gains. #111
I believe this issue can be closed! |
The problem: Too long execution time for station lookups
At this moment, both
getStations
andgetStationFromID
loop over the entire stations list. This consumes valuable time resources. getStations also does quite some text replacing to match different spellings for the same station.Given that a single liveboard can easily contain 25 entries, every reduction of the execution time by 10ms, would result in a possible reduction up to 250ms. This means even small gains are worth it. As a possible side-effect, the php code might run more efficient, offloading the server a little.
Option 1: SQLite
The entire matching could be moved to an SQLite database. Here, we would have a table, which contains the standard name and the alternative name. A station could have multiple entries here, one for every special spelling of that station. If a station doesn't have special/different spellings, it's not in the table.
getStations($name) would perform a query, checking this table to see if we're handling a special name. If so, the original name would be returned. If not, we continue working with the original name parameter.
we now need to lookup the station with it's data. For this purpose, we could load the jsonld in sqlite too. We could set id as primary key and station name as indexed. Searching would be done on the name parameter, using
LIKE $name%
and load the entire rows. We parse the rows and return. This is expected to be a performance boost compared to looping over the entire file.searching by id would happen on the same table mentioned in step 2. Only, we'd now search on id. Again, we load the entire row, parse and return.
We could load the stations in an sqlite database (containing both tables) using a php script, which could be manually run after running the build script. (Or call it from the build script).
To load the special spellings in sqlite, we could use the same php script, and a station-alternatives.csv files, which would contain the same data as the table would (original name | special spelling). This way, we can easily add names and run the script to generate a database table. An alternative would be to allow an unlimited amount of special names on one line, so you'd have the original name followed by all special spellings.
This would also clean up the code a little, since all matching code would be replaced by a single query.
Option 2: In-memory store
We could consider keeping the results in a memory cache. This cache could be filled by a cron job, so we would always have a cache hit. This seems an easier, but less flexible solution. This also seems to be more server and config dependent, compared to the guaranteed, but likely lower performance of sqlite.
Option 3: Combine
The best of 2 worlds: we could use the sqlite followed by an in-memory store. Every cache miss would still get a good performance, the memory store would give a little extra boost.
These performance improvements might help #88 a little. (Not a duplicate, as this is one specific approach for a specific speedup, whereas #88 is a lot broader).
Also related to iRail/iRail#265
I could implement the sqlite version in a fork to compare performance, after which we could evaluate if we switch? Input and discussion wanted. @pietercolpaert
The text was updated successfully, but these errors were encountered: