Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Use in-memory store for station lookups. #111
The problem: Too long execution time for station lookups
At this moment, both
Given that a single liveboard can easily contain 25 entries, every reduction of the execution time by 10ms, would result in a possible reduction up to 250ms. This means even small gains are worth it. As a possible side-effect, the php code might run more efficient, offloading the server a little.
Option 1: SQLite
The entire matching could be moved to an SQLite database. Here, we would have a table, which contains the standard name and the alternative name. A station could have multiple entries here, one for every special spelling of that station. If a station doesn't have special/different spellings, it's not in the table.
We could load the stations in an sqlite database (containing both tables) using a php script, which could be manually run after running the build script. (Or call it from the build script).
To load the special spellings in sqlite, we could use the same php script, and a station-alternatives.csv files, which would contain the same data as the table would (original name | special spelling). This way, we can easily add names and run the script to generate a database table. An alternative would be to allow an unlimited amount of special names on one line, so you'd have the original name followed by all special spellings.
This would also clean up the code a little, since all matching code would be replaced by a single query.
Option 2: In-memory store
We could consider keeping the results in a memory cache. This cache could be filled by a cron job, so we would always have a cache hit. This seems an easier, but less flexible solution. This also seems to be more server and config dependent, compared to the guaranteed, but likely lower performance of sqlite.
Option 3: Combine
The best of 2 worlds: we could use the sqlite followed by an in-memory store. Every cache miss would still get a good performance, the memory store would give a little extra boost.
Also related to iRail/iRail#265
I could implement the sqlite version in a fork to compare performance, after which we could evaluate if we switch? Input and discussion wanted. @pietercolpaert
Another advantage of sqlite is that we can run queries on other fields too, like stations near another station, or stations within x km from a given set of coordinates. This might make the code a little more future-proof.
First tests with SQLite and memcached.
Tests are run as 50 station queries from a brussels-south liveboard. They are run a single time, to simulate a station request (test1) and 100 times for server average across multiple clients (test2). Tests are run inside a virtualbox instance. For the memcached queries, a 3rd test was executed, to see how it scaled to 500 *50 queries. The database is loaded and stored in memory (static) every test run.
the virtualbox instance, and differences in processing power, might skew these results, but the relative difference should be about the same.
While this was tested with a basic / prototype implementation of SQLite3, but it seems the data is still too small to get a real performance difference here due to the loading overhead. However, memcached proved we can get a 55% speed improvement by using an in-memory cache.
Edit: MySQL and APC
Since the overhead of Sqlite is likely caused by loading the file etc, and not so much by the query, I decided to give it a run on SQL (MariaDb). Results are below, using the same testing method. To prevent any impact from outside on the solution, the normal tests were run again and used as reference. (Prevents processor load by other applications from distorting test results).
As we can see, SQL removes the speed limit imposed by the sqlite loading. It also comes relatively close to the normal approach combined with memcached. Possible disadvantages are the fact that this requires an SQL server.
When combined with APC, we can see between 5 and 10% speed improvements over memcached. The final results are significant. When using SQL and APC combined, a benchmark starting from an empty cache is handled 40% faster than now. When running a benchmark multiple times, the average single benchmark is handled up to 77% faster than a request at this moment.
How can a single benchmark starting from an empty cache still improve performance
The caching mechanism (APC or memcached) store the stations one by one, based on their query.
APC should be easy to install on the server, and is extremely easy to implement as well. Therefore, we should add it without hesitating.
The final implementation reduces a liveboard benchmark to a value between 0.75 and 5 msec. This is a reduction of 95 - 99,17%. Therefore, there's no need for SQL here. However, the benchmarks above give some idea on the response time of different technologies, and might be useful for other (iRail) projects too.
Great work @Bertware! For some reason I supposed that PHP would optimize getting files from disk, yet your tests clearly indicate this is not the case.
I have left some comments in pull request #112:
I agree on abstracting using PSR-6 and the running while it's not available. Currently it already works without PHP available, I only overlooked the clear-cache statement in the tests (All code is already wrapped in IF statements to check if apc is available).
For the iRail servers, I would not use memcached. The additional improvements by APC are way to big to ignore. Note that the current code achieves way better performance than mentioned in the tests above (I'd say APC easily halves the time required by Memcached). Using memcached might lower the priority of using APC ("We're already using caching, so APC is not top priority"). Therefore, I would do it right from the first time, and just install APC.
As I discussed in #88, we do need to move to PHP7 too. The install of APC could be combined by upgrading to PHP7. PHP7 is the way to go anyhow, and shouldn't break too much code (if it breaks anything at all). That 2000x speedup should also allow serving more requests from the same hardware, which in turn should lower costs. (I know that stations is only part of iRail, but any improvement is good, and I might look into improving iRail API as well)