Optimize the realtime trips updater #100

jamespfennell · 2023-04-01T16:10:33Z

For the new NYC buses transit system (#97), I'm seeing GTFS realtime feed updates taking up to 7 seconds. To be clear, this is probably not a huge deal from a features perspective because the accuracy of transit data is O(30 seconds) so having slightly stale data is fine. And the update process is in the background so it doesn't impact HTTP latency. The main problem is that for those 7 seconds Transiter is probably using a non-trivial amount of CPU/memory/IO, and thus requires a larger machine for deployment. By analogy, when the Python to Go migration was merged the minimum requirements to run Transiter with the NYC subway system dropped from ~2.5 CPUs to ~0.5 CPUs because the update process became 5 times more efficient. It's literally cheaper to run Transiter now.

As a first pass I ran pprof on a Transiter instance with the new system:

go tool pprof "http://localhost:8082/debug/pprof/profile?seconds=300"

which dumped a new profile in my~/pprof directory. (Note that you can always take a prprof profile of a running Tranister instance using the admin HTTP endpoint /debug/pprof/profile.) This is the flamegraph, limited to the UpdateFeed function which drives the feed update:

There are three big consumers of CPU time, in decreasing order:

db.(*Queries).UpdateTripStopTime - this function sends an update SQL query to Postgres and is invoked once for every single stop time update in the feed. This is a lot of stop time updates! I suspect the reason this is taking so long isn't because the SQL query is slow. It's because of the overhead of calling Postgres from the Go code. It would be awesome to batch these SQL queries somehow. This is non-trivial though because it's an update query and Postgres/sqlc doesn't seem to offer a simple solution here.
db.(*Queries).InsertServiceMapStop - this inserts a single node in a service map. Again, batching this would probably help dramatically. This is easier than the last case because it's an insert query, so we can probably use sqlc's support for CopyFrom.
common.Hash - this takes a hash of the GTFS data for a single trip, and uses that to decide if the trip needs to be updated. If not, the trip update is skipped. This was submitted in b1bc7e1 and overall is a pretty good optimization - it halves the time needed to update the NYC subway. However there may be a way to make the hashing faster.

The text was updated successfully, but these errors were encountered:

jamespfennell · 2023-04-01T16:24:07Z

Optimization 2 seems really easy: e947f72!

jamespfennell · 2023-04-01T16:45:32Z

With this commit the InsertServiceMapStop portion of the flamegraph becomes tiny. Going to merge that commit.

Per report in the bug, improves performance of the update task and the code is as simple or even simpler.

jamespfennell · 2023-04-01T17:09:39Z

I did a first pass at the stops times too in 119c651. The results here look good, though not amazing. The update times seem to be halved compared to master before this ticket, and in the flame graph updateStopTimesInDB is much smaller:

At this point the code is a little simpler, too.

This code though operates at the level of a single trip. I wonder to get a really great update improvement do we need to batch stop time updates across all trips? This would make this code potentially more complex, but the savings could be awesome.

By the way, a nice thing about playing around with these optimizations is that the testing coverage is quite good, so there is not much concern about introducing regressions.

jamespfennell · 2023-04-01T18:39:44Z

The last flamegraph isn't quite right - it turns out there's another goroutine that handles "copy from" operations, and it's reporting taking a lot of time. The really weird thing is that all of the time seems to be taken with constructing errors, even though no errors are reported:

jamespfennell · 2023-04-01T18:47:36Z

Looking through the source code of the pgx driver, it seems the function encodePreparedStatementArgument is no longer in mainline. In fact, it seems that the "copy from" part of pgx changed a lot between v4 (what Transiter is using) and v5 (the most recent pgx version). I think all the dependencies of Transiter should be updated before doing more performance improvements.

All of the diffs are from updating pgx to v5, which was the point of this anyway.

jamespfennell · 2023-04-02T03:07:54Z

With the dependencies updated and using the previous commit mentioned above (not yet merged), the results here look pretty nice. I think I'm seeing about a 60% performance improvement overall, both from looking at the time taken for each update (has gone from ~8 seconds to ~3 seconds) and from the total time spent in the update function in pprof (from ~16 seconds to 6 seconds, over a 300 second profile).

jamespfennell · 2023-04-02T16:41:25Z

The current state of play on this branch:

The main 3 things we can speed up here are:

Making the trip hashing algorithm better (common.HashValue). Right now it marshals the trip to JSON and takes a hash of the bytes; presumably this would be much faster if we just hashed the struct directly somehow.
Batching UpdateTrip (second column under realtime.(*stagedUpdates).run).
Batching MarkStopTimesPast (first column under realtime.(*stagedUpdates).run).

Both of these batchings are a little trickier because we can't delete the old row and reinsert it. We may need to write some custom pgx driving code around a UPDATE FROM SQL query.

Other than these 3, I think we're at the limit of what we can do, at least in idiomatic Go. Looking through other parts of the flamegraph, non-trivial time is being spent in e.g. map accesses. That this is the bottleneck is a good sign that we've optimized the code well I think.

From the investigation on the bug (#100), the main slowness in the realtime updater comes from sending stop time updates one-at-a-time to Postgres. With this commit the updates changed to delete+insert, and both the delete and insert queries are batched across the whole update. For the NYC buses, this commit seems to make the updater at least 50% faster.

jamespfennell · 2023-04-02T22:26:56Z

So turns out sqlc/pgx have a really simple way to do batch updates using :batchexec instead of :exec. Using this makes the two SQL queries mentioned in the last comment negligible in the flamegraph. The last part is optimizing the hash calculation - this calculation is taking up 25% of the update time right now!

This improves performance by the same mechanism as other batching commits.

jamespfennell · 2023-04-03T01:54:40Z

I did a proof-of-concept of a custom trip hasher in bfb910c and, as expected, performance is much better, another 25% reduction in the time taken to do updates. The code is not crazy bad, so I'm going to commit it to the GTFS library and use it. Then this bug will be finished.

One other thing I've noticed in all of the profiles is a long time being spent in garbage collection. The GTFS library creates a lot of pointers and maybe this is why the GC is so busy? Now that generics has landed, all of these pointers could be replace with an optional type which would be much more GC friendly. This would be a big breaking change.

jamespfennell · 2023-04-03T02:14:41Z

I just took a "heap profile" (first time every doing that!) and accordingly to the inuse_space, 70% of the memory is being allocated in the GTFS realtime parser. The profile seems a little janky though and is only showing a few functions:

There is also alloc_space in which the distribution follows the CPU distribution.

Edit: my understanding of the profile is incorrect here. The profile is just showing live memory, and so doesn't tell us about the allocation profile of the program.

jamespfennell · 2023-04-14T22:51:04Z

I just committed an update to the GTFS library, which incorporates a much faster hash function for trips (jamespfennell/gtfs@02cdcf2). As expected, CPU usage in the update process decreased again.

I rolled out all of the changes in this issue to demo.transiter.dev, and am seeing an awesome reduction in the amount of time taken to do updates for the NYC subway numbered trains:

The graph is using a 5 minute sliding window, which is why the decrease is smooth. Note that the bottom axis is 0.2 seconds. The average update time thus dropped from ~800ms to ~350ms! On my CPU monitoring, this seems to have reduced Transiter's CPU usage by 10% of a vCPU (maybe about a 50% reduction, but I'm not sure because the node is multi-tenanted with other projects I run).

Finally, the last profile I took shows maybe 45% of the CPU time being spent GCing memory. I think this is potentially expected because with these optimizations Transiter is highly IO bounded. But we can revisit this later of course.

jamespfennell mentioned this issue Apr 1, 2023

Add MTA buses config #97

Merged

jamespfennell added a commit that referenced this issue Apr 1, 2023

Use CopyFrom for InsertServiceMapStop query (#100)

6886e56

Per report in the bug, improves performance of the update task and the code is as simple or even simpler.

jamespfennell added a commit that referenced this issue Apr 2, 2023

Update dependecies (#100)

62dedc3

All of the diffs are from updating pgx to v5, which was the point of this anyway.

jamespfennell added a commit that referenced this issue Apr 2, 2023

Update dependencies (#100)

e34131c

All of the diffs are from updating pgx to v5, which was the point of this anyway.

jamespfennell added a commit that referenced this issue Apr 2, 2023

Use :batchexec in the realtime updater (#100)

1233f89

This improves performance by the same mechanism as other batching commits.

jamespfennell closed this as completed Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the realtime trips updater #100

Optimize the realtime trips updater #100

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023 •

edited

Loading

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 3, 2023

jamespfennell commented Apr 3, 2023 •

edited

Loading

jamespfennell commented Apr 14, 2023

Optimize the realtime trips updater #100

Optimize the realtime trips updater #100

Comments

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023 • edited Loading

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 1, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 2, 2023

jamespfennell commented Apr 3, 2023

jamespfennell commented Apr 3, 2023 • edited Loading

jamespfennell commented Apr 14, 2023

jamespfennell commented Apr 1, 2023 •

edited

Loading

jamespfennell commented Apr 3, 2023 •

edited

Loading