Skip to content

Batch action runner state updates into occasional DB flushes #35737

@SpeedyCraftah

Description

@SpeedyCraftah

Feature Description

I'll try not to bore you with the specifics of my setup but here goes.
I've recently just noticed that my server's CPU has been consistently at around the 10% mark at almost all times, which I found was unusual since my idle rate usually hovered at around the 5-6% mark.
I've done some digging and saw from top that it was coming from Gitea which was also reading & writing to the disk a lot, and after running a fatrace saw that it was from the SQLite3 database that I have setup with Gitea.

This was never the case before when running Gitea months on end and so found it unusual, but after enabling SQL logging, I saw almost every second that Gitea was reading & writing the state of the recently setup action runners every time they send a ping to the Gitea instance to let it know that they're still alive and running (see screenshots). I have 2 action runners on different servers connected via a WireGuard instance, and when shutting down the tunnel temporarily to effectively stop the pings from the runners, Gitea went back to its usual low resource idle state and my elevated CPU & disk disappeared (see screenshots also).

Now, since I have 2 runners with the default setting of pinging Gitea every 2 seconds (fetch_interval: 2s), and since each ping performs a DB read + write, this explains the elevated resource usage I was seeing. Now, my server is quite low-spec'd and has a relatively slow disk (I prefer to squeeze as much performance as I can out of the same hardware) which is why these readings show up clearly, but I think this highlights an inefficiency in how Gitea handles the runner pings.

To mitigate this, I could simply just reduce the amount of times my runners ping Gitea, but this means it will take longer for a workflow job to be picked up, and I believe this would be a case of fighting the symptom rather than the cause, the cause being that the database is being accessed on every single runner ping.

I propose that runner pings are batched together in memory, meaning that a runner's last seen date isn't immediately flushed to the database on each ping, but rather it is either kept tracked in some kind of hashmap where the states are flushed to the database in one operation, or even better, the last seen date of a runner is flushed to the database ONLY if it has been X amount of minutes (or even seconds) since the last flush, checked on each runner ping. Even if a Gitea instance crashes, in the worst case on startup the runner's last seen date will be out of date by X minutes, and since that metric often isn't very important anyways since it gets updated quite frequently (unless the runner is offline), I think this is a more than acceptable trade-off compared to the resource usage we are saving.

Screenshots

Before shutting down the runner tunnels:
Image

SQL log before shutting down the runner tunnels:
Image

After shutting down the runner tunnel (effectively stopping the runner pings), Gitea is no longer the top entry or even close:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic/gitea-actionsrelated to the actions of Giteatype/proposalThe new feature has not been accepted yet but needs to be discussed first.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions