Number of audit tasks generated-per-min is not configurable #93

perama-v · 2023-03-21T22:34:00Z

Description

glados-audit can be viewed as a funnel as follows:

flowchart TD
subgraph generate[Trigger every AUDIT_SELECTION_PERIOD_SECONDS]
s1[Strategy Latest]
s2[Strategy Random]
s3[Strategy Random]
end
s1  & s2 & s3 --> |send KEYS_PER_PERIOD tasks|chan[Audit task channel]


chan --> |take 1 task|a1 & a2 & a3 & a4
subgraph fulfill[Continuously replenish with new threads once they complete]
a1[Auditing thread 1]
a2[Auditing thread 2]
a3[...]
a4[Auditing thread CONCURRENCY]
end
a1 & a2 & a3 & a4 --> node[Portal node]

At present, the CLI can control the throughput as follows:

--concurrency <n> flag controls the maximum funnel output rate.
--strategy <strat> flag controls the nature of tasks generated (limited effect on throughput. E.g., setting multiple --strategy random)

The two variables that control the maximum funnel input rate are:

KEYS_PER_PERIOD. Currently hard coded as 10.
AUDIT_SELECTION_PERIOD_SECONDS. Currently hard coded as 120 (seconds)

Thus max audits per minute can be calculated:

one active strategy, the funnel is filled at 10/120 * 60 = 5 tasks (individual content key audits) per minute.
Currently the default is three active strategies and the funnel is filled at 3 * 10/120 * 60 = 15 tasks (individual content key audits) per minute.

Noting that observed audits/min rate will be lower because audits that timeout against a portal node are not recorded as pass/fail.

The funnel has a "rim height" set to overflow at 100 pending tasks. That is, when the channel has 100 pending tasks, new
tasks generated at this point will be discarded.

Resolution

Expose funnel input control from the CLI. Options flag to:

Expose KEYS_PER_PERIOD variable.
Expose AUDIT_SELECTION_PERIOD_SECONDS variable.
Expose KEYS_PER_PERIOD and AUDIT_SELECTION_PERIOD_SECONDS variables.
New --max-task-rate <n = max audits per min> flag that controls maximum audits per minute that are generated.
a. Titrate AUDIT_SELECTION_PERIOD_SECONDS to n, taking into account number of strategies and KEYS_PER_PERIOD
b. Titrate KEYS_PER_PERIOD to n, taking into account number of strategies and AUDIT_SELECTION_PERIOD_SECONDS.

Current flags

Usage: glados-audit [OPTIONS] --transport <TRANSPORT>

Options:
  -d, --database-url <DATABASE_URL>
          [default: sqlite::memory:]

  -i, --ipc-path <IPC_PATH>
          

  -u, --http-url <HTTP_URL>
          

  -t, --transport <TRANSPORT>
          [possible values: ipc, http]

  -c, --concurrency <CONCURRENCY>
          number of auditing threads
          
          [default: 4]

  -s, --strategy <STRATEGY>
          Specific strategy to use. Default is to use all available strategies. May be passed multiple times for multiple strategies (--strategy latest --strategy random). Duplicates are permitted (--strategy random --strategy random).

          Possible values:
          - latest:
            Content that is: 1. Not yet audited 2. Sorted by date entered into glados database (newest first)
          - random:
            Randomly selected content
          - failed:
            Content that looks for failed audits and checks whether the data is still missing. 1. Key was audited previously 2. Latest audit for the key failed (data absent) 3. Keys sorted by date audited (keys with oldest failed audit first)
          - select_oldest_unaudited:
            Content that is: 1. Not yet audited. 2. Sorted by date entered into glados database (oldest first)

  -h, --help
          Print help information (use `-h` for a summary)

  -V, --version
          Print version information

The text was updated successfully, but these errors were encountered:

pipermerriam · 2023-03-22T02:42:23Z

I have an alternate suggestion that I believe gives us both of the following.

ability to control overall audit rate
ability to throttle each source/strategy in relation to each other (e.g. 50% towards latest, 30% towards random, 20% towards oldest-unaudited

I suggest the --concurrency flag remains to determine how many lookups should happen in parallel.

I suggest the following flags where N is a positive integer argument:

--latest-strategy-weight=N
--random-strategy-weight=N
... (etc)

We then interpret these as the respective weights for how many jobs we pull from each source.

Implementation detail: This would also imply that we give each strategy its own mpsc::Channel to enable the logic for spreading the audits across the different sources.

Each strategy can then produce audit candidates at maximum rate all of the time and will self-throttle when the channel is full.

perama-v · 2023-03-22T04:10:20Z

Ok that sounds good. So to check:

If there are two strategies (A weight=1, B weight=2), with KEYS_PER_PERIOD=10. Then:

Total weight = 1 + 2 = 3
A = 10 * 1//3 = 3 keys per period
B = 10 * 2//3 = 7 keys per period

Or you could pass 10 and 20 for weights respectively to get 30 and 70 keys per period respectively

pipermerriam · 2023-03-22T15:27:31Z

I think we would get rid of the KEYS_PER_PERIOD concept and probably handle each strategy different.

all strategies would be "throttled" by their mpsc::Channel being "full" so they would select as fast as they can and then block once their channel is full.
random can just select as fast as possible, probably in some kind of batch size since we probably don't want to hammer the database with single row lookups.
latest should probably get "smarter" and work backwards from the front of the chain stopping either once it hits the previous latest or some other maximum batch size.
oldest un-audited can probably select as fast as it wants as long as it has protection from filling the queue with duplicates.

perama-v mentioned this issue Mar 23, 2023

add strategy weights #95

Merged

pipermerriam closed this as completed in #95 Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of audit tasks generated-per-min is not configurable #93

Number of audit tasks generated-per-min is not configurable #93

perama-v commented Mar 21, 2023

pipermerriam commented Mar 22, 2023

perama-v commented Mar 22, 2023

pipermerriam commented Mar 22, 2023

Number of audit tasks generated-per-min is not configurable #93

Number of audit tasks generated-per-min is not configurable #93

Comments

perama-v commented Mar 21, 2023

Description

Resolution

Current flags

pipermerriam commented Mar 22, 2023

perama-v commented Mar 22, 2023

pipermerriam commented Mar 22, 2023