Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of audit tasks generated-per-min is not configurable #93

Closed
perama-v opened this issue Mar 21, 2023 · 3 comments · Fixed by #95
Closed

Number of audit tasks generated-per-min is not configurable #93

perama-v opened this issue Mar 21, 2023 · 3 comments · Fixed by #95

Comments

@perama-v
Copy link
Contributor

Description

glados-audit can be viewed as a funnel as follows:

flowchart TD
subgraph generate[Trigger every AUDIT_SELECTION_PERIOD_SECONDS]
s1[Strategy Latest]
s2[Strategy Random]
s3[Strategy Random]
end
s1  & s2 & s3 --> |send KEYS_PER_PERIOD tasks|chan[Audit task channel]


chan --> |take 1 task|a1 & a2 & a3 & a4
subgraph fulfill[Continuously replenish with new threads once they complete]
a1[Auditing thread 1]
a2[Auditing thread 2]
a3[...]
a4[Auditing thread CONCURRENCY]
end
a1 & a2 & a3 & a4 --> node[Portal node]
Loading

At present, the CLI can control the throughput as follows:

  • --concurrency <n> flag controls the maximum funnel output rate.
  • --strategy <strat> flag controls the nature of tasks generated (limited effect on throughput. E.g., setting multiple --strategy random)

The two variables that control the maximum funnel input rate are:

  • KEYS_PER_PERIOD. Currently hard coded as 10.
  • AUDIT_SELECTION_PERIOD_SECONDS. Currently hard coded as 120 (seconds)

Thus max audits per minute can be calculated:

  • one active strategy, the funnel is filled at 10/120 * 60 = 5 tasks (individual content key audits) per minute.
  • Currently the default is three active strategies and the funnel is filled at 3 * 10/120 * 60 = 15 tasks (individual content key audits) per minute.

Noting that observed audits/min rate will be lower because audits that timeout against a portal node are not recorded as pass/fail.

The funnel has a "rim height" set to overflow at 100 pending tasks. That is, when the channel has 100 pending tasks, new
tasks generated at this point will be discarded.

Resolution

Expose funnel input control from the CLI. Options flag to:

  1. Expose KEYS_PER_PERIOD variable.
  2. Expose AUDIT_SELECTION_PERIOD_SECONDS variable.
  3. Expose KEYS_PER_PERIOD and AUDIT_SELECTION_PERIOD_SECONDS variables.
  4. New --max-task-rate <n = max audits per min> flag that controls maximum audits per minute that are generated.
    a. Titrate AUDIT_SELECTION_PERIOD_SECONDS to n, taking into account number of strategies and KEYS_PER_PERIOD
    b. Titrate KEYS_PER_PERIOD to n, taking into account number of strategies and AUDIT_SELECTION_PERIOD_SECONDS.

Current flags

Usage: glados-audit [OPTIONS] --transport <TRANSPORT>

Options:
  -d, --database-url <DATABASE_URL>
          [default: sqlite::memory:]

  -i, --ipc-path <IPC_PATH>
          

  -u, --http-url <HTTP_URL>
          

  -t, --transport <TRANSPORT>
          [possible values: ipc, http]

  -c, --concurrency <CONCURRENCY>
          number of auditing threads
          
          [default: 4]

  -s, --strategy <STRATEGY>
          Specific strategy to use. Default is to use all available strategies. May be passed multiple times for multiple strategies (--strategy latest --strategy random). Duplicates are permitted (--strategy random --strategy random).

          Possible values:
          - latest:
            Content that is: 1. Not yet audited 2. Sorted by date entered into glados database (newest first)
          - random:
            Randomly selected content
          - failed:
            Content that looks for failed audits and checks whether the data is still missing. 1. Key was audited previously 2. Latest audit for the key failed (data absent) 3. Keys sorted by date audited (keys with oldest failed audit first)
          - select_oldest_unaudited:
            Content that is: 1. Not yet audited. 2. Sorted by date entered into glados database (oldest first)

  -h, --help
          Print help information (use `-h` for a summary)

  -V, --version
          Print version information
@pipermerriam
Copy link
Member

I have an alternate suggestion that I believe gives us both of the following.

  • ability to control overall audit rate
  • ability to throttle each source/strategy in relation to each other (e.g. 50% towards latest, 30% towards random, 20% towards oldest-unaudited

I suggest the --concurrency flag remains to determine how many lookups should happen in parallel.

I suggest the following flags where N is a positive integer argument:

  • --latest-strategy-weight=N
  • --random-strategy-weight=N
  • ... (etc)

We then interpret these as the respective weights for how many jobs we pull from each source.

Implementation detail: This would also imply that we give each strategy its own mpsc::Channel to enable the logic for spreading the audits across the different sources.

Each strategy can then produce audit candidates at maximum rate all of the time and will self-throttle when the channel is full.

@perama-v
Copy link
Contributor Author

Ok that sounds good. So to check:

If there are two strategies (A weight=1, B weight=2), with KEYS_PER_PERIOD=10. Then:

  • Total weight = 1 + 2 = 3
  • A = 10 * 1//3 = 3 keys per period
  • B = 10 * 2//3 = 7 keys per period

Or you could pass 10 and 20 for weights respectively to get 30 and 70 keys per period respectively

@pipermerriam
Copy link
Member

I think we would get rid of the KEYS_PER_PERIOD concept and probably handle each strategy different.

  • all strategies would be "throttled" by their mpsc::Channel being "full" so they would select as fast as they can and then block once their channel is full.
  • random can just select as fast as possible, probably in some kind of batch size since we probably don't want to hammer the database with single row lookups.
  • latest should probably get "smarter" and work backwards from the front of the chain stopping either once it hits the previous latest or some other maximum batch size.
  • oldest un-audited can probably select as fast as it wants as long as it has protection from filling the queue with duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants