sa: make authz2 table cheaper

The `authz2` table is our biggest and fastest growing. We'd like to make it cheaper from a database perspective.

The current schema is:

```sql
CREATE TABLE `authz2` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `registrationID` bigint(20) NOT NULL,
  `status` tinyint(4) NOT NULL,
  `expires` datetime NOT NULL,
  `challenges` tinyint(4) NOT NULL,
  `attempted` tinyint(4) DEFAULT NULL,
  `attemptedAt` datetime DEFAULT NULL,
  `token` binary(32) NOT NULL,
  `validationError` mediumblob DEFAULT NULL,
  `validationRecord` mediumblob DEFAULT NULL,
  `certificateProfileName` varchar(32) DEFAULT NULL,
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`),
  KEY `regID_expires_idx` (`registrationID`,`status`,`expires`),
  KEY `regID_identifier_status_expires_idx` (`registrationID`,`identifierType`,`identifierValue`,`status`,`expires`),
  KEY `expires_idx` (`expires`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci;
```

There are indexes here that we don't need. The `expires_idx` [was created](https://github.com/letsencrypt/boulder/pull/4226/changes) in support of the `expired-authz2-purger` tool, which no longer exists. The `regID_expires_idx` is in support of `CountPendingAuthorizations2`, which is no longer called (and should be deleted).

We *do* still need an equivalent of `regID_identifier_status_expires_idx`, which we use to find authorizations for reuse. However, this index should be moved to a separate table. See Reuse section below.

Deleting unneeded indexes should reduce the number of non-sequential writes needed when we insert an authorization row.

## Non-growing Rows

The lifecycle of an authorization row involves being inserted into the table, then updated with a `validationRecord` and possibly a `validationError`, which increases the size of the record significantly. The average `validationError` is 142 bytes (or 181 bytes if we only count nonempty `validationError`). The average `validationRecord` is 211 bytes. Growing the record is likely to cause InnoDB to split blocks, which causes unnecessary writes when updating indexes.

According to [https://dev.mysql.com/doc/refman/8.4/en/innodb-physical-structure.html](https://dev.mysql.com/doc/refman/8.4/en/innodb-physical-structure.html):

\> When new records are inserted into an InnoDB [clustered index](https://dev.mysql.com/doc/refman/8.4/en/glossary.html#glos_clustered_index), InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.

Authorization inserts are sequential, so the index pages will start out 15/16 full, which will mean they are pretty close to a split.

Instead, we can split the two written-later fields into separate tables. I propose we use two tables: `successfulValidations` and `failedValidations`:

```sql
CREATE TABLE `successfulValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`),
);
CREATE TABLE `failedValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationError` mediumblob NOT NULL,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`),
);
```

Having separate tables for successes and failures allows us to set different retention periods for them, if we choose.

We would then give the new `authorizations` table a field that gets updated on success or failure, with the `id` of a `successfulValidations` or `failedValidations` row:

```sql
   `validationId` bigint(20) DEFAULT NULL
```

SA would choose the right table to query based on a fetched authorization's `status`.

In theory appending to these separate tables should be cheap, since they are autoincrement. More importantly, it allows us to avoid ever resizing authz2 rows. That allows us to maintain an efficient fill proportion (15/16).

Adding the columns is an instant operation, but dropping the indexes is not. However, we can manage the transition by creating a new `authorizations` table, and having Boulder temporarily `SELECT` from both `authz2` and `authorizations` for a transition period.

## Authorization Reuse

Currently, we rely on an index in the `authz2` table to find authorizations that are eligible for reuse. Instead, we should use a secondary table, `reusableAuthorizations`. This lets us be more efficient in a few ways:

- Only successful authorizations get written to the table, so we don't waste indexing work on failed ones.  
- We can set the retention period for `reusableAuthorizations` to 30 days, since that's the current maximum time an authorization can be reused.  
- We no longer query the `regID_identifier_status_expires_idx` index. Queries to this index may cause old partitions to have data read into memory unnecessarily, because InnoDB doesn't know that authorizations in old partitions are likely to be unusable (`expires` in the past).

With a separate table for reuse, we'll only ever query or update the new `authorizations` table by id, which allows the working set to be restricted to authorizations that are actually in use.

## New Schema

```sql
CREATE TABLE `authorizations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `registrationID` bigint(20) NOT NULL,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `status` tinyint(4) NOT NULL,
  `validationID` bigint(20),
  `expires` datetime NOT NULL,
  `challenges` tinyint(4) NOT NULL,
  `attempted` tinyint(4),
  `attemptedAt` datetime,
  `token` binary(32) NOT NULL,
  `certificateProfileName` varchar(32),
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`)
);

-- Records are created in this table when validation succeeds.
CREATE TABLE `reusableAuthorizations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `registrationID` bigint(20) NOT NULL,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `certificateProfileName` varchar(32),
  `authorizationID` bigint(20) NOT NULL,
  `expires` datetime NOT NULL,
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`)
  KEY `regID_identifier_expires_idx` (`registrationID`,`identifierValue`,`expires`)
);

-- Records are created in this table when validation succeeds.
CREATE TABLE `successfulValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`)
);

-- Records are created in this table when validation fails.
CREATE TABLE `failedValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationError` mediumblob NOT NULL,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`)
);
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sa: make authz2 table cheaper #8702

Non-growing Rows

Authorization Reuse

New Schema

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

sa: make authz2 table cheaper #8702

Description

Non-growing Rows

Authorization Reuse

New Schema

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions