Skip to content

sa: make authz2 table cheaper #8702

@jsha

Description

@jsha

The authz2 table is our biggest and fastest growing. We'd like to make it cheaper from a database perspective.

The current schema is:

CREATE TABLE `authz2` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `registrationID` bigint(20) NOT NULL,
  `status` tinyint(4) NOT NULL,
  `expires` datetime NOT NULL,
  `challenges` tinyint(4) NOT NULL,
  `attempted` tinyint(4) DEFAULT NULL,
  `attemptedAt` datetime DEFAULT NULL,
  `token` binary(32) NOT NULL,
  `validationError` mediumblob DEFAULT NULL,
  `validationRecord` mediumblob DEFAULT NULL,
  `certificateProfileName` varchar(32) DEFAULT NULL,
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`),
  KEY `regID_expires_idx` (`registrationID`,`status`,`expires`),
  KEY `regID_identifier_status_expires_idx` (`registrationID`,`identifierType`,`identifierValue`,`status`,`expires`),
  KEY `expires_idx` (`expires`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci;

There are indexes here that we don't need. The expires_idx was created in support of the expired-authz2-purger tool, which no longer exists. The regID_expires_idx is in support of CountPendingAuthorizations2, which is no longer called (and should be deleted).

We do still need an equivalent of regID_identifier_status_expires_idx, which we use to find authorizations for reuse. However, this index should be moved to a separate table. See Reuse section below.

Deleting unneeded indexes should reduce the number of non-sequential writes needed when we insert an authorization row.

Non-growing Rows

The lifecycle of an authorization row involves being inserted into the table, then updated with a validationRecord and possibly a validationError, which increases the size of the record significantly. The average validationError is 142 bytes (or 181 bytes if we only count nonempty validationError). The average validationRecord is 211 bytes. Growing the record is likely to cause InnoDB to split blocks, which causes unnecessary writes when updating indexes.

According to https://dev.mysql.com/doc/refman/8.4/en/innodb-physical-structure.html:

> When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.

Authorization inserts are sequential, so the index pages will start out 15/16 full, which will mean they are pretty close to a split.

Instead, we can split the two written-later fields into separate tables. I propose we use two tables: successfulValidations and failedValidations:

CREATE TABLE `successfulValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`),
);�
CREATE TABLE `failedValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationError` mediumblob NOT NULL,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`),
);

Having separate tables for successes and failures allows us to set different retention periods for them, if we choose.

We would then give the new authorizations table a field that gets updated on success or failure, with the id of a successfulValidations or failedValidations row:

   `validationId` bigint(20) DEFAULT NULL

SA would choose the right table to query based on a fetched authorization's status.

In theory appending to these separate tables should be cheap, since they are autoincrement. More importantly, it allows us to avoid ever resizing authz2 rows. That allows us to maintain an efficient fill proportion (15/16).

Adding the columns is an instant operation, but dropping the indexes is not. However, we can manage the transition by creating a new authorizations table, and having Boulder temporarily SELECT from both authz2 and authorizations for a transition period.

Authorization Reuse

Currently, we rely on an index in the authz2 table to find authorizations that are eligible for reuse. Instead, we should use a secondary table, reusableAuthorizations. This lets us be more efficient in a few ways:

  • Only successful authorizations get written to the table, so we don't waste indexing work on failed ones.
  • We can set the retention period for reusableAuthorizations to 30 days, since that's the current maximum time an authorization can be reused.
  • We no longer query the regID_identifier_status_expires_idx index. Queries to this index may cause old partitions to have data read into memory unnecessarily, because InnoDB doesn't know that authorizations in old partitions are likely to be unusable (expires in the past).

With a separate table for reuse, we'll only ever query or update the new authorizations table by id, which allows the working set to be restricted to authorizations that are actually in use.

New Schema

CREATE TABLE `authorizations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `registrationID` bigint(20) NOT NULL,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `status` tinyint(4) NOT NULL,
  `validationID` bigint(20),
  `expires` datetime NOT NULL,
  `challenges` tinyint(4) NOT NULL,
  `attempted` tinyint(4),
  `attemptedAt` datetime,
  `token` binary(32) NOT NULL,
  `certificateProfileName` varchar(32),
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`)
);

-- Records are created in this table when validation succeeds.
CREATE TABLE `reusableAuthorizations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `registrationID` bigint(20) NOT NULL,
  `identifierType` tinyint(4) NOT NULL,
  `identifierValue` varchar(255) NOT NULL,
  `certificateProfileName` varchar(32),
  `authorizationID` bigint(20) NOT NULL,
  `expires` datetime NOT NULL,
  `created` datetime DEFAULT current_timestamp(),
  PRIMARY KEY (`id`)
  KEY `regID_identifier_expires_idx` (`registrationID`,`identifierValue`,`expires`)
);

-- Records are created in this table when validation succeeds.
CREATE TABLE `successfulValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`)
);

-- Records are created in this table when validation fails.
CREATE TABLE `failedValidations` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `validationError` mediumblob NOT NULL,
  `validationRecord` mediumblob NOT NULL,
  PRIMARY KEY (`id`)
);

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions