Skip to content

Commit

Permalink
Docs: update FAQ to reflect newly-implemented features
Browse files Browse the repository at this point in the history
  • Loading branch information
evanelias committed Dec 30, 2016
1 parent 62b7a50 commit b04a62d
Showing 1 changed file with 27 additions and 13 deletions.
40 changes: 27 additions & 13 deletions doc/faq.md
Expand Up @@ -2,7 +2,14 @@

### Is Skeema another online schema change tool?

No. Skeema is a tool for *managing* schemas, and the workflow around how schema changes are requested, reviewed, and performed. It can be used as a "glue" layer between git and existing online schema change tools, and/or as part of a continuous integration / continuous deployment system.
No. Skeema is a tool for *managing* schemas, and the workflow around how schema changes are requested, reviewed, and performed. It can be used as a "glue" layer between git and existing online schema change tools, or perhaps as part of a continuous integration / continuous deployment pipeline.

Skeema is designed to be a unified solution to the following common problems:

* Keeping schemas in sync across development, staging, and production environments
* Keeping schemas in sync across multiple shards
* Exporting schemas to a repo and managing them like code, using pull requests and code review
* Configuring use of an external online schema change tool, optionally only for certain table sizes, pools, schema names, or environments

Skeema does not implement its own method for online schema changes, but it can be configured to shell out to other arbitrary online schema change tools.

Expand All @@ -22,26 +29,26 @@ Aside from the temporary schema operations described below, only one command mod

#### Temporary schema usage

Most Skeema commands need to perform intermediate operations in a scratch space -- for example, to run CREATE TABLE statements in the *.sql files, so that the corresponding information_schema representation may be inspected. By default, Skeema creates, uses, and then drops a database called `_skeema_tmp`. (The schema name and dropping behavior may be configured via the --temp-schema and --reuse-temp-schema options.)
Most Skeema commands need to perform intermediate operations in a scratch space -- for example, to run CREATE TABLE statements in the *.sql files, so that the corresponding information_schema representation may be inspected. By default, Skeema creates, uses, and then drops a database called `_skeema_tmp`. (The schema name and dropping behavior may be configured via the [temp-schema](options.md#temp-schema) and [reuse-temp-schema](options.md#reuse-temp-schema) options.)

When operating on the temporary database, Skeema refuses to drop a table if it contains any rows, and likewise refuses to drop the database if any tables contain any rows. This prevents disaster if someone accidentally points --temp-schema at a real schema, or accidentally starts storing real data in the temporary schema.
When operating on the temporary database, Skeema refuses to drop a table if it contains any rows, and likewise refuses to drop the database if any tables contain any rows. This prevents disaster if someone accidentally points [temp-schema](options.md#temp-schema) at a real schema, or accidentally starts storing real data in the temporary schema.

#### Dropping tables and columns is prevented by default

Destructive actions only occur when specifically requested. This prevents human error with running `skeema push` from an out-of-date repo working copy, as well as misinterpreting accidental attempts to rename tables or columns (both of which are not yet supported).

* `skeema push` refuses to run any generated DROP TABLE statement, unless the --allow-drop-table option is provided.
* `skeema push` refuses to run any generated ALTER TABLE statement that drops columns, unless the --allow-drop-column option is provided.
* `skeema push` refuses to run any generated DROP TABLE statement, unless the [allow-drop-table option](options.md#allow-drop-table) is provided.
* `skeema push` refuses to run any generated ALTER TABLE statement that drops columns, unless the [allow-drop-column option](options.md#allow-drop-column) is provided.

`skeema diff` also provides the same two options, even though `skeema diff` never actually modifies tables regardless. These options are present so that `skeema diff` can serve as a safe dry-run that exactly matches the logic for `skeema push`.

A future enhancement of Skeema may allow dropping tables or columns without specifying these options *if the table is detected to be completely empty*, as a convenience when iteratively developing a new schema. This has not yet been implemented.
You may also configure Skeema to always permit dropping tables or columns below a certain size (in bytes), or always permit dropping tables or columns only for tables that have no rows. See the [allow-below-size option](options.md#allow-below-size).

#### Auto-generated DDL is verified for correctness

Skeema is a declarative tool: users declare what the table *should* look like (via CREATE TABLE files), and the tool generates the corresponding ALTER TABLE in `skeema diff` (outputted but not run) and `skeema push` (actually executed). When generating these statements, Skeema *automatically verifies their correctness* by testing them in the temporary schema. This confirms that running the generated DDL against an empty copy of the old (live) table definition correctly yields the expected new (from filesystem/repo) table definition. If verification fails, Skeema aborts.

When performing a large diff or push that affects dozens or hundreds of tables, this verification behavior may slow things down. You may skip verification for speed reasons via the --skip-verify option, but this is not recommended.
When performing a large diff or push that affects dozens or hundreds of tables, this verification behavior may slow things down. You may skip verification for speed reasons via the [skip-verify option](options.md#verify), but this is not recommended.

#### Detection of unsupported table features

Expand All @@ -57,23 +64,30 @@ Please see the [requirements doc](requirements.md#responsibilities-for-the-user)

### How do I configure Skeema to use online schema change tools?

The --alter-wrapper option for `skeema diff` and `skeema push` allows you to shell out to arbitrary external command(s) to perform ALTERs. You can set this option in `~/.skeema` or any other `.skeema` config file to automatically apply it every time. For example, to always use `pt-online-schema-change` to perform ALTERs, you might have a config file line of:
The [alter-wrapper option](options.md#alter-wrapper) for `skeema diff` and `skeema push` allows you to shell out to arbitrary external command(s) to perform ALTERs. You can set this option in `~/.skeema` or any other `.skeema` config file to automatically apply it every time. For example, to always use `pt-online-schema-change` to perform ALTERs, you might have a config file line of:

```ini
alter-wrapper=/usr/local/bin/pt-online-schema-change --alter {CLAUSES} D={SCHEMA},t={TABLE},h={HOST},P={PORT},u={USER},p={PASSWORD}
```

The brace-wrapped variables will automatically be replaced with appropriate values from the corresponding `.skeema` files. The {CLAUSES} variable returns the portion of the DDL statement after the prefix, e.g. everything after `ALTER TABLE table_name `. You can also obtain the full DDL statement via {DDL}. Variable values containing spaces or control characters will be escaped and wrapped in single-quotes, and then the entire command string is passed to /bin/sh -c.
The brace-wrapped variables will automatically be replaced with appropriate values from the corresponding `.skeema` files. The {CLAUSES} variable returns the portion of the DDL statement after the prefix, e.g. everything after `ALTER TABLE table_name `. You can also obtain the full DDL statement via {DDL}. Variable values containing spaces or control characters will be escaped and wrapped in single-quotes, and then the entire command string is passed to `/bin/sh -c`.

Currently this feature only works easily for `pt-online-schema-change`. Integration with `gh-ost` is more challenging, because its recommended execution mode requires passing it a *replica*, not the master; but meanwhile `.skeema` files should only refer to the master, since this is where `CREATE TABLE` and `DROP TABLE` statements need to be run. Similar problems exist with using `fb-osc`, which must be run on the master *and* all replicas individually. Better integration for these tools may be added in the future.

### How do I configure Skeema to use MySQL 5.6+ online DDL (algorithm=inplace)?
### How do I force Skeema to use MySQL 5.6+ online DDL (algorithm=inplace, lock=none)?

The [alter-algorithm](options.md#alter-algorithm) and [alter-lock](options.md#alter-lock) options permit configuring use of MySQL 5.6 online DDL.

This is not yet supported, but is high on the priority list. These ALTERs generally aren't replication-friendly due to lag they create, but are safe in some common scenarios (small tables; or no traditional replicas e.g. RDS without read-replicas). The plan is to make this configurable, with one option being smart auto-detection of when online DDL is safe.
Note that these ALTERs generally aren't replication-friendly due to lag they create, but are safe in some common scenarios (small tables; or no traditional replicas e.g. RDS without read-replicas). You can optionally combine these options with [alter-wrapper](options.md#alter-wrapper) and [alter-wrapper-min-size](options.md#alter-wrapper-min-size) to implement conditional logic: use online DDL for smaller tables, and an external online schema change (OSC) tool for larger tables.

### How do I configure Skeema to use service discovery?

This isn't supported yet, but eventual integration with etcd, Consul, and ZooKeeper is planned. A nearer-term solution will be support for shelling out to an external process to determine which host(s) a given directory should apply to.
There are several possibilities here, all based on how the [host option](options.md#host) is configured:

* DNS: This works if you can provide a consistently up-to-date domain name for the master of each pool. It isn't friendly towards sharded environments though, nor is it a good solution if nonstandard port numbers are in use. (Skeema does not yet support SRV record lookups.)

* External command shellout: by setting [host](options.md#host) to a backtick-wrapped external command line, you can configure Skeema to obtain hosts (and optionally ports) dynamically from the output of any arbitrary script. This permits you to interface with any service discovery client, to do lookups like "return the master of pool foo" or "return all shard masters for sharded pool xyz".

For now, the work-around is to use DNS, or to have a configuration management system rewrite directories' .skeema config files when host roles change. Providing a better solution is high on the priority list.
* Configuration management: You could use a system like Chef or Puppet to rewrite directories' .skeema config files periodically, ensuring that an up-to-date master IP is listed in each file.

Simpler integration with etcd, Consul, and ZooKeeper is planned for future releases.

0 comments on commit b04a62d

Please sign in to comment.