Command line flags
A more in-depth discussion of various
gh-ost command line flags: implementation, implication, use cases.
Add this flag when executing on Aliyun RDS.
gh-ost would like you to connect to a replica, from where it figures out the master by itself. This wiring is required should your master execute using
If, for some reason, you do not wish
gh-ost to connect to a replica, you may connect it directly to the master and approve this via
When your migration issues a column rename (
change column old_name new_name ...)
gh-ost analyzes the statement to try an associate the old column name with new column name. Otherwise the new structure may also look like some column was dropped and another was added.
gh-ost will print out what it thinks the rename implied, but will not issue the migration unless you provide with
If you think
gh-ost is mistaken and that there's actually no rename involved, you may pass
--skip-renamed-columns instead. This will cause
gh-ost to disassociate the column values; data will not be copied between those columns.
gh-ost infers the identity of the master server by crawling up the replication topology. You may explicitly tell
gh-ost the identity of the master host via
--assume-master-host=the.master.com. This is useful in:
- master-master topologies (together with
gh-ostcan arbitrarily pick one of the co-masters and you prefer that it picks a specific one
- tungsten replicator topologies (together with
gh-ostis unable to crawl and detect the master
If you happen to know your servers use RBR (Row Based Replication, i.e.
binlog_format=ROW), you may specify
--assume-rbr. This skips a verification step where
gh-ost would issue a
STOP SLAVE; START SLAVE.
Skipping this step means
gh-ost would not need the
SUPER privilege in order to operate.
You may want to use this on Amazon RDS.
--conf=/path/to/my.cnf: file where credentials are specified. Should be in (or contain) the following format:
[client] user=gromit password=123456
Comma delimited status-name=threshold, same format as
--critical-load defines a threshold that, when met,
gh-ost panics and bails out. The default behavior is to bail out immediately when meeting this threshold.
This may sometimes lead to migrations bailing out on a very short spike, that, while in itself is impacting production and is worth investigating, isn't reason enough to kill a 10 hour migration.
--critical-load-interval-millis is specified (e.g.
gh-ost gives a second chance: when it meets
critical-load threshold, it doesn't bail out. Instead, it starts a timer (in this example:
2.5 seconds) and re-checks
critical-load when the timer expires. If
critical-load is met again,
gh-ost panics and bails out. If not, execution continues.
This is somewhat similar to a Nagios
n-times test, where
n in our case is always
Optional. Default is
safe. See more discussion in
Danger: this flag will silently discard any foreign keys existing on your table.
At this time (10-2016)
gh-ost does not support foreign keys on migrated tables (it bails out when it notices a FK on the migrated table). However, it is able to support dropping of foreign keys via this flag. If you're trying to get rid of foreign keys in your environment, this is a useful flag.
gh-ost reads event from the binary log and applies them onto the ghost table. It does so in batched writes: grouping multiple events to apply in a single transaction. This gives better write throughput as we don't need to sync the transaction log to disk for each event.
--dml-batch-size flag controls the size of the batched write. Allowed values are
1 - 100, where
1 means no batching (every event from the binary log is applied onto the ghost table on its own transaction). Default value is
Why is this behavior configurable? Different workloads have different characteristics. Some workloads have very large writes, such that aggregating even
50 writes into a transaction makes for a significant transaction size. On other workloads write rate is high such that one just can't allow for a hundred more syncs to disk per second. The default value of
10 is a modest compromise that should probably work very well for most workloads. Your mileage may vary.
Noteworthy is that setting
--dml-batch-size to higher value does not mean
gh-ost blocks or waits on writes. The batch size is an upper limit on transaction size, not a minimal one. If
gh-ost doesn't have "enough" events in the pipe, it does not wait on the binary log, it just writes what it already has. This conveniently suggests that if write load is light enough for
gh-ost to only see a few events in the binary log at a given time, then it is also light enough for
gh-ost to apply a fraction of the batch size.
gh-ost execution need to copy whatever rows you have in your existing table onto the ghost table. This can, and often be, a large number. Exactly what that number is?
gh-ost initially estimates the number of rows in your table by issuing an
explain select * from your_table. This will use statistics on your table and return with a rough estimate. How rough? It might go as low as half or as high as double the actual number of rows in your table. This is the same method as used in
gh-ost also supports the
--exact-rowcount flag. When this flag is given, two things happen:
- An initial, authoritative
select count(*) from your_table. This query may take a long time to complete, but is performed before we begin the massive operations. When
--concurrent-rowcountis also specified, this runs in parallel to row copy. Note:
--concurrent-rowcountnow defaults to
- A continuous update to the estimate as we make progress applying events. We heuristically update the number of rows based on the queries we process from the binlogs.
While the ongoing estimated number of rows is still heuristic, it's almost exact, such that the reported ETA or percentage progress is typically accurate to the second throughout a multiple-hour operation.
Without this parameter, migration is a noop: testing table creation and validity of migration, but not touching data.
Add this flag when executing on a 1st generation Google Cloud Platform (GCP).
Default 100. See
subsecond-lag for details.
gh-ost maintains two tables while migrating: the ghost table (which is synced from your original table and finally replaces it) and a changelog table, which is used internally for bookkeeping. By default, it panics and aborts if it sees those tables upon startup. Provide
--initially-drop-old-table to let
gh-ost know it's OK to drop them beforehand.
gh-ost should not take chances or make assumptions about the user's tables. Dropping tables can be a dangerous, locking operation. We let the user explicitly approve such operations.
On a replication topology, this is perhaps the most important migration throttling factor: the maximum lag allowed for migration to work. If lag exceeds this value, migration throttles.
When using Connect to replica, migrate on master, this lag is primarily tested on the very replica
gh-ost operates on. Lag is measured by checking the heartbeat events injected by
gh-ost itself on the utility changelog table. That is, to measure this replica's lag,
gh-ost doesn't need to issue
show slave status nor have any external heartbeat mechanism.
--throttle-control-replicas is provided, throttling also considers lag on specified hosts. Lag measurements on listed hosts is done by querying
gh-ost's changelog table, where
gh-ost injects a heartbeat.
See also: Sub-second replication lag throttling
List of metrics and threshold values; topping the threshold of any will cause throttler to kick in. See also:
gh-ost is used to migrate tables on a master. If you wish to only perform the migration in full on a replica, connect
gh-ost to said replica and pass
gh-ost will briefly connect to the master but other issue no changes on the master. Migration will be fully executed on the replica, while making sure to maintain a small replication lag.
Indicate a file name, such that the final cut-over step does not take place as long as the file exists.
When this flag is set,
gh-ost expects the file to exist on startup, or else tries to create it.
gh-ost exits with error if the file does not exist and
gh-ost is unable to create it.
With this flag set, the migration will cut-over upon deletion of the file or upon
cut-over interactive command.
Defaults to 99999. If you run multiple migrations then you must provide a different, unique
--replica-server-id for each
Optionally involve the process ID, for example:
It's on you to choose a number that does not collide with another
gh-ost or another running replica.
concurrent-migrations on the cheatsheet.
gh-ost verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not reference other table nor is referenced by other tables) and wish to save the check time, provide with
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See
Provide a HTTP endpoint;
gh-ost will issue
HEAD requests on given URL and throttle whenever response status code is not
200. The URL can be queried and updated dynamically via interactive commands. Empty URL disables the HTTP check.
Makes the old table include a timestamp value. The old table is what the original table is renamed to at the end of a successful migration. For example, if the table is
gh_ost_test, then the old table would normally be
--timestamp-old-table it would be, for example,
tungsten on the cheatsheet.