Table generator CLI usage

dbgen -i template.sql -o out_dir -N 7500000 -R 300000 -r 100

Common options

-i «PATH», --template «PATH»

The path to the template file. See Template reference for details.
-o «DIR», --out-dir «DIR»

The directory to store the generated files. If the directory does not exist, dbgen will try to create it.
-N «N», --total-count «N»

Total number of rows to generate. Default is 1.
-R «N», --rows-per-file «N»

Maximum number of rows per file generator thread. Default is 1.
-r «N», --rows-count «N»

Maximum number of rows per INSERT statement. Default is 1.

The --total-count and --rows-per-file parameters also accept SI prefixes (e.g. 1.5K = 1500) and exponential form (e.g. 2.5e4 = 25000) to simplify some input.

More options

-t «NAME», --table-name «NAME»

Override the table name of generated data. Should be a qualified and quoted name like '"database"."schema"."table"'.

This option cannot be used when the template has multiple tables.
--schema-name «NAME»

Replaces the schema name of the generated tables. Should be a qualified and quoted name like '"database"."schema"'.
--qualified

If specified, the generated INSERT statements will use the fully qualified table name (i.e. INSERT INTO "db"."schema"."table" …. Otherwise, only the table name will be included (i.e. INSERT INTO "table" …).
-s «SEED», --seed «SEED»

Provide a 64-digit hex number to seed the random number generator, so that the output becomes reproducible. If not specified, the seed will be obtained from the system entropy.

(Note: There is no guarantee that the same seed will produce the same output across major versions of dbgen.)
--rng «RNG»

Choose a random number generator. The default is hc128 which should be the best in most situations. Supported alternatives are:

RNG name Algorithm

chacha12 ChaCha12

chacha20 ChaCha20

hc128 HC-128

isaac ISAAC

isaac64 ISAAC-64

xorshift Xorshift

pcg32 PCG32

step Step sequence
-j «N», --jobs «N»

Use N threads to write the output in parallel. Default to the number of logical CPUs.
-q, --quiet

Disable progress bar output.

--escape-backslash

When enabled, backslash (\) is considered introducing a C-style escape sequence, and should itself be escaped as \\. In standard SQL, the backslash does not have any special meanings. This setting should match that of the target database, otherwise it could lead to invalid data or syntax error if a generated string contained a backslash.

SQL dialect	Should pass `--escape-backslash`
MySQL	Yes if `NO_BACKSLASH_ESCAPES` is off (default)
PostgreSQL	No if `standard_conforming_strings` is on (default since 9.1)
SQLite3	No
TransactSQL	No

Backslashes in template strings are always not special, regardless of this setting.

-k «N», --files-count «N»

(Deprecated) Total number of file generator threads.
-n «N», --inserts-count «N»

(Deprecated) Number of INSERT statements per file generator threads.
--last-file-inserts-count «N»

(Deprecated) In the last file generator thread, produce N INSERT statements instead of the value given by --inserts-count.
--last-insert-rows-count «N»

(Deprecated) In the last INSERT statement of the last file generator thread, produce N rows instead of the value given by --rows-count.

These four options provide an alternative way to specify the total number of rows. But they are more difficult to manage, and are no longer recommended since v0.8.0.
--time-zone «TZ»

The time zone used to parse and format timestamps. Defaults to UTC, regardless of system time zone. Any tz database time zone name (e.g. America/New_York) can be used.
--zoneinfo «PATH»

The path to the tz database, default to /usr/share/zoneinfo. On Unix this typically can be kept as default. However, not every OS (esp. Windows) has a built-in tz database, or perhaps you want to precisely control which version of tz database to use. In this case, you may use this parameter to let dbgen choose a custom database.

Note:
- The time zone data must be compiled into TZif version 2 or 3 format.
- Leap seconds are ignored.
- Immutable time zones such as UTC are hard-coded into dbgen and will skip the tz database.
--now «TIMESTAMP»

Override the timestamp reported by current_timestamp. Defaults to the time when dbgen was started. The timestamp must be written in the format YYYY-mm-dd HH:MM:SS.fff, and it is always in UTC, regardless of the --time-zone setting.
-e «TEMPLATE», --template-string «TEMPLATE»

Pass the content of a template as an inline string via this command line argument.
-D «EXPR», --initialize «EXPR»

Executes the global expression before generating files. This parameter can be specified multiple times. The expressions are executed before the global expressions in the template file. This allows user to parametrize the template, e.g. here we define a @level variable defaults to 1
```
/*{{ @level := coalesce(@level, 1) }}*/
create table foo (
    …
```
and then we can use -D to override @level:
```
./dbgen -D '@level := 2' …
```

-f «FORMAT», --format «FORMAT»

Output format of the data files. Could be one of:

Format	Data output sample
sql	INSERT INTO tbl (col1, col2) VALUES (1, 'one'), (3, 'three');
csv	"col1","col2" 1,"one" 3,"three"
sql-insert-set	INSERT INTO tbl SET col1 = 1, col2 = 'one';

--format-true «STRING», --format-false «STRING», --format-null «STRING»

Change the string printed for TRUE, FALSE and NULL results.

The default values are:

Format True False Null

sql, sql-insert-set 1 0 NULL

csv 1 0 \N

Some database systems (e.g. PostgreSQL) distinguish between boolean and integer types. When targeting these systems, you may need to modify these keywords:
```
./dbgen -o bool_table -N 10 -R 10 \
    -e 'create table bool_table(a boolean {{ rand.bool(0.5) }});' \
    --format-true TRUE --format-false FALSE
```
Regardless of these settings, when evaluating a boolean value as a string they always turn into '0' or '1' (e.g. {{ true || '!' }} always produces '1!').
--headers

Include column names into the output as headers.

In SQL format, this means insert statements include the complete column name list.
```
INSERT INTO tbl (col1, col2, col3) VALUES
(1, 2, 3),
(4, 5, 6);
```
In CSV format, this means files contain a header row of column names.
```
"col1","col2","col3"
1,2,3
4,5,6
```
-c «ALG», --compress «ALG» / --compress-level «LEVEL»

Compress the data output. Possible algorithms are:

Algorithm Levels

gzip 0–9

xz 0–9

zstd 1–21

The compression level defaults to 6 if not specified.

Since the data are randomly generated, the compression ratio is typically not very high (around 70% of uncompressed input). We do not recommend using the algorithm "xz" here, nor using very high compression levels.
-z «SIZE», --size «SIZE»

Target size (in bytes) of each data file. Default is unlimited.

Explicit units are accepted (e.g. 8kB = 8000, 0.5MiB = 524288).

When the target size is provided, each file generator thread may produce multiple files. After a complete INSERT statement (i.e. --row-count rows) is written, if the pre-compressed size of the file exceeds this --size parameter, dbgen will close it and start to write to a new file.

The multiple files generated this way are written sequentially. You should still consider adjusting --total-count/--rows-per-file or --files-count to take advantage of parallelism.

The size limits on a main table and its derived tables are considered independently. Thus, the number of files for them may not match.

The file names are still ordered lexicographically but not necessarily numerically. Example directory is like:
```
# suppose 10102 files are generated from file generator thread 0:
tbl.0000.csv
tbl.0001.csv
...
tbl.0099.csv
tbl.010000.csv
tbl.010001.csv
...
tbl.019999.csv
tbl.02000000.csv
tbl.02000001.csv

# similar for file generator thread 1:
tbl.1000.csv
tbl.1001.csv
...
tbl.1099.csv
tbl.110000.csv
tbl.110001.csv
...
```
In lexicographic ordering, tbl.1000.csv should appear after tbl.010000.csv. But numerical or "natural" ordering will switch the order, and potentially affect subsequent import efficiency.
--components schema,table,data

What components to be generated:
- schema (the CREATE SCHEMA SQL files)
- table (the CREATE TABLE SQL files)
- data (the output files)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI.md

CLI.md

Table generator CLI usage

Common options

More options

RNG name	Algorithm
`chacha12`	ChaCha12
`chacha20`	ChaCha20
`hc128`	HC-128
`isaac`	ISAAC
`isaac64`	ISAAC-64
`xorshift`	Xorshift
`pcg32`	PCG32
`step`	Step sequence

Format	True	False	Null
sql, sql-insert-set	1	0	NULL
csv	1	0	\N

Algorithm	Levels
gzip	0–9
xz	0–9
zstd	1–21

Files

CLI.md

Latest commit

History

CLI.md

File metadata and controls

Table generator CLI usage

Common options

More options