Skip to content

Commit

Permalink
ovsdb: Use column diffs for ovsdb and raft log entries.
Browse files Browse the repository at this point in the history
Currently, ovsdb-server stores complete value for the column in a database
file and in a raft log in case this column changed.  This means that
transaction that adds, for example, one new acl to a port group creates
a log entry with all UUIDs of all existing acls + one new.  Same for
ports in logical switches and routers and more other columns with sets
in Northbound DB.

There could be thousands of acls in one port group or thousands of ports
in a single logical switch.  And the typical use case is to add one new
if we're starting a new service/VM/container or adding one new node in a
kubernetes or OpenStack cluster.  This generates huge amount of traffic
within ovsdb raft cluster, grows overall memory consumption and hurts
performance since all these UUIDs are parsed and formatted to/from json
several times and stored on disks.  And more values we have in a set -
more space a single log entry will occupy and more time it will take to
process by ovsdb-server cluster members.

Simple test:

1. Start OVN sandbox with clustered DBs:
   # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'

2. Run a script that creates one port group and adds 4000 acls into it:
   # cat ../memory-test.sh
   pg_name=my_port_group
   export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off)
   ovn-nbctl pg-add $pg_name
   for i in $(seq 1 4000); do
     echo "Iteration: $i"
     ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
   done
   ovn-nbctl acl-del $pg_name
   ovn-nbctl pg-del $pg_name
   ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
   ovn-appctl -t ovn-nbctl exit
   ---

4. Check the current memory consumption of ovsdb-server processes and
   space occupied by database files:
   # ls sandbox/[ns]b*.db -alh
   # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid'

Test results with current ovsdb log format:

   On-disk Nb DB size     :  ~369 MB
   RSS of Nb ovsdb-servers:  ~2.7 GB
   Time to finish the test:  ~2m

In order to mitigate memory consumption issues and reduce computational
load on ovsdb-servers let's store diff between old and new values
instead.  This will make size of each log entry that adds single acl to
port group (or port to logical switch or anything else like that) very
small and independent from the number of already existing acls (ports,
etc.).

Added a new marker '_is_diff' into a file transaction to specify that
this transaction contains diffs instead of replacements for the existing
data.

One side effect is that this change will actually increase the size of
file transaction that removes more than a half of entries from the set,
because diff will be larger than the resulted new value.  However, such
operations are rare.

Test results with change applied:

   On-disk Nb DB size     :  ~2.7 MB  ---> reduced by 99%
   RSS of Nb ovsdb-servers:  ~580 MB  ---> reduced by 78%
   Time to finish the test:  ~1m27s   ---> reduced by 27%

After this change new ovsdb-server is still able to read old databases,
but old ovsdb-server will not be able to read new ones.
Since new servers could join ovsdb cluster dynamically it's hard to
implement any runtime mechanism to handle cases where different
versions of ovsdb-server joins the cluster.  However we still need to
handle cluster upgrades.  For this case added special command line
argument to disable new functionality.  Documentation updated with the
recommended way to upgrade the ovsdb cluster.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
  • Loading branch information
igsilya committed Jan 15, 2021
1 parent 980bca7 commit 2ccd66f
Show file tree
Hide file tree
Showing 8 changed files with 170 additions and 10 deletions.
65 changes: 65 additions & 0 deletions Documentation/ref/ovsdb.7.rst
Expand Up @@ -204,6 +204,14 @@ split-brain.

Open vSwitch 2.6 introduced support for the active-backup service model.

.. important::

There was a change of a database file format in version 2.15.
To upgrade/downgrade the ``ovsdb-server`` processes across this version
follow the instructions described under
`Upgrading from version 2.14 and earlier to 2.15 and later`_ and
`Downgrading from version 2.15 and later to 2.14 and earlier`_.

Clustered Database Service Model
--------------------------------

Expand Down Expand Up @@ -270,11 +278,68 @@ vSwitch to another, upgrading them one at a time will keep the cluster healthy
during the upgrade process. (This is different from upgrading a database
schema, which is covered later under `Upgrading or Downgrading a Database`_.)

.. important::

There was a change of a database file format in version 2.15.
To upgrade/downgrade the ``ovsdb-server`` processes across this version
follow the instructions described under
`Upgrading from version 2.14 and earlier to 2.15 and later`_ and
`Downgrading from version 2.15 and later to 2.14 and earlier`_.

Clustered OVSDB does not support the OVSDB "ephemeral columns" feature.
``ovsdb-tool`` and ``ovsdb-client`` change ephemeral columns into persistent
ones when they work with schemas for clustered databases. Future versions of
OVSDB might add support for this feature.

Upgrading from version 2.14 and earlier to 2.15 and later
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is a change of a database file format in version 2.15 that doesn't allow
older versions of ``ovsdb-server`` to read the database file modified by the
``ovsdb-server`` version 2.15 or later. This also affects runtime
communications between servers in **active-backup** and **cluster** service
models. To upgrade the ``ovsdb-server`` processes from one version of Open
vSwitch (2.14 or earlier) to another (2.15 or higher) instructions below should
be followed. (This is different from upgrading a database schema, which is
covered later under `Upgrading or Downgrading a Database`_.)

In case of **standalone** service model no special handling during upgrade is
required.

For the **active-backup** service model, administrator needs to update backup
``ovsdb-server`` first and the active one after that, or shut down both servers
and upgrade at the same time.

For the **cluster** service model recommended upgrade strategy is following:

1. Upgrade processes one at a time. Each ``ovsdb-server`` process after
upgrade should be started with ``--disable-file-column-diff`` command line
argument.

2. When all ``ovsdb-server`` processes upgraded, use ``ovs-appctl`` to invoke
``ovsdb/file/column-diff-enable`` command on each of them or restart all
``ovsdb-server`` processes one at a time without
``--disable-file-column-diff`` command line option.

Downgrading from version 2.15 and later to 2.14 and earlier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similar to upgrading covered under `Upgrading from version 2.14 and earlier to
2.15 and later`_, downgrading from the ``ovsdb-server`` version 2.15 and later
to 2.14 and earlier requires additional steps. (This is different from
upgrading a database schema, which is covered later under
`Upgrading or Downgrading a Database`_.)

For all service models it's required to:

1. Stop all ``ovsdb-server`` processes (single process for **standalone**
service model, all involved processes for **active-backup** and **cluster**
service models).

2. Compact all database files with ``ovsdb-tool compact`` command.

3. Downgrade and restart ``ovsdb-server`` processes.

Understanding Cluster Consistency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
8 changes: 8 additions & 0 deletions NEWS
@@ -1,6 +1,14 @@
Post-v2.14.0
---------------------
- OVSDB:
* Changed format in which ovsdb transactions are stored in database files.
Now each transaction contains diff of data instead of the whole new
value of a column.
New ovsdb-server process will be able to read old database format, but
old processes will *fail* to read database created by the new one.
For cluster and active-backup service models follow upgrade instructions
in 'Upgrading from version 2.14 and earlier to 2.15 and later' section
of ovsdb(7).
* New unixctl command 'ovsdb-server/get-db-storage-status' to show the
status of the storage that's backing a database.
* New unixctl command 'ovsdb-server/memory-trim-on-compaction on|off'.
Expand Down
82 changes: 75 additions & 7 deletions ovsdb/file.c
Expand Up @@ -34,6 +34,7 @@
#include "table.h"
#include "timeval.h"
#include "transaction.h"
#include "unixctl.h"
#include "uuid.h"
#include "util.h"
#include "openvswitch/vlog.h"
Expand All @@ -53,8 +54,33 @@ static void ovsdb_file_txn_add_row(struct ovsdb_file_txn *,
const struct ovsdb_row *new,
const unsigned long int *changed);

/* If set to 'true', file transactions will contain difference between
* datums of old and new rows and not the whole new datum for the column. */
static bool use_column_diff = true;

static void
ovsdb_file_column_diff_enable(struct unixctl_conn *conn, int argc OVS_UNUSED,
const char *argv[] OVS_UNUSED,
void *arg OVS_UNUSED)
{
use_column_diff = true;
unixctl_command_reply(conn, NULL);
}

void
ovsdb_file_column_diff_disable(void)
{
if (!use_column_diff) {
return;
}
use_column_diff = false;
unixctl_command_register("ovsdb/file/column-diff-enable", "",
0, 0, ovsdb_file_column_diff_enable, NULL);
}

static struct ovsdb_error *
ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,
bool row_contains_diff,
const struct json *json)
{
struct ovsdb_table_schema *schema = row->table->schema;
Expand Down Expand Up @@ -84,6 +110,20 @@ ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,
if (error) {
return error;
}
if (row_contains_diff
&& !ovsdb_datum_is_default(&row->fields[column->index],
&column->type)) {
struct ovsdb_datum new_datum;

error = ovsdb_datum_apply_diff(&new_datum,
&row->fields[column->index],
&datum, &column->type);
ovsdb_datum_destroy(&datum, &column->type);
if (error) {
return error;
}
ovsdb_datum_swap(&datum, &new_datum);
}
ovsdb_datum_swap(&row->fields[column->index], &datum);
ovsdb_datum_destroy(&datum, &column->type);
}
Expand All @@ -93,7 +133,7 @@ ovsdb_file_update_row_from_json(struct ovsdb_row *row, bool converting,

static struct ovsdb_error *
ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
bool converting,
bool converting, bool row_contains_diff,
const struct uuid *row_uuid, struct json *json)
{
const struct ovsdb_row *row = ovsdb_table_get_row(table, row_uuid);
Expand All @@ -107,14 +147,16 @@ ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
return NULL;
} else if (row) {
return ovsdb_file_update_row_from_json(ovsdb_txn_row_modify(txn, row),
converting, json);
converting, row_contains_diff,
json);
} else {
struct ovsdb_error *error;
struct ovsdb_row *new;

new = ovsdb_row_create(table);
*ovsdb_row_get_uuid_rw(new) = *row_uuid;
error = ovsdb_file_update_row_from_json(new, converting, json);
error = ovsdb_file_update_row_from_json(new, converting,
row_contains_diff, json);
if (error) {
ovsdb_row_destroy(new);
} else {
Expand All @@ -127,7 +169,9 @@ ovsdb_file_txn_row_from_json(struct ovsdb_txn *txn, struct ovsdb_table *table,
static struct ovsdb_error *
ovsdb_file_txn_table_from_json(struct ovsdb_txn *txn,
struct ovsdb_table *table,
bool converting, struct json *json)
bool converting,
bool row_contains_diff,
struct json *json)
{
struct shash_node *node;

Expand All @@ -147,6 +191,7 @@ ovsdb_file_txn_table_from_json(struct ovsdb_txn *txn,
}

error = ovsdb_file_txn_row_from_json(txn, table, converting,
row_contains_diff,
&row_uuid, txn_row_json);
if (error) {
return error;
Expand Down Expand Up @@ -176,6 +221,13 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
return ovsdb_syntax_error(json, NULL, "object expected");
}

struct json *is_diff = shash_find_data(json->object, "_is_diff");
bool row_contains_diff = false;

if (is_diff && is_diff->type == JSON_TRUE) {
row_contains_diff = true;
}

txn = ovsdb_txn_create(db);
SHASH_FOR_EACH (node, json->object) {
const char *table_name = node->name;
Expand All @@ -187,6 +239,10 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
if (!strcmp(table_name, "_date")
&& node_json->type == JSON_INTEGER) {
continue;
} else if (!strcmp(table_name, "_is_diff")
&& (node_json->type == JSON_TRUE
|| node_json->type == JSON_FALSE)) {
continue;
} else if (!strcmp(table_name, "_comment") || converting) {
continue;
}
Expand All @@ -197,7 +253,7 @@ ovsdb_file_txn_from_json(struct ovsdb *db, const struct json *json,
}

error = ovsdb_file_txn_table_from_json(txn, table, converting,
node_json);
row_contains_diff, node_json);
if (error) {
goto error;
}
Expand Down Expand Up @@ -353,6 +409,9 @@ ovsdb_file_txn_annotate(struct json *json, const char *comment)
if (comment) {
json_object_put_string(json, "_comment", comment);
}
if (use_column_diff) {
json_object_put(json, "_is_diff", json_boolean_create(true));
}
json_object_put(json, "_date", json_integer_create(time_wall_msec()));
return json;
}
Expand Down Expand Up @@ -383,17 +442,26 @@ ovsdb_file_txn_add_row(struct ovsdb_file_txn *ftxn,
const struct ovsdb_column *column = node->data;
const struct ovsdb_type *type = &column->type;
unsigned int idx = column->index;
struct ovsdb_datum datum;
struct json *column_json;

if (idx != OVSDB_COL_UUID && column->persistent
&& (old
? bitmap_is_set(changed, idx)
: !ovsdb_datum_is_default(&new->fields[idx], type)))
{
if (old && use_column_diff) {
ovsdb_datum_diff(&datum, &old->fields[idx],
&new->fields[idx], type);
column_json = ovsdb_datum_to_json(&datum, type);
ovsdb_datum_destroy(&datum, type);
} else {
column_json = ovsdb_datum_to_json(&new->fields[idx], type);
}
if (!row) {
row = json_object_create();
}
json_object_put(row, column->name,
ovsdb_datum_to_json(&new->fields[idx], type));
json_object_put(row, column->name, column_json);
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions ovsdb/file.h
Expand Up @@ -23,6 +23,8 @@ struct ovsdb;
struct ovsdb_schema;
struct ovsdb_txn;

void ovsdb_file_column_diff_disable(void);

struct json *ovsdb_to_txn_json(const struct ovsdb *, const char *comment);
struct json *ovsdb_file_txn_to_json(const struct ovsdb_txn *);
struct json *ovsdb_file_txn_annotate(struct json *, const char *comment);
Expand Down
8 changes: 8 additions & 0 deletions ovsdb/ovsdb-server.c
Expand Up @@ -1793,6 +1793,7 @@ parse_options(int argc, char *argv[],
OPT_SYNC_EXCLUDE,
OPT_ACTIVE,
OPT_NO_DBS,
OPT_FILE_COLUMN_DIFF,
VLOG_OPTION_ENUMS,
DAEMON_OPTION_ENUMS,
SSL_OPTION_ENUMS,
Expand All @@ -1815,6 +1816,7 @@ parse_options(int argc, char *argv[],
{"sync-exclude-tables", required_argument, NULL, OPT_SYNC_EXCLUDE},
{"active", no_argument, NULL, OPT_ACTIVE},
{"no-dbs", no_argument, NULL, OPT_NO_DBS},
{"disable-file-column-diff", no_argument, NULL, OPT_FILE_COLUMN_DIFF},
{NULL, 0, NULL, 0},
};
char *short_options = ovs_cmdl_long_options_to_short_options(long_options);
Expand Down Expand Up @@ -1905,6 +1907,10 @@ parse_options(int argc, char *argv[],
add_default_db = false;
break;

case OPT_FILE_COLUMN_DIFF:
ovsdb_file_column_diff_disable();
break;

case '?':
exit(EXIT_FAILURE);

Expand Down Expand Up @@ -1942,6 +1948,8 @@ usage(void)
printf("\nOther options:\n"
" --run COMMAND run COMMAND as subprocess then exit\n"
" --unixctl=SOCKET override default control socket name\n"
" --disable-file-column-diff\n"
" don't use column diff in database file\n"
" -h, --help display this help message\n"
" -V, --version display version information\n");
exit(EXIT_SUCCESS);
Expand Down
8 changes: 7 additions & 1 deletion ovsdb/ovsdb-tool.c
Expand Up @@ -391,6 +391,9 @@ compact_or_convert(const char *src_name_, const char *dst_name_,
ovs_fatal(retval, "%s: failed to lock lockfile", dst_name);
}

/* Resulted DB will contain a single transaction without diff anyway. */
ovsdb_file_column_diff_disable();

/* Save a copy. */
struct ovsdb *ovsdb = (new_schema
? ovsdb_file_read_as_schema(src_name, new_schema)
Expand Down Expand Up @@ -648,6 +651,8 @@ static void
print_db_changes(struct shash *tables, struct smap *names,
const struct ovsdb_schema *schema)
{
struct json *is_diff = shash_find_data(tables, "_is_diff");
bool diff = (is_diff && is_diff->type == JSON_TRUE);
struct shash_node *n1;

int i = 0;
Expand Down Expand Up @@ -691,7 +696,8 @@ print_db_changes(struct shash *tables, struct smap *names,
printf(" insert row %.8s:\n", row_uuid);
}
} else {
printf(" row %s (%.8s):\n", old_name, row_uuid);
printf(" row %s (%.8s)%s:\n", old_name, row_uuid,
diff ? " diff" : "");
}

if (columns->type == JSON_OBJECT) {
Expand Down
4 changes: 3 additions & 1 deletion tests/ovsdb-server.at
Expand Up @@ -715,7 +715,9 @@ ovsdb_check_online_compaction() {
[0], [stdout])
if test $model = standalone; then
dnl Check that all the crap is in fact in the database log.
AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | ovstest test-json --multiple -]], [0],
AT_CHECK([[uuidfilt db | grep -v ^OVSDB | \
sed 's/"_date":[0-9]*/"_date":0/' | sed 's/"_is_diff":true,//' | \
ovstest test-json --multiple -]], [0],
[[{"cksum":"12345678 9","name":"ordinals","tables":{"ordinals":{"columns":{"name":{"type":"string"},"number":{"type":"integer"}},"indexes":[["number"]]}},"version":"5.1.3"}
{"_comment":"add row for zero 0","_date":0,"ordinals":{"<0>":{"name":"zero"}}}
{"_comment":"delete row for 0","_date":0,"ordinals":{"<0>":null}}
Expand Down
3 changes: 2 additions & 1 deletion tests/ovsdb-tool.at
Expand Up @@ -93,7 +93,8 @@ AT_CHECK(
done]],
[0], [stdout], [ignore])
dnl Check that all the crap is in fact in the database log.
AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | ovstest test-json --multiple -]], [0],
AT_CHECK([[uuidfilt db | grep -v ^OVSDB | sed 's/"_date":[0-9]*/"_date":0/' | \
sed 's/"_is_diff":true,//' | ovstest test-json --multiple -]], [0],
[[{"cksum":"12345678 9","name":"ordinals","tables":{"ordinals":{"columns":{"name":{"type":"string"},"number":{"type":"integer"}},"indexes":[["number"]]}},"version":"5.1.3"}
{"_comment":"add row for zero 0","_date":0,"ordinals":{"<0>":{"name":"zero"}}}
{"_comment":"delete row for 0","_date":0,"ordinals":{"<0>":null}}
Expand Down

0 comments on commit 2ccd66f

Please sign in to comment.