diff --git a/pages/database-management/configuration.mdx b/pages/database-management/configuration.mdx index 8ad74d813..9d99f6470 100644 --- a/pages/database-management/configuration.mdx +++ b/pages/database-management/configuration.mdx @@ -455,6 +455,8 @@ in Memgraph. | `--storage-snapshot-interval="300`" | Define periodic snapshot schedule via cron expression or as a period in seconds. Set to empty string to disable. | `[string]` | | `--storage-snapshot-on-exit=true` | Controls whether the storage creates another snapshot on exit. | `[bool]` | | `--storage-snapshot-retention-count=3` | The number of snapshots that should always be kept. | `[uint64]` | +| `--storage-parallel-snapshot-creation=false` | Controls whether the snapshot creation can be done in a multi-threaded fashion. | `[bool]` | +| `--storage-snapshot-thread-count` | The number of threads used to create snapshots. Defaults to using system's maximum thread count. | `[uint64]` | | `--storage-wal-enabled=true` | Controls whether the storage uses write-ahead-logging. To enable WAL, periodic snapshots must be enabled. | `[bool]` | | `--storage-wal-file-flush-every-n-tx=100000` | Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. | `[uint64]` | | `--storage-wal-file-size-kib=20480` | Minimum file size of each WAL file. | `[uint64]` | diff --git a/pages/fundamentals/data-durability.mdx b/pages/fundamentals/data-durability.mdx index 95d9aef00..eb7c08beb 100644 --- a/pages/fundamentals/data-durability.mdx +++ b/pages/fundamentals/data-durability.mdx @@ -87,6 +87,8 @@ on the value of the `--storage-snapshot-on-exit` configuration flag. When a snapshot creation is triggered, the entire data storage is written to the drive. Nodes and relationships are divided into groups called batches. +Snapshot creation can be made faster by using **multiple threads**. See [Parallelized execution](#parallelized-execution) for more information. + On startup, the database state is recovered from the most recent snapshot file. Memgraph can read the data and build the indexes on multiple threads, using batches as a parallelization unit: each thread will recover one batch at a time @@ -155,6 +157,15 @@ storage mode is changed to `IN_MEMORY_TRANSACTIONAL` storage mode. Snapshots and WAL files are presently not compatible between Memgraph versions. +### Parallelized execution + +Snapshot creation in Memgraph can be optimized using multiple threads, which significantly reduces the time required to create snapshots for large datasets. + +This behavior can be controlled using the following flags: +- `--storage-parallel-snapshot-creation`: This flag determines whether snapshot creation is performed in a multi-threaded fashion. By default, it is set to `false`. To enable parallelized execution, set this flag to `true`. +- `--storage-snapshot-thread-count`: This flag specifies the number of threads to be used for snapshot creation. By default, Memgraph uses the system's maximum thread count. You can override this value to fine-tune performance based on your system's resources. + +When parallelized execution is enabled, Memgraph divides the data into batches, where the batch size is defined via `--storage-items-per-batch`. The optimal batch size and thread count may vary depending on the dataset size and system configuration. ## Storage modes diff --git a/pages/help-center/errors/snapshots.mdx b/pages/help-center/errors/snapshots.mdx index c0795a624..ebe9a6cdb 100644 --- a/pages/help-center/errors/snapshots.mdx +++ b/pages/help-center/errors/snapshots.mdx @@ -82,6 +82,24 @@ for security reasons, it can't automatically create a new disk copy when you use `CREATE SNAPSHOT` in Memgraph. So, while the command creates a snapshot locally, it doesn't trigger a new snapshot in the Cloud interface. +## Why am I seeing corrupt snapshot files named `_edge_part_` and `_vertex_part_`? + +These files are partial results from the multi-threaded execution of snapshot creation. +When Memgraph creates snapshots using multiple threads, it divides the data into smaller parts. Each thread processes a specific part and writes intermediate results to files named with the `_edge_part_` and `_vertex_part_` patterns. + +If the snapshot creation process is interrupted or fails, these partial files may remain on disk and appear as corrupt. +Memgraph cannot load these incomplete files during startup, as they do not represent a valid snapshot. + +### How to resolve this issue? + +To resolve this issue, you can safely delete the partial files and restart Memgraph. The database will attempt to recover its state using the most recent valid snapshot and the write-ahead log (WAL) files. + +```bash +rm /var/lib/memgraph/snapshots/*_edge_part_* +rm /var/lib/memgraph/snapshots/*_vertex_part_* +``` + + --- \ No newline at end of file