Skip to content

Commit

Permalink
Add batched and parallel import (#43)
Browse files Browse the repository at this point in the history
  • Loading branch information
gitbuda committed May 20, 2023
1 parent dbbde8f commit c1d60b7
Show file tree
Hide file tree
Showing 60 changed files with 2,258 additions and 232 deletions.
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
Language: Cpp
BasedOnStyle: Google
Standard: "c++17"
Standard: "c++20"
UseTab: Never
DerivePointerAlignment: false
PointerAlignment: Right
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ jobs:
build_and_test_ubuntu:
strategy:
matrix:
platform: [ubuntu-20.04]
platform: [ubuntu-22.04]
mg_version:
- "2.1.1"
- "2.7.0"
runs-on: ${{ matrix.platform }}
steps:
- name: Install dependencies (Ubuntu 20.04)
if: matrix.platform == 'ubuntu-20.04'
- name: Install dependencies (Ubuntu 22.04)
if: matrix.platform == 'ubuntu-22.04'
run: |
sudo apt install -y git cmake make gcc g++ libssl-dev # mgconsole deps
sudo apt install -y libpython3.8 python3-pip # memgraph deps
sudo apt install -y libpython3.10 python3-pip # memgraph deps
mkdir ~/memgraph
curl -L https://download.memgraph.com/memgraph/v${{ matrix.mg_version }}/ubuntu-20.04/memgraph_${{ matrix.mg_version }}-1_amd64.deb > ~/memgraph/memgraph_${{ matrix.mg_version }}-1_amd64.deb
curl -L https://download.memgraph.com/memgraph/v${{ matrix.mg_version }}/ubuntu-22.04/memgraph_${{ matrix.mg_version }}-1_amd64.deb > ~/memgraph/memgraph_${{ matrix.mg_version }}-1_amd64.deb
sudo systemctl mask memgraph
sudo dpkg -i ~/memgraph/memgraph_${{ matrix.mg_version }}-1_amd64.deb
Expand Down Expand Up @@ -65,7 +65,7 @@ jobs:
build_apple:
strategy:
matrix:
platform: [macos-10.15]
platform: [macos-latest]
runs-on: ${{ matrix.platform }}
steps:
- name: Set-up repository
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ include(CTest)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${PROJECT_SOURCE_DIR}/cmake)

set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD 20)

# Set default build type to 'Release'
if (NOT CMAKE_BUILD_TYPE)
Expand Down
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,49 @@ memgraph> MATCH (t:Turtle) RETURN t;
memgraph> :quit
Bye
```

## Batched and parallelized import (EXPERIMENTAL)

Since Memgraph v2 expects vertices to come first (vertices has to exist to
create an edge), and serial import can be slow, the goal with batching and
parallelization is to improve the import speed when ingesting queries in the
text format.

To enable faster import, use `--import-mode="batched-parallel"` flag when
running `mgconsole` + put Memgraph into the `STORAGE MODE
IN_MEMORY_ANALYTICAL;` (could be part of the `.cypherl` file) to be able to
leverage parallelism in the best possible way.

```
cat data.cypherl | mgconsole --import-mode=batched-parallel
// STORAGE MODE IN_MEMORY_ANALYTICAL; is optional
```

IMPORTANT NOTE: Inside the import file, vertices always have to come first
because `mgconsole` will read the file serially and chunk by chunk.

Additional useful runtime flags are:
- `--batch-size=10000`
- `--workers-number=64`

### Memgraph in the TRANSACTIONAL mode

In [TRANSACTIONAL
mode](https://memgraph.com/docs/memgraph/reference-guide/storage-modes#transactional-storage-mode-default),
batching and parallelization might help, but since there are high chances for
serialization errors, the execution times might be similar or even slower
compared to the serial mode.

### Memgraph in ANALYTICAL mode

In [ANALYTICAL
mode](https://memgraph.com/docs/memgraph/reference-guide/storage-modes#analytical-storage-mode),
batching and parallelization will mostly likely help massively because
serialization errors don't exist, but since Memgraph will accept any query
(e.g., on edge create failure, vertices could be created multiple times),
special care is required:
- queries with pure create vertices have to be specified first
- please use only import statements using simple MATCH, CREATE, MERGE
statements.

If you encounter any issue, please create a new [mgconsole Github issue](https://github.com/memgraph/mgconsole/issues).
4 changes: 2 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ add_dependencies(${GFLAGS_LIBRARY} gflags-proj)
ExternalProject_Add(mgclient-proj
PREFIX mgclient
GIT_REPOSITORY https://github.com/memgraph/mgclient.git
GIT_TAG v1.3.0
GIT_TAG v1.4.1
CMAKE_ARGS "-DCMAKE_INSTALL_PREFIX=<INSTALL_DIR>"
"-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}"
"-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}"
Expand Down Expand Up @@ -115,7 +115,7 @@ if(MGCONSOLE_ON_WINDOWS)
add_compile_options(-Wno-narrowing)
endif()

add_executable(mgconsole main.cpp)
add_executable(mgconsole main.cpp interactive.cpp serial_import.cpp batch_import.cpp parsing.cpp)
target_compile_definitions(mgconsole PRIVATE MGCLIENT_STATIC_DEFINE)
target_include_directories(mgconsole
PRIVATE
Expand Down
Loading

0 comments on commit c1d60b7

Please sign in to comment.