The MapD Core database
Clone or download
Permalink
Failed to load latest commit information.
Analyzer Fix casts on timestamps Oct 19, 2018
Archive clang-format: clang-tidy: format, tidy codebase again Sep 21, 2018
Calcite Initial support for native SSL connections, HTTPS in JDBC Oct 17, 2018
Catalog using finer locks on table descriptor level to mitigate long stalls Oct 22, 2018
Chunk fix SELECT crash from unsupported ALTER ADD geo COLUMN by supporting it Aug 9, 2018
CudaMgr clang-format: format entire codebase Jul 21, 2018
DataMgr Cleanup: EMC++ Item #42 Oct 1, 2018
Distributed delegating clear_cpu / clear_gpu calls down to leafs in distributed mode Aug 9, 2018
Fragmenter Cleanup: EMC++ Item #42 Oct 1, 2018
Import Handle CSVs without a trailing newline Oct 18, 2018
Parser Add Alter privilege in parser file Oct 19, 2018
Planner Fix warnings re: missing virtual destructors Aug 14, 2018
QueryEngine Remove more unused bushy join code Oct 22, 2018
QueryRunner Support for columnar outputs in projection queries Oct 10, 2018
SQLFrontend thrift: https: don't use the trust certificate directory Oct 18, 2018
SampleCode add profiler libs to all executable Sep 21, 2018
SampleData Dictionary encoded strings support 1/? Feb 18, 2015
Shared thrift: https: temp fix for ssl socket creation issues Oct 19, 2018
SqliteConnector Cleanup: EMC++ Item #42 Oct 1, 2018
StringDictionary Fix Clang 7 warnings Sep 21, 2018
Tests Do not use semiprivate keyless layout for varlen arguments to SAMPLE Oct 19, 2018
ThirdParty deps: add glslang and Vulkan Oct 1, 2018
ThriftHandler Initial support for native SSL connections, HTTPS in JDBC Oct 17, 2018
Utils Make ArrayDatum inherit from VarlenDatum Sep 19, 2018
cmake/Modules New CMake find script for GDAL Oct 19, 2018
config Add TSAN suppressions file. Ignore lock inversion warning in gdal. May 15, 2018
docker docker: build: add valgrind, cloc, jq Oct 12, 2018
docs docs: update link to external docs, remove internal files Aug 28, 2017
java jdbc: fix NPE if no protocol in db url Oct 22, 2018
scripts New CMake find script for GDAL Oct 19, 2018
systemd systemd: web: bump nofile limit to 65536 Jun 12, 2018
.clang-format clang-format: format entire codebase Jul 21, 2018
.clang-tidy Cleanup: EMC++ Item #42 Oct 1, 2018
.gitattributes git: remove odbc/*.xml from gitattributes Jun 27, 2018
.gitignore ignore automatically generated files from java project managers Aug 9, 2018
CLA.txt eula: rename MapD to OmniSci Sep 27, 2018
CMakeLists.txt Initial support for native SSL connections, HTTPS in JDBC Oct 17, 2018
CMakePackaging.txt cmake: add install paths for rpm, deb Mar 31, 2017
Doxyfile.in doxygen: cmake: add support for generating docs Aug 30, 2018
EULA-CE.txt eula: rename MapD to OmniSci Sep 27, 2018
LICENSE.md Expose Calcite's SqlAdvisor as a Thrift endpoint Nov 25, 2017
MapDServer.cpp Initial support for native SSL connections, HTTPS in JDBC Oct 17, 2018
MapDServer.h clang-format: format entire codebase Jul 21, 2018
MapDWebServer.go Add http->https redirect handler to mapd_web_server Jul 16, 2018
README.md readme: clarify centos deps builds Oct 17, 2018
ROADMAP.md Roadmap typos/minor style changes Jul 10, 2018
completion_hints.thrift Expose Calcite's SqlAdvisor as a Thrift endpoint Nov 25, 2017
initdb.cpp Rename default geospatial tables to use OmniSci prefix Sep 27, 2018
insert_sample_data insert_sample_data: fix cli option parsing Sep 8, 2017
mapd.conf.sample web: deprecate --quiet, add --verbose Apr 20, 2017
mapd.thrift Add has_object_privilege API to return recursive privileges of a user Oct 10, 2018
run_sanity_tests import: add initial test for type detection Apr 20, 2017
startmapd startmapd: don't pass --config option if file does not exist Feb 26, 2018

README.md

MapD Core

MapD Core is an in-memory, column store, SQL relational database that was designed from the ground up to run on GPUs.

Table of Contents

Links

License

This project is licensed under the Apache License, Version 2.0.

The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses is at ThirdParty/licenses/index.md.

Contributing

In order to clarify the intellectual property license granted with Contributions from any person or entity, MapD must have a Contributor License Agreement ("CLA") on file that has been signed by each Contributor, indicating agreement to the Contributor License Agreement. After making a pull request, a bot will notify you if a signed CLA is required and provide instructions for how to sign it. Please read the agreement carefully before signing and keep a copy for your records.

Building

If this is your first time building MapD Core, install the dependencies mentioned in the Dependencies section below.

MapD uses CMake for its build system.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=debug ..
make -j 4

The following cmake/ccmake options can enable/disable different features:

  • -DCMAKE_BUILD_TYPE=release - Build type and compiler options to use. Options are Debug, Release, RelWithDebInfo, MinSizeRel, and unset.
  • -DENABLE_ASAN=off - Enable address sanitizer. Default is off.
  • -DENABLE_AWS_S3=on - Enable AWS S3 support, if available. Default is on.
  • -DENABLE_CALCITE_DELETE_PATH=on - Enable Calcite Delete Path. Default is on.
  • -DENABLE_CALCITE_UPDATE_PATH=on - Enable Calcite Update Path. Default is on.
  • -DENABLE_COMPACTION=off - Enable Compaction and Overflow/Underflow Detection. Default is off.
  • -DENABLE_CUDA=off - Disable CUDA. Default is on.
  • -DENABLE_DECODERS_BOUNDS_CHECKING=off - Enable bounds checking for column decoding. Default is off.
  • -DENABLE_FOLLY=on - Use Folly. Default is on.
  • -DENABLE_IWYU=off - Enable include-what-you-use. Default is off.
  • -DENABLE_JIT_DEBUG=off - Enable debugging symbols for the JIT. Default is off.
  • -DENABLE_JOIN_EXEC=on - Enable RA vm to execute join node. Default is on.
  • -DENABLE_ONE_TO_MANY_HASH_JOIN=on - Enable hash join on a column w/ duplicate values. Default is on.
  • -DENABLE_PROFILER=off - Enable google perftools. Default is off.
  • -DENABLE_STANDALONE_CALCITE=off - Require standalone Calcite server. Default is off.
  • -DENABLE_TESTS=on - Build unit tests. Default is on.
  • -DENABLE_TSAN=off - Enable thread sanitizer. Default is off.
  • -DENALBE_JAVA_REMOTE_DEBUG=on - Enable Java Remote Debug. Default is off.
  • -DMAPD_DOCS_DOWNLOAD=on - Download the latest master build of the documentation / docs.mapd.com. Default is off. Note: this is a >50MB download.
  • -DPREFER_STATIC_LIBS=off - Static link dependencies, if available. Default is off.

Testing

MapD Core uses Google Test as its main testing framework. Tests reside under the Tests directory.

The sanity_tests target runs the most common tests. If using Makefiles to build, the tests may be run using:

make sanity_tests

AddressSanitizer

AddressSanitizer can be activated by setting the ENABLE_ASAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_ASAN=on -DENABLE_CUDA=off ..
make -j 4

Finally run the tests:

export ASAN_OPTIONS=alloc_dealloc_mismatch=0:handle_segv=0
make sanity_tests

ThreadSanitizer

ThreadSanitizer can be activated by setting the ENABLE_TSAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_TSAN=on -DENABLE_CUDA=off ..
make -j 4

We use a TSAN suppressions file to ignore warnings in third party libraries. Source the suppressions file by adding it to your TSAN_OPTIONS env:

export TSAN_OPTIONS="suppressions=/path/to/mapd/config/tsan.suppressions"

Finally run the tests:

make sanity_tests

Generating Packages

MapD Core uses CPack to generate packages for distribution. Packages generated on CentOS with static linking enabled can be used on most other recent Linux distributions.

To generate packages on CentOS (assuming starting from top level of the mapd-core repository):

mkdir build-package && cd build-package
cmake -DPREFER_STATIC_LIBS=on -DCMAKE_BUILD_TYPE=release ..
make -j 4
cpack -G TGZ

The first command creates a fresh build directory, to ensure there is nothing left over from a previous build.

The second command configures the build to prefer linking to the dependencies' static libraries instead of the (default) shared libraries, and to build using CMake's release configuration (enables compiler optimizations). Linking to the static versions of the libraries libraries reduces the number of dependencies that must be installed on target systems.

The last command generates a .tar.gz package. The TGZ can be replaced with, for example, RPM or DEB to generate a .rpm or .deb, respectively.

Using

The startmapd wrapper script may be used to start MapD Core in a testing environment. This script performs the following tasks:

  • initializes the data storage directory via initdb, if required
  • starts the main MapD Core server, mapd_server
  • starts the MapD Core web server, mapd_web_server, for serving MapD Immerse
  • offers to download and import a sample dataset, using the insert_sample_data script
  • attempts to open MapD Immerse in your web browser

Assuming you are in the build directory, and it is a subdirectory of the mapd-core repository, startmapd may be run by:

../startmapd

Starting Manually

It is assumed that the following commands are run from inside the build directory.

Initialize the data storage directory. This command only needs to be run once.

mkdir data && ./bin/initdb data

Start the MapD Core server:

./bin/mapd_server

In a new terminal, start the MapD Core web server:

./bin/mapd_web_server

If desired, insert a sample dataset by running the insert_sample_data script in a new terminal:

../insert_sample_data

You can now start using the database. The mapdql utility may be used to interact with the database from the command line:

./bin/mapdql -p HyperInteractive

where HyperInteractive is the default password. The default user mapd is assumed if not provided.

You can also interact with the database using the web-based MapD Immerse frontend by visiting the web server's default port of 9092:

http://localhost:9092

Note: usage of MapD Immerse is governed by a separate license agreement, provided under EULA-CE.txt. The version bundled with this project may only be used for non-commercial purposes.

Code Style

A .clang-format style configuration, based on the Chromium style guide, is provided at the top level of the repository. Please format your code using a recent version (6.0+ preferred) of ClangFormat before submitting.

To use:

clang-format -i File.cpp

Contributed code should compile without generating warnings by recent compilers on most Linux distributions. Changes to the code should follow the C++ Core Guidelines.

Dependencies

MapD has the following dependencies:

Package Min Version Required
CMake 3.3 yes
LLVM 3.8-4.0, 6.0 yes
GCC 5.1 no, if building with clang
Go 1.6 yes
Boost 1.65.0 yes
OpenJDK 1.7 yes
CUDA 8.0 yes, if compiling with GPU support
gperftools yes
gdal yes
Arrow 0.10.0 yes

Dependencies for mapd_web_server and other Go utils are in ThirdParty/go. See ThirdParty/go/src/mapd/vendor/README.md for instructions on how to add new deps.

CentOS 7

MapD Core requires a number of dependencies which are not provided in the common CentOS/RHEL package repositories. A prebuilt package containing all these dependencies is provided for CentOS 7 (x86_64).

First install the basic build tools:

sudo yum groupinstall -y "Development Tools"
sudo yum install -y \
    zlib-devel \
    epel-release \
    libssh \
    openssl-devel \
    ncurses-devel \
    git \
    maven \
    java-1.8.0-openjdk-devel \
    java-1.8.0-openjdk-headless \
    gperftools \
    gperftools-devel \
    gperftools-libs \
    python-devel \
    wget \
    curl \
    environment-modules

Next download and install the prebuilt dependencies:

curl -OJ https://dependencies.mapd.com/mapd-deps/deploy.sh
sudo bash deploy.sh

These dependencies will be installed to a directory under /usr/local/mapd-deps. The deploy.sh script also installs Environment Modules in order to simplify managing the required environment variables. Log out and log back in after running the deploy.sh script in order to active Environment Modules command, module.

The mapd-deps environment module is disabled by default. To activate for your current session, run:

module load mapd-deps

To disable the mapd-deps module:

module unload mapd-deps

WARNING: The mapd-deps package contains newer versions of packages such as GCC and ncurses which might not be compatible with the rest of your environment. Make sure to disable the mapd-deps module before compiling other packages.

Instructions for installing CUDA are below.

CUDA

It is preferred, but not necessary, to install CUDA and the NVIDIA drivers using the .rpm using the instructions provided by NVIDIA. The rpm (network) method (preferred) will ensure you always have the latest stable drivers, while the rpm (local) method allows you to install does not require Internet access.

The .rpm method requires DKMS to be installed, which is available from the Extra Packages for Enterprise Linux repository:

sudo yum install epel-release

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The deploy.sh script includes two files with the appropriate environment variables: mapd-deps-<date>.sh (for sourcing from your shell config) and mapd-deps-<date>.modulefile (for use with Environment Modules, yum package environment-modules). These files are placed in mapd-deps install directory, usually /usr/local/mapd-deps/<date>. Either of these may be used to configure your environment: the .sh may be sourced in your shell config; the .modulefile needs to be moved to the modulespath.

Building Dependencies

The scripts/mapd-deps-centos.sh script is used to build the dependencies. Modify this script and run if you would like to change dependency versions or to build on alternative CPU architectures.

cd scripts
module unload mapd-deps
./mapd-deps-centos.sh --compress

macOS

scripts/mapd-deps-osx.sh is provided that will automatically install and/or update Homebrew and use that to install all dependencies. Please make sure macOS is completely update to date and Xcode is installed before running. Xcode can be installed from the App Store.

CUDA

mapd-deps-osx.sh will automatically install CUDA via Homebrew and add the correct environment variables to ~/.bash_profile.

Java

mapd-deps-osx.sh will automatically install Java and Maven via Homebrew and add the correct environment variables to ~/.bash_profile.

Ubuntu

Most build dependencies required by MapD Core are available via APT. Certain dependencies such as Thrift, Blosc, and Folly must be built as they either do not exist in the default repositories or have outdated versions. The provided build script will install all required dependencies (except CUDA) and build the dependencies which require it. The built dependencies will be installed to /usr/local/mapd-deps/ by default; see the Environment Variables section below for how to add these dependencies to your environment.

Ubuntu 16.04

MapD Core requires a newer version of Boost than the version which is provided by Ubuntu 16.04. The scripts/mapd-deps-ubuntu1604.sh build script will compile and install a newer version of Boost into the /usr/local/mapd-deps/ directory.

Ubuntu 18.04

Use the scripts/mapd-deps-ubuntu.sh build script to install dependencies.

Some installs of Ubuntu 18.04 may fail while building with a message similar to:

java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

This is a known issue in 18.04 which will be resolved in Ubuntu 18.04.1. To resolve on 18.04:

sudo rm /etc/ssl/certs/java/cacerts
sudo update-ca-certificates -f

Environment Variables

The CUDA and mapd-deps lib directories need to be added to LD_LIBRARY_PATH; the CUDA and mapd-deps bin directories need to be added to PATH. The mapd-deps-ubuntu.sh script above will generate a script named mapd-deps.sh containing the environment variables which need to be set. Simply source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) in order to activate it:

source /usr/local/mapd-deps/mapd-deps.sh

CUDA

Recent versions of Ubuntu provide the NVIDIA CUDA Toolkit and drivers in the standard repositories. To install:

sudo apt install -y \
    nvidia-cuda-toolkit

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Arch

The following uses yaourt to install packages from the Arch User Repository.

yaourt -S \
    git \
    cmake \
    boost \
    google-glog \
    extra/jdk8-openjdk \
    clang \
    llvm \
    thrift \
    go \
    gdal \
    maven

VERS=1.21-45
wget --continue https://github.com/jarro2783/bisonpp/archive/$VERS.tar.gz
tar xvf $VERS.tar.gz
pushd bisonpp-$VERS
./configure
make -j $(nproc)
sudo make install
popd

CUDA

CUDA and the NVIDIA drivers may be installed using the following.

yaourt -S \
    linux-headers \
    cuda \
    nvidia

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The CUDA bin directories need to be added to PATH. The easiest way to do so is by creating a new file named /etc/profile.d/mapd-deps.sh containing the following:

PATH=/opt/cuda/bin:$PATH
export PATH