This version of DSI uses Python 3
If you have used 10gen/dsi or mongodb/dsi, please use separate virtualenvs for both.
sudo apt install awscli # / pip install awscli / brew install awscli
aws configure # API credentials
ssh-keygen -m PEM -t rsa -b 2048 -C $(whoami)-dsikey \
-f ~/.ssh/$(whoami)-dsikey #no pass
ssh-agent bash # initialize ssh-agent, assuming you are using bash
ssh-add ~/.ssh/$(whoami)-dsikey
for a in $(aws ec2 describe-regions --query 'Regions[].{Name:RegionName}' --output text); do aws ec2 import-key-pair --key-name $(whoami)-dsikey --public-key-material file://~/.ssh/$(whoami)-dsikey.pub --region $a ; done
git clone git@github.com:10gen/dsi.git; cd dsi; git checkout stable
# Activate virtualenv / workon here if you want (python3)
pip3 install --user -r requirements.txt
curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_linux_amd64.zip
# mac: curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_darwin_amd64.zip
sudo unzip terraform.zip -d /usr/local/bin
WORK=any-path
$EDITOR configurations/bootstrap/bootstrap.example.yml
./bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK
# You can put the following line in .bashrc if you don't mind adding a relative path to PATH
export PATH=./.bin:$PATH
infrastructure_provisioning.py
workload_setup.py
mongodb_setup.py
test_control.py
analysis.py
infrastructure_teardown.py
- The above steps in long form: Getting Started
- Frequently Asked Questions
- DSI is a complex system with hundreds of configuration options. All of them are documented under docs/config-specs/.
DSI = Distributed Systems Infrastructure. At MongoDB we use this for system level performance tests where we deploy real MongoDB clusters in AWS.
DSI is the orchestrator which drives all of the below:
- bin/infrastructure_provisioning.py Deploy EC2 resources with terraform.
- terraform/remote-scripts/system-setup.sh Linux configurations (mount disks, install packages...)
- bin/workload_setup.py Install test specific dependencies (e.g. Java for YCSB)
- bin/mongodb_setup.py Deploy a MongoDB cluster
- bin/test_control.py Execute a test, collect and parse results.
- Currently supported benchmark tools: Mongo shell (Benchrun), YCSB, py-tpcc, Linkbench, Genny, Sysbench
- bin/analysis.py Run various checks on test log files: core files, replication lag, etc...
- bin/infrastructure_teardown.py terraform destroy
A key principle in developing DSI was that DSI owns and has access to all configuration. For example, we use vanilla AMI images and all system setup is in terraform/remote-scripts/system-setup.sh. If you look at a file called mongodb_setup.yml, you will see that it embeds a mongod.conf file (among other things). Similarly infrastructure_provisioning.yml embeds some input parameters to terraform *.tf files. All DSI config is in YAML. Since terraform uses JSON, DSI will convert the YAML to JSON when executing terraform.
The reasons for having all configuration in DSI are:
- Consistency: All configuration is in the same syntax (YAML) and in a limited set of files, which always have the same names, whether you use YCSB or Linkbench.
- Tracking: All configuration changes are committed to this repo. This avoids situations where performance changes are due to changes to a specially crafted AMI, generated by scripts in another repo, by a person on a different team.
- Globally shared, "normalized" config: All DSI binaries always read the entire set of config files. For example, mongodb_setup.py will use the same SSH key as terraform used in infrastructure_provisioning.py.
You use DSI by creating a work directory and putting some configuration files into it. (At least once upon a time it was even possible to run all DSI commands using just defaults, without any configuration files.) This directory will also hold your terraform tfstate files, benchmark output, logs, etc...
A helper script bin/bootstrap.py is a convenient way to create a directory and
copy some canned configuration files into it. In fact, we almost always use files available under
configurations/. You list the combination of configs you want to use in a simple
bootstrap.yml
file. See
configurations/bootstrap.example.yml to get started!
All configuration is in files, command line options aren't supported. This way there's a permanent record of all config that was used to create a specific benchmark result. (In CI we tar and store the entire work directory, containing both all your configuration as well as result files.) It's also simple to rerun the exact same test without having to copy paste cli options from a log file or a friend.
The effective runtime configuration is a blend of three levels of configuration:
- configurations/defaults.yml
- infrastructure_provisioning.yml, workload_setup.yml, mongodb_setup.yml, test_control.yml
- overrides.yml
...where later configurations override those in the former level.
The second level is split into one file per section, but are logically a single configuration. The reason for splitting into multiple files is modularity: Whether you want to deploy a 1-node or 3-node replica set, you can use the same test_control.yml with both.
The file overrides.yml is a small config file where you can conveniently add manual changes if you don't want to edit the files in level 2, as they tend to be bigger. However, editing those files is perfectly allowed too. It's up to you!
The repo's tests are all packaged into /testscripts/runtests.sh
, which must be run from the repo
root and it requires:
- a
~/.dsi_config.yml
file (see/example_config.yml
), containing...- Evergreen credentials: found in your local
~/.evergreen.yml
file. (Instructions here if you are missing this file.) - Github authentication token:
curl -i -u <USERNAME> -H 'X-GitHub-OTP: <2FA 6-DIGIT CODE>' -d '{"scopes": ["repo"], "note": "get full git hash"}' https://api.github.com/authorizations
- (You only need
-H 'X-GitHub-OTP: <2FA 6-DIGIT CODE>
if you have 2-factor authentication on.)
- Evergreen credentials: found in your local
Run all validations, linters and tests:
testscripts/runtests.sh
Run all the unit tests:
testscripts/run-nosetest.sh
Run a specific test:
testscripts/run-nosetest.sh bin/tests/test_bootstrap.py
Info for this fork:
If you need to run this version of DSI in Evergreen Sys-perf project, you can easily change the dsi module to point to this (or any other) repo:
Setup:
DSI_REPO=$(pwd)
ln -s $DSI_REPO/bin/switch_module.py $HOME/bin/switch_module.py
Use this branch for a sys-perf patch test in Evergreen:
MONGO_REPO=$HOME/repos/mongo
cd $MONGO_REPO
switch_module.py # will edit github url in etc/system_perf.yml
Now submit Evergreen patch as usual.
When done, undo changes:
cd $MONGO_REPO
git checkout etc/system_perf.yml
Regular README continues...
There are two options. Repeating a task with a different DSI module or skipping compile entirely.
The Easy Way
You still have to suffer compile once but only once.
Create a patch of sys-perf and do the usual evergreen set-module -m dsi
step. Schedule the tasks you want. You can call evergreen set-module -m dsi
multiple times on the same patch-build and re-schedule your tasks. The compile task isn't re-run.
cd mongo
evergreen patch -p sys-perf
cd dsi
evergreen set-module -m dsi -i <id>
# ... make changes
evergreen set-module -m dsi -i <id>
# reschedule any tasks you want to run again with updated DSI
The Slightly Harder Way
Use a hard-coded asset path and remove the compile-task dependency.
Replace this line with a static URL e.g.:
mongodb_binary_archive: "https://s3.amazonaws.com/mciuploads/dsi/5c8685d3850e61268dd41be1/447847d93d6e0a21b018d5df45528e815c7c13d8/linux/mongodb-5c8685d3850e61268dd41be1.tar.gz"
(This is the artifact URL from a previous waterfall run.)
Then remove the depends_on
blocks for the build-variants you want to run e.g. remove these lines.
Submit this as your patch-build and then do the usual set-module
dance. Here too you can use the same patch-build multiple times like the example above.