Skip to content

mdcallag/dsi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Systems Infrastructure 2.0

Python 3

This version of DSI uses Python 3

If you have used 10gen/dsi or mongodb/dsi, please use separate virtualenvs for both.

Quick Start (Ubuntu)

sudo apt install awscli  # / pip install awscli / brew install awscli
aws configure # API credentials

ssh-keygen -m PEM -t rsa -b 2048 -C $(whoami)-dsikey \
    -f  ~/.ssh/$(whoami)-dsikey #no pass

ssh-agent bash # initialize ssh-agent, assuming you are using bash
ssh-add ~/.ssh/$(whoami)-dsikey

for a in $(aws ec2 describe-regions --query 'Regions[].{Name:RegionName}' --output text); do aws ec2 import-key-pair --key-name $(whoami)-dsikey --public-key-material file://~/.ssh/$(whoami)-dsikey.pub --region $a ; done

git clone git@github.com:10gen/dsi.git; cd dsi; git checkout stable

# Activate virtualenv / workon here if you want (python3)
pip3 install --user -r requirements.txt

curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_linux_amd64.zip 
# mac: curl -o terraform.zip https://releases.hashicorp.com/terraform/0.12.16/terraform_0.12.16_darwin_amd64.zip 
sudo unzip terraform.zip -d /usr/local/bin

WORK=any-path
$EDITOR configurations/bootstrap/bootstrap.example.yml
./bin/bootstrap.py --directory $WORK --bootstrap-file configurations/bootstrap/bootstrap.example.yml
cd $WORK

# You can put the following line in .bashrc if you don't mind adding a relative path to PATH
export PATH=./.bin:$PATH
infrastructure_provisioning.py
workload_setup.py
mongodb_setup.py
test_control.py
analysis.py
infrastructure_teardown.py

More docs to get started

Navigating and using this repo

DSI = Distributed Systems Infrastructure. At MongoDB we use this for system level performance tests where we deploy real MongoDB clusters in AWS.

DSI is the orchestrator which drives all of the below:

A key principle in developing DSI was that DSI owns and has access to all configuration. For example, we use vanilla AMI images and all system setup is in terraform/remote-scripts/system-setup.sh. If you look at a file called mongodb_setup.yml, you will see that it embeds a mongod.conf file (among other things). Similarly infrastructure_provisioning.yml embeds some input parameters to terraform *.tf files. All DSI config is in YAML. Since terraform uses JSON, DSI will convert the YAML to JSON when executing terraform.

The reasons for having all configuration in DSI are:

  • Consistency: All configuration is in the same syntax (YAML) and in a limited set of files, which always have the same names, whether you use YCSB or Linkbench.
  • Tracking: All configuration changes are committed to this repo. This avoids situations where performance changes are due to changes to a specially crafted AMI, generated by scripts in another repo, by a person on a different team.
  • Globally shared, "normalized" config: All DSI binaries always read the entire set of config files. For example, mongodb_setup.py will use the same SSH key as terraform used in infrastructure_provisioning.py.

You use DSI by creating a work directory and putting some configuration files into it. (At least once upon a time it was even possible to run all DSI commands using just defaults, without any configuration files.) This directory will also hold your terraform tfstate files, benchmark output, logs, etc...

A helper script bin/bootstrap.py is a convenient way to create a directory and copy some canned configuration files into it. In fact, we almost always use files available under configurations/. You list the combination of configs you want to use in a simple bootstrap.yml file. See configurations/bootstrap.example.yml to get started!

All configuration is in files, command line options aren't supported. This way there's a permanent record of all config that was used to create a specific benchmark result. (In CI we tar and store the entire work directory, containing both all your configuration as well as result files.) It's also simple to rerun the exact same test without having to copy paste cli options from a log file or a friend.

The effective runtime configuration is a blend of three levels of configuration:

  1. configurations/defaults.yml
  2. infrastructure_provisioning.yml, workload_setup.yml, mongodb_setup.yml, test_control.yml
  3. overrides.yml

...where later configurations override those in the former level.

The second level is split into one file per section, but are logically a single configuration. The reason for splitting into multiple files is modularity: Whether you want to deploy a 1-node or 3-node replica set, you can use the same test_control.yml with both.

The file overrides.yml is a small config file where you can conveniently add manual changes if you don't want to edit the files in level 2, as they tend to be bigger. However, editing those files is perfectly allowed too. It's up to you!

Development & Testing

The repo's tests are all packaged into /testscripts/runtests.sh, which must be run from the repo root and it requires:

  • a ~/.dsi_config.yml file (see /example_config.yml), containing...
    • Evergreen credentials: found in your local ~/.evergreen.yml file. (Instructions here if you are missing this file.)
    • Github authentication token: curl -i -u <USERNAME> -H 'X-GitHub-OTP: <2FA 6-DIGIT CODE>' -d '{"scopes": ["repo"], "note": "get full git hash"}' https://api.github.com/authorizations
    • (You only need -H 'X-GitHub-OTP: <2FA 6-DIGIT CODE> if you have 2-factor authentication on.)

Testing Examples

Run all validations, linters and tests:

testscripts/runtests.sh

Run all the unit tests:

testscripts/run-nosetest.sh

Run a specific test:

testscripts/run-nosetest.sh bin/tests/test_bootstrap.py

Evergreen Patch Test (Sys-perf)

Info for this fork:

If you need to run this version of DSI in Evergreen Sys-perf project, you can easily change the dsi module to point to this (or any other) repo:

Setup:

DSI_REPO=$(pwd)
ln -s $DSI_REPO/bin/switch_module.py $HOME/bin/switch_module.py

Use this branch for a sys-perf patch test in Evergreen:

MONGO_REPO=$HOME/repos/mongo
cd $MONGO_REPO
switch_module.py                  # will edit github url in etc/system_perf.yml

Now submit Evergreen patch as usual.

When done, undo changes:

cd $MONGO_REPO
git checkout etc/system_perf.yml

Regular README continues...

Patch-Testing DSI Without Compile

There are two options. Repeating a task with a different DSI module or skipping compile entirely.

The Easy Way

You still have to suffer compile once but only once.

Create a patch of sys-perf and do the usual evergreen set-module -m dsi step. Schedule the tasks you want. You can call evergreen set-module -m dsi multiple times on the same patch-build and re-schedule your tasks. The compile task isn't re-run.

cd mongo
evergreen patch -p sys-perf
cd dsi
evergreen set-module -m dsi -i <id>
# ... make changes
evergreen set-module -m dsi -i <id>
# reschedule any tasks you want to run again with updated DSI

The Slightly Harder Way

Use a hard-coded asset path and remove the compile-task dependency.

Replace this line with a static URL e.g.:

mongodb_binary_archive: "https://s3.amazonaws.com/mciuploads/dsi/5c8685d3850e61268dd41be1/447847d93d6e0a21b018d5df45528e815c7c13d8/linux/mongodb-5c8685d3850e61268dd41be1.tar.gz"

(This is the artifact URL from a previous waterfall run.)

Then remove the depends_on blocks for the build-variants you want to run e.g. remove these lines.

Submit this as your patch-build and then do the usual set-module dance. Here too you can use the same patch-build multiple times like the example above.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.6%
  • HCL 3.6%
  • Shell 1.6%
  • Other 0.2%