- TODO
- Introduction
- Features
- Terminology
- Examples
- System Story
- Configuration
- Experiment Status Spreadsheet (ESS)
- Deprecated Content
- User Story
- How does the current system work?
- Rewrite entire readme
I want to improve and revamp Autoperf (AP). So I’m going to plan out everything here and record things as they go along.
Let’s start off with the purpose of AP:
- Run lots of perftest tests automatically without minimal (if any) human input
- Gather lots of juicy performance data
What are the extra things that AP should be able to do?
- Record problematic tests (tests that don’t produce data for any reason)
- Automatically deal with cases where the machines don’t respond for a while
- Provide an interface to keep track of what tests are happening and which tests have been successful so far and which ones have failed
- Deal with the situation where several consecutive tests have failed - the machines could be off
- Notify remotely when something has gone wrong
- Continue a previous test campaign if it was interrupted
- Rerun tests up to 3 times just in case something went wrong that isn"t related to the test itself (e.g. can"t access the slave machines)
- 🔃 Retry failed tests x times before moving on to next test.
- 🗂️ Automatically compress test data after each test.
- 💿 Store test statuses in a spreadsheet for easy monitoring.
Tests refer to Perftest tests.
Experiments (formerly campaigns) refer to AP experiments where 1 AP experiment can contain many Perftest tests.
ESS stands for Experiment Status Spreadsheet and is a csv file containing details about the run of each test. More details here.
{
"duration_secs": [30],
"datalen_bytes": [100],
"pub_count": [1, 50, 100],
"sub_count": [1, 50, 100],
"use_reliable": [true, false],
"use_multicast": [true, false],
"durability_level": [0, 1, 2, 3],
"latency_count": [100]
}{
"ip": "169.254.248.55",
"machine_name": "p1",
"participant_allocation": "pub",
"perftest_exec_path": "~/Documents/rtiperftest/srcCpp/objs/armv7Linux4gcc7.5.0/perftest_publisher",
"ssh_key_path": "~/.ssh/id_rsa",
"username": "acwh025"
}This is an overview of how the system will work from start to finish.
- Validate connections to machines in config.
- For each experiment:
- If PCG:
- Generate all possible combinations.
- Order them.
- Check for ESS.
- If ESS does exist:
- Find last successful test.
- Set PCG next test to be the next combination.
- Match tests that have run with test folders that exist.
- Make sure that successful tests that have run, have existing data.
- If ESS does NOT exist:
- Make one.
- Set PCG next test to be first combination.
- If RCG:
- Generate new combination.
- Check if combination already exists in ESS.
- If combination exists:
- Go back to step 4.5.1.
- Start timer.
- Record start time, test name, pings count, ssh check count, and attempt # into ESS.
- Start executing test.
- Finish running test.
- Get end timestamp.
- Find row in ESS for that test.
- Record end timestamp into ESS.
- Create directory for test.
- Move pub.csv and sub_n.csv files to directory.
- If last 15 tests have failed:
- Stop program.
- If PCG:
- Compress experiment folder.
What do we need to store?
- Experiment Details
- Experiment Name
- QoS Configuration
- PCG or RCG
- Slave machine details
- IP
- SSH key filepath
- Username
- perftest executable filepath
- Participant Allocation
RCG Example:
[{
"experiment_name": "RCG #1",
"combination_generation_type": "rcg",
"qos_settings": {
"duration_secs": [30],
"datalen_bytes": [100],
"pub_count": [1, 100],
"sub_count": [1, 100],
"use_reliable": [true, false],
"use_multicast": [true, false],
"durability_level": [0, 1, 2, 3],
"latency_count": [100]
},
"slave_machines": [
{
"machine_name": "p1",
"participant_allocation": "pub",
"ip": "169.254.248.55",
"ssh_key_path": "~/.ssh/id_rsa",
"username": "acwh025",
"perftest_exec_path": "~/Documents/rtiperftest/srcCpp/objs/armv7Linux4gcc7.5.0/perftest_publisher"
},
{
"machine_name": "p2",
"participant_allocation": "sub",
"ip": "169.254.201.141",
"ssh_key_path": "~/.ssh/id_rsa",
"username": "acwh025",
"perftest_exec_path": "~/Documents/rtiperftest/srcCpp/objs/armv7Linux4gcc7.5.0/perftest_publisher"
}
]
}]PCG Example:
[{
"experiment_name": "PCG #1",
"combination_generation_type": "pcg",
"qos_settings": {
"duration_secs": [30],
"datalen_bytes": [100],
"pub_count": [1, 50, 100],
"sub_count": [1, 50, 100],
"use_reliable": [true, false],
"use_multicast": [true, false],
"durability_level": [],
"latency_count": [100]
},
"slave_machines": [
{
"machine_name": "p1",
"participant_allocation": "pub",
"ip": "169.254.248.55",
"ssh_key_path": "~/.ssh/id_rsa",
"username": "acwh025",
"perftest_exec_path": "~/Documents/rtiperftest/srcCpp/objs/armv7Linux4gcc7.5.0/perftest_publisher"
},
{
"machine_name": "p2",
"participant_allocation": "sub",
"ip": "169.254.201.141",
"ssh_key_path": "~/.ssh/id_rsa",
"username": "acwh025",
"perftest_exec_path": "~/Documents/rtiperftest/srcCpp/objs/armv7Linux4gcc7.5.0/perftest_publisher"
}
]
}]ESS stands for Experiment Status Spreadsheet and is a csv file containing details about the run of each test.
It contains the following columns:
- start timestamp
- end timestamp
- test name
- pings count
- ssh check count
- end status
- qos settings
- comments
These are the general steps that take place when using AP:
- Define experimental configurations.
- Run AP.
- Get notified if something goes wrong.
- Read config file and buffer duration in seconds from command line arguments.
- For each campaign:
- Generate combinations.
- For each combination:
- Generate scripts for combination.
- Distribute scripts across machines.
- Check last 10 tests for failures.
- If last 10 tests have failed then stop the application.
- If last 10 tests have not ALL failed then continue.
- For each machine:
- Ping machine.
- Check SSH connection to machine.
- For each machine:
- Restart machine.
- Ping every other machine.
- SSH check every other machine.
- tbc...