News 14 Sep 2006
MTT News for 14 September, 2006
Here's an update from the MTT team for you:
- Nightly mail is here!
- New version of the client requires synchronized update
- Flexible web reporting is now possible
- Plans for rolling out MTT 1.0 to the rest of the OMPI team
- Plans for development of MTT 2.0
Let's go through each of these individually...
Nightly mail is here!
That's right, sports fans, you can now get an e-mail of the executive summary of all the previous night's MTT testing. You can see failures in each of the phases listed by architecture/OS. Links in the mail provide the capability to get more information (e.g., drill down to see what the failures were, etc.). Sign up here:
This e-mail is intended to be a quick view of what happened yesterday. It is not intended to be a detailed e-mail (so don't ask for it [yet] ;-) ).
More features are planned for MTT e-mail results, but they will take time (see below).
New version of the client requires synchronized update
Tomorrow morning (15 Sep 2006), I will be rolling out a new version of the MTT client to the release branch (SVN mtt:/branches/ompi-core-testers/). It contains a bunch of new features and bug fixes, but more importantly, since it also includes some server-side changes, requires:
- everyone SVN update to get the latest version before your runs tomorrow night
- update your INI files in a few critical places
Here's a summary of the new stuff in this version of the client:
- Add ability to print out how long each phase takes in the client (good for planning exactly what tests to run, etc.)
- Allow command line parameters to override values in the INI file
- Guarantee that OMPI cleanup runs on all nodes
- Add support for LoadLeveler and N1GE
- Be [much] more efficient in submitting results to the IU database
- Be [much] more efficient in saving stdout/stderr
- Do not try to re-build a failed MPI Install if you run the client again
- "Ping" the IU database URL from the INI file to ensure it's correct
- Fix not submitting stdout/stderr from some phases
- Ability to create sub-groups of tests within a single section (e.g., for some tests that are supposed to "fail")
You will need to update your INI file in a few places:
- There used to be a field named
separate_stdout_stderrin some sections (other sections had a field named
merge_stdout_stderr). All section now only support
merge_stdout_stderr. So if you have any usage of
separate_stdout_sterr, please flip its value (it's a boolean) and change the field name to
- The ompi-core-template.ini file has a new
after_each_execdefinition in the "MPI Details" section. This section will guarantee to cleanup OMPI jobs on all nodes (the previous definition only cleaned up on the node where the MTT client was running). You'll want to copy the new definition to your INI file.
- We used to save all stdout/stderr in some sections. This proved to be a serious scalability problem in some cases (consider that some Intel tests output tens of MB to stdout when they fail!). MTT now only saves the last N lines of stdout/stderr. N defaults to 100 -- so it should still be more than enough information to track down errors. If it's not, you can increase the value.
- The format of specification of tests by the "Simple" module for Test Run sections has changed. You now must designate sub-groups of tests in the form:
simple_<group name>:<field> = <value>
where "simple" is the prefix given to all the key names (because the values belong to the "Simple" Test Run plugin), "" is an arbitrary string of alphanumeric and "_" characters, and "" is one of the recognized field names. For example:
[Test run: intel] test_build = intel pass = &or(&eq(&test_exit_status(), 0), &eq(&test_exit_status(), 77)) timeout = &max(30, &multiply(10, &test_np())) save_stdout_on_pass = 1 merge_stdout_stderr = 1 stdout_save_lines = 100 # The intel tests have some hard limits at 64 processes np = &min(64, &env_max_procs()) module = Simple # This group of tests uses the defaults from above and should all # return an exit status of 0 simple_pass:tests = &find_executables("src") # This group of tests is from the "supposed_to_fail" file, and is # exclusive from the other group (meaning that anything in this group # should be removed from the other group). The programs in the group # should also have a nonzero exit status. simple_fail:tests = &find_executables("src/" . "&cat("supposed_to_fail")") simple_fail:np = &env_max_procs() simple_fail:pass = &ne(&test_exit_status(), 0) # Setting "exclusive" to 1 means that any tests found in this group # will be removed from all other groups. simple_fail:exclusive = 1
Flexible web reporting is now possible
A first version of more flexible MTT results reporting is now available:
You will need your organization's MTT submit username/password to access this site (or the general OMPI core username/password -- if you're not OMPI core, please don't ask for it). With this interface, you can drill down to the results that you want to see. Experiment with it a bit. You should be able to bookmark specific results that you want to see (e.g., failures on your cluster within the last 24 hours).
We expect their to be a bunch of feedback on this interface. :-) Please note that most of your feedback will likely be directed towards the next version of MTT (2.0) -- see below.
Plans for rolling out MTT 1.0 to the rest of the OMPI team
Once we are able to get the current set of testers up and going on this "1.0" interface (Sun, Cisco, IU, HLRS), we'll be releasing it to the rest of the team to get wide-spread, distributed testing. We hope to do this next week.
Plans for development of MTT 2.0
Sun and Cisco have been working heavily on MTT 1.0. Throughout the process, we have learned quite a bit -- both what is good about the current MTT and what is bad about the current MTT. In several cases, we decided to make the bad things "just good enough" to be useful (although certainly not optimal). Some users will likely have some [potentially violent] feedback about the reporting, for example. The web reporting, we have found, is fairly religious. What currently exists is a conglomeration of compromises between different sets of people. I think that we have found that it works, but is far from optimal.
We plan to overhaul that interface based on feedback from the group (and we have a bunch of ideas of our own). We also plan to overhaul a bunch of the back-end / server-side plumbing for MTT. Our first cut (using Perfbase) was already scrapped; we're currently using a 2nd generation effort. But even that isn't good enough; we have a 3rd generation planned out and will be moving towards that. Users won't notice too much of a difference, but it will definitely lead to more scalability on the server, both in terms of query speeds and storage requirements.
We're expeting MTT 2.0 to take a few months (we haven't fully scoped it out yet to have a timeline). But we do expect that most of the feedback that the OMPI team gives us will be applied towards 2.0, not 1.0. The rationale is that the 1.0 interface is usable and good enough to move the Open MPI v1.2 release process forward. MTT 2.0 will be better, but we'll never get there if we have to keep continually updating the MTT 1.0 stuff. So bear with us and keep the feedback coming -- it'll make MTT be a better system, even if it'll take time to get there.