Permalink
Browse files

MCA/base: Add new MCA variable system

Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.
  • Loading branch information...
1 parent e85f8ac commit be525bef85f89078df164807df09b70daedbb23a @hjelmn committed Feb 5, 2013
Showing 443 changed files with 11,078 additions and 8,404 deletions.
View
@@ -0,0 +1,322 @@
+New prefix: mca_base_cvar_<foo>
+Old prefix: mca_base_param_<foo>
+
+Goal: for 1.7.x, we support both, but mca_base_param will be
+deprecated. For 1.9.x, mpi_base_param will be dropped.
+
+-------------------------
+
+project framework componnet name
+1 0 0 0 does not make sense
+1 0 0 1 only allowed for project names that are not actual projects (e.g., project=mpi)
+1 0 1 0 does not make sense
+1 0 1 1 does not make sense
+1 1 0 0 yes
+1 1 0 1 do not allow (use "base")
+1 1 1 0 does not make sense
+1 1 1 1 yes
+
+-------------------------
+
+--> do we deprecarte all <v2.0 names (e.g., btl)
+NO
+
+Support: btl_<foo> and ompi_btl_<foo>
+
+BUT: if there are multiple frameworks with the same name in different
+projects, then all MCA param sepcifications (file, env, CLI) *must* be
+fully qualified with project name (e.g., if opal/mca/btl and
+ompi/mca/btl exist, then all specifications must be opal_btl_<foo> and
+ompi_btl_<foo>). AS A CONSEQUENCE: during registration, we'll
+register in order. opal/mca/btl will register and all will be fine.
+Then ompi/mca/btl will register and it can see that there's another
+framework named "btl" already registered. At this point, it needs to
+go check the file cache and run through the environment and see if any
+param name specifications are ambiguous (e.g., btl_<foo> instead of
+ompi_btl_<foo> or opal_btl_<foo>). If so, show_help and abort -- let
+a human figure out the ambiguity.
+
+-------------------------
+
+This goes to v1.7.x.
+
+register_param(
+- 4 strings for names
+- bit flags:
+ - internal
+ - default_only
+ - settable
+ - deprecated
+ - type: int and string
+- scope: local, global, constant, ...a few others
+- pointer to user-provided stroage
+)
+--> returns unique index of created param
+
+When we create the param, look up the current value (environment,
+files, etc.) and assign the value back to the user-provided storage.
+Forever after that, we will NOT look up in env/files/etc. MPIT_Get
+will only return the value from the user-provided storage, and
+MPIT_Set will only set the value in the user-provided storage.
+
+Synonyms:
+
+register_synonym(
+- index of real param
+- 4 strings for names
+- bit flags:
+ - deprecated
+)
+
+Same rules as creating a param: we do the lookup env/file upon
+creation of a syn, and possibly change the value in the user-provided
+storage (according to the precedent rules listed in current
+mca_base_param.h).
+
+Add funcitonality: if create param or synonym is called in a framework
+after that framework is opened, error/show_help/abort. This is a
+programmer error. We need to force all param/syn creation in the
+registration phase so that ompi_info will see everything.
+
+--> Jeff found work from Jeff+Ralph tree from recently (Sep/Oct/Nov
+ 2012) where a component register function can return BAD_PARAM if
+ it does validation of params and finds an error. The component
+ register functionc can show_help and then return BAD_PARAM,
+ thereby telling ompi_info that it should dump what it has so far
+ and then abort. This tells users if they have an invalid value.
+
+-------------------------
+
+Defer to v1.9
+
+Move the framework verbosity/stream ID into the framework struct, and
+therefore handle registration of the fw_verbose param (and possible
+opening of the stream) during framework_register, and then rip it all
+down during framework_finalize.
+
+Current code always creates a priority MCA param for all components.
+So let's just put a priority param in the component struct and have it
+all handled automatically.
+
+framework functions:
+- open: standardized
+- close: standardized
+- select: not consistent/standardized (we're not changing this)
+--> need to add a standardized framework register function
+
+component functions:
+- open: standardized
+- close: standardized
+- register: standardized
+- query: standardized
+
+Component registration function should never return errors based on
+invalid values. ompi_info, for example, will simply display all
+current values, even if some are invalid. Invalid detection of values
+only happens during component open (or later), and therefore show_help
+kinds of messages about invalid param values only happen at mpirun
+time.
+
+-----------------
+
+Defer to v1.9:
+
+MCA component param validation function:
+
+1. It can't be NULL, unless the MCA param registration function is
+ also NULL.
+
+2. It'll never abort, but it can show_help() and return BAD_PARAM if
+ it finds an invalid value (perhaps returning the invalid param name
+ if it finds one...?).
+
+3. Having it as a separate, mandatory function *makes* programmers
+ think about MCA param validation, and increases the chance that
+ they'll actually check the values of their MCA params.
+
+4. ompi_info can call all the registration functions to register
+ *everything*, and then it can call all the validation functions
+ after everything has been registered. Especially since we're using
+ user-provided storage for MCA param values, this makes the pain of
+ validation significantly less (i.e., at least you don't have to do
+ separate lookups to get param values). This way, ompi_info will
+ always be able to print *all* values, but also print an error
+ message(s) if it finds one (or more?) invalid param values.
+
+Another concept: adding more types of MCA params:
+
+enum: component supplies list of acceptable int values/descriptions
+ for MPIT; param system does validation for you.
+
+bool: might be a special case of enum?
+
+...?
+
+-----
+
+Can be done v1.7 or v1.9:
+
+Add param to new MCA register function for the 9 verbosity levels of
+MPIT (this is basically ticket #1390):
+
+- basic, detail, all
+- user, tuner, MPI implementors
+
+Make ompi_info grow a new CLI option for selecting this verbosity
+level and displaying only these. Default to the lowest layer:
+USER_BASIC.
+
+Make ompi_info show in the last line (or something like this):
+("...and X more parameters not shown because verbosity level is too
+low")
+
+-----
+
+For v1.7
+
+Random note: if we set a param to valueA, and then we set a synonym of
+that param to valueB, print a warning. If valueA==valueB, DON'T print
+a warning.
+
+-----
+
+For v1.9:
+
+Optimization note: have orted read in all the MCA param files and then:
+
+1. set mca_base_param_files to none
+2. set all the env variables for all the file-based values
+3. set companion env variables (e.g.,
+OMPI_MCA_SOURCE_btl=FILE:filename) to say what the source was for all
+of these values
+
+This allows MPI procs to get all the file values without actually
+reading the files. Non-ORTE-based RTE's can do this or not -- i.e.,
+if they don't do this, MPI procs will read from files as they do
+today.
+
+Also add the ability to know whether an MCA param was set from the CLI
+or enviornment. Sources will be:
+
+- CLI
+- Environment
+- Code (i.e., via API)
+- File
+
+Modify mpirun and ompi_info to call a new API function that sets the
+source to one of the above enums. This keeps the idea of the
+2nd/shadow environment variable indicating the source hidden from the
+callers of the MCA param API.
+
+-----
+
+v1.9 (but may be difficult to separate if it's implemented early)
+
+Non-overrideable MCA params (this is basically ticket #75)
+
+Idea: have a hard-coded filename that is installed under $prefix
+somewhere. Any MCA value that is set in there, users cannot
+override. And if they try to (e.g., via CLI, file, or environment),
+we show_help a warning saying that they can't because sysadmin has set
+an immutable value, and then error/abort.
+
+This is on the rationale that user asked for something we can't
+deliver, so we need to make a human figure it out.
+
+Format of the file is same as any other MCA param file. Sysadmins can
+use this to set values that they don't want users to override.
+
+-----
+
+v1.9
+
+Warn about param name misspellings
+
+Have a "validate" function (or a better name?) in the MCA base, in
+each project, and in each framework.
+
+- base function registers everything (i.e., calls project register
+ functions, ....etc)
+- base function then loops over calling the validate function in each
+ project
+- project function loops over calling the valid function in each of
+ its frameworks
+- each framework will loop over all the project/framework MCA params
+ that were set via CLI, env or file (e.g., BTL framework will look at
+ all btl_* and ompi_btl_* params set on CLI/env/file). If it can't
+ find a matching registered MCA param for a given set value, then it
+ adds that MCA param name to a list of "these were set but do not
+ exist" param names.
+- Once all validate functions have finished, show_help the "these were
+ set but do not exist" param names.
+
+CORNER CASE: If MPI_T_INIT_THREAD is invoked, force loading of all
+frameworks (including those that would normally lazy load, like
+MPI/IO) so that they can mpit_set those params and not have warnings
+about trying to set non-existant MCA params.
+
+-----
+
+Josh comment:
+
+Josh just needs the ability to "re-examine an MCA param" -- he doesn't
+really care about recaching the files. He just wants to say "I've
+re-loaded, so do the ordered/precedence-based lookup for this param
+again."
+
+-----
+
+Comment from George:
+
+1. De-dup the filename cached for where a value was set from. E.g., have
+an argv[] of all the filenames, and just save a reference to the
+filename's argv[] entry in each param that was set by a file.
+
+2. Use hash functions / AVL search for lookup of names. He suggested
+murmur32:
+http://www.qdecoder.org/svn/qlibc/trunk/src/utilities/qhash.c?revision=94&pathrev=128.
+
+This may or may not be relevant -- group lookup makes it faster
+already, and the groups end up being kinda small.
+
+-----
+
+stages:
+
+For v1.7:
+1. provide new mca_base_cvar_* API (including convert mca_base_param_*
+ API to use the new mca_base_cvar_* API)
+ --> include API parameters for enum MCA params (see below) that are
+ ignored for now
+ --> include API param for MPIT verbosity levels, but this info will
+ be ignored for now
+2. put in new mca_base_framework_t structure; create standardized
+ component fw registration (separates fw register and open)
+3. convert frameworks to the new framework_t struct
+4. provide new MPIT API (including touching mpi.h.in and opal_ddt)
+5. convert to new mca_base_cvar_* API (probably intermingled):
+ - projects
+ - frameworks
+ - components
+
+For v1.9:
+6. remove old mca_base_param_* API (including any additional gorp we
+ added in #1 to support the old API)
+7. update new API back-end to handle multiple frameworks with same name
+ (e.g., btl_<foo> == ompi_btl_<foo> now supported)
+8. update component_t to include the validate function; update
+ ompi_info to validate after all registrations complete, and
+ gracefully handle a validation failure
+9. update ompi_info (opal_info) to be completely generalized (read
+ list of frameworks and components)
+10. update ompi_info to handle MPIT verbosity levels
+11. audit all MCA parameters and assign MPIT verbosity level
+ --> including making developer guidelines for assigning these MPIT
+ verbosity levels
+12. move framework verbosity/stream ID to framework_t
+13. move component priority to component_t
+14. add support for "enum" type of MCA param
+15. add orted optimization to read in MCA params and set them in the
+ environment for its children
+16. non-overrideable MCA params
+17. warn about MCA param name misspellings
@@ -61,7 +61,7 @@
#
# Basic behavior to smooth startup
-mca_component_show_load_errors = 0
+mca_base_component_show_load_errors = 0
orte_abort_timeout = 10
opal_set_max_sys_limits = 1
orte_report_launch_progress = 1
@@ -128,6 +128,8 @@ OMPI_DECLSPEC volatile int MPIR_being_debugged = 0;
OMPI_DECLSPEC volatile int MPIR_debug_state = 0;
OMPI_DECLSPEC char *MPIR_debug_abort_string = "";
+static char *ompi_debugger_dll_path = NULL;
+
/* Check for a file in few direct ways for portability */
static void check(char *dir, char *file, char **locations)
{
@@ -164,18 +166,19 @@ extern void
ompi_debugger_setup_dlls(void)
{
int i;
- char *a, *b, **dirs, **tmp1 = NULL, **tmp2 = NULL;
-
- a = strdup(opal_install_dirs.pkglibdir);
- mca_base_param_reg_string_name("ompi",
- "debugger_dll_path",
- "List of directories where MPI_INIT should search for debugger plugins",
- false, false, a, &b);
- free(a);
+ char **dirs, **tmp1 = NULL, **tmp2 = NULL;
+
+ ompi_debugger_dll_path = opal_install_dirs.pkglibdir;
+ (void) mca_base_var_register("ompi", "ompi", "debugger", "dll_path",
+ "List of directories where MPI_INIT should search for debugger plugins",
+ MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0,
+ OPAL_INFO_LVL_9,
+ MCA_BASE_VAR_SCOPE_READONLY,
+ &ompi_debugger_dll_path);
/* Search the directory for MPI debugger DLLs */
- if (NULL != b) {
- dirs = opal_argv_split(b, ':');
+ if (NULL != ompi_debugger_dll_path) {
+ dirs = opal_argv_split(ompi_debugger_dll_path, ':');
for (i = 0; dirs[i] != NULL; ++i) {
check(dirs[i], OMPI_MPIHANDLES_DLL_PREFIX, tmp1);
check(dirs[i], OMPI_MSGQ_DLL_PREFIX, tmp2);
Oops, something went wrong.

0 comments on commit be525be

Please sign in to comment.