Skip to content

Conversation

@vkarak
Copy link
Contributor

@vkarak vkarak commented Mar 25, 2020

Summary of this PR

  • Support for the new-style configuration files
    • Configuration files can be either Python or JSON
  • All configuration files have been migrated to the new syntax.
  • More configuration options
  • Ability to associate command line options to environment and/or configuration variables.
  • Configuration component is completely refactored
  • Framework's runtime context component refactored and simplified
  • Internal representations for system and system partitions adapted
  • The bulk of unit tests using a modified runtime have been migrated to pytest using fixtures and CI scripts have been adapted accordingly.
  • Better mechanism for registering scheduler and launcher backends that avoids spurious imports
  • Free functions of the environments module that manipulate Environments have been moved to the runtime module, in order to avoid circular dependencies and ensure a better conceptual fit

Changes of this PR from the user's point of view

  • The new configuration syntax introduced in [feat] Introduce new configuration syntax and validation schema #1128 is now the default. The old configuration style is still accepted, but a deprecation warning is issued and it is converted to the new syntax in a temporary file as implemented by [feat] Add a tool to convert from the old to the new configuration syntax #1186.

  • The new syntax allows users to define configuration parameters per system or per system/partition combination. However some configuration options, such as the modules to load before starting a regression test run, can only be treated at the system level, even if a user defines them at a finer level.

  • Several command line options are associated with environment variables and/or configuration variables. There is a precedence order, such that, command-line options are considered first, environment variables second and, finally, configuration variables. See cli.py for all the associations.

  • Configuration files can be either in JSON format or a Python file that defines a site_configuration variable which holds essentially the JSON configuration formatted in Python syntax.

  • ReFrame now looks for configuration files in the following locations and in that order:

    1. ${HOME}/.reframe/settings.{py,json}
    2. ${REFRAME_INSTALL_PREFIX}/settings.{py,json}
    3. /etc/reframe.d/settings.{py,json}

    If both a .py and .json configuration file are found, the Python one will be preferred.

  • If no configuration file is found or none is supplied, ReFrame will use a generic fallback configuration file, that is located in reframe/core/settings.py and will allow ReFrame to run minimally on any system. This is NOT meant to be altered by users. If the user supplies a configuration file, ReFrame will not fall back to the generic configuration. This is different from the current behavior.

  • ReFrame starts always using the generic fallback configuration and then switches to the one provided by the user. This is for allowing ReFrame to enable logging right from the beginning, so that early messages are nicely formatted and colored.

  • The very old logging configuration syntax (prior to 2.13) is no more recognized and the configuration conversion tool does not take it into account.

  • The --show-env-config option is eliminated and the functionality of the --show-config option is expanded. You can use that option to query any configuration parameter of the framework. By default, without an argument, it will show the entire ReFrame configuration for the current system, but you can request a configuration parameter using the syntax described below for the _SiteConfig.get() method. The output of this command is in JSON forrmat.

  • The default value for the different flags in build systems is now the empty list [] instead of None. The behavior of the default flags is still the same, though, such that no XFLAGS is emitted if flags' list is empty. If the user wants to have XFLAGS explicitly emitted even if empty, [''] has to be specified as the flags' value.

  • The default values for the different compiler name variables is the empty string '' instead of None. This and the previous changes were made in order for the default values to be consistent with the JSON schema requirements.

  • ShifterNG container backend is changed to Shifter.

  • The changes of [feat] Do not allow chaining of the -c option; pass multiple paths as a colon separated list  #1125 are reverted and -c option's behaviour is as before. The reason for that is that now the check search path can be completely overriden through an environment variable.

  • Printing of "ReFrame paths" section has changed as well as how list of checks are printed. IMHO look more modern now.

Implementation details

The configuration component has been redesigned entirely and implemented from ground up. All other components attached to it have also been profoundly changed and redesigned. Here are the core concepts of the new configuration mechanism implementation.

The heart of the configuration is the reframe.core.config._SiteConfig. This is the ultimate reference for any configuration option throughout the lifetime of ReFrame run and is responsible for supplying the default values for any configuration parameter. Defaults are no more scattered among the different internal APIs of the framework.

The reframe.core.systems.System and reframe.core.systems.SystemPartition abstractions still remain and the pipeline interfaces with them as in the past. A System and all its partitions and environments are created from a _SiteConfig objects through the System.create() factory method.

The reframe.core.runtime.RuntimeContext is also considerably changed. It serves its purpose as a wrapper to the site configuration providing some additional logic on top of it, e.g., computed stage/output directory prefixes and the information about the current test run (used in the retries feature). It also offers the facility to directly access a configuration option through its get_option() method. The HostResources and HostSystem objects are eliminated and any related functionality has been moved to either the RuntimeContext or the System. Also the system auto-detection is no more part of the RuntimeContext.

How the _SiteConfig works

The _SiteConfig objects stores the raw site configuration as this was extracted from the configuration file.
It also holds the JSON schema for validating the configuration, and some other variables, relating to selecting the sub-configuration for a specific system or system/partition combination.

Loading and validating the configuration

This is pretty much simple. We just load the JSON file in one of the _create_from_xxx() methods and we check for any JSON decoding error, which we propagate upward as a ConfigError. The validation against the schema happens in the validate() method, which clients must call separately.

Selecting a sub-configuration for a system or system/partition combination

This happens in the select_subconfig() method which optionally takes a system name in the system:partition format. If a system name is not provided, the auto-detection takes place as in the past.

The purpose of this method is to construct a valid configuration just for the requested system or system/partition combination. This configuration is kept separately in the _local_config attribute, while the _local_system attribute acts as a flag in order to avoid unnecessary re-evaluations of the sub-configuration.

In the new syntax, all configuration sections can have a target_systems property that allows them to be configured differently per system and partition. In order to correctly address this, we create a ScopedDict() for each section and use that to resolve the configuration values against the requested system/partition combination.

After the new local configuration is built, we perform some additional validation steps to ensure that this configuration is valid: (a) all required fields are present in the local configuration as well, (b) all environments of the current system are defined in the local configuration. We perform these checks manually, not through the schema validation, because we don't want to do a full validation (this is just a subset of the original configuration) and step (b) cannot be validated through the schema anyway.

You may call select_subconfig() several times on the same _SiteConfig object in order to retrieve different sub-configurations.

Retrieving configuration options

The method for retrieving configuration options is the get() method of _SiteConfig. This will take care also of returning the default values defined in the schema. The format of the option argument can be one of the following:

  • section/subsection/.../property
  • section/<index>/subsection/.../property
  • section/@<name>/.../property

If a section contains a list of elements, you will have to use and index to retrieve the exact element or @name if the element are name addressable. An element is considered name addressable if it has a name property. Here are some examples of calling get():

site_config.get('environments/0/cxx')
site_config.get('environments/@PrgEnv-gnu/cc')

See the test_config.py unit test for more use cases.

If any option does not exist in the current configuration, its default value will be looked up in the schema.
The lookup key is the same as the option name with any indexes or @name accessors removed. So for the above get() calls the following keys will be looked up in the schema:

defaults/environments/cxx
defaults/environments/cc

If a default value cannot be retrieved from the schema the default argument passed to get() will be returned (which defaults to None). This argument is also returned in case of index errors or if an element cannot by found by its name.

The idea is that any other component of the framework that needs a special configuration variable asks it from the site configuration through the runtime. This is for example the case of the scheduler backends, which do get('schedulers/@backend_name/job_submit_timeout') in order to retrieve the job submit timeout parameter. And this parameter can be configured differently for different backends and for different system/partitions combinations!

Sticky options

You can add and remove sticky options to the site configuration through the add/remove_stick_option() methods. A sticky option overrides any other reference to that option in the configuration, including the defaults. This is the mechanism that the frontend uses to impose command line options and/or environment variables over their configuration option counterparts. The format for setting a sticky option is similar to the defaults described above but without the defaults/ prefix. So to override completely the cxx value in the configuration file, one has to call site_config.add_sticky_option('environments/cxx', 'clang').

Associating command line options with environment variables and configuration parameters

You can now associate a command-line argument with an environment variable and/or a configuration parameter. This is achieved by extending the interface of the reframe.frontend.argparse.ArgumentParser and specifically the add_argument() method. This method now accepts two more keyword arguments for specifying the association:

  1. envvar: The name of the associated environment variable.
  2. configvar: The name of the associated configuration variable (without indexes or name accessors).

When you retrieve a value of an option of the namespace object returned by ArgumentParser.parse_args(), if the option was not specified in the command-line (i.e., has a value of None), the associated environment variable will be queried if it exists. We explicitly do not query the associated configuration variable here, because the namespace object is only visible to the cli.py and, as discussed previously, we want the _SiteConfig to be the ultimate reference of configuration parameters throughout the framework. For this reason, the ArgumentParser is equipped with an update_config() method, whose main purpose is to add sticky options to the site configuration based on the namespace's options.

Other changes and improvements

Unit tests

Unit tests that required switching the runtime configuration have been all migrated to pure pytest syntax using fixtures. This makes each test more self contained and avoids excessive monkey patching. There is still some duplication of common fixtures among different files, but I leave that as future work.

Notice also the changes required to run the unit tests on the different systems. Since I am using the tmp_path fixture wherever I need a temporary directory, you have to pass the --basetemp option to pytest in order to run on Daint/Dom etc, since $TMPDIR is not mounted on the compute nodes.

Registration of scheduler and parallel job launcher backends

The logic of registration has been consolidated and moved inside the reframe.core.backends module. Also the modules of the backends are imported programmatically upon request from the getscheduler() and getlauncher() methods. This avoids nasty circular import problems. Nothing changes in terms of the registration itself or retrieval of the scheduler and launchers except the module to import. For this reason, I had to change all the checks that make use of any of these methods.

Environment loading and snapshots

I have moved the logic of loading an environment and taking an environment snapshot inside the runtime module. The reason for that is that these operations need the runtime due to the modules system, which led to intricate import patterns.

Open Issues / Todos

  • Fine tune printing of the "ReFrame directories"
  • Fix FIXME notes
  • Remove stale code if any
  • Allow add_argument() to define environment variable/configuration options combinations only
  • Convert environment variable values to the correct type based on the const argument of the argument parser.
  • Update comments in ArgumentParser
  • Fix crash reported below.
  • Logging is still configured the old way, ignoring the defaults set in the schema.
  • Verify Graylog backend and consider using environment variables for specifying the server address.
  • Update the documentation
  • Fix crash when trying to initialize non-configured modules systems.

Fixes #1116.
Fixes #1115.
Fixes #66.
Addresses #1157.

- Support for the new-style JSON configuration files
- All configuration files have been migrated to the new syntax.
- More configuration options
- Ability to associate command line options to environment and/or configuration
  variables.
- Configuration component is completely refactored
- Framework's runtime context component refactored and simplified
- Internal representations for system and system partitions adapted
- The bulk of unit tests using a modified runtime have been migrated to pytest
  using fixtures and CI scripts have been adapted accordingly.
- Better mechanism for registering scheduler and launcher backends that avoids
  spurious imports
- Free functions of the environments module that manipulate Environments have
  been moved to the runtime module, in order to avoid circular dependencies and
  ensure a better conceptual fit
@pep8speaks
Copy link

pep8speaks commented Mar 25, 2020

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2020-05-06 20:22:43 UTC

Vasileios Karakasis added 5 commits March 25, 2020 21:48
- This bug is present also in the current master, but it was not
  triggered by the unittests.
- I explicitly allow long lines in the configuration files, since this is rather
  meaningful for the log message formatting.
@vkarak
Copy link
Contributor Author

vkarak commented Mar 26, 2020

Code crashes if you run it from a separate directory:

$ ../reframe/bin/reframe -C ../reframe/config/cscs.py -c checks/mch/ncl/ncl.py -r --performance-report
Traceback (most recent call last):
  File "../reframe/bin/reframe", line 11, in <module>
    cli.main()
  File "/users/karakasv/Devel/reframe/reframe/frontend/cli.py", line 311, in main
    site_config = config.load_config('reframe/core/settings.py')
  File "/users/karakasv/Devel/reframe/reframe/core/config.py", line 480, in load_config
    return _SiteConfig.create(filename)
  File "/users/karakasv/Devel/reframe/reframe/core/config.py", line 128, in create
    return cls._create_from_python(filename)
  File "/users/karakasv/Devel/reframe/reframe/core/config.py", line 137, in _create_from_python
    mod = util.import_module_from_file(filename)
  File "/users/karakasv/Devel/reframe/reframe/utility/__init__.py", line 72, in import_module_from_file
    return _do_import_module_from_file(filename, module_name)
  File "/users/karakasv/Devel/reframe/reframe/utility/__init__.py", line 54, in _do_import_module_from_file
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 780, in get_code
  File "<frozen importlib._bootstrap_external>", line 832, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/users/karakasv/Devel/reframe-tests/reframe/core/settings.py'

This is now fixed.

Vasileios Karakasis added 4 commits March 28, 2020 22:48
- Replace `--show-config` and `--show-env-config` options with
  `--show-config=PARAM` which can be used to query any configuration parameter
  from the framework. It doesn't work ideally yet, because if you request a
  full configuration section that is not defined explicitly in the configuration
  file, it will issue an error, but if you request a single terminal parameter,
  it will return its default value. For example, using the generic
  configuration, `--show-config=general` yields an error, but
  `--show-config=general/check_search_path` returns the default value of this
  terminal parameter.

- `System`, `SystemPartition` and `Environment` acquired propert str() and
  repr() methods
- This check is moved over to `_SiteConfig`
- Other checks have been added and unit tests were adapted.
Also:
- Remove prefix from `RegressionCheckLoader`
- Some moving around of code
- Typo fixes
Vasileios Karakasis added 3 commits March 29, 2020 22:19
Also:

- Add more unit tests for the `--show-config` opiton
- More robust Tmod implementation when accessing MODULEPATH
@codecov-io
Copy link

codecov-io commented Mar 29, 2020

Codecov Report

Merging #1242 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1242   +/-   ##
=======================================
  Coverage   91.69%   91.69%           
=======================================
  Files          83       83           
  Lines       12488    12488           
=======================================
  Hits        11451    11451           
  Misses       1037     1037           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4567249...4567249. Read the comment docs.

@vkarak vkarak requested a review from rsarm March 30, 2020 08:15
@vkarak vkarak changed the title [feat] Add a new powerful configuration mechanism [wip] [feat] Add a new powerful configuration mechanism Mar 31, 2020
Copy link
Contributor

@teojgo teojgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing the review. The first comment I have is, shouldn't the file reframe/core/launchers/registry.py be removed?

@vkarak
Copy link
Contributor Author

vkarak commented Mar 31, 2020

@teojgo Yes, it should.

@teojgo
Copy link
Contributor

teojgo commented Mar 31, 2020

@vkarak ok, don't change it for the moment it does not affect the review in any case.

@vkarak vkarak changed the title [wip] [feat] Add a new powerful configuration mechanism [feat] Add a new powerful configuration mechanism Apr 30, 2020
@vkarak vkarak requested a review from teojgo May 4, 2020 21:18
@vkarak
Copy link
Contributor Author

vkarak commented May 5, 2020

@jenkins-cscs retry trsa

Copy link
Contributor

@teojgo teojgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the two conflicts remain to be fixed, other that this lgtm

@vkarak
Copy link
Contributor Author

vkarak commented May 6, 2020

@teojgo I will fix them as soon as I get all the approvals. I have fixed conflicts multiple times for this PR so far.

@vkarak vkarak changed the title [feat] Add a new powerful configuration mechanism [feat] Introduce a new powerful configuration mechanism May 6, 2020
Add a note about the `-c` option's behavior, which is reverted to its original
2.x behavior.
@vkarak
Copy link
Contributor Author

vkarak commented May 7, 2020

@jenkins-cscs retry daint

@vkarak vkarak merged commit 35e93b2 into reframe-hpc:master May 7, 2020
@vkarak vkarak deleted the feat/new-config-mechanism branch May 7, 2020 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

6 participants