New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize the configuration #368

Open
jaimeperez opened this Issue Apr 20, 2016 · 6 comments

Comments

Projects
None yet
4 participants
@jaimeperez
Member

jaimeperez commented Apr 20, 2016

The current configuration we have is a mess. We are not taking advantage at all of the fact that we are using PHP files, having the configuration as a simple set of keys and values, instead of grouping options that belong together, for example. Besides, the option names have nothing to do with each other, using different formats and notations.

We should put some order in the configuration, grouping things that should go together, and unifying the naming conventions as much as possible.

Additionally, if we change the configuration we should provide an automatic script that could consume an existing configuration with the old options and generate an updated configuration file.

@jaimeperez jaimeperez added this to the v2.0 milestone Apr 20, 2016

@sgomez

This comment has been minimized.

Contributor

sgomez commented Jul 20, 2016

What about the possibility of use YAML config files instead PHP arrays?

@jaimeperez

This comment has been minimized.

Member

jaimeperez commented Jul 20, 2016

¡Hola Sergio!

While I'm not against using YAML, I don't think it's a good idea to move away from PHP. In the end, PHP will probably be easier for someone using SimpleSAMLphp, while YAML could be somehow unknown in general. Besides, the flexibility that we have with PHP configuration files we wouldn't have with YAML.

See for instance the feature request in #420. Today, it's very simple to solve an issue like that with some trivial code in the configuration. If we move to YAML, we would need to add even more configuration options than we already have, and I'm already starting to joke about the software calling it NotSoSimpleSAMLphp...

@pmeulen

This comment has been minimized.

Contributor

pmeulen commented Jul 20, 2016

+1 for continuing to allow the SSP config to be in PHP. I've seen many SSP installations that use the fact that config can be more than static values. If you would like to use YAML for (part of your SSP) configuration. You could actually write this in PHP the configuration file :)

Simply format shifting the configuration format to YAML, JSON, XML or whatever does not automagically improve matters. I doubt syntax is the issue as config though PHP arrays common enough. IMO reducing the mental effort required to figure out what goes where, and what the available options are is key, and the real challenge.

@jaimeperez

This comment has been minimized.

Member

jaimeperez commented Jul 21, 2016

Thanks for the feedback Pieter!

It could indeed be a good idea to prepare brief tutorials on how to use configuration files in JSON or YAML and add them to the wiki, for example.

In any case, I absolutely agree. Take Feide for example. We depend absolutely on the fact that we can change the configuration programmatically. From loading different configuration files depending on whether a server is testing or production, to going through the list of backends configured and generate a list of scopes for the metadata out of it.

What I would really, really appreciate is a discussion and feedback on how to make the configuration more intuitive and rational. I'll start sharing my ideas around this:

  • We should agree on a naming standard and stick to it. I don't mind to useCapitalizedWords or to separate_them_with_underscores. We can even continue separating.words.with.dots, but whatever we decide, we should always use that and avoid mixing formats.
  • Some options should be unified as they have the same meaning, but one or the other should be used depending on other options. The best example of this is the session.phpsession.* set of options. It doesn't make sense to have some configuration options to configure sessions if you are using memcache or SQL backends, and another different set if you are using PHP sessions.
  • Some options should also be renamed. We have terrible examples of configuration options with names expressing concepts completely different to what they are actually used for. baseurlpath is a good example, as nowadays we always tend to put a full URL there, and even recommend it. debug is another good example, as it doesn't really switch debugging on, it just dumps SAML messages to the logs.
  • Related options should be grouped, instead of given common prefixes. For example, all logging-related options should be in an array with the key logging.
  • A naming convention for how to access the configuration options via SimpleSAML_Configuration should be agreed upon. I can imagine something like $config->getString('logging/format', $default_format);, where that would return whatever string is in $config['logging']['format'].
  • A translation file should be provided, not only to be able to update an old configuration file and dump a new one, but also for SimpleSAML_Configuration to be able to look for both the new and the old option transparently for the code asking for it. Coming back to the previous example, SimpleSAML_Configuration should look for the aforementioned array keys. If there's nothing defined in there, it should go to that translation table, look for the old name corresponding to that option, and then search for the old name. This way, we can migrate all the code to use the new configuration options, and it will work at the same time with both old and new configuration files. This would have been helpful to avoid those lines of code repeated all over the place, testing if there is a configuration option with the new name, and then testing for the old name (some times even testing for a third name, which I find excessive in terms of backwards compatibility).
  • As I said previously, an automated tool should be provided to convert old configuration files into the new files.
  • Finally, I think we need proper documentation for the configuration file. The current documentation covers mostly metadata and the authsources.php file, but the config.php file is left apart. I tried some time ago to reorganize the file template and add comments serving as documentation, but I think it's not enough. People tends to ignore those comments or go somewhere else for instructions, so it's better if we provide those instructions ourselves.

What do you think in general? Things you agree or disagree with? More ideas and suggestions?

Again, thanks for the input guys!

@bradjones1

This comment has been minimized.

Contributor

bradjones1 commented Jul 21, 2016

Thanks Jaime for your leadership on this. Excellent discussion. Though, this thread becomes more of a meta-issue regarding code style, config, architectural issues... but that's fine, too.

We should agree on a naming standard and stick to it. I don't mind to useCapitalizedWords or to separate_them_with_underscores. We can even continue separating.words.with.dots, but whatever we decide, we should always use that and avoid mixing formats.

It might just be helpful to adopt someone else's code style and take the "fix while you're at it" mentality. There's PHP-FIG PSR-1, and a more specific PSR-2. There's an example guide at Symfony that could be some inspiration.

Some options should be unified as they have the same meaning, but one or the other should be used depending on other options. The best example of this is the session.phpsession.* set of options. It doesn't make sense to have some configuration options to configure sessions if you are using memcache or SQL backends, and another different set if you are using PHP sessions...Some options should also be renamed...

Sounds like a good task for deprecation and breaking changes in the next major version?

A naming convention for how to access the configuration options via SimpleSAML_Configuration should be agreed upon. I can imagine something like $config->getString('logging/format', $default_format);, where that would return whatever string is in $config['logging']['format'].

Since this project utilizes Composer and can import vendor dependencies, how about outsourcing the problem? There are libraries like hassankhan/config that can provide a front-end, and it itself uses the Symfony YAML parser component, which is itself decopuled from the rest of Symfony.

A translation file should be provided, not only to be able to update an old configuration file and dump a new one, but also for SimpleSAML_Configuration to be able to look for both the new and the old option transparently for the code asking for it...

Seems like this is a lot of heavy lifting for what is already a significant project... why not just revamp the configuration system for 2.x (following semantic versioning principles) and provide a migration document?

Finally, I think we need proper documentation for the configuration file. The current documentation covers mostly metadata and the authsources.php file, but the config.php file is left apart. I tried some time ago to reorganize the file template and add comments serving as documentation, but I think it's not enough. People tends to ignore those comments or go somewhere else for instructions, so it's better if we provide those instructions ourselves.

Moving to a more portable format like JSON or YAML would help a lot; YAML can be commented, JSON not so much. But regardless, that would open up the opportunity for extending configs, providing sensible default template configs, and the like.

In general I think taking a decoupled approach of using configuration management tools already existing in the ecosystem would be a great improvement, and allow us to offload this technical debt.

@jaimeperez

This comment has been minimized.

Member

jaimeperez commented Jul 22, 2016

Hi Brad! Thanks a lot for your comments!

It might just be helpful to adopt someone else's code style and take the "fix while you're at it" mentality. There's PHP-FIG PSR-1, and a more specific PSR-2. There's an example guide at Symfony that could be some inspiration.

Yes, in fact, we are already migrating everything to PSR-2 and asking for it on PRs. However, I was referring to the naming of the configuration options themselves, which is something not covered by the PSR-* standards, as far as I know.

Sounds like a good task for deprecation and breaking changes in the next major version?

Indeed. The idea would be to have the new configuration structure available in 1.15.0 in parallel with the old one, so that people can start migrating if they want. Later on, 2.0 would remove support for the old configuration and provide a tool for those who didn't migrate before.

Since this project utilizes Composer and can import vendor dependencies, how about outsourcing the problem? There are libraries like hassankhan/config that can provide a front-end, and it itself uses the Symfony YAML parser component, which is itself decopuled from the rest of Symfony.

I wasn't aware of that library, thanks for the pointer! I'm absolutely in favour of not reinventing the wheel. However, I have two concerns here:

  1. I would prefer to keep the list of external dependencies to a minimum. Our latest developments are focused on stripping down the code and removing clutter, externalizing things that are not common for everybody, etc. Adding an external library for something as critical as the configuration seems a bit of an overkill, though I would probably just use it if I was starting from scratch.
  2. This library is pretty basic and lacks quite a lot of the functionality that we have today, some of it being absolutely critical. A few examples I see at first sight:
    • There's no type checking or enforcement. The API is very simple and that's great, but at the same time it forces you to do the type checking on your own. For us, being able to ask the configuration for an array and have that array guaranteed is crucial.
    • There's, apparently, no way to express a singleton with it. We would then need to start creating new Config objects all around, and it would be impossible to preload a configuration for other parts of the code to use (which is the only way we can do a lot of the unit testing we are doing today). Even worse, that would have an impact in performance, as we would need to go again and again to the configuration files to load and parse them, unless we modify all our APIs to start passing a configuration object around.
    • I might be missing something, but I don't see a way to load a configuration from an array either. In a way, that's completely logical, since the Config class does not provide anything other than loading a configuration in some format into an array, and offers no type checking at all. For us this is important though, because that's also a handy way to unit test our code.

Of course, we can just use this library inside SimpleSAML_Configuration to be able to load other file formats (as opposed to replacing SimpleSAML_Configuration with it), retaining all the functionality we already have. I'll definitely have a look at that.

Seems like this is a lot of heavy lifting for what is already a significant project... why not just revamp the configuration system for 2.x (following semantic versioning principles) and provide a migration document?

Paraphrasing people up here in Norway: tja (which would be something like yes but no 😄).

It's a significant amount of work, but it's mostly mechanical work. The thing is we always try to keep backwards compatibility as far as we can. At the same time, of course, you need to be able to break stuff. That's why we have 2.0 on the horizon. In between, however, we would like to be able to provide a smooth transition, rather than breaking abruptly all existing configurations without previous notice.

In any case, as I said, it's not that hard. We don't have so many configuration options (even though the list has grown a lot), and providing a list of what's the new name for this old configuration option is just a matter of adding a line to an array every time we refactor an option. Supporting both old and new at the same time is just a matter of an if clause and a call to array_key_exists() in this list we've been building, and it can be done in one single place (SimpleSAML_Configuration). Now, regarding the script to convert from the old format to the new one, it's actually pretty simple:

<?php
include_once($argv[1]);
$new_config = array();
for ($config as $option => $value) {
    if (!in_array($option, SimpleSAML_Configuration::configurationMap)) {
        echo "Unknown option $option\n";
        continue;
    }
    $new_config[SimpleSAML_Configuration::configurationMap[$option]] = $value;
}
echo "$config = ".var_export($new_config, true));

Ok, I admit it, that's a bit oversimplified, but you get my point 😄

So I think most of it is an extensive use of grep, and using an IDE that can help you out with automatic replacements, it's actually pretty simple and fast. The most difficult part, in my opinion, is actually to build that list, meaning to agree on what needs to be changed, and how (names, groupping, naming conventions, etc).

Moving to a more portable format like JSON or YAML would help a lot; YAML can be commented, JSON not so much. But regardless, that would open up the opportunity for extending configs, providing sensible default template configs, and the like.

I'm not entirely sure why we would need portability here. I mean, I don't doubt YAML is a great language for configurations, but I don't see the benefit or what does it provide compared to PHP files as we have today. We have comments in our PHP configuration files. We have of course templates for them, and it's definitely possible to extend the configuration. In fact, there's nothing more flexible than PHP in this case. So, as I was telling Sergio before, I don't really see much benefit in migrating to YAML (though I see the value of allowing YAML configuration files for those who want them). Maybe I am missing something, given that I'm no expert in YAML at all?

In general I think taking a decoupled approach of using configuration management tools already existing in the ecosystem would be a great improvement, and allow us to offload this technical debt.

I would agree if we didn't have anything today and had to write our configuration management code from the ground up, but that's not the case. We already have a pretty flexible configuration class that allows us to do whatever we need and works perfectly fine, and it's been there for much longer than other libraries like the one you are suggesting, so I consider it stable and mature.

Just to be clear: I think there's two different dicussions here: one is on the contents, or how our configuration should look like, more in a semantic plane if you want. The other is on the tools, or how should we manage the configurations and what to use for that. The idea behind this issue is to agree on the former, so that it is easier for users to configure SimpleSAMLphp, though, of course, the discussion on the latter is also welcome and enriching, especially if we can find out a way to satisfy the needs of more people.

I would also love to have input from @thijskh, @DRvanR and @relaxnow if they have time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment