Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Fixit Config File Requirements #183

Closed
lisroach opened this issue Apr 8, 2021 · 3 comments
Closed

Discussion: Fixit Config File Requirements #183

lisroach opened this issue Apr 8, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@lisroach
Copy link
Contributor

lisroach commented Apr 8, 2021

When scaling Fixit to large monolithic repos we have run into challenges with the config file. In monorepos it is important to be able to do directory level configurations, that way users can tweak rules for just their projects without affecting the entire repository.

I wanted to kick off a discussion around this problem first by getting an agreement on the requirements for fixit’s configs. Here is the problem space I wish to address:

Problem: I want to turn on (or off) a lint rule just for my directory

In Fixit today there is only one top-level config, and only one way to turn off rules for the repo: block_list_rules.
This means we can turn off rules for everyone or no one, with no subdirectory level support.

Problem: I want to customize a rule_config just for my directory

Same as the issue above, since there is only one root-level config file there is no way to customize rules for subdirectories.

Proposals

Inheritance of config files

The problems listed can be mostly addressed by supporting subdirectories with overriding configs. This way a top-level config file can exist that contains the defaults the majority of the repo will utilize, but subdirectories can override these defaults with their configurations.

Based on our experiences with Flake8, I propose these files support inheritance. When there is no config inheritance, users usually make copies of the top-level config to keep most of the defaults and only change the few they need. Unfortunately, these copies are not linked with the top-level config and get out of sync very quickly. Worse still, many users assume inheritance is a property already of the config files and do not copy the top-level configs at all, losing valuable and sometimes necessary customizations.

Enabling inheritance of config files will enable custom configurations per-directory, while keeping in sync with the top-level config and its defaults.

Inheritance opens up a lot of questions, which I have listed in the next section below.

Create an allow_list_rules key

To fully address the first problem I propose we add a new key, allow_list_rules.
Adding an allow_list_rules section to the config will enable us to mark rules as ‘on’, instead of defaulting to everything running. Using this, directories can turn on rules they want without the rule also being turned on everywhere else.

This will also make it less risky to upgrade to a new version of Fixit. New rules can be added to Fixit, but they won’t start firing until we explicitly turn them on. This gives us a chance to test and vet new rules.

block_list_rules should take precedence. This way inherited rules that are on can be overridden and turned off, as well as it enables the root config to turn off dangerous rules for all leaf configs in a single location.

Inheritance Questions

Assuming people are on board with the idea of inheritance, there are several questions that should be discussed.

Should leaf configs override the root, or merge with the root config?

Example:
root/.fixit.config.yaml

block_list_rules:
- BlockListedRule

root/subdir/.fixit.config.yaml

block_list_rules:
- OtherBlockListedRule

Overridden the final file looks like...

block_list_rules:
- OtherBlockListedRule

Merging/appending instead of overriding becomes...

block_list_rules:
- OtherBlockListedRule
- BlockListedRule

I think the answer here is merging should be the default, at least for keys like block_list_rules. This way the root config has the power to en-mass turn off rules, which could be valuable if a rule is found to be dangerous.

Let’s check again with a more complicated example showing a deep merge in the rule_config.

root/.fixit.config.yaml

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
repo_root: .
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [["*", "allow"]]

root/subdir/.fixit.config.yaml

block_list_rules:
- OtherBlockListedRule
packages:
- myproject.rules
rule_config:
    ImportConstraintsRule:
        mysubdir:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

Merged config

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
- OtherBlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
- myproject.rules
repo_root: .
rule_config:
  ImportConstraintsRule:
    fixit:
    - - '*'
      - allow
    mysubdir:
      rules:
      - - fixit
        - allow
      - - other_module
        - deny
      - - '*'
        - deny
      ignore_tests: true
      ignore_types: true
      message: '''{imported}'' cannot be imported from within ''{current_file}''.'

This again looks right to me. We’ve added some custom subdir configurations to the ImportConstraintsRule, and we’ve extended the packages and block_list_rules.

Should leaf configs always deep merge with the root config?

Assuming people agree with my assessment that the above config looks good, does it make sense for it to always deep merge? Let’s look at an example where keys in both the root and the leaf are identical:

root/.fixit.config.yaml

rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [["*", "allow"]]

root/subdir/.fixit.config.yaml

rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

Merged file

rule_config:
  ImportConstraintsRule:
    fixit:
      rules:
      - - '*'
        - allow
      - - fixit
        - allow
      - - other_module
        - deny
      - - '*'
        - deny
      ignore_tests: true
      ignore_types: true
      message: '''{imported}'' cannot be imported from within ''{current_file}''.'

The rule_config’s rules section has been appended, but this results in an invalid rule: there are two * rules which will raise an exception. Here the deep merge technique failed.

What if we override duplicate keys in the rule_config section, and merge the rest?

This is possible to do. Overriding the rule_config keys and merging everywhere else would result in:

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
- OtherBlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
- myproject.rules
repo_root: .
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

Which is a correct config file. This is not prohibitively complicated to implement, we can choose which keys we want to be merged and which we want to be overridden with a tool like jsonmerge.

This strategy works well for the few rules I have tested it out on. My only concern is the variety of rules. I wonder if some rules would be better configured with a deep merge, instead of overriding. I’d appreciate any thoughts around this. It is possible to require each rule author to define whether they want their rule configured via a deep merge or overriding, but that is adding a layer of complexity.

Config resolution order?

How do we want to define the order of config inheritance? There’s several questions here:

  • Should configs be limited to inheriting only from the root config?
  • Should configs be able to inherit from configs outside of their directory tree?
    • If so, how does the user define this (a new key in the config?)
    • How do we avoid the diamond problem and determine order?

I think the most appealing way is to inherit via the directory hierarchy. We can read yaml files starting from the root (where default values would be) to the leaf (where most specific values would be, which should take precedence).
Like so:

root
├──root_config.yaml
└──myproject
    ├──sub_config.yaml
    └──subdir
        └── leaf_config.yaml

Here leaf_config.yaml inherits from sub_config.yaml and root_config.yaml. sub_config.yaml inherits just from root_config.yaml. The user isn’t be required to do any extra configurations to indicate from where they wish to inherit, it happens automatically. I think this is intuitive to users, and keeps the implementation and maintenance simple.

Embedding partial configs?

This idea adds some complexity, but it is worth discussing. We could use interpolation or includes to embed whole configs in parts of another config. This might be interesting for individual rules in the rule_config section.

Instead of defining default rule customizations in the root config, rule authors could generate a config just for their rule, and embed that into the root config. Something like this:

rule_config: !include root/rules/**/*.yaml

That would embed all the configs defined under rules.

This adds the overhead that rule authors would need to determine “default” rule configurations, and would result in the creation of more config files. You also would not be able to see the configuration of all the rules at a glance. But it would keep the root config a lot smaller and more manageable, and make it relatively easy to find and update rule_configs specific to individual rules.

Thanks for reading this far! This is a brain dump of my thoughts, please comment on anything you disagree (or agree) with, or anything I have not thought of.

@lisroach
Copy link
Contributor Author

To get things moving here I'll work on implementing allow_list_rules and see if there are any objections on the PR

@lisroach
Copy link
Contributor Author

While waiting on review of #184, I'll put out my plan for what I think is the right design for inheritable configs (open to suggestions!).

TL;DR on the post above, I propose:

  • We create inheritable configs
  • The configs are inheritable via simple hierarchy
    • leaf configs merge with or override root configs
  • We are selective with how each section of a config is inherited
    • Certain keys are merged the leaf configs, while others are directly overridden

I am also interested in investigating the partial configs as well, but I believe that can be evaluated later.

For implementation, I evaluated a few different libraries. I'll go into detail into those below in case people want to read into it, I sorted them from least viable to most viable. But I'll give another TL;DR here:

I believe using jsonmerge is the best option. It gives us the flexibility to configure how each section of the config should be overridden, and it should be simple to implement.

The steps to enable are basically:

  1. Create a JSON Schema for the existing Config file
  2. Create a directory crawler that uses jsonmerge to merge configs in each directory
  3. Update the JSON Schema with the merging techniques for each key

And that's it. I am interested in anyone's thoughts (especially @jimmylai :)) and happy to change course if there are objections.


Library Evaluations

PyYAML Root Key Merging

PyYAML and ruamel.yaml provide a very basic inheritance functionality that allows usage of defaults defined in one config to be used elsewhere. It does this via key [merging)(https://yaml.org/type/merge.html). This proposal is essentially that there is one root config and it is the only config any file can inherit from. We would define all our defaults there and users would use their configs to add to these.

Documentation is pretty sparse here, ruyamel appears to be better documented and more up to date, but functionality looks relatively the same.

Rudimentary example:
root/config.yaml

root: &default
    block_list_patterns:
    - '@generated'
    - '@nolint'
   
subdir:
    <<: *default
    block_list_patterns:
    - '@test'

becomes

block_list_patterns:
- '@generated'
- '@nolint'
- '@test'

This is not very configurable from a users perspective, and based on the few docs I’ve found I’m not sure if you can use the inheritance across config files at all. I bring it up in case people do want to discuss the option of having only one config that is possible to inherit from, but I do not find the idea compelling.

Includes and Interpolation

This is what I think we can use for embedding partial configs for the rule_config, I also evaluated it for use as the whole thing. Instead of true inheritance, we could use includes or interpolation to share config information. This requires some more work on the users’ part as they need to manually add the correct values to their config, but it provides a good deal of flexibility.

Using includes embeds another config file into the config file that uses it, support comes from a library called pyyaml-include. An example:

root/config.yaml

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule

root/myproject/config.yaml

root_config: !include root/config.yaml
packages:
- myproject.rules

becomes...

root_config:
  block_list_patterns:
  - '@generated'
  - '@nolint'
  block_list_rules:
  - BlockListedRule
packages:
- myproject.rules

Interpolation support exists in a few libraries, including hiyapyco which I will talk about more in the next section. An example:
root/config.yaml

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule

root/myproject/config.yaml

{{ block_list_patterns }}
{{ block_list_rules }}
packages:
- myproject.rules

becomes...

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
packages:
- myproject.rules

I think these ideas are interesting, but don’t fit our entire use-case well enough on their own. Going back to my experience with Flake8, users assume there will be inheritance, and may omit entirely the needed configuration sections from the root config. There also isn’t an easy way to override or add to individual keys.

There might be some potential use for this, though. A concern I have about the single root config is that it will become bloated with a huge rule_config section. We may be able to use one of these tools to define rule_configs in their own files and pull them into configs (or the root config) in a way that will result in a much shorter file.

Merge Configs via Hierarchy

With this technique all config files in the directory are merged together into one using a merge (optionally a deep merge). There are a number of libraries that support this, like himland hiyapyco. hiyapyco is more configurable, so I’ll discuss it here.

A list of paths is passed to the tool and merged with latest path taking precedence over older path. There are two supported configuration options, METHOD_SIMPLE which replaces older config objects, and METHOD_MERGE which performs a deep merge. METHOD_MERGE is closest to what we need, so I have shown examples here:

root/config.yaml

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
repo_root: .
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [["*", "allow"]]

root/myproject/config.yaml

block_list_rules:
- OtherBlockListedRule
packages:
- myproject.rules
rule_config:
    ImportConstraintsRule:
        mysubdir:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

Merged config with METHOD_MERGE...

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
- OtherBlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
- myproject.rules
repo_root: .
rule_config:
  ImportConstraintsRule:
    fixit:
    - - '*'
      - allow
    mysubdir:
      rules:
      - - fixit
        - allow
      - - other_module
        - deny
      - - '*'
        - deny
      ignore_tests: true
      ignore_types: true
      message: '''{imported}'' cannot be imported from within ''{current_file}''.'

This works well, as we were able to merge the common items into one larger config.

But there is an issue if a leaf config attempts to override the root config:

root/myproject/config.yaml

block_list_rules:
- OtherBlockListedRule
packages:
- myproject.rules
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

Merged with config METHOD_MERGE...

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
- OtherBlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
- myproject.rules
repo_root: .
rule_config:
  ImportConstraintsRule:
    fixit:
      rules:
      - - '*'
        - allow
      - - fixit
        - allow
      - - other_module
        - deny
      - - '*'
        - deny
      ignore_tests: true
      ignore_types: true
      message: '''{imported}'' cannot be imported from within ''{current_file}''.'

The rule_config rules section has been appended, resulting in an invalid rule: there are two * rules. This will raise an exception, as I discussed in my first post. Because of this, I do not think we can use this technique without breaking some existing rules.

jsonmerge + custom code

This has led me to jsonmerge. Jsonmerge can better handle some of the issues that the hierarchical merge tools can’t do, like merging different keys in different ways. For example, we may want to always append to block_list_patterns but always override rule_config configurations. With jsonmerge we’d provide our own JSON schema which will define the merge strategies we want per key. This can give us a result that looks like this:

root/config.yaml

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
repo_root: .
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [["*", "allow"]]

root/myproject/config.yaml

block_list_rules:
- OtherBlockListedRule
packages:
- myproject.rules
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

merged config...

block_list_patterns:
- '@generated'
- '@nolint'
block_list_rules:
- BlockListedRule
- OtherBlockListedRule
fixture_dir: ./tests/fixtures
formatter:
- black
- '-'
packages:
- fixit.rules
- myproject.rules
repo_root: .
rule_config:
    ImportConstraintsRule:
        fixit:
            rules: [
                ["fixit", "allow"],
                ["other_module", "deny"],
                ["*", "deny"]
            ]
            ignore_tests: True
            ignore_types: True
            message: "'{imported}' cannot be imported from within '{current_file}'."

This is correct, and looks to be easy to implement. Another benefit of using jsonmerge is we have the potential for rules to define their own custom JSON Schemas. This gives rule creators granular control over how their rule is inherited, instead of us deciding one single solution for all rules. This is harder to implement and maintain, plus it is added complexity for rule owners, but may be a nice future feature.

@amyreese
Copy link
Member

amyreese commented Oct 8, 2022

Most of this discussion is being handled with the new configuration planned as part of FP1: #236

@amyreese amyreese closed this as completed Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants