Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use commands.Config instead of config.Config #1390

Merged
merged 10 commits into from Aug 1, 2016
Merged

Conversation

technoweenie
Copy link
Contributor

I want to live in a world where most of the LFS functions don't use a global config.Config value. The commands package is the only exception, so I created a commands.Config value to use.

This is a small first step towards removing config.Config around the app.

@ttaylorr
Copy link
Contributor

ttaylorr commented Jul 21, 2016

I, too, want to live in that 🌎 .

I wonder whether or not we should export the new commands.Config variable. At the moment, I'm thinking not, since it makes it easy for other packages to depend on that package-public variable, much the same way as config.Config currently is.

Thoughts?

@technoweenie
Copy link
Contributor Author

Probably shouldn't export it, but we can't use config because it clashes with the config package. cfg works. I wish there was a better convention for global/private args like that.

@ttaylorr
Copy link
Contributor

Hm.. I'd be okay with calling it cfg

@technoweenie
Copy link
Contributor Author

I'm stopping here. This replaces most of the config.Config uses outside of the following packages: api, lfs, and transfer. Those packages will need some more refactoring.

@technoweenie
Copy link
Contributor Author

@sinbad: Any thoughts? I cut a release-1.3 branch, with v1.3.0 shipping tomorrow hopefully, so I'd like to get this in for v1.4.0.

@sinbad
Copy link
Contributor

sinbad commented Jul 28, 2016

Makes sense to localize the scope for testing like this, even though it makes the interfaces more noisy. An more specific alternative would be to identify which configuration elements are needed by each method and require those individually - even more noisy but arguably clearer and more 'contract like'.

A middle ground would be to have a single package-level Config struct for each package, enumerating the config elements each package needs and adding helper functions to convert the global config to them. That way it's clear what each package is using, and can also be used to break import cycles; at the expense of being a little more cumbersome.

@technoweenie
Copy link
Contributor Author

One thing that bothers me is that config.Configuration has helpers like BatchTransfer(). I'm thinking about making config.Configuration a struct without those helpers:

// package config
type Configuration struct {
  Git *ConfigValues
  Env *ConfigValues
  // ...
}

Calls like cfg.SetEnv() would instead be cfg.Env.Set(). Maybe other packages could define what they need through their own config wrapping struct then:

// package lfs
type Configuration struct {
  *config.Configuration
}

func (c *Configuration) BatchTransfer() bool {
    return c.Git.Bool("lfs.batch", true)
}

I'm not sure how we could do per-method contracts, but that may be interesting to experiment with.

My goal in this PR is to stop referring to a global config.Configuration in most of the "library" packages first, before moving on to bigger changes like what you suggested.

@ttaylorr
Copy link
Contributor

ttaylorr commented Jul 29, 2016

@technoweenie I love your idea of making *Configuration as simple as .Git.Foo() and .Env.Foo(). This should definitely be something that we get out there.

I'm not sure yet how I feel about package-level configuration. In theory it seems nice, but I feel like inter-package dependencies might make things more difficult than I'm hoping they'll be. As an alternative, I'd love it if we had a better set of types that were able to accept a configuration object and maintain their own set of relevant data by parsing it out from cfg.Env.Get() and the like.

I think the problem at large is that we have to have (either globally in package config, or per-package) a object that knows about the domain of all its callers. Obviously, this is way worse when the *Configuration object lives in package config, but I think things might still be hairy when we have a per-package config.

If we limited the responsibility of *config.Configuration to just be a data-fetching type, I think things could get pretty sweet. Here's what I'm thinking:

package config

type Config struct {
        Env Fetcher
        Git Fetcher
}

type Fetcher interface {
        String(key string) string
        Int(key string) int
        // ...
}

So to fetch data by hand, all you would need is:

func byHand() {
        var cfg config.Config

        fmt.Sprintf("Hello, %s", cfg.Env.Get("world"))
}

But my ideal scenario is that we have a set of well-factored types that represent the bulk of business logic in LFS and know how to fetch config values, as well as calculate results based upon the values stored in the configuration. Take the HTTP proxy stuff, for example:

type HttpProxy struct {
        HttpProxy string `config:"env=GIT_HTTP_PROXY,git=http.proxy"`
        HttpsProxy string `config:"env=GIT_HTTPS_PROXY,git=https.proxy"` 
} 

func (p *HttpProxy) ProxyFor(remote, repo string) *url.URL {
        // Complicated business logic goes _here_, not in the `*config.Configuration`.
}

Now, the tricky part is how do we get data into that type? One option I thought of would be having something like an initializer that works like:

func NewHttpProxy(cfg *config.Config) *HttpProxy {
        http := cfg.Env.Get("GIT_HTTP_PROXY")
        if len(http) == 0 {
                http = cfg.Git.Get("http.proxy")
        }

        https := cfg.Env.Get("GIT_HTTPS_PROXY")
        if len(https) == 0 {
                https = cfg.Git.Get("https.proxy")
        }

        return &HttpProxy{http, https}
}

This works, but is pretty gross even without the env-falls-back-to-git behavior. Something cooler could be an unmarshalling function, provided by the config package that would accept a struct and fill its values tagged with the config based on the data that was in the env and Git configuration at that time. Perhaps:

package config

type Configuration {
        Env Fetcher
        Git Fetcher
}

var (
        // We still have a singleton in the `config` package, but it doesn't matter
        // any more because it's a) not global, and b) doesn't store tons of stuff, 
        // it _only fetches data_.
        cfg Configuration
)

func Unmarshal(v interface{}) error {
        val := reflect.ValueOf(v).Elem()
        typ := reflect.TypeOf(v).Elem().Type()

        for i := 0; i < typ.NumField(); i++ {
                tag := typ.Field(i).Tag
                if cfgTag := tag.Get("config"); len(cfgTag) != 0 {
                        // Fetch the config value based on the data in the tag
                        // and reflect.Value.Set() it into the struct field.
                } 
        }
}

By the end, you'd have a struct that is initialized with all of the data needed from the configuration in order to make complex decisions about the data within.

This got kind of long, but my main ideas are:

  1. Remove as much logic from the config package as possible by making it a data-fetching object only.
  2. Factor out small, composable types that are capable of making decisions about data in the configuration.
  3. Make loading configuration data into the types mentioned in (2) as easy as possible with a config.Unmarshal() error function.

The great thing about something like this is that we can preform this refactor without having to remove all of the code that currently lives in the *config.Configuration type, so we don't have to refactor all of LFS in a single swoop.

Thoughts?

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

@ttaylorr that's kind of what I was thinking of, except that to keep the function signature noise down, each package (or function) could have a config struct which aggregates those small domain-specific config structs (like HttpProxy), identifying which ones it uses, and which are explicitly passed in to functions (like Rick is doing in this PR, but more specific rather than 'all config'). If a function could identify what config it needs simply with a struct that looked something like (say):

type SomeSpecificConfig struct {
    *HttpProxyConfig
    *ConcurrencyConfig
}
func DoSomethingInAPackage(cfg *SomeSpecificConfig, otherArgs...) {
    ...

This has 2 benefits:

  1. passing the config inward rather than each package pulling config from inside means we have better control for testing (which is what this PR does)
  2. it makes it far easier to identify what specific subset of config each package/function is actually using.

Your Unmarshal style approach could potentially work to populate these.

But this is all longer term design thinking, as an initial step of making config pass explicitly into packages this PR is fine.

@ttaylorr
Copy link
Contributor

ttaylorr commented Jul 29, 2016

each package (or function) could have a config struct

This is a neat idea. I think my longer term thinking is that I want to get away from methods that take no receiver and move towards types that have methods that can act on the data embedded in those types. Instead of:

type HttpProxyConfig struct {
        HttpProxy  string
        HttpsProxy string
}

func ProxyForRepo(cfg *HttpProxyConfig, remote, repo string) (*url.URL, error) { ... }

I want to move towards:

type HttpProxy struct {
        HttpProxy  string
        HttpsProxy string
}

func (p *HttpProxy) Proxy(remote, repo string) (*url.URL, error) { ... }

I think having meaningful types that have well-defined methods that can act and make decisions based on their embedded data will yield more organized and uniformed code within LFS.

Your Unmarshal style approach could potentially work to populate these.

To keep the amount of exported functions down within a package, I think implementing the config.Unmarshal(v interface{}) error func would be a good step. That way we get 1 type and N methods on that type per domain group, and behaviors within that group, respectively.

But this is all longer term design thinking, as an initial step of making config pass explicitly into packages this PR is fine.

I agree, and I think we should move forward with merging this PR. My main goal is to get away so much from the idea of "configuration" types and more towards data-fetching types and business-logic types which can fetch and act on data.

@sinbad
Copy link
Contributor

sinbad commented Jul 29, 2016

If we want to go fully go-friendly, it could just be interfaces:

type HttpProxyConfigurator interface {
    Proxy(remote, repo string) (*url.URL, error)
}
type SomeSpecificConfigurator interface {
    HttpProxyConfigurator
    ConcurrencyConfigurator
}
func DoSomethingInAPackage(cfg SomeSpecificConfigurator, otherArgs...) {
    ...
}

Then it doesn't matter how the innards are populated.

@technoweenie
Copy link
Contributor Author

Great feedback. I'm going to merge this, as it cleans up our tests, doesn't affect the client behavior. It's nowhere near where I want things to be, but it's a step in the right direction.

I do want to experiment with some of those proposals in future PRs though.

@ttaylorr ttaylorr merged commit 701cc0c into master Aug 1, 2016
@ttaylorr ttaylorr deleted the commands-config branch August 1, 2016 19:01
ttaylorr added a commit that referenced this pull request Aug 3, 2016
As per my comments in
#1390 (comment), I'd like
the `*config.Config` type to be focused solely on data fetching, not having to
know about the domain of each of its callers. The `Fetcher` type is the first
step in that direction.

`Fetcher` is an interface designed to be implemented in three ways, concrete
types that wrap:
  1) Interaction with the `.gitconfig`.
  2) Calls to `os.Getenv`.
  3) A `map[string]string` for testing.

In this initial implementation, the `Fetcher` is responsible for type
conversion, providing methods like `Fetcher.Bool` and (eventually)
`Fetcher.Int`.

The above is undesirable for a number of reasons:
  1) It forces implementers of the `Fetcher` type to be responsible not only
    for data-fetching, but also data conversion.
  2) At the time of writing, data conversion is uniform throughout the LFS
    codebase. This encourages code smells by way of duplication in that all
    types will have to implement the same sort of type conversions with the
    same sort of code. The alternative is to have static helper methods, or
    weird abstraction trees, but this is similarly undesirable.

In subsequent PRs, I plan to demote the Fetcher type to a function:

```
type FetchFn func(key string) (val string)
```

and introduce an `Environment` type which *has* a `FetchFn`. In doing so, I'll
delegate the responsibility of implementing the `Bool` and `Int` (etc) methods
to the concrete `Enviornment` type, and only implement it once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants