Skip to content
This repository has been archived by the owner on Jan 19, 2018. It is now read-only.

[epic] Rework params to support xpath-like replacing and non-scalar values #70

Open
vpavlin opened this issue May 20, 2015 · 26 comments
Open
Assignees

Comments

@vpavlin
Copy link
Contributor

vpavlin commented May 20, 2015

Problem statement

Currently params work with simple substitution. You need to specify the param name prefixed with $ in the artifact and it's then replaced with the value given either by default, answers file or by the user in case of interactive mode. This is a problem because a developer of nulecule application has to modify (upstream) kube files and replace real values with something like $image. This brings debuging problems right now (you cannot use artifacts shipped in nulecule app directly) and will bring a lot of maintainance problems in the future (you need to mirror all changes in upstream artifacts in your modified files).

Based on the description in previous paragraph it is obvious (almost) all artifacts has to be modified from upstream and thus maintained directly in the nulecule app repo - i.e. it does not make much sense to reference artifacts by remote URL (https://...) and deffer their download to the point of installation.

Also there already were some discussions about supporting non-scalar types, thus part of this proposal is addition of type key to the params definition.

Proposal

JSON Pointer

Based on an idea of @markllama we came to the conclusion we need to be able to replace param values in artifacts not based on the placeholder, but based on the position in the json/yaml object. This will enable developers of nulecule applications to exactly specify what should be replaced without a need to modify the artifacts.

This concept is described in a JSON Pointer (https://tools.ietf.org/html/rfc6901)

For this kubernetes pod definition

{
    "apiVersion": "v1beta3",
    "kind": "Pod",
    "metadata": {
        "labels": {
            "app": "helloapache"
        },
        "name": "helloapache"
    },
    "spec": {
        "containers": [
            {
                "image": "centos/httpd",
                "name": "helloapache",
                "ports": [
                    {
                        "containerPort": 80,
                        "hostPort": 8080,
                        "protocol": "TCP"
                    }
                ]
            }
        ]
    }
}

The param image should look like

params:
  - name: image
    pointer: [/spec/containers/0/image]
    description: The webserver image
    default: centos/httpd

As you can see, pointer is defined as an array which means you can specify more places in an artifact(s) where the param should be used. Also the default is optional and if not specified (i.e. if the key default is omitted) the actual value in artifact can be used. Existing key default set to null means that a user needs to provide the value in answers.

Although using the pointer will be strongly recommended, current substitution of $ prefixed variables will continue to work as there might be some use cases for it.

To preserve the simplicity of answers file, params will be refferenced by name there. On the other hand extension to support JSON Pointer notaition as keys in answers file should be simple (and would potentially enable user to replace not only the given params, but, for debugging purposes, any value in any artifact)

Problems

  • Same JSON Pointer path might exist in multiple artifacts although user might want to change the value only in a single specific artifact

    • This might trigger a discussion about adding artifacts specific params again.
  • Exact location specified by JSON Pointer might lead to failing runs in case of unexpected changes in artifact.
    f.e. value of environment variable PASS specified in kubernetes pod definition:

    ...
        "containers": [
          {
            "name": "mariadb",
            "image": "$image",
            "env": [
              {"name": "USER", "value": "$db_user"},
              {"name": "PASS", "value": "$db_pass"},
              {"name": "NAME", "value": "$db_name"}
            ],
            "cpu": 1,
    ...
    

    would have path .../containers/env/1/value. If upstream decides to add environment variable FOO above the PASS variable, the path to PASS changes to .../containers/env/2/value (note the change 1 -> 2).

    • This is probably a smaller issue then current state of mirroring all artifacts as a part of the app

Allow URL and git description for artifacts

Although specification does not block anybody from specifying remote location, current implementation and the way how we substitute params basically supports only artifacts stored locally. As a part of this proposal we should document ability of the spec to accept remote locations for artifacts - i.e.

artifacts:
  - kubernetes:
     - https://raw.githubusercontent.com/projectatomic/nulecule/master/examples/skydns-atomicapp/artifacts/kubernetes/skydns-replicationcontroller.yaml
     - ftp://example.com/nulecule/my-app/artifact.json
     - file:artifacts/artifact2.json
     - ...

I'd also like to propose a special type of artifact which would be specified by object in a format similar to:

artifacts:
  - kubernetes:
     - git_url: https://github.com/projectatomic/nulecule/
       checkout: fda3fb8c48a7f8e6b73f70f55035c21fa22d7941
       path: examples/skydns-atomicapp/

Where

  • git_url: is mandatory and represents a string which can be used in git clone command
  • checkout: is optional and represents a string which can be used in git checkout command
  • path: is optional and represent a string which is
    1. a directory and can be used in cd command (all files found in that directory are then expected to be artifacts)
    2. a file and can be read

Artifacts can be then specified by a combination by URL like strings and above specified objects describing contents of git repository.

Problems

The only problem I can think of regarding this change is potentially conflicting file names. This is not really a spec problem, but mostly an implementation problem thus at least some naming suggestions might be provided to developers.

Add type to the params definition

Currently we use INI format to store answers.conf(.sample). Although this is nice from user experience point of view, it blocks us from using non-scalar values for params as is.

Let's assume we have an application which expects a list of IP addresses as a parameter. With current params structure and INI file it's not possible to parse and pass such value to the Nulecule app without assumptions being done about the param - i.e. mathing the value to some regular expressions or something similar. In case of adding a type key to params definition we would be able to recognize the value type easily and by parsing it also validate it against the given type.

For yaml and json types are implicit, for INI we would need to introduce some conventions - f.e.:

nulecule:
params:
  - name: public_ips
    pointer: /spec/containers/0/public_ip
    type: array
    default:
      - 10.0.0.3
      - 10.0.0.4

-----

answers.conf:
[my-app]
public_ips = '["192.168.1.2", "192.168.1.3"]'

The type key might be optional and default to current behaviour - i.e. type string.

Problems

This would obviously bring more complexity to both implementation of the spec and creation of nulecule apps.

@aweiteka
Copy link
Contributor

Related #64

@goern
Copy link
Contributor

goern commented May 20, 2015

do you guys think we should put that in the Summit release? I think so...

@vpavlin
Copy link
Contributor Author

vpavlin commented May 20, 2015

I think so too

@markllama
Copy link

If possible. I sure would like it as soon as is reasonable.

On Wed, May 20, 2015 at 2:28 PM, Vaclav Pavlin notifications@github.com
wrote:

I think so too


Reply to this email directly or view it on GitHub
#70 (comment)
.


Mark Lamourine markllama@gmail.com
Dad, Hubbie, Software Developer, System Administrator, Road Cyclist

@tradej
Copy link
Contributor

tradej commented May 21, 2015

Just wondering - why does the artifacts object contain an array instead of another object? shouldn't it be like this?

artifacts:
  kubernetes:
    - https://raw.githubusercontent.com/projectatomic/nulecule/master/examples/skydns-atomicapp/artifacts/kubernetes/skydns-replicationcontroller.yaml
    - ...

@vpavlin
Copy link
Contributor Author

vpavlin commented May 21, 2015

I am wondering the very same thing atm as I was looking at rework of atomicapp to comply with 0.0.2 - we changed components and params to be a list to get rid of magic keys..shouldn't we do the same for providers? Although that would have to be

artifacts:
  - provider: kubernetes
    content:
      - file:...

@aweiteka @goern Are you ok with using provider names as keys here? Or should we also change it to a list?

@markllama
Copy link

using names as keys breaks schema validation, doesn't it?

array of things with one attribute "name: foo" or .... .not sure

On Thu, May 21, 2015 at 12:49 PM, Vaclav Pavlin notifications@github.com
wrote:

I am wondering the very same thing atm as I was looking at rework of
atomicapp to comply with 0.0.2 - we changed components and params to be a
list to get rid of magic keys..shouldn't we do the same for providers?
Although that would have to be

artifacts:

  • provider: kubernetes
    content:
    • file:...

@aweiteka https://github.com/aweiteka @goern https://github.com/goern
Are you ok with using provider names as keys here? Or should we also change
it to a list?


Reply to this email directly or view it on GitHub
#70 (comment)
.


Mark Lamourine markllama@gmail.com
Dad, Hubbie, Software Developer, System Administrator, Road Cyclist

@goern
Copy link
Contributor

goern commented May 21, 2015

ja, there shall be no magic key as of #35

@goern goern added this to the Red Hat Summit 2015 milestone May 21, 2015
@imcleod
Copy link

imcleod commented Jul 1, 2015

Python implementation of json pointers - https://pypi.python.org/pypi/jsonpointer

@veillard
Copy link
Contributor

veillard commented Jul 9, 2015

since rfc6901 is a fragment identifier specification, I would really suggest for standard and potential future uses to use the syntax # for specifying the path:
pointer: #/spec/containers/0/public_ip

This means in URI terms, take the current document and extract the following fragment identifier.
Potentially we could use this to expand by fetching from other resources via URI-References

@bexelbie
Copy link

bexelbie commented Jul 9, 2015

This has the potential to break multi-provider support. Perhaps we should consider a two-level process. The Nulecule file remains a key/value structure like today. Each provider needs to provide a transformation matrix that translates the keys from the Nulecule file to their pointers (or other method) for the specific provider. It creates more work for application providers, but retains easy use by end-users with single and multiple providers.

This means we may need to define built-in transformers (probably xslt or something similar for json, and $XX substitution for docker) and the ability to load an external transformer from a container for more esoteric types.

@aweiteka
Copy link
Contributor

aweiteka commented Jul 9, 2015

@bexelbie I think you nailed it.

  • We can use provider artifacts unaltered
  • It addresses the provider-specific params issue Support provider-specific params #109
  • It allows for huge flexibility in using params while keeping the developer cost low
  • It provides a way to map unstructure provider files such as docker using the $<var> substitution method
  • Parameter re-use

Fleshing this out from our conversation...

Given Nulecule file

...
graph:
  - name: my-app
    params:
      - name: foo
        description: My param
        default: bar
      - name: bee
        description: Your param
        default: buzz
    artifacts:
      kubernetes:
        - file://artifacts/kubernetes/template.json
      docker:
        - file://artifacts/docker/template.run
...

artifacts/kubernetes/params.yaml maps the Nulecule param names to the provider file path.

foo: #/spec/template/metadata/name
bee: #/spec/some/other/json/pointer/path

artifacts/docker/params.yaml maps the Nulecule param names to the provider file using $<var> substitution. The bee param is not used for this provider since there is no mapping.

foo: $value

File artifacts/docker/template.run might look like this:

docker run -e FOO=$value ...

@aweiteka
Copy link
Contributor

Above I proposed putting the provider param translation in another file. This is problematic since we would probably need to agree on a naming convention and it abstracts the parameters making it difficult to develop since the params are referenced in two different files.

A more compact way would be to refactor the artifacts object so it has more structure than just a list of files. This would allow us to embed the params mapping into the artifacts object. Something like this:

...
    artifacts:
      kubernetes:
        sources:
          - file://artifacts/kubernetes/template.json
        params:
          - foo: #/spec/template/metadata/name
          - bee: #/spec/some/other/json/pointer/path
      docker:
        sources:
          - file://artifacts/docker/template.run
        params:
          - foo: $value
...

cc @veillard

@veillard
Copy link
Contributor

Seeing
sources:
- file://artifacts/kubernetes/template.json
params:
- foo: #/spec/template/metadata/name

urges me to simplify it to
params:
- foo: file://artifacts/kubernetes/template.json#/spec/template/metadata/name

And you have something which is a proper URI

Daniel

@aweiteka
Copy link
Contributor

urges me to simplify it to
params:

  • foo: file://artifacts/kubernetes/template.json#/spec/template/metadata/name

That does seem appealing and I think we should support that so params can be targeted to an exact thing using fully qualified URI. The trouble I see with this is...

  • sources is a list of artifacts to deploy.
  • one may have many params per source file potentially resulting in a lot of redundancy in specifying the file.
  • some sources are in git remotes. See artifacts object. We would need to think through how these are referenced.

One of the use cases I'm considering is there may be multiple places to reference a single param value. This is not possible with the list of key:value pairs suggested here. A way to approach this could be to us a list of pointers under each param name key. For example...

...
    artifacts:
      kubernetes:
        sources:
          - file://artifacts/kubernetes/template.json
        params:
          db_pass:
            - #/spec/template/metadata/database/passwd
            - #/spec/template/metadata/frontend/database_passwd
          foo:
            - #/spec/some/other/json/pointer/path
      docker:
        sources:
          - file://artifacts/docker/template.run
        params:
          db_pass:
            - $value
...

@veillard
Copy link
Contributor

One nasty thing is that referencing local files is always a bit of a mess, file:// scheme works fine for absolute patch but in practice it's rarely where the files sits. The RFC for file:// is (was) a big mess for anything not an absolute path, so you end up forgetting the scheme and doing things like
../config/templates.json which is an URI references to the base which is the URI of the current file. and then you hit Windows with C:\MyStuff\myapp.json for the base and all bets are off :-
So that 'Let's use URIs' is great up to a point for referencing files. Where it would shine is very dynamic
environment where app templates are just kept online and parameters are fetched from databases
or from the config parameter exported by existing applications or run-times.

Since we need to keep $ based substitution anyway, we could just make source optional for pointers.
That's the equivalent of a base (html or XML) and a commonly accepted concept in URI world :-)

@vpavlin
Copy link
Contributor Author

vpavlin commented Jul 17, 2015

@veillard I am not sure f I understand what you are proposing/suggesting in your comment - file:// refs are bad, but not using them is worse...so what is the solution?:-)

@veillard
Copy link
Contributor

no solution, just a warning. for example file:/../foo vs. file:///../foo will get probably more confusion
and errors than simply accepting ../foo for relative file paths

@veillard
Copy link
Contributor

Hum, params should really be associated to a given source. If you start giving multiple sources in the
current suggested syntax, how do we know on which source the param should be evaluated ?
single source and associated params need to be bound together. Either we do:

params:
  foo: artifacts/kubernetes/template.json#/spec/template/metadata/foo
  bar: artifacts/kubernetes/template.json#/spec/template/metadata/bar

and then we need to check the resource is in the provided list of sources,
or when we define the sources we do a liste like

artifacts:
   kubernetes:
      sources:
         - resource: artifacts/kubernetes/template.json
           params:
              - foo:
                   - /spec/template/metadata/foo
              - bar:
                   - /spec/template/metadata/bar
        - resource: artifacts/kubernetes/extra.json

@vpavlin
Copy link
Contributor Author

vpavlin commented Sep 24, 2015

@veillard You are right, @aweiteka's proposal will not work in case pointers are not same for all artifacts in a provider..so it seems we will have to go your way - params mapping will need to be specified for each artifact (i.e. source)

@vpavlin
Copy link
Contributor Author

vpavlin commented Sep 24, 2015

Maybe don't use array for params as the keys are not really "magic keys" we want to avoid but has to exist in component.params section

artifacts:
   kubernetes:
       - resource: artifacts/kubernetes/template.json
         params:
            foo:
               - /spec/template/metadata/foo
            bar:
               - /spec/template/metadata/bar
      - resource: artifacts/kubernetes/extra.json

@veillard
Copy link
Contributor

Yeah that works

artifacts/kubernetes/template.json#/spec/template/metadata/foo

is more web like but it can get repetitive and we already need to provide the URI for the
resource so ACK on vavlin suggested syntax

@vpavlin vpavlin self-assigned this Sep 29, 2015
vpavlin added a commit to vpavlin/atomicapp that referenced this issue Oct 23, 2015
New Nulecule_Base methods for getting and templating artifacts were added.
These are backwards compatible with pre-xpath implementation.

New format for artifacts is described in:

projectatomic/nulecule#70 (comment)

Fixes projectatomic#192
@aweiteka
Copy link
Contributor

We need to update the spec.

@cdrage
Copy link
Member

cdrage commented Nov 19, 2015

To be honest, I haven't had much experience in regards to spec writing.

@aweiteka @bexelbie Where in particular should I be adding the changes of xpathing into the spec?

@aweiteka
Copy link
Contributor

@cdrage
Copy link
Member

cdrage commented Nov 19, 2015

Awesome, I'll send in a PR :) thanks @aweiteka !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants