Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: pipeline configuration from source control #1133

Closed
tomzo opened this issue May 16, 2015 · 132 comments
Closed

Feature: pipeline configuration from source control #1133

tomzo opened this issue May 16, 2015 · 132 comments

Comments

@tomzo
Copy link
Member

tomzo commented May 16, 2015

This feature made it! The documentation in GoCD is here.

I have collected most notable information from the comments below so that no one has to read all that again to get an idea what has changed in the system and what is expected from it.

Overview of the feature

  • user can define pipelines and environments in many source code repositories - configuration repositories
  • pipeline belonging to specific group can be specified in configuration repository.
  • configuration in repos may have references to main xml or to other repos. E.g. pipeline in repo A depends on other pipeline in main xml.
  • user can provide a plugin to interpret contents of single checkout of config repository in any custom way. E.g. pipelines defined in yaml
  • if configuration repo is the same (by global fingerprint) as one of pipelines scm material then they are treated as one.
  • scm-config consistency - pipeline running on source code at commit C1 will use configuration at commit C1 as long as they are in same repository.
  • environments from many config repo sources get merged together with environments from main xml. So that environment definition can be all in one repository or actually spread across many configuration sources.
  • pipelines from many config repo sources get summed together with pipelines from main xml.

Example configuration repositories

I have prepared example config repositories. In order of complexity:

  1. https://github.com/tomzo/gocd-main-config contains main cruise XML configuration. The one stored in /etc/go/cruise-config.xml
  2. https://github.com/tomzo/gocd-indep-config-part - XML configuration part with no external references.
  3. https://github.com/tomzo/gocd-refmain-config-part - XML configuration part that refers to pipelines from main.
  4. https://github.com/tomzo/gocd-refpart-config-part - XML configuration part with references to other configuration part repository
  5. https://github.com/tomzo/gocd-json-config-example - JSON configuration part

In main config https://github.com/tomzo/gocd-main-config there is config-repos branch with config-repo sections to import elements from the other repositories.

Domain and concepts

Configuration repository

Configuration repository is a source control repository of any kind that holds part of Gocd configuration.
So far we referred to this as config-repo or partial. However 'partial' should really be reserved for the object of configuration. While repository is the remote code, yet to be fetched and parsed.

ConfigOrigin

Tells where some part of configuration comes from. It was necessary to add, because now some services need this extra info to operate. There are 3 types of configuration origins:

  • the old XML file
  • one of configuration repositories
  • web UI (added in last branch 1133-ui)

Base and Merged configuration

There are 2 scopes of configuration:

  • base - all configuration in cruise-config.xml, also stored and committed in internal configuration git repository
  • merged - cruise-config.xml + all remote, parsed elements.

These are important at system level because we consider validity twice, first at base scope, then at merged.

Behavior and assumptions

  • cruise-config.xml is always valid by its own when no parts are yet appended. Just like it was so far - meaning this feature is not breaking current xml-config stability.
  • when server (re)starts it loads only main configuration from xml. So for a while remote pipelines are not in Go server. Then we wait for material updates defined config-repo
    and configuration merging kicks in, each partial gets merged into current config.
  • there is never a situation when invalid merged config is considered as current config
  • Elements defined in configuration repository should be rendered in UI but editing via UI should be disabled.

Significant cases

When pipeline is defined in configuration repository, there are always 2 cases which actually define how Go server should behave.

When configuration repository that defines the pipeline is the same as one of materials

In automated builds we expect that when pipeline is triggered with material at revision C1, then configuration of the pipeline will be from the same commit - C1.
There is a small (unavoidable) inconsistency here - when there are few quick commits (C1, C2, C3) made, that change pipeline configuration, then Go may pick them up faster than finishing already running builds (E.g. Configuration has been updated to C2, when stages on C1 are is still running). It may lead to failing a build that would have passed if the commits were slower. However IMO this is good after all, the quick commits usually would be done because somebody wanted to fix the previous configuration. There is no way to avoid it because only one pipeline configuration can exist at a moment.

In manually triggered builds Go always fetches materials first, which may change the configuration of pipeline that we just triggered.

  • when changes fetched changed the pipeline configuration then it just runs on new configuration
  • when changes fetched removed current pipeline then build is canceled.
  • when changes fetched made current merged configuration invalid, then it will run on old configuration and display a warning.

In timer triggered builds Go also fetches materials first, which may change the configuration of pipeline that is being triggered.

  • when changes fetched changed the pipeline configuration then it just runs on new configuration
  • when changes fetched removed current pipeline then build is canceled.
  • when changes fetched made current merged configuration invalid, then what? (I can't see what option is sensible at all, each has major drawbacks).

When configuration repository that defines the pipeline is not one of materials

This case is much less complex. Go is always polling for changes in configuration repositories and tries to merge them to current configuration.
The rules are the same as if the incoming changes were done from UI.

Failures

Hung material

What happens when one material polling gets hung:

  • when config repo and pipeline material is the same - latest partial is used. Pipelines that use that material do not get auto-scheduled anyway. No harm. Manual trigger can still be issued.
  • when config repo and pipeline material are different - latest partial is then old. if pipeline would schedule then it would use configuration from old commit in config repo with new commits from material repos.

Failed parsing

When plugin fails or configuration has invalid format or migration fails in configuration repo checkout then material update completes but config partial is old.

  • when config repo and pipeline material is the same - if pipeline would schedule then it would use old configuration with new commit violating scm-config consistency. it is not allowed to schedule until partial is fixed. (Actually this is implemented by canceling build )
  • when config repo and pipeline material is different - same as in hung case. (latest partial is then old. if pipeline would schedule then it would use configuration from old commit in config repo with new commit from scm repo.)

Handling merges and conflicts

How to handle merging configuration parts and main configuration?

  1. Merges are done at object-level. (Meaning first all XML and all repositories are parsed to create BasicCruiseConfig and PartialConfig, then an aggregate object is created - BasicCruiseConfig with merge strategy)
  2. According to rules written below

Environments

Pipelines in environment

Most liberal approach possible:

  • if any new pipeline name appears then consider it member of environment.
  • If pipeline name repeats among many configuration parts then just ignore repetition.
Agents in environment

Most liberal approach possible:

  • if any new agent uuid appears then consider it member of environment.
  • If agent uuid repeats among many configuration parts then just ignore repetition.
Environment variables in environment
  • if any new 'variable1=value1' appears then consider it member of environment.
  • if 'variable1=value1' repeats then just ignore
  • if first part has 'variable1=firstvalue' and second part has 'variable1=othervalue' then it is a conflct and merged config is invalid.

There could be optional overrides but we can consider it future work.

Pipelines

  • Final pipeline groups get created as a sum of pipelines in groups in partial configurations
  • if there are 2 pipelines with the same (case insensitive) name then it is a conflict, configuration is invalid.

Authorization can be only in main xml so it cannot conflict when merging.

System

Some notes about changes in how Go services work and what is happening when configuration repositories are present.

Services

Here is a summary of new services layout:

Below GoConfigService

  • renamed (with refactoring) GoConfigDataSource to GoFileConfigDataSource
  • moved implementation of CachedGoConfig to CachedFileGoConfig
  • added GoRepoConfigDataSource - holds recent configuration repository parse result (PartialConfig or exception). It is called from top with clean checkout prepared already.
  • added GoPartialConfig which holds latest set of successfully parsed partial configurations.
  • added MergedGoConfig where CachedGoConfig used to be - there were many references to old class CachedGoConfig. Now they all reference MergedGoConfig instead. MergedGoConfig understands multiple configuration sources (parts and main).
  • added CachedGoConfig interface. Implemented by MergedGoConfig and CachedFileGoConfig. Public methods look like in the old CachedGoConfig class. It used only to test against. Best explanation is in commit message tomzo@80706b3
  • GoConfigFileDao is renamed to GoConfigDao. Almost no changes here.
  • added GoConfigWatchList - keeps track of list config-repos that should be polled and parsed. Fires events when list has changed.
  • added GoConfigPluginService - provides a config plugin implementation by name. This service is still TODO and currently always returns default gocd-xml plugin.

The best analogy to get the whole point here is that MergeGoConfig has replaced the old CachedGoConfig. It used to be that CachedGoConfig had 2 instances of configuration in memory (for edit and current config). Now there is MergeGoConfig that has these two. But main difference is that MergeGoConfig may return merged configuration as current config or for edit. If there are no extra configuration parts then it returns the main configuration.

Above GoConfigService

This is implemented mostly how we discussed here

New material update queue

  • Added new queue - config-material-update-required - Materials which are configuration repositories are always requested on that queue.
  • All other materials are on the old queue material-update-required
  • MaterialUpdateService understands both these queues and schedules update accordingly.

Unloading queues

  • Previously 10 MaterialUpdateListeners were unloading material-update-required, then talking to MaterialDatabaseUpdater and posting MaterialUpdateCompleted messages to material-update-completed. Now using the same classes there are additional 2 MaterialUpdateListeners unloading from config-material-update-required and posting to config-material-update-completed.
  • MaterialUpdateService does not listen on config-material-update-completed topic.
ConfigMaterialUpdater - new component

Added new component - ConfigMaterialUpdater which listens on config-material-update-
completed
topic. So when MDU is done then ConfigMaterialUpdater gets its chance to work with material being updated:

  • It uses MaterialRepository to check if there were any changes
  • It uses existing pollers code to checkout material to directory
  • It calls GoRepoConfigDataSource (where parsing happens) and when done
  • it posts to material-update-completed which is picked up by MaterialUpdateService using standard procedure as if this was an old-school material. This removes material from inProgress status.

Final service notes

Reuse pollers directories

The checkouts (in pipelines/flyweight) are NOT done/updated by standard material pollers when doing update on db (MDU).

But now there is new type of poller that creates full checkout on each update. These directories are now read and parsed by configrepo plugins.

Handling edits

Merged cruise config is returned for edits. When some service is editing the config it does not know if the config is merged or not. It does not have to know.

Adding

When method to add pipeline or environment is made then it reaches merged cruise config at some point. It is then aware that we meant to add in the main part and changes the main config instance (inside the merge cruise config instance).

Removing

Removing is like adding. We can localize where to remove from. If user tries to remove remote element then it fails. Usually it would fail in the cruise config code.

Modifications

Modifications get complex because there are many ways in which they are introduced. This is where there is real benefit from returning merged config instance. Changes are made on the config instance in full merged context so that when anything invalid is attempted then it will throw. E.g. when trying to change name of pipeline group defined remotely.

Saving changes

Each config edit ends with attempt to save some config for edit instance (or deep clone of it, or clone of a clone, etc.). To deal with that - magical writer is aware of possibility that merged config might be passed to be serialized. If so then it takes out only locally defined configuration elements. Actual extraction of local elements is implemented in config-api and it is very easy because we keep and maintain the main configuration instance inside merged config anyway.

Pull requests

These are either merged or planned pull requests to make all above work:

Original post from May 2015

Motivation

Being a big fan of keeping all project-related code in its source code repository I would really like to be able to declare pipeline configuration in the source code of each individual project instead of the global cruise-config.xml.
Many people will agree that each project's code should know how to build and test itself. Following this concept it should also know how to CI-itself.

Problem

Currently when all go configuration is in global configuration file on server we basically end up with 2 sources of projects configuration - one being git repository, the other a file on go server. There are lots of scenarios when new changes in git repo will cause the build to break because they expected different pipeline configuration. Or rather pipeline configuration expected the older git repo contents.

Concept

In order to avoid such conflicts probably the <pipeline> section should never be in the global cruise-config.xml, instead go-server should configure pipelines after pooling from source repositories.

Final notes
  • Is anyone interested in such feature or am I crazy? Please provide some feedback on how would you like to see this?
  • How do you (or your organization) handle the problem described above?
  • I am not a gocd developer and I am unfamiliar with its source code or development process. But I learn fast and I am very determined to get this done.
  • I would like to kindly ask the core developers of gocd to the right direction on getting this implemented. What components will need to be updated? How invasive would it be? Can configuration loading and applying be easily replaced to the general schema I described above.
@matt-richardson
Copy link
Contributor

+1

One thing to consider is that you'd have to know the xml schema - you wouldn't be able to use the UI to edit it... Unless there was some way of sync'ing the config for a pipeline back to source control.

@zabil
Copy link
Contributor

zabil commented May 18, 2015

There's another conversation on #838 which talks about this.
Here are a few to consider.

Source code is a type of material to a pipeline

Handling pipeline groups and role based auth.

Making pipelines available while configuring (environments)[http://www.go.cd/documentation/user/current/configuration/managing_environments.html]

Related to code.

For learning how the config xml is loaded/parsed/validated
https://github.com/gocd/gocd/blob/master/config/config-server/src/com/thoughtworks/go/config/MagicalGoConfigXmlLoader.java

For operations on the pipeline
https://github.com/gocd/gocd/blob/master/server/src/com/thoughtworks/go/server/service/PipelineService.java

Please feel free to join our gitter for more help/discussions.

@tomzo
Copy link
Member Author

tomzo commented May 18, 2015

@zabil thanks for the tips. I think considerations from #838 are very much related. I am actually starting to think that the way configuration is handled should be rewritten.

There are a few points in #838 that form a great specification. Just to clarify:

  • 100% backward compatibility
  • no more big global xml configuration
  • validations must exist, but they should not be a bottleneck
  • xml should not be the only option - suggested by @mrmanc
  • so possible many sources of pipeline configuration

There is this long comment #838 (comment) which I think is great insight on this.

My idea of work towards new config implementation

  1. Create abstract configuration provider.
    • many implementations are possible (including the old one)
    • Components above should reference only this class. It's role would be to load and validate configuration object from whatever backend is used. It would provide the only source of truth for the rest of go.
  2. Put old implementation as first implementation of the abstract provider. This will guarantee backwards compatibility.
  3. Create a new shiny configuration provider - that should be separate issue.

The implementation could be auto-detected based on contents in go-server configuration directory. E.g. when there is cruise-config.xml then load the old one.

Pipeline configuration as material?

Disclaimer: this is just an idea
Haven't you noticed that updating pipeline configuration triggers a built just like it would in case of a new commit in material? I think this is a hidden symptom of not entirely true domain model. Lets say that every pipeline has at least a config material which defines all contents of pipeline. Then

  • when central go-configuration source is used it is just a config material of all pipelines of organization. Which I think it is the truth when using global config.
  • when many config sources are used they would show up as config materials only to the pipelines that these sources define.
  • in very specific case config material and git material could be the same repository - like I wanted in the first place.

What do you think?

Finally I would like to ask for status of work towards any of the above. The performance issue seems to have ended on that is has to be split in parts, but I don't see to many. I can see the moving go agents config but IMHO this is just a minor solution to larger problem.
Please approve (or disapprove) this direction, because I really do not want to get alone into such heavy work and not get merged.

PS: @matt-richardson I think it would be easy to sync back from UI to source. It could work like editing code on gitlab or github where you just have to add commit message in the end.

@zabil
Copy link
Contributor

zabil commented May 19, 2015

@tomzo we are still looking at putting in more fixes to solve immediate issues described on #838.

On direction, waiting for our BDFL @arvindsv to get back from vacation to comment.

@mrmanc
Copy link
Contributor

mrmanc commented May 19, 2015

If we do go down the route of having pipeline configuration as a material it would be great if the first thing the pipeline did was verify the integrity of the pipeline: named dependencies etc. Then you still fail fast and highlight any problems with the person who made the change.

@arvindsv
Copy link
Member

@tomzo: I think it's a big discussion, part of which has happened in #838, as you mention. Most of what you say about the abstract configuration provider makes sense. In my view, within the code today, CruiseConfig and its whole tree, is a representation of the configuration. It doesn't have any real link to the XML directly (once it is created). So, it could be a start towards that. The config-api was started so that everything related to the config (its "interface", if you will) was moved there. The config-server module has all the XML reading and writing code.

Another approach to thinking about this could be to think about a clean-slate and ideal implementation and see what it will take to get there, rather than trying to extract a good abstraction out of what exists today. That might be harder, in which case we can continue with the approach you brought up.

The fact that you mention that you're determined to get this done is the reason I think this can actually happen, and the part I admire the most, actually. :) So, if you can keep 100% backward compatibility, write well tested code and make different backend providers nearly pluggable, then I don't see why it cannot get merged.

My worry is that this is a bigger task than you or I anticipate and if you get bogged down, you might get discouraged. :) So, if you take small steps towards it, and involve the rest of us in what you're doing, we might be able to help. Of course, you might be the kind who doesn't get discouraged, and that's great! But, still if you involve others, it'll be easier to see progress and maybe merge smaller bits of code than the whole thing at once.

@arvindsv
Copy link
Member

You asked about the status of work: For the performance bit, my feeling is that the config save does not need to move out of XML for it to be fast (though it can/should move out for flexibility reasons as you mention). I think that keeping the state of the configuration in memory and not deserializing the whole thing all the time and validating everything all the time will help. @ketan and I have talked about this, and started work towards it, but got distracted. We should get back to it. Elaborating on that idea: For instance, adding a task does not require a full config revalidation. It cannot affect another pipeline or be invalid, unless it is a fetch artifact task. So, handling the validation at that level, and so on. This should be quick. Needs some work to put in the framework to allow that, as well as some changes to the UI controllers.

@arvindsv
Copy link
Member

The idea about config material repo(s) might need a little more clarity for me to comment on it, but from what I understand, I think it can be handled as one of the providers. The provider just happens to be reading from a repository of files, and deciding that some of them are pipelines. As long as the config interface is maintained, it should be fine. Of course, there might need to be some work done to poll that repository for changes, especially if you say that the <pipeline> tag itself shouldn't be in the config, but should be in the config material repository itself.

@mrmanc
Copy link
Contributor

mrmanc commented May 29, 2015

If the pipeline tag is not in the configuration, then where would the configuration repository be specified?

I’m still really keen that configuration integrity is enforced, so if configuration was read from a repo, then could it be validated as the first thing a build does? That would keep things failing fast…

@arvindsv
Copy link
Member

... then could it be validated as the first thing a build does?

I agree. I think it should, and it probably would be implemented that way.

If you look at the original post, @tomzo said:

In order to avoid such conflicts probably the <pipeline> section should never be in the global cruise-config.xml, instead go-server should configure pipelines after pooling from source repositories.

That's what I was talking about. If <pipeline> is not in the config, then maybe the config has something like:

<config>
    ...
    <pipeline-repo url="..." type="git">
   ...

I don't know. It's just a guess. That's why I mentioned needing more clarity.

But, at a high level, this is about the "pluggability" of the whole config, essentially. That can be done, if we have an abstract provider interface. If we have that, then making it read from a repository of pipeline-level declaration files, instead of one XML should be easy, I feel.

@tomzo
Copy link
Member Author

tomzo commented May 29, 2015

Validation

For instance, adding a task does not require a full config revalidation. It cannot affect another pipeline or be invalid, unless it is a fetch artifact task. So, handling the validation at that level, and so on. This should be quick.

This is something that I was thinking about as well. There is a lot of configuration that can be completely validated at lower level than global.

Config as material

The idea about config material repo(s) might need a little more clarity for me to comment on it, but from what I understand, I think it can be handled as one of the providers

This is what I had in mind. It seems simple to do assuming going into abstract provider direction.

if configuration was read from a repo, then could it be validated as the first thing a build does?

I think it is the only way to do it. The pipeline objects have to be built from that repo. I assume that we validate while building.

If the pipeline tag is not in the configuration, then where would the configuration repository be specified?

It would be something similar to what @arvindsv guessed. The way I see configuration in far future would be something like

<config>
    <config-part-provider type="file" url="/some/local/path" />
    <config-part-provider type="git" url="some git repo url" />
    ... 
</config>

The idea about config material repo(s) might need a little more clarity for me to comment on it

I do not have details yet but a few problems made me think that pipeline configuration should be a material. Consider this:
Lets assume:

  • we have the git config provider implemented
  • there is some project with all its code and pipelines defined in single git repo

Then go pulls that repository for 2 reasons - because it is a config and a git material of a pipeline. It should be aware that configuration repository and pipeline material is the same entity in this case. Otherwise it potentially could use different commits for config and the rest of source code, which is the inconsistency that I wanted to avoid in the first place.
When there are more pipelines the situation gets even more complex. And the fan-out and fan-in core feature would just solve it as long as it 'sees' configuration as just another material.

There is also this observation that I mentioned earlier - go already behaves as if configuration was a material. But it does not model it like one.

Next steps

So, if you take small steps towards it, and involve the rest of us in what you're doing, we might be able to help.

I was hoping to hear that. I see that it is major work of actually unknown effort. I think I will make first move towards that abstract configuration provider and implement xml config provider. Then I think I should merge to prove that config provider is abstracted and still working.
Then I would head towards newer implementation. As I stated initially - I am interested in having provider from git repo(s) so that would be the one I would implement.
I do not want to specify more details now before digging deeper into what old implementation looks like.

I was waiting on @arvindsv to comment and started working on something else in the mean time. So I will get back to gocd in few days. When I start I hope to get some help on how old configuration works, (and annoy you a little bit during the day). I hope that is OK with you.

But, at a high level, this is about the "pluggability" of the whole config, essentially. That can be done, if we have an abstract provider interface. If we have that, then making it read from a repository of pipeline-level declaration files, instead of one XML should be easy, I feel

Exactly. I think this is most critical part to start any progress with configuration issues.
I also think there should be a very detailed abstract test suite to be defacto specification of it and to run against any configuration provider implementations.

@arvindsv
Copy link
Member

When I start I hope to get some help on how old configuration works, (and annoy you a little bit during the day). I hope that is OK with you.

Sure. Let me know when and I'll help you get started.

@arvindsv
Copy link
Member

Then go pulls that repository for 2 reasons - because it is a config and a git material of a pipeline. It should be aware that configuration repository and pipeline material is the same entity in this case. Otherwise it potentially could use different commits for config and the rest of source code, which is the inconsistency that I wanted to avoid in the first place.

Right. If commit C1 changes code and commit C2 of the same repo changes pipeline config (which is just a file in the same repo), then there should be a build with both code and pipeline config at C1, and another build with both code and pipeline config at C2. But, there should never be a build where code is at C1 and pipeline config is at C2.

One of the reasons for mentioning <include file="my-file.xml" repo="scripts" type="go-xml" /> in #838 was to get to that. If modeled as:

<pipeline name=...>
  <materials>
    <git name="code" .../>
    <git name="scripts" .../>
  </materials>

  <include file="my-file.xml" repo="scripts" type="go-xml" />
</pipeline>

... then, a change to either repository will trigger a build. But, when modeled like this:

<pipeline name=...>
  <materials>
    <git name="code_and_scripts" .../>
  </materials>

  <include file="my-file.xml" repo="code_and_scripts" type="go-xml" />
</pipeline>

... then, a change to the repository, for either code or pipeline config, will always be consistent. This just reuses Go's usual material and does not explicitly bring in the concept of a config material. I felt that it would make it easier to implement and understand. It is also flexible enough to model both cases.

When there are more pipelines the situation gets even more complex. And the fan-out and fan-in core feature would just solve it as long as it 'sees' configuration as just another material.

Yes. There is a concept of uniqueness of a material, in Go. Materials which are considered unique across different pipelines are not polled multiple times. That'll need to apply here as well.

There is also this observation that I mentioned earlier - go already behaves as if configuration was a material. But it does not model it like one.

You're right. It is modeled only implicitly as a material which is checked for changes. Any changes to it from the filesystem or from the UI does cause the system to change, and potentially trigger pipelines. But, I think changing a task definition (for instance) doesn't cause a pipeline to trigger. Changing a material could. In that sense, it is not a full fledged material.

@tomzo
Copy link
Member Author

tomzo commented Jun 3, 2015

Yesterday I started getting familiar with current implementation of Go. I've setup dev. environment and I can debug 'development server' which is a lot of help together with searching through classes in IDEA.
I have already spent a few hours trying to figure out how it works and where would be a good point to start at. Could you please correct me if am wrong and answer some of the questions I have below?
Is there some technical/architectural documentation on Go that I do not know of?

PipelineConfig class

It seems to me that this what actually defines how most recent pipeline should be build. Correct?
Is the PipelineConfig class is fully defined by what tag contains?
It is just a part of bigger CruiseConfig.
It is what eventually I want in to be created from source control.

PipelineGroups class

what is this class responsible for? and what are these map objects for?

private Map<String, List<Pair<PipelineConfig, PipelineConfigs>>> packageToPipelineMap;
private Map<String, List<Pair<PipelineConfig, PipelineConfigs>>> pluggableSCMMaterialToPipelineMap;

PipelineConfigService class

This class seems not used anywhere. Is there a reason or this is just 'tech dept', old file?
I am asking because it seems that it would be easier to start from something like PipelineConfigService than GoConfigService.

@autowired, @service and @component

What are these for and how do I use them?
It seems they are used in consumer-service cases.

Current git repo for cruise-config.xml

There is some branching and merging functions in the code. Could you explain how is current cruise-config.xml git repository used?

Polling

Please explain how currently polling works. I Can see the poller classes, some messaging code I was running debugger but I cannot get it through my head.
What components, events, services, messages are involved?

@arvindsv you have said that

In my view, within the code today, CruiseConfig and its whole tree, is a representation of the configuration. It doesn't have any real link to the XML directly (once it is created). So, it could be a start towards that.

While it is true, it seems to me that CruiseConfig is way too big to build 'configuration provider' of it. Mostly because if you look at its contents it does not make sense to use more sophisticated methods than xml to store stuff like agents and passwords, etc.

I think I am not going to create abstraction because it would be too hard with just one implementation. Instead I am going to head towards implementing feeding configuration from git and then make it work with older xml config.

@arvindsv You mention this approach instead of config as material

<pipeline name=...>
  <materials>
    <git name="code_and_scripts" .../>
  </materials>

  <include file="my-file.xml" repo="code_and_scripts" type="go-xml" />
</pipeline>

But do you see how it could work when many pipelines would be defined from same git repository?
I do see that making config as material would be very invasive:

  • all pollers are in server module while PipelineConfig is in config-server.
  • there is a lot of code that expects PipelineConfig(s) to be present first before running pipelines (and polling materials?). So that makes a chicken-egg problem when config has to be polled. What makes me think that other component running earlier must be responsible for polling all configuration sources.

There is scms section in CruiseConfig which I never heard of before.
If there was a single service responsible for polling all of these scms and presenting their changes further then GoConfigSource could use that and GoConfigFileDao to assemble final CruiseConfig.

Let's call this problem SCM-config consistency :

Right. If commit C1 changes code and commit C2 of the same repo changes pipeline config (which is just a file in the same repo), then there should be a build with both code and pipeline config at C1, and another build with both code and pipeline config at C2. But, there should never be a build where code is at C1 and pipeline config is at C2.

This problem will occur as long as not addressed. I mean no matter the changes we would do now the current model is that there is only one valid, latest PipelineConfig. Which in reality is not true, there is whole history of pipeline config. It should be possible to trigger a build for any commit C2 and use both config and code from that commit consistently. Which again makes me think that there should be a service responsible for:

  • polling (at least) all source materials where some of which may have pipeline definitions therefore PipelineConfig objects. But not just the last one. Something like ConfigRepository but with smaller scope

@arvindsv
Copy link
Member

arvindsv commented Jun 3, 2015

Note: Some part of conversation about this, here.

@tomzo
Copy link
Member Author

tomzo commented Jun 4, 2015

I'd like to add to list of those questions:

How is uniqueness of materials achieved now?

I am asking because I might have some clue on making a config material. But I must hook into previous identity system.

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

I'll answer the earlier questions today (in a few hours, sorry). About the uniqueness of materials: It's a hash of all the relevant properties of a material. The definition of "relevant" is dependent on the kind of material.

Take a look at this, this and this. In this case (SVN), the fields used for uniqueness are type ("svn"), url, username and checkExternals flag.

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

PipelineConfig class

It seems to me that this what actually defines how most recent pipeline should be build. Correct?

Correct.

Is the PipelineConfig class is fully defined by what tag contains?
It is just a part of bigger CruiseConfig.

Yes. It's part of the config, and it fully defines what the <pipeline> tag has.

It is what eventually I want in to be created from source control.

Looks like it to me.

PipelineGroups class

what is this class responsible for? and what are these map objects for?

It is responsible for the <pipelines> tag. It is what holds a group of pipelines. A group of pipelines can have authorization related to them. It can be used to separate pipelines in the system into groups or teams, giving them the ability to administer only their own pipelines.

Those two maps can be ignored. They're a local cache for these two methods. They're creating a map from a material to the pipelines they're in, etc. That's an extremely expensive operation, and those maps help to not run them again and again.

PipelineConfigService class

This class seems not used anywhere. Is there a reason or this is just 'tech dept', old file?
I am asking because it seems that it would be easier to start from something like PipelineConfigService than GoConfigService.

It's used here. This is from the admin UI. If you try to delete a pipeline which is a dependency material for another pipeline, this code should get executed.

@​autowired, @​service and @​component

What are these for and how do I use them?
It seems they are used in consumer-service cases.

They're from Spring. They're used for dependency injection. Maybe these help:
http://simplespringtutorial.com/annotations.html
http://stackoverflow.com/questions/6594908/spring-autowire-fundamentals

For most service packages, etc. autowiring has been setup (meaning, Spring has been told to scan those packages for annotations such as @​service, etc. and to automatically instantiate them and inject their dependencies.

Current git repo for cruise-config.xml

There is some branching and merging functions in the code. Could you explain how is current cruise-config.xml git repository used?

The current git repository is just a store of historical changes to the config. While the real (current) config is usually in a location like /etc/go/cruise-config.xml, the one in /var/lib/go-server/db/config.git is just a copy, and is the latest valid config known. git log on that will show you all the changes made to the config, over time. The log message of every commit has a specific format and the format is used by Go. Making a commit there directly or indirectly is not recommended. The repo is also used to try and merge concurrent changes. If there are 10 pipelines, and I make a change to pipeline 1 and you make a change to pipeline 5, Go uses git to try and merge those changes, so that a user does not need to redo their changes.

I'd recommend not trying to use this repository at all.

I'll reply to the rest of the post in a separate reply.

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

Polling

Please explain how currently polling works. I Can see the poller classes, some messaging code I was running debugger but I cannot get it through my head.
What components, events, services, messages are involved?

What polling are you thinking about? Material polling (git, etc)? This might help, as a start. If you mean polling for config changes, Go just checks the config at /etc/go/cruise-config.xml every few seconds to see if it has changed from the latest known config.

I think I am not going to create abstraction because it would be too hard with just one implementation. Instead I am going to head towards implementing feeding configuration from git and then make it work with older xml config.

Ok. You need to remember that not all material information is inside <pipeline>. Package repository plugins and SCM plugins store their material information at the top level, outside of <pipeline> and they have a reference inside the <pipeline> tag.

Later, related to this, you said:

There is scms section in CruiseConfig which I never heard of before.

It is very new. It's for SCM plugins. It's similar to <repositories> tag. That's what I mentioned just above.

But do you see how it could work when many pipelines would be defined from same git repository?
I do see that making config as material would be very invasive:

all pollers are in server module while PipelineConfig is in config-server.
there is a lot of code that expects PipelineConfig(s) to be present first before running pipelines (and polling materials?). So that makes a chicken-egg problem when config has to be polled. What makes me think that other component running earlier must be responsible for polling all configuration sources.

Not exactly sure I get what you're saying about the pollers. Since server module can access config-server, we could set it up so that the pollers in the server notify some service in the config-server when some "config repositories" have some changes that needs a reload of the config.

You're also right that the current code expects the configs to be present. We talked about this earlier, over chat, I guess. We might have to have a different poller, for config materials only. I can see it as something like this:

  1. Existing pollers continue to do what they do. They don't poll config materials.
  2. We write different pollers for config materials (using the code for git, svn, etc. already present).
  3. These new pollers are used to get a globally valid config, that can be used. Or, we can bring in the concepts of scopes, etc. Once they have a config, the original pollers can use the config from this module (config module) to get the new set of code-materials they need to poll.

If there was a single service responsible for polling all of these scms and presenting their changes further then GoConfigSource could use that and GoConfigFileDao to assemble final CruiseConfig.

Yes, that's another option, pulling out the polling as a module below both the server and config-server, so that they can both access them. The polling module will need to be quite generic. Today, the pollers take the commits they find, and put them into the DB. For config-material, the pollers don't need to do that. They need to provide the commits to the config-server module, which can then take action (refresh its notion of config).

Final post on this coming up.

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

Let's call this problem SCM-config consistency:
... [snip] ...
This problem will occur as long as not addressed. I mean no matter the changes we would do now the current model is that there is only one valid, latest PipelineConfig. Which in reality is not true, there is whole history of pipeline config. It should be possible to trigger a build for any commit C2 and use both config and code from that commit consistently.

Currently there's only one valid latest config. It's useful in one way. If you have a build, and it fails, you can change the config and re-run the stage or some jobs and it will use the new config. Not the old failed config. When a pipeline run is tied to its config through a repo, then you lose this ability. A rerun of a stage or job will re-use the config for that time.

I have mixed feeling about this. Though I like the ability to re-run a job, I think re-using the old config is the correct thing to do. However, the more you put into the config from the repository, the more there is a chance to go wrong.

Going back to C1 and C2: As I said, C1 and C2 were commits in the same repository (say R1). So, if config is at C2, then code is also (should also be) at C2, since the repository is the same. Doesn't that address that problem. If we're flexible, we should be able to have it such that code commit C1 comes from repo R1 and config commit C2 comes from repo R2, and it gives the user to mix and match (and possibly run inconsistent config with inconsistent code. Right? There's flexibility in that approach, but it allows inconsistencies. Either way tying code to config in the same repository should solve the problem we were talking about (scm-consistency).

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

Maybe our thinking needs to be broader? Starting with something like:

Aspiration: I want to be able to have all my pipeline configuration information in an (one only, for now) external repository. The config that Go knows about should be like this:

<cruise> (or <go>)
  <server>
    ...
  </server>

  <pipeline-repo plugin="git.config.repo" url="git://something">

  <agents>
     ...
  </agents>
</cruise>

We could then give that config information (url, etc) to the plugin and wash our hands off it. It is the plugin's responsibility now, whenever asked by Go, to give back a list of pipelines. We'd need to figure out environments and other concepts, if we're returning something equivalent to a list of <pipeline> (or PipelineConfig) objects. But, it's a way of thinking.

The plugin can then poll that repository and have some kind of a convention. Save every file with the extension .pipeline is a candidate to be considered a pipeline. It then polls all of them, resolves dependencies between them and gives it back to Go.

To take it further, we can even allow both <pipeline> and <pipeline-repo> to exist in the configuration. It could make it a little harder to make sure that the whole config is valid, but it allows flexibility to move from the current config, without forcing anyone to. Something like this:

<cruise> <!-- or <go> -->
  <server>
    ...
  </server>

  <scms>...</scms> <!-- Used by the <pipeline> tag below. -->
  <repositories>...</repositories> <!-- Used by the <pipeline> tag below. -->
  <templates>...</templates> <!-- Used by the <pipeline> tag below. -->

  <pipeline-repo plugin="git.config.repo" url="git://something_team1">

  <pipeline name="...">
    ...
  </pipeline>

  <pipeline-repo plugin="git.config.repo" url="git://something_team2">

  <agents>
     ...
  </agents>
</cruise>

I'd recommend not having a frankenstein config like this, but it keeps the old config valid, while allowing the big config pieces to be moved out and managed elsewhere. This is just an opinion/idea. I'd like others like @jyotisingh, @mdaliejaz, @zabil, @ketan, etc. to weigh in.

@tomzo
Copy link
Member Author

tomzo commented Jun 4, 2015

Thank you very much for all these answers.

What polling are you thinking about? Material polling (git, etc)? This might help

Thanks I skipped that by mistake. It is enough.

They're from Spring

I'll learn spring then.

I'd recommend not trying to use this repository at all.

I thought so, but had to be sure because it could be a hint.

Today, the pollers take the commits they find, and put them into the DB. For config-material, the pollers don't need to do that. They need to provide the commits to the config-server module, which can then take action (refresh its notion of config).

I do not agree with that. That would imply that there is only one valid, most-fresh config.

Currently there's only one valid latest config. It's useful in one way. If you have a build, and it fails, you can change the config and re-run the stage or some jobs and it will use the new config. Not the old failed config. When a pipeline run is tied to its config through a repo, then you lose this ability. A rerun of a stage or job will re-use the config for that time.

  1. I was assuming that if user is going for config in repo then he/she is willing to resign of some of the operations. Above would be an example of that resignation.
  2. At least in git it could be still done by using the commit amends.

I have a few points more. I will post soon.

@tomzo
Copy link
Member Author

tomzo commented Jun 4, 2015

I have now considered many of the approaches, which are referenced above and in #838
I was digging in the code to get idea of what can be actually changed with relatively small modifications and adding code rather than changing the old one.

There are few points which I am quite certain about:

  • CruiseConfig and all xml loading is designed to be static and globally valid. Let's keep it that way. We will just add a new section <scm-configs> with list of extra sources to poll and load. This is similar to what @arvindsv just mentioned above.
  • definitely configuration object should be modeled as 2 parts: static+dynamic. Static is delivered via xml, it is globally valid, it is the old xml config. Dynamic requires polling to get it, then objects of dynamic configuration are created and passed further.
  • if there is a SCM configuration material then it is not a material of single pipeline. It cannot be a member of PipelineConfig class. configuration material is a member of class with larger scope - something like PipelineConfigGroup. It would be first step towards those validation scopes we talked about.

@arvindsv
Copy link
Member

arvindsv commented Jun 4, 2015

Today, the pollers take the commits they find, and put them into the DB. For config-material, the pollers don't need to do that. They need to provide the commits to the config-server module, which can then take action (refresh its notion of config).

I do not agree with that. That would imply that there is only one valid, most-fresh config.

Not necessarily only. There is (and I think will be) the concept of a valid, most-fresh config (maybe not at the global level, but at least at the pipeline level). You need this to schedule a new pipeline, when a code commit happens. Of course, that could be a config commit itself, in which case the most-fresh config is the one for that commit. Will have to check validity.

However, you need older commits of the config, only for reruns of an old pipeline, right? So, I don't see why they need to be in the Go DB. Especially if all of this is happening in a plugin. I'd just get the config for that point in time using the repository itself, on demand.

That's what I think. Let me know what I'm not considering.

[Update: Of course, if we want to store it in the DB for some reason for a rerun, it's doable]

@tomzo
Copy link
Member Author

tomzo commented Jun 4, 2015

However, you need older commits of the config, only for reruns of an old pipeline, right? So, I don't see why they need to be in the Go DB. Especially if all of this is happening in a plugin. I'd just get the config for that point in time using the repository itself, on demand.

That will do it. I was going for ability to rerun.

There is (and I think will be) the concept of a valid, most-fresh config (maybe not at the global level, but at least at the pipeline level).

Yes. But I was referring to 'refresh its notion of config'

They need to provide the commits to the config-server module, which can then take action (refresh its notion of config)

I just wanted to note that we should not implement a situation when polled configuration part would update some configuration instance, especially CruiseConfig.

I would use the static config and dynamic config separation I mentioned above. I think server module should be aware of that separation. config-server would only provide urls to configuration repos.
This is what I came up with after digging in code.

There is also this problem:
If there is more than one pipeline in config repo then how do we imagine rerunning just one pipeline at some older revision?

But I think it is already answered above to some extent.

@tomzo
Copy link
Member Author

tomzo commented Jun 5, 2015

@arvindsv in the broader approach #1133 (comment) you mention <pipeline-repo> and that it could return a list of pipelines. And that then we have to figure out environments and other elements.
Why not <config-repo> that can return much more than just pipelines? It would be allowed to return pipelines and environments. Anything that would make sense storing in repo.

I am not in favor of adding a huge feature at once. I just think the work towards <config-repo> and <pipeline-repo> would be very much alike.

I would also love to hear opinions of others.

tomzo added a commit to tomzo/gocd that referenced this issue Jun 5, 2015
@arvindsv
Copy link
Member

arvindsv commented Jun 5, 2015

Why not <config-repo> that can return much more than just pipelines?

Sure. That's fine. As long as it is not too complicated. I'd leave it at only pipelines and environments for now. Not everything else. There needs to be something that merges information in the config, with information from the (multiple?) <config-repo> sections. That becomes more complicated as we add more things. For instance, if we allow environments there, what happens if there's an environment with the same name outside. Is it an error? Should they be merged? Etc.

@arvindsv
Copy link
Member

arvindsv commented Jun 5, 2015

I just wanted to note that we should not implement a situation when polled configuration part would update some configuration instance, especially CruiseConfig.

I would use the static config and dynamic config separation I mentioned above. I think server module should be aware of that separation. config-server would only provide urls to configuration repos.

This might become hard to do, given that the rest of the system (for instance the scheduler, material subsystem, the dashboard, the whole admin UI bit) expect to call something like GoConfigService.give_me_all_pipelines and expects to get the current known set of pipelines, so that they can be edited, shown on the dashboard, materials polled for them, etc.

[I'm away for a bit. Will be back and think about this some more]

@tomzo
Copy link
Member Author

tomzo commented Jun 5, 2015

expect to call something like GoConfigService.give_me_all_pipelines and expects to get the current known set of pipelines

I noticed that already. I am currently evaluating how much would it take to have GoConfigService that would understand concept of historical configuration. So that there wouldn't be methods like

public boolean isPipelineEmpty()

But rather something like

public boolean isPipelineEmpty(unambiguous definition of configuration at some point in time)

The good news is that both these methods could co-exist. So maybe this can be implemented in some low components and gradually introduced up.

But these are killers at the moment:

public CruiseConfig getCurrentConfig();

And CruiseConfig has

public List<PipelineConfig> allPipelines()

tomzo added a commit to tomzo/gocd that referenced this issue Jul 6, 2016
tomzo added a commit that referenced this issue Jul 6, 2016
tomzo added a commit to tomzo/gocd that referenced this issue Jul 6, 2016
tomzo added a commit that referenced this issue Jul 6, 2016
tomzo added a commit to tomzo/gocd that referenced this issue Jul 7, 2016
ketan added a commit that referenced this issue Jul 7, 2016
#1133 config repo contract tests stop printing to stdout
@tomzo
Copy link
Member Author

tomzo commented Jul 29, 2016

I am closing this because core feature is done and released in 16.7.0. User documentation is here and here is XML reference

There are 2 plugins available:

We should open smaller issues with enhancements and bugs as they come along.

@cintiadr if you are interested in discussing templates see tomzo/gocd-yaml-config-plugin#2

@tomzo tomzo closed this as completed Jul 29, 2016
@tomzo tomzo modified the milestones: Release 16.7, Release: Near term Jul 29, 2016
@ketan
Copy link
Member

ketan commented Jul 30, 2016

Woot!

On Sat, Jul 30, 2016, 3:52 AM Tomasz Sętkowski notifications@github.com
wrote:

I am closing this because core feature is done and released in 16.7.0
https://www.go.cd/releases/. User documentation is here
https://docs.go.cd/current/extension_points/configrepo_extension.html
and here is XML reference
https://docs.go.cd/current/configuration/configuration_reference.html#config-repos

There are 2 plugins available:

We should open smaller issues with enhancements and bugs as they come
along.

@cintiadr https://github.com/cintiadr if you are interested in
discussing templates see tomzo/gocd-yaml-config-plugin#2
tomzo/gocd-yaml-config-plugin#2


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1133 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAApZrpylxAiYaiEuubvkUfmo9eLcFpaks5qan0sgaJpZM4Ec0G_
.

@cintiadr
Copy link

Thank you!

tomzo added a commit to tomzo/gocd that referenced this issue Aug 2, 2016
arvindsv added a commit that referenced this issue Aug 2, 2016
varshavaradarajan added a commit that referenced this issue Aug 30, 2016
…tegration

#1133 config repos integration with post commit hooks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests