Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Feature: pipeline configuration from source control #1133
This feature made it! The documentation in GoCD is here.
I have collected most notable information from the comments below so that no one has to read all that again to get an idea what has changed in the system and what is expected from it.
Overview of the feature
Example configuration repositories
I have prepared example config repositories. In order of complexity:
In main config https://github.com/tomzo/gocd-main-config there is config-repos branch with
Domain and concepts
Configuration repository is a source control repository of any kind that holds part of Gocd configuration.
Tells where some part of configuration comes from. It was necessary to add, because now some services need this extra info to operate. There are 3 types of configuration origins:
Base and Merged configuration
There are 2 scopes of configuration:
These are important at system level because we consider validity twice, first at base scope, then at merged.
Behavior and assumptions
When pipeline is defined in configuration repository, there are always 2 cases which actually define how Go server should behave.
When configuration repository that defines the pipeline is the same as one of materials
In automated builds we expect that when pipeline is triggered with material at revision C1, then configuration of the pipeline will be from the same commit - C1.
In manually triggered builds Go always fetches materials first, which may change the configuration of pipeline that we just triggered.
In timer triggered builds Go also fetches materials first, which may change the configuration of pipeline that is being triggered.
When configuration repository that defines the pipeline is not one of materials
This case is much less complex. Go is always polling for changes in configuration repositories and tries to merge them to current configuration.
What happens when one material polling gets hung:
When plugin fails or configuration has invalid format or migration fails in configuration repo checkout then material update completes but config partial is old.
Handling merges and conflicts
How to handle merging configuration parts and main configuration?
Pipelines in environment
Most liberal approach possible:
Agents in environment
Most liberal approach possible:
Environment variables in environment
There could be optional overrides but we can consider it future work.
Authorization can be only in main xml so it cannot conflict when merging.
Some notes about changes in how Go services work and what is happening when configuration repositories are present.
Here is a summary of new services layout:
The best analogy to get the whole point here is that MergeGoConfig has replaced the old CachedGoConfig. It used to be that CachedGoConfig had 2 instances of configuration in memory (for edit and current config). Now there is MergeGoConfig that has these two. But main difference is that MergeGoConfig may return merged configuration as current config or for edit. If there are no extra configuration parts then it returns the main configuration.
This is implemented mostly how we discussed here
New material update queue
ConfigMaterialUpdater - new component
Added new component - ConfigMaterialUpdater which listens on config-material-update-
Final service notes
Reuse pollers directories
The checkouts (in
But now there is new type of poller that creates full checkout on each update. These directories are now read and parsed by
Merged cruise config is returned for edits. When some service is editing the config it does not know if the config is merged or not. It does not have to know.
When method to add pipeline or environment is made then it reaches merged cruise config at some point. It is then aware that we meant to add in the main part and changes the main config instance (inside the merge cruise config instance).
Removing is like adding. We can localize where to remove from. If user tries to remove remote element then it fails. Usually it would fail in the cruise config code.
Modifications get complex because there are many ways in which they are introduced. This is where there is real benefit from returning merged config instance. Changes are made on the config instance in full merged context so that when anything invalid is attempted then it will throw. E.g. when trying to change name of pipeline group defined remotely.
Each config edit ends with attempt to save some config for edit instance (or deep clone of it, or clone of a clone, etc.). To deal with that - magical writer is aware of possibility that merged config might be passed to be serialized. If so then it takes out only locally defined configuration elements. Actual extraction of local elements is implemented in config-api and it is very easy because we keep and maintain the main configuration instance inside merged config anyway.
These are either merged or planned pull requests to make all above work:
Original post from May 2015
Being a big fan of keeping all project-related code in its source code repository I would really like to be able to declare pipeline configuration in the source code of each individual project instead of the global cruise-config.xml.
Currently when all go configuration is in global configuration file on server we basically end up with 2 sources of projects configuration - one being git repository, the other a file on go server. There are lots of scenarios when new changes in git repo will cause the build to break because they expected different pipeline configuration. Or rather pipeline configuration expected the older git repo contents.
In order to avoid such conflicts probably the
There's another conversation on #838 which talks about this.
Source code is a type of material to a pipeline
Handling pipeline groups and role based auth.
Making pipelines available while configuring (environments)[http://www.go.cd/documentation/user/current/configuration/managing_environments.html]
Related to code.
For learning how the config xml is loaded/parsed/validated
For operations on the pipeline
Please feel free to join our gitter for more help/discussions.
There are a few points in #838 that form a great specification. Just to clarify:
There is this long comment #838 (comment) which I think is great insight on this.
My idea of work towards new config implementation
The implementation could be auto-detected based on contents in go-server configuration directory. E.g. when there is cruise-config.xml then load the old one.
Pipeline configuration as material?
Disclaimer: this is just an idea
What do you think?
Finally I would like to ask for status of work towards any of the above. The performance issue seems to have ended on that is has to be split in parts, but I don't see to many. I can see the moving go agents config but IMHO this is just a minor solution to larger problem.
PS: @matt-richardson I think it would be easy to sync back from UI to source. It could work like editing code on gitlab or github where you just have to add commit message in the end.
If we do go down the route of having pipeline configuration as a material it would be great if the first thing the pipeline did was verify the integrity of the pipeline: named dependencies etc. Then you still fail fast and highlight any problems with the person who made the change.
@tomzo: I think it's a big discussion, part of which has happened in #838, as you mention. Most of what you say about the abstract configuration provider makes sense. In my view, within the code today, CruiseConfig and its whole tree, is a representation of the configuration. It doesn't have any real link to the XML directly (once it is created). So, it could be a start towards that. The config-api was started so that everything related to the config (its "interface", if you will) was moved there. The config-server module has all the XML reading and writing code.
Another approach to thinking about this could be to think about a clean-slate and ideal implementation and see what it will take to get there, rather than trying to extract a good abstraction out of what exists today. That might be harder, in which case we can continue with the approach you brought up.
The fact that you mention that you're determined to get this done is the reason I think this can actually happen, and the part I admire the most, actually. :) So, if you can keep 100% backward compatibility, write well tested code and make different backend providers nearly pluggable, then I don't see why it cannot get merged.
My worry is that this is a bigger task than you or I anticipate and if you get bogged down, you might get discouraged. :) So, if you take small steps towards it, and involve the rest of us in what you're doing, we might be able to help. Of course, you might be the kind who doesn't get discouraged, and that's great! But, still if you involve others, it'll be easier to see progress and maybe merge smaller bits of code than the whole thing at once.
You asked about the status of work: For the performance bit, my feeling is that the config save does not need to move out of XML for it to be fast (though it can/should move out for flexibility reasons as you mention). I think that keeping the state of the configuration in memory and not deserializing the whole thing all the time and validating everything all the time will help. @ketan and I have talked about this, and started work towards it, but got distracted. We should get back to it. Elaborating on that idea: For instance, adding a task does not require a full config revalidation. It cannot affect another pipeline or be invalid, unless it is a fetch artifact task. So, handling the validation at that level, and so on. This should be quick. Needs some work to put in the framework to allow that, as well as some changes to the UI controllers.
The idea about config material repo(s) might need a little more clarity for me to comment on it, but from what I understand, I think it can be handled as one of the providers. The provider just happens to be reading from a repository of files, and deciding that some of them are pipelines. As long as the config interface is maintained, it should be fine. Of course, there might need to be some work done to poll that repository for changes, especially if you say that the
If the pipeline tag is not in the configuration, then where would the configuration repository be specified?
I’m still really keen that configuration integrity is enforced, so if configuration was read from a repo, then could it be validated as the first thing a build does? That would keep things failing fast…
I agree. I think it should, and it probably would be implemented that way.
If you look at the original post, @tomzo said:
That's what I was talking about. If
I don't know. It's just a guess. That's why I mentioned needing more clarity.
But, at a high level, this is about the "pluggability" of the whole config, essentially. That can be done, if we have an abstract provider interface. If we have that, then making it read from a repository of pipeline-level declaration files, instead of one XML should be easy, I feel.
This is something that I was thinking about as well. There is a lot of configuration that can be completely validated at lower level than global.
Config as material
This is what I had in mind. It seems simple to do assuming going into abstract provider direction.
I think it is the only way to do it. The pipeline objects have to be built from that repo. I assume that we validate while building.
It would be something similar to what @arvindsv guessed. The way I see configuration in far future would be something like
I do not have details yet but a few problems made me think that pipeline configuration should be a material. Consider this:
Then go pulls that repository for 2 reasons - because it is a config and a git material of a pipeline. It should be aware that configuration repository and pipeline material is the same entity in this case. Otherwise it potentially could use different commits for config and the rest of source code, which is the inconsistency that I wanted to avoid in the first place.
There is also this observation that I mentioned earlier - go already behaves as if configuration was a material. But it does not model it like one.
I was hoping to hear that. I see that it is major work of actually unknown effort. I think I will make first move towards that abstract configuration provider and implement xml config provider. Then I think I should merge to prove that config provider is abstracted and still working.
I was waiting on @arvindsv to comment and started working on something else in the mean time. So I will get back to gocd in few days. When I start I hope to get some help on how old configuration works, (and annoy you a little bit during the day). I hope that is OK with you.
Exactly. I think this is most critical part to start any progress with configuration issues.
Right. If commit C1 changes code and commit C2 of the same repo changes pipeline config (which is just a file in the same repo), then there should be a build with both code and pipeline config at C1, and another build with both code and pipeline config at C2. But, there should never be a build where code is at C1 and pipeline config is at C2.
One of the reasons for mentioning
... then, a change to either repository will trigger a build. But, when modeled like this:
... then, a change to the repository, for either code or pipeline config, will always be consistent. This just reuses Go's usual material and does not explicitly bring in the concept of a config material. I felt that it would make it easier to implement and understand. It is also flexible enough to model both cases.
Yes. There is a concept of uniqueness of a material, in Go. Materials which are considered unique across different pipelines are not polled multiple times. That'll need to apply here as well.
You're right. It is modeled only implicitly as a material which is checked for changes. Any changes to it from the filesystem or from the UI does cause the system to change, and potentially trigger pipelines. But, I think changing a task definition (for instance) doesn't cause a pipeline to trigger. Changing a material could. In that sense, it is not a full fledged material.
Yesterday I started getting familiar with current implementation of Go. I've setup dev. environment and I can debug 'development server' which is a lot of help together with searching through classes in IDEA.
It seems to me that this what actually defines how most recent pipeline should be build. Correct?
what is this class responsible for? and what are these map objects for?
private Map<String, List<Pair<PipelineConfig, PipelineConfigs>>> packageToPipelineMap; private Map<String, List<Pair<PipelineConfig, PipelineConfigs>>> pluggableSCMMaterialToPipelineMap;
This class seems not used anywhere. Is there a reason or this is just 'tech dept', old file?
I'll answer the earlier questions today (in a few hours, sorry). About the uniqueness of materials: It's a hash of all the relevant properties of a material. The definition of "relevant" is dependent on the kind of material.
Yes. It's part of the config, and it fully defines what the
Looks like it to me.
It is responsible for the
Those two maps can be ignored. They're a local cache for these two methods. They're creating a map from a material to the pipelines they're in, etc. That's an extremely expensive operation, and those maps help to not run them again and again.
It's used here. This is from the admin UI. If you try to delete a pipeline which is a dependency material for another pipeline, this code should get executed.
They're from Spring. They're used for dependency injection. Maybe these help:
For most service packages, etc. autowiring has been setup (meaning, Spring has been told to scan those packages for annotations such as @service, etc. and to automatically instantiate them and inject their dependencies.
The current git repository is just a store of historical changes to the config. While the real (current) config is usually in a location like /etc/go/cruise-config.xml, the one in /var/lib/go-server/db/config.git is just a copy, and is the latest valid config known.
I'd recommend not trying to use this repository at all.
I'll reply to the rest of the post in a separate reply.
What polling are you thinking about? Material polling (git, etc)? This might help, as a start. If you mean polling for config changes, Go just checks the config at /etc/go/cruise-config.xml every few seconds to see if it has changed from the latest known config.
Ok. You need to remember that not all material information is inside
Later, related to this, you said:
It is very new. It's for SCM plugins. It's similar to
Not exactly sure I get what you're saying about the pollers. Since server module can access config-server, we could set it up so that the pollers in the server notify some service in the config-server when some "config repositories" have some changes that needs a reload of the config.
You're also right that the current code expects the configs to be present. We talked about this earlier, over chat, I guess. We might have to have a different poller, for config materials only. I can see it as something like this:
Yes, that's another option, pulling out the polling as a module below both the server and config-server, so that they can both access them. The polling module will need to be quite generic. Today, the pollers take the commits they find, and put them into the DB. For config-material, the pollers don't need to do that. They need to provide the commits to the config-server module, which can then take action (refresh its notion of config).
Final post on this coming up.
Currently there's only one valid latest config. It's useful in one way. If you have a build, and it fails, you can change the config and re-run the stage or some jobs and it will use the new config. Not the old failed config. When a pipeline run is tied to its config through a repo, then you lose this ability. A rerun of a stage or job will re-use the config for that time.
I have mixed feeling about this. Though I like the ability to re-run a job, I think re-using the old config is the correct thing to do. However, the more you put into the config from the repository, the more there is a chance to go wrong.
Going back to C1 and C2: As I said, C1 and C2 were commits in the same repository (say R1). So, if config is at C2, then code is also (should also be) at C2, since the repository is the same. Doesn't that address that problem. If we're flexible, we should be able to have it such that code commit C1 comes from repo R1 and config commit C2 comes from repo R2, and it gives the user to mix and match (and possibly run inconsistent config with inconsistent code. Right? There's flexibility in that approach, but it allows inconsistencies. Either way tying code to config in the same repository should solve the problem we were talking about (scm-consistency).
Maybe our thinking needs to be broader? Starting with something like:
Aspiration: I want to be able to have all my pipeline configuration information in an (one only, for now) external repository. The config that Go knows about should be like this:
We could then give that config information (url, etc) to the plugin and wash our hands off it. It is the plugin's responsibility now, whenever asked by Go, to give back a list of pipelines. We'd need to figure out environments and other concepts, if we're returning something equivalent to a list of
The plugin can then poll that repository and have some kind of a convention. Save every file with the extension
To take it further, we can even allow both
I'd recommend not having a frankenstein config like this, but it keeps the old config valid, while allowing the big config pieces to be moved out and managed elsewhere. This is just an opinion/idea. I'd like others like @jyotisingh, @mdaliejaz, @zabil, @ketan, etc. to weigh in.
Thank you very much for all these answers.
Thanks I skipped that by mistake. It is enough.
I'll learn spring then.
I thought so, but had to be sure because it could be a hint.
I do not agree with that. That would imply that there is only one valid, most-fresh config.
I have a few points more. I will post soon.
I have now considered many of the approaches, which are referenced above and in #838
There are few points which I am quite certain about:
Not necessarily only. There is (and I think will be) the concept of a valid, most-fresh config (maybe not at the global level, but at least at the pipeline level). You need this to schedule a new pipeline, when a code commit happens. Of course, that could be a config commit itself, in which case the most-fresh config is the one for that commit. Will have to check validity.
However, you need older commits of the config, only for reruns of an old pipeline, right? So, I don't see why they need to be in the Go DB. Especially if all of this is happening in a plugin. I'd just get the config for that point in time using the repository itself, on demand.
That's what I think. Let me know what I'm not considering.
[Update: Of course, if we want to store it in the DB for some reason for a rerun, it's doable]
That will do it. I was going for ability to rerun.
Yes. But I was referring to 'refresh its notion of config'
I just wanted to note that we should not implement a situation when polled configuration part would update some configuration instance, especially CruiseConfig.
I would use the static config and dynamic config separation I mentioned above. I think server module should be aware of that separation. config-server would only provide urls to configuration repos.
There is also this problem:
But I think it is already answered above to some extent.
@arvindsv in the broader approach #1133 (comment) you mention
I am not in favor of adding a huge feature at once. I just think the work towards
I would also love to hear opinions of others.
added a commit
Jun 5, 2015
Sure. That's fine. As long as it is not too complicated. I'd leave it at only pipelines and environments for now. Not everything else. There needs to be something that merges information in the config, with information from the (multiple?)
This might become hard to do, given that the rest of the system (for instance the scheduler, material subsystem, the dashboard, the whole admin UI bit) expect to call something like
[I'm away for a bit. Will be back and think about this some more]
I noticed that already. I am currently evaluating how much would it take to have GoConfigService that would understand concept of historical configuration. So that there wouldn't be methods like
public boolean isPipelineEmpty()
But rather something like
public boolean isPipelineEmpty(unambiguous definition of configuration at some point in time)
The good news is that both these methods could co-exist. So maybe this can be implemented in some low components and gradually introduced up.
But these are killers at the moment:
public CruiseConfig getCurrentConfig();
And CruiseConfig has
public List<PipelineConfig> allPipelines()
I wonder if the some point in time part of "unambiguous definition of configuration at some point in time" can be "now". :) It is unambiguous at that time. It can change over time, and that's ok.
As you say, getCurrentConfig is a killer. I think that's because it's fundamental to how config is, in the system (presently). It is assumed to exist. I hesitate to try and change it, because I feel it is too ingrained to change. With the idea about "now" above, I'm trying to see if there's a way to reconcile the two, bringing in the concept of a default of "now", unless specified otherwise. Just a thought.
If some global configuration consists only from 3 parts, all them being scms then "unambiguous definition of configuration at some point in time" could be something like
I guess "now" would be latest commit in each of those
Yes, that's right. The equivalent of HEAD. We can then put all of those together and then see what to do about possible invalidity of the config.
When I think about a config such as:
then, the "now" (and the "current/latest config"), according to me is a combination of the current config of "P1" and "P2" from this config and the configs of "P3" and "P4" as described by the HEAD commit of repo "abc".
This is what will be used to show the dashboard (assume no pipelines are running), and for polling of materials (code materials, not config materials).
Reading what you said about "-latest" in the previous comment, I think we're thinking the same thing.
If I've understood you correctly, the config for your first-svn-latest example looks something like this:
added a commit
Jul 3, 2016
referenced this issue
Jul 3, 2016
added a commit
Jul 3, 2016
added a commit
Jul 5, 2016
added a commit
Jul 7, 2016
added a commit
Jul 7, 2016
added a commit
Jul 13, 2016
added a commit
Jul 13, 2016
added a commit
Jul 16, 2016
There are 2 plugins available:
We should open smaller issues with enhancements and bugs as they come along.
On Sat, Jul 30, 2016, 3:52 AM Tomasz Sętkowski email@example.com