Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node lists support implemented #9

Closed
wants to merge 2 commits into from

Conversation

octo47
Copy link

@octo47 octo47 commented Nov 25, 2013

Idea of this patch is to create special file, which denotes list of nodes, and by design,
reclass assumes, that .yml file with the same name exists. thats gives us several
identical nodes, which names are listed in .hosts file

example: (also example available in examples directory)

we can create file group1.hosts and group1.yml, thats gives us effect of several identical .yml
files with the same content as group1.yml.

as for performance, seems that it works reasonably fast, event for 4k hosts.

idea of this patch is to create special file, which denotes group of nodes, and by design,
declass assumes, that .yml file with the same name exists. thats gives us several
identical nodes, which names are listed in .grp file
@octo47
Copy link
Author

octo47 commented Nov 26, 2013

but may be groups not correct name for that, looks like 'aliases' are more natural name.
right now current implemenation doesn't allow to include node in two or more groups.

@octo47
Copy link
Author

octo47 commented Nov 26, 2013

additionally, this patch fixes bug with inner directories. If we create node definition in
inner directory we got error, like this:

├── classes
│   ├── basenode.yml
│   ├── mysite.yml
│   └── unixnode.yml
├── nodes
│   ├── localhost.yml
│   └── m1
│       └── ttt.yml

$ ./reclass.py -b examples -n ttt
No such file: /Users/octo/Projects/github/reclass/examples/nodes/ttt.yml
$ ./reclass.py -b examples -i
No such file: /Users/octo/Projects/github/reclass/examples/nodes/ttt.yml

@madduck
Copy link
Owner

madduck commented Nov 26, 2013

Could you help me understand the problem you are trying to solve?

@octo47
Copy link
Author

octo47 commented Nov 26, 2013

I need to make a bunch of identical nodes, and I don't want to create over 1000 identical files.
As a side effect i've fix problem mentioned in comment, I can't place nodes in subdirectories, code brokes on such definitions.

@octo47
Copy link
Author

octo47 commented Nov 26, 2013

Another usecase: I have external source of hosts (say some registry) and I wan't to dump some list of hosts (for example my_cluster_hadoop_workers) into file and automatically got them assigned node definition.

@madduck
Copy link
Owner

madduck commented Nov 26, 2013

Would a mapping of nodenames to classes help? Like this: http://reclass.pantsfullofunix.net/todo.html#wildcards-regexpclass-mapping

@octo47
Copy link
Author

octo47 commented Nov 26, 2013

actually not:

  1. explicit hostlists are more safe, then matching, because matching can leak or include not needed host by misconfiguring.
  2. in case of external registry it is not possible (easily) to construct mask

Can you tell me, why proposed solution is not fit reclass? It don't brake any compatibility, but gives huge flexibility. Even host generators can be implemented in that case (we can easily implement support for generators like some.host[1-30].domain in hosts files)

@madduck
Copy link
Owner

madduck commented Nov 26, 2013

I am not saying it's not acceptable, I just wanted to understand your use-case before I look at the code!

Just to confirm that I understand correctly what you want: if matching were implemented, this would also have the same effect, right:

/^node1\.example\.org$/ → group1
/^node2\.example\.org$/ → group1
/^node3\.example\.org$/ → group1
…
/^node99\.example\.org$/ → group1

That is, matching each node name explicitly and assigning a group.

Could you file an issue about the error you found? I would like to fix this separately.

@octo47
Copy link
Author

octo47 commented Nov 26, 2013

Solution with matchers not solves completely:

  1. Inventory will not work (-i argument), this solution can't enumerate all hosts
  2. Second, my solution allows to separate hostlists and node definition, which allows to use external tools to generate hostlists more easy.

Btw, where this matches should be defined? In class or nodes .yml files?

@madduck
Copy link
Owner

madduck commented Nov 28, 2013

How would you feel about adding a new top-level key to node YAML files, e.g.:

# www1.example.org.yml
classes:
  - virtual
  - debiannode
parameters:
  …
aliases:
  - www2.example.org
  - www3.example.org
  - www{003..100}.example.org

Furthermore, the parameters ${reclass:nodename} and ${reclass:nodename_short} could be defined to usable in parameters (www2.example.org and www2 respectively).

@octo47
Copy link
Author

octo47 commented Nov 28, 2013

I like it. Really clever idea. We can even add notation for file inclusion (like aliases: @somefile.hosts ).
Do you make a patch?

@madduck
Copy link
Owner

madduck commented Nov 28, 2013

Yes, I will work on it. I am still not 100% sold on it though. Let me merge/implement subdirectories first.

@octo47
Copy link
Author

octo47 commented Nov 28, 2013

Ok, thank you.

@madduck
Copy link
Owner

madduck commented Nov 30, 2013

I've given this issue some thought today while out on a walk.

What we are trying to achieve is to treat several nodes equally, i.e.
www0.example.org through www999.example.org should all be webservers,
without having to maintain 1000 equal files.

The class_mappings branch already provides a way to assign the webservers
class to all hosts that match e.g. the glob www*.example.org.

However, this is not enough, as run across the entire inventory need to be
able to enumerate all hosts.

So somewhere, we need to keep a list of nodes.

There are four proposed ideas:

  1. Keeping a file webservers.group with one line for each node, which
    would add the webservers class to each node listed therein;
  2. Like (1.), but without the implicit class. Instead, there would be
    a webservers.yml file, which is like a node definition, which gets used
    for all nodes listed in webservers.group;
  3. Adding to file www0.example.org a key such as alternative_names,
    which lists the nodes to which this node definition also applies;
  4. Using class mappings in combination with empty files, possibly in
    subdirectories.

If we can agree that a directory of empty files is the same as a file listing
one-name-per-line, then I think all of these four ideas achieve the goal, with
minor differences between them.

(3.) is identical in functionality to (2.), except it combines two files into
one. Here, one node gets "promoted" as representative of all the other nodes.
I fear this could get a little intransparent, so I don't like (3.) at all.

(1.) and (4.) are identical, if the node files are created with a common
prefix/suffix (to allow for glob/regexp matching). Once class mappings can
handle subdirectories, the file webservers.group (one node per line) and the
directory webservers/ would have exactly the same function, except updates
to (4.) would be truly atomic.

(2.) and (1.)/(4.) are really not any different, because whether
webservers.yml is a node definition (in case of 2.), or an extra class (for
1./4.), doesn't really matter. It feels more like a class, which is an
argument against (2.).

So there's really only (1.) and (4.). Let's compare those a bit more.

Updates to (4.) are atomic, we had that. (4.) is mostly implemented. What's
missing is that class mappings right now only apply to the node name, not any
subdirectory under which the node might be defined. (4.) would allow for
a pretty straight-forward view of the inventory with just /bin/ls and
friends.

(1.), OTOH, would allow for lines such as www{0..999}.example.org to
enumerate all 1,000 hosts.

However, this is common shell parlance, and touch webservers/www{0..999}.example.org may well be more expressive, which would
put (4.) back into the race.

I am therefore tending towards (4.), and I would like to invite you to take
a look at the class_mappings branch. While subdirectory-matching is not
implemented, use a glob such as www*.example.org to get the same effect.

What do you think?

@octo47
Copy link
Author

octo47 commented Dec 2, 2013

Schemes (1), (3) and (4) has one significant drawback: it is not possible to guarantee order of class applications, and scheme (4) doesn't guratatee application at all.
For example, (for simplicity let is be scheme 1).
Suppose we have two classes, one of which has applications: abc, and other has applications: -abc. Resulting class depends on class inheritance and order of class application.

As for your assumption, that we have www0.example.org through www999.example.org without holes is not really true (especially in case, of disabled nodes for maintenance or some experimental classes application).
I think that (3) or (2) are leaders (and (4) can coexist, but it must be clear to the user, that reclass doesn't guarantee class applications for nodes).
(3) looks better, because gives us not only host lists atomicity, but the whole class/properties assignment to whole group. And (3) also make possible to implement type (2) via some 'file reference', like special '@' prefix or custom yaml tag, example:

alternative_names: 
  - @file.group
  - @file2.group

And we don't need to restrict user to use some ephemeral node name (actually a group name), and even allow reclass to return information about group as a whole (by requesting group by its name).

As for (4), it depend on node names, in my case it is nearly impossible to control node names.
In my case I have a pool of machines (say host[0-100].some.domain) and those hosts can be tossed between different roles or even clusters. So, solution (any solution) which gives ability to create file with explicitly listed hosts is much better and safe for me, than any matching schemes (btw, that is why I use reclass, instead of salts default scheme, it is very tedious to maintain matches for hosts in my environment)

@madduck
Copy link
Owner

madduck commented Dec 2, 2013

I don't understand your point about order of class application. It would be well-defined: if a node foo.example.org is defined in webservers.group or if a class mapping matches, the class will be applied first. Class mappings are also ordered, and their application is guaranteed. I think maybe you are working off wrong assumptions. Have you tried class mappings?

If you have holes in a sequence, i.e. because nodes are disabled, then scheme (4.) makes this easy: instead of empty file www42.example.org, you write into it

classes:
  - disabled

and it will receive the disabled class after the webservers class.

I also don't understand your "host file atomicity" comment. Writing/changing files on Unix is not atomic. Adding and removing files to/from a directory is atomic. Can you please try to help me understand what you are saying?

"whole class/properties assignment to whole group." is also possible with scheme (4.), you just have to go via a class, which makes a lot of sense.

If you have "a pool of machines (say host[0-100].some.domain) and those hosts can be tossed between different roles or even clusters", then scheme (4.) would work just fine: if a host changes role, you just mv it to a different directory, which is truly atomic.

I am sorry, I am really not understand your problems with (4.). Could you maybe make some examples, after you look at class mappings?

@octo47
Copy link
Author

octo47 commented Dec 2, 2013

"I also don't understand your "host file atomicity" comment. Writing/changing files on Unix is not atomic. Adding and removing files to/from a directory is atomic. Can you please try to help me understand what you are saying?"
Files can be moved around as one consistent unit. It is much easier to exchange two files, than move files around.

"I don't understand your point about order of class application.", Ok, I've looked at your branch, looks like I need to manage config for reclass and carefully edit mapping. And this solution still use matching, so in degenerate case, I need to put all of my hosts in one config file. It gives complicated solution, instead of clean and easy management of simple files or list of aliases in node definition.

""whole class/properties assignment to whole group." is also possible with scheme (4.), you just have to go via a class, which makes a lot of sense."
scheme 4 doesn't allow to easy assign to group, you need to prepare some matches, which in general can't be automated. and (4) doesn't allow to be used from parallel processes. and in general it is bad idea to put data in config file (node definition is a 'data', not a 'configuration' for reclass).

"If you have "a pool of machines (say host[0-100].some.domain) and those hosts can be tossed between different roles or even clusters", then scheme (4.) would work just fine: if a host changes role, you just mv it to a different directory, which is truly atomic."
in case of files I don't need to do anything, just rewrite all files (but ok, that will be not an atomical operation, really here is hard to achive atomicity, on by copying 'inventory' to new directory as a whole, and files with hosts will play well here, becase we need to copy much less files)

Examples:

  1. I have an external source of groups (let it be some ldap database)
  2. I have cron job, which reads those groups and synchronize classes and groups definitions with reclass inventory
  3. Say, I have clasess and parameters stored in external source (actually classes and parameters are simply dumped to node definition, and list of hosts in the group a dumped to the .hosts file)
  4. I need have one source of classes in git for several salt installation
  5. And yes, I need to use reclass from salt.

With my patch it works like a charm: I dump group memebers to group.hosts file, dump role definition to group.yml and even if I need atomicity, I can simply copy whole inventory and switch it with two move (and can be supported by reclass if needed by say special .lock file). Classes are merged from different salt states git repos (where actuall states defined and default classes defined too). Nodes can override some settings

As result I have installation, which has no parts, which need to be managed by hand, thats gives me ability to construct multimaster scheme, where all hosts only reads confiugation from elsewhere, and not store anything, what need to be repliacated to other masters.

@madduck
Copy link
Owner

madduck commented Dec 2, 2013

It sounds to me more like you need a different storage backend, yaml_fs is really not made for automated use. Wouldn't an LDAP backend work for you?

Other question: if nodes in a subdirectory would always get assigned the class named like the subdirectory (without mappings), this would work for you, right? I.e. a list of files in group1/* is the same as a file group1.group, right? If we do not agree on that, then I think we have a different issue to solve first.

@madduck
Copy link
Owner

madduck commented Dec 2, 2013

Btw, I agree with you that the class mappings should not really be put into the configuration file. That's not final, but it was convenient to start with.

@octo47
Copy link
Author

octo47 commented Dec 2, 2013

"It sounds to me more like you need a different storage backend, yaml_fs is really not made for automated use. Wouldn't an LDAP backend work for you?"
LDAP introduces runtime dependency (and it is actually our custom source)
Why yaml_fs is not suited? It already suited well with my patch.

  1. .hosts files suites well for such usecase (external group of hosts)
  2. Subdirectories works
  3. Inventory works, moreover, it is possible to implement name generators for .hosts files (i.e. line w[0-99] would be expended automatically)

Don't understand, why .hosts solution don't fit. It can be implemented like me in this pull request, or like you suggested via attribute in node definition, but it much simplier then any other solution:

  1. class mapping introduce undefined behavior (unmatched hosts, one config, etc)
  2. directory names with meaning should be documented explicitly (i.e. I cant create any number of groups with the same node type)

Really, really don't understand, why you decline so simple solution like hostlist and subdirectory scanning, like done by this patch.

@octo47
Copy link
Author

octo47 commented Dec 2, 2013

"Btw, I agree with you that the class mappings should not really be put into the configuration file. That's not final, but it was convenient to start with."
When you implement support for different places, you will stumble on the same question: how to order applied classes.

@madduck
Copy link
Owner

madduck commented Dec 2, 2013

The classes applied with mappings are in well-defined order. If you think otherwise, I suggest you provide a working case proving me wrong.

One reason why I am against .hosts files is that "yaml_fs" is a filesystem-based storage based on YAML files. The .hosts file you describe isn't a YAML file.

And the reason why I am not so keen on node_aliases is because it's intransparent to the person administering yaml_fs as a filesystem.

I understand your concerns about class mappings, but I think they are they way to go. We just have to come up with a proper way to define them.

Btw, node_aliases are really just like class mappings, except they map node definitions (which are identical to classes, really), and they make enumeration a bit more explicit (but at the expense of not being able to use regexps or backreferences.

@octo47
Copy link
Author

octo47 commented Dec 2, 2013

If you have several sources of class mappings, you should define how they applied to nodes. Now you have one source - config. But if sources will be two?

Why you need so purification and so many details on upper level? In case of another storage implementation it can be many optimization on storage level, how to store groups. For example if we have, say, SQL storage, we can use 1 sql query to get all hosts for node, but if you don't allow implementation details, reclass will ask all nodes one by one.
And example with remote filesystem already given by me, it is not meaningfull to read dozens of files just to be pure yml filesystem.
On the other side node_alisases gives clear clue where to find host - in nodes/. In case of class mapping you will end up with lookup through whole classes, config and nodes

And what you mean 'administering like yaml filesystem'? what that mean? abstraction - one node one file breaks with class mappings and make administering much much harder, then hosts file or node mappings.

But it is ok, you have you vision of how you system should look like, but making wide interface to filesystem is really bad idea, because it limits implementation (right like in this pull request, you current patches not compatible at all, because they pull details and caching to upper level, and storage has no chances to hint upper level about nodes: node1 and node2 actually the same nodes and you don't need to ask storage about them once more)

@madduck
Copy link
Owner

madduck commented Dec 2, 2013

Either there will be only one source of class mappings, or the sources will themselves be ordered. Reclass will always have well-defined, reproducible output.

An SQL storage backend can run a single query if that's how it's implemented, just like the new yaml_fs enumerates the inventory only once and stores the listing in a cache.

At the moment, there is only a get_node function, but that could trivially be changed to allow things like single-SQL-queries.

yaml_fs is going to stay like it is.

But if another storage backend comes around and the interface between the "upper level" and the storage backend isn't suitable, then it will evolve.

@madduck madduck closed this Dec 2, 2013
@madduck
Copy link
Owner

madduck commented Dec 2, 2013

I would accept a patch that

  1. enabled support in reclass for multiple storage backends;
  2. added a storage backend that just returned list of nodes from .hosts files.

The idea is to parse .hosts files and then to merge in the context of later storage backends ("yaml_fs").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants