New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2-compatible 'roles' or similar #1594

Open
bitprophet opened this Issue Apr 22, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@bitprophet
Member

bitprophet commented Apr 22, 2017

Synopsis

At time of writing, the v2 branch has a Group class that should be capable of serving as the units formerly known as 'roles', aka "a bunch of hosts to do stuff with/on".

However, there's no specific way of organizing or labeling Group objects yet; it's "done" enough for the pure API use case of advanced users who want to roll their own specific way of creating them, but lacks anything for CLI-oriented users or intermediate folks who want something frameworky to build around.

Put another way, unless you're rolling purely with the API, having Group objects lying around somewhere is useless if the CLI or task-calling bits have no way of finding them!

Background

In v1, roles were effectively a single flat namespace mapping simple string labels to what would be Groups in v2, and they could be selected on the CLI at runtime (fab --roles=web,db) and/or registered as default targets for tasks (@task('db') \n def migrate():), much like hosts.

Users defined them in env.roledefs, a simple dict; any intermediate to advanced functionality revolved around modifying it, usually at runtime (via pre-task or subroutine), sometimes at module load time.

Specific use cases / needs / subfeatures

  • Basic, naive mapping for use/reference anywhere else in the system: put in a name, get back some iterable of Groups and/or Connections.
    • Aliasing often wants to go along with that, so e.g. a Lexicon instead of a dict.
    • Even deeper constructs, such as 'bundling', e.g. you have direct mappings named db, web, lb, but then a 2nd-tier name called prod that is always the union of the other three. I forget if I added that to Lexicon yet. Possible there's other map subclasses out there that already do it too.
    • Additionally/alternately, things like globbing or other string syntaxes, though I personally would prefer to leverage the fact that Python is not "stringly typed"...
  • Useful 'reverse mapping' such that you can identify which groups a given Connection belongs to.
    • Problematic: because there's currently no global shared state, the naive answer to this - using identity - falls down because you can technically create multiple identical Connection objects.
      • Especially since Group can create them implicitly on your behalf if you just give it shorthand host strings, though that is only a convenience option.
    • However, given that constraint of no global state, I can't see obvious problems with using equality testing instead, so that should be doable, e.g. if cxn in group would work even if cxn is a distinct object from the equal member inside group.
      • The only thing that comes to mind is if there were strong, stateful links from a Connection to a Group (would have to be groups, plural) holding it, instead of vice versa, but I can't see great reasons for that offhand.
  • Strongly related to the previous: ability to inspect/display what the "currently running role" is (something folks wanted for a long time in v1 which was nontrivial due to its design)
    • Main issue is that this is really two semi distinct questions: "what role(s) is the current host part of, generally speaking" (basically, that previous use case of the reverse lookup) but also "what role(s) was the execution machinery specifically asked to run against".
    • In other words, given host 'foo' belonging to roles A, B and C: within a given task whose context is 'foo', but which was run because of a request to 'execute on role A', is a user looking for an answer of "A, B and C" (the roles 'foo' is in overall) or just "A" (the currently executing role)?
    • This really feels like two distinct API calls, even though the feature requests I remember getting conflate the two.
  • Target selection on the CLI, globally and/or per-task
    • An extension of Invoke's CLI system to account for "flags that all tasks get on top of what they define" may be useful or required for this. Which falls firmly into pyinvoke/invoke#205 territory, in fact, so that just got higher priority than it already was (which was pretty high.)
  • Ditto task-level defaults
    • Though task-level target defaults really want to be any of: connection, connections, group obj, group objs, or name evaluating to group objs (that last is the only thing that directly pertains to this ticket, arguably)
  • Ditto collection-level defaults (NEW in v2!)
    • I.e. "all tasks in $submodule default to running against the db role"
    • Same deal as previous point - this default wants to allow a number of different values, not just a string key.
  • Anything else new and exciting enabled by an OO approach that really wants to go along with this? Remember emphasis should be on building blocks and enabling advanced users, not on e.g. totally reinventing systems like Chef or Ansible.

Implementation ideas/concerns

  • If we used the config system as the main storage vector, values "want" to be primitives so they can be stored in yaml, json etc, but that's a can of worms ending with "store all Group/Connection kwargs in a big ol' list-o-dicts", etc.
  • If we expect the definitions to primarily be in Python, we can simply say "instantiate Group objects", and then we have the option of merging that data into the config system or leaving it standalone somehow.
    • I think I prefer the latter because stuffing literally everything into the nested config dicts feels like it'll lead to bad news.
  • The deeper constructs like aliasing and bundling add complexity & ordering issues (i.e. imagine a trivial alias setup where key1's value is a group but key2's value is key1; now you have to crawl the structure twice to resolve or check key2)
    • though if we go for a mostly "do it in-python" approach, it becomes much like the config system's API, where you can start out with a declarative structure but anything more is enabled by method calls after that initial setup. I don't think that's awful? EDIT: and I think that's exactly how Lexicon works anyways.
  • Regardless of format, we have to figure out how advanced users will want to generate it on the fly from external sources or similar; this plus the issues with aliasing and such, implies we may not want this in a naive structure "stored" somewhere, but as an API on some object or objects that is called to generate it.
    • I suspect we may want to work 'downwards' from the selection of roles/groups, arriving at whatever the highest level API is for "turn what the user supplied into an actionable unit of targets", because the most advanced users will necessarily want to have complete control over the implementation of that API call. Then we can as always supply what feels like a useful common case but which is clearly marked as "just one way to do it".
    • @RedKrieg has a nifty idea along these lines where we have @group like @task, and the functions aren't executable units of work, but instead yield Group objects.
      • This approach natively reuses the task hierarchy (Collection), which is practical (why reinvent the wheel) and elegant (because in real world cases, role/group definitions frequently DO map very closely to the tasks using them!)
        • It also works well even if your groups DON'T map to your tasks, because you can simply write the definitions at your root collection level. Easy peasy.
      • It's unclear to me whether this is best returning a single Group from each function, or if we want the ability to yield multiple groups (or connections), or if it's best to do it not as decorated functions at all but as just API calls on Collection (like how collection-level configs are stored).
      • For example, the use case where group/role data is dynamic and outside of Fabric still needs solving here (which is why earlier I noted that we first must identify the highest-level API for this space; then we need to see how that meshes with this intermediate-level idea.)
@max-arnold

This comment has been minimized.

Show comment
Hide comment
@max-arnold

max-arnold Apr 24, 2017

From the mailing list:

We impemented our own internal REST API which populates env.roledefs dynamically depending on the project being deployed and heavily rely on not embedding host strings into project's fabfile or specifying them in CLI.

Our use cases are:

  1. Environment-free codebase https://12factor.net/config. Environments (roles) and their respective host strings are stored in a centralized database. Each fabfile.py has something like this (it populates env.roledefs when the file is imported):
EnvironmentDatabaseAPIClient(
    'https://rest.api.url/schema/',
    env.service_name,
).apply_env()
  1. Number of server environments - multiple testing evironments (some of them are private, some public) and multiple production environments (for different clients). Each environment consists of one or more hosts and is mapped to fabric role.

  2. Each service (env.service_name in the example above) has different set of environments.

  3. Also we have meta-roles (groups of roles). They are prefixed with group-: group-production, group-test, group-external, group-internal, group-all. This allows us to deploy to multiple server roles without specifying them one-by-one, for example group-all deploys to all roles, both production and test.

  4. We have special fabric tasks to print information about role groups, roles and hosts.

  5. We also rely heavily on reverse mapping host strings back to role names (hosts strings are unique per service_name). This is used for deployment logging and notifications. Basically, we log service deployments to each host and send Slack notification when service has been deployed to all hosts in a role. EnvironmentDatabaseAPI server is responsible for this (it keeps logs and deployment state). This is done by decorating fabric tasks with a decorator which submits env.host, env.port and env.service_name (plus commit info) back to API server.

  6. We plan to add deployment authentication in the future, also very likely to pull more env variables from the server to make them available within task context.

max-arnold commented Apr 24, 2017

From the mailing list:

We impemented our own internal REST API which populates env.roledefs dynamically depending on the project being deployed and heavily rely on not embedding host strings into project's fabfile or specifying them in CLI.

Our use cases are:

  1. Environment-free codebase https://12factor.net/config. Environments (roles) and their respective host strings are stored in a centralized database. Each fabfile.py has something like this (it populates env.roledefs when the file is imported):
EnvironmentDatabaseAPIClient(
    'https://rest.api.url/schema/',
    env.service_name,
).apply_env()
  1. Number of server environments - multiple testing evironments (some of them are private, some public) and multiple production environments (for different clients). Each environment consists of one or more hosts and is mapped to fabric role.

  2. Each service (env.service_name in the example above) has different set of environments.

  3. Also we have meta-roles (groups of roles). They are prefixed with group-: group-production, group-test, group-external, group-internal, group-all. This allows us to deploy to multiple server roles without specifying them one-by-one, for example group-all deploys to all roles, both production and test.

  4. We have special fabric tasks to print information about role groups, roles and hosts.

  5. We also rely heavily on reverse mapping host strings back to role names (hosts strings are unique per service_name). This is used for deployment logging and notifications. Basically, we log service deployments to each host and send Slack notification when service has been deployed to all hosts in a role. EnvironmentDatabaseAPI server is responsible for this (it keeps logs and deployment state). This is done by decorating fabric tasks with a decorator which submits env.host, env.port and env.service_name (plus commit info) back to API server.

  6. We plan to add deployment authentication in the future, also very likely to pull more env variables from the server to make them available within task context.

@bitprophet

This comment has been minimized.

Show comment
Hide comment
@bitprophet

bitprophet Apr 24, 2017

Member

Thanks @max-arnold! I recognize many of those from my own use cases in the past as well. The reverse mapping bit in particular I remember coming up in v1 a few times, so I added it to the list.

Member

bitprophet commented Apr 24, 2017

Thanks @max-arnold! I recognize many of those from my own use cases in the past as well. The reverse mapping bit in particular I remember coming up in v1 a few times, so I added it to the list.

@urzds

This comment has been minimized.

Show comment
Hide comment
@urzds

urzds Sep 18, 2018

For Fabric v2 to become useful to me, I would need a way to tell fab which set of hosts to execute a task on.

Previously I defined roles and then ran fab -R .... (Actually the roles were defined programmatically using an IP address range, but that is no requirement and a static list inside a YAML file would be fine.)

I fail to find an equivalent in Fabric v2, and I also failed to emulate this feature using:

  • a fabric.yaml configuration file containing
active_hostset: null
hostsets:
  myhostset:
  - ...
  • active_hostset = config["hostsets"][config["active_hostset"]] in fabfile.py
  • env INVOKE_ACTIVE_HOSTSET=myhostset fab ...

Instead of the expected list of hosts I get KeyError: 'active_hostset'.

urzds commented Sep 18, 2018

For Fabric v2 to become useful to me, I would need a way to tell fab which set of hosts to execute a task on.

Previously I defined roles and then ran fab -R .... (Actually the roles were defined programmatically using an IP address range, but that is no requirement and a static list inside a YAML file would be fine.)

I fail to find an equivalent in Fabric v2, and I also failed to emulate this feature using:

  • a fabric.yaml configuration file containing
active_hostset: null
hostsets:
  myhostset:
  - ...
  • active_hostset = config["hostsets"][config["active_hostset"]] in fabfile.py
  • env INVOKE_ACTIVE_HOSTSET=myhostset fab ...

Instead of the expected list of hosts I get KeyError: 'active_hostset'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment