self deployment / pull mode #320

Smana · 2016-06-30T05:34:11Z

Configure the local node.
There will probably be some caveats like the certs/tokens management

Smana · 2016-06-30T05:38:01Z

Smana · 2016-07-02T08:18:22Z

For the pull mode what do you think about:

1 - Create the etcd cluster in a way that it can be scaled, we need to discuss about what would be the proper way : static or discovery.
How can we scale up in the case of an autoscaling.

2 - Use the etcd cluster to store the inventory and use a dynamic inventory.
example (https://gist.github.com/justenwalker/09698cfd6c3a6a49075b)

3 - A big issue is the secrets management, where to store the certs/tokens?. how to sync between the nodes?, do we have to create a cert by node? ...

4 - Use ansible pull-mode

mattymo · 2016-07-02T09:11:58Z

For public cloud consumers, etcd discovery is probably optimal since it almost never results in a broken cluster. For anyone deploying in-house, they might be reluctant to use discovery. An initial cluster array is adequate already.

Dynamic inventory and deploying etcd via ansible creates a chicken and egg problem. You can't use an inventory from etcd until etcd is up. Also, you need to make a way to populate this etcd. I would vote against adding complexity just for the sake of finding an innovative way to consume etcd.

Secrets management is a topic I've dealt with in previous projects. We currently have 1 master host which knows all the information. If you want to move to client-pull mode, all clients need to know where host(s) are located that know the secrets. Secret file storage should be replicated and transmitted using an encrypted method (ansible's SSH/rsync transport is totally fine).

I think you should add a new role for secrets and the first alphabetical node actually generates the secrets, while the others take a fully copy. All other nodes only take the secrets as-needed. It's important to ensure that scale-up/scale-down scenarios are covered.

Smana · 2016-07-02T12:04:49Z

Thank you mattymo for your answer.

We can let the user choose the way he wants to deploy the etcd cluster.
The pull mode would just be an option.

I understand that etcd would become a strong dependency but when for instance a new node is added he needs to know about the cluster topology (where is the api, the etcd, ...).
If you think about another option, we can evaluate it too.

Regarding the secrets,

all clients need to know where host(s) are located that know the secrets.

This is the reason why we need an inventory
What you describe is exactly current kargo's behaviour and it works just fine.
We can probably keep it if we have an inventory somewhere (e.g. etcd).

v1k0d3n · 2016-07-02T12:32:37Z

i'm probably missing something, but why not consider DNS discovery with SRV records vs. etcd discovery?

Smana · 2016-07-02T12:34:20Z

This is one of the discovery option that offers etcd and i'm actually considering it @v1k0d3n

Smana · 2016-07-02T12:35:56Z

@RustyRobot , i need your input here too :)

v1k0d3n · 2016-07-02T12:50:42Z

it's always been the easiest for me when building and tearing down etcd clusters for kubernetes during testing (granted, i've been pulled away from doing this in recent months so some of the syntax may have changed with etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd01.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd02.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd03.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd04.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd01.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd02.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd03.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd04.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd05.domain.com.

; 10.1.1.0/24 - A Records: Kubernetes/Etcd Members
kubetcd01              IN      A       10.1.1.21
kubetcd02              IN      A       10.1.1.22
kubetcd03              IN      A       10.1.1.23
kubetcd04              IN      A       10.1.1.24
kubetcd05              IN      A       10.1.1.25

and then configure the etcd cluster for dns discovery (example for kubetcd01)...

# [member]
ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd" 
ETCD_SNAPSHOT_COUNTER="1000" 
ETCD_ELECTION_TIMEOUT="1000" 
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001" 
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380" 

#[cluster]
ETCD_DISCOVERY_SRV="domain.com" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd" 
ETCD_INITIAL_CLUSTER_STATE="new" 
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379"

ant31 · 2016-07-02T15:07:49Z

I would not use ansible pull if possible. Also about the all in one image
there are 2 images:
1 with de deployment scripts. 1 with all tools

I ll detail more later
Le 2 juil. 2016 14:50, "Brandon B. Jozsa" notifications@github.com a
écrit :

it's always been the easiest for me, and i bring up and tear down etcd
clusters for kubernetes all the time (granted, i've been pulled away from
doing this in recent months so some of the syntax may have changed with
etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd01.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd02.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd03.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd04.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd01.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd02.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd03.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd04.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd05.domain.com.

and then configure the etcd cluster for dns discovery (example for
kubetcd01)...

[member]

ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_SNAPSHOT_COUNTER="1000"
ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001"
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380"

#[cluster]
ETCD_DISCOVERY_SRV="jinkit.com"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380"
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379"

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#320 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA_vbBdsxWn2jp3XpNn5eMOMi3uDq03Mks5qRl6jgaJpZM4JB0LA
.

Smana · 2016-07-02T15:22:36Z

@ant31 yes, please do :)

Smana · 2016-07-02T15:29:10Z

@v1k0d3n how would you delete or add members ?

v1k0d3n · 2016-07-02T15:32:35Z

i would let the users control that on the DNS side, and use proxy on the etcd members. destroy, and/or rebuild...add via dns. i mean, it's RAFT...so 3 or 5 members is ideal. how many members do you really want over that? my biggest stumbling block right now with kargo is I have this great srv/dns framework in place that i can't use to bring up etcd. :(

yoojinl · 2016-07-04T13:54:19Z

@Smana @v1k0d3n What I know about current state of etcd, is there is no other way to manage it except of having a static list of etcd members and synchronizing it with a etcd cluster by explicitly calling etcd member add node and etcd member remove node, also documentation explicitly states that discovery should be used only for cluster bootstrapping, after cluster is created, discovery becomes kind off useless. Also public discovery is not always an option, when we are talking about data centers with firewall in front (which may block some (all) traffic due to security reasons), in this case deploying own HA discovery system is another issue.

There is a good video on life cycle management of etcd from CoreOS fest which took place in Berlin. Basically the presenter had to reinvent new tool on top of etcd, in order to do proper cluster management, it's not a trivial task, and I would suggest to go with static list as the most simple and straight forward solution, until something like that is supported by etcd natively.

ant31 · 2016-07-04T14:31:12Z

If we use ansible-pull it will add ansible dependency on every host instead of one (the deployment host). It's the opposite of the target (lower requirements / host modifications)
We should have
- 1 image with the deployment script: kargo-deploy
- 1 image with with every binaries/tools (all-in-one)

-> the all-in one is optional and that's an another subject

The only requirement on host would be docker:
The base idea is to run:

docker run -e options=... -v /:/rootfs/ --rm kargo-deploy -- init

The image kargo-deploy, contains ansible + kargo scripts, we mount the host volume into the container, with privileged access we can configure it.

Each node should be able to configure himself,
We can give the list of other nodes, or some options (like the etcd-cluster addresses)
We assume that a container_engine is running (docker / rkt / ...)

To configure the container_engine, I propose to keep and use current playbooks. Maybe later we can switch to shell-script instead of ansible to remove the 'python' dependency from hosts.

v1k0d3n · 2016-07-04T16:11:51Z

i think i'm losing track of what's being discussed in this thread, which is why I started #324 @Smana. giving users an option for how they want to bootstrap etcd distances ourselves away from which method is better and why. in my use case; i'm very specifically looking for a DNS SRV bootstrap discovery method for etcd, and i like the approach of "bring your own [xyz component]" to the project.

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc, or if the project becomes less democratic and more opinionated about the etcd bootstrap method i feel like the target audience will become more narrow over time.

ant31 · 2016-07-04T16:25:52Z

Discussion have deviated on etcd, maybe we should open a new issue to solve this question of etcd.

This issue is to how to switch kargo from push to pull.
The idea is to:

scale deployment to large cluster
allow auto-scale of nodes
remove host dependencies (python etc)

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc,

That's the opposite of what we trying to solve with this issue.
We want that the only requirement on hosts is aa 'container engine' (docker/rkt)

This is why using ansible-pull is out! I don't want to install ansible on every hosts.
We have to find something else than ansible-pull (shell-script?) to deploy the kargo image

Smana · 2016-07-05T10:47:06Z

@ant31 i agree with you for the docker image which deploys the node where it resides.
Actually that's why i've opened the issue #321
That said how would you configure the local node without using the pull mode (inside the container of course) ?
The main issue is not to run inside a container, this is easy to do but how to configure the local node.

Smana · 2016-07-05T10:51:39Z

The pull is not mandatory in the case of a docker image (the ansible playbooks is inside the container) but we need to get the inventory from somewhere and automatically (when the node starts)

v1k0d3n · 2016-07-05T13:52:22Z

Maybe I'm missing something, but why not stand up an etcd cluster with discovery and store secrets in etcd?

ant31 · 2016-07-05T14:46:33Z

@v1k0d3n yes, good idea to use the etcd cluster to store the secrets and configuration shared by nodes/masters

Smana · 2016-07-07T19:36:49Z

~~Please refer to https://github.com/kubespray/kargome~~

v1k0d3n · 2016-07-07T21:53:13Z

private repo?

Smana · 2016-07-08T06:59:26Z

@v1k0d3n Sorry, i've changed my mind and i closed the repo, i'll try to do a PR instead.

billyoung · 2017-03-25T00:32:18Z

hello! i'm curious if this is still in the works? i'd be interested in contributing :)

Atoms · 2018-06-07T12:57:07Z

@billyoung we thinking about something, but no real work is done as i know.

Atoms · 2018-09-12T12:36:08Z

/lifecycle stale

fejta-bot · 2019-04-10T23:51:53Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-05-11T00:10:01Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-05-11T00:10:08Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Smana mentioned this issue Jun 30, 2016

New terrafrom AWS implementation #316

Closed

Smana mentioned this issue Jul 2, 2016

etcd to use existing srv records for joining a cluster #324

Closed

Smana added the feature label Jul 2, 2016

Smana changed the title ~~Pull mode deployment~~ Configure the local node Jul 5, 2016

ant31 mentioned this issue Jul 5, 2016

Use etcd to store configurations/secrets #336

Closed

2 tasks

Smana changed the title ~~Configure the local node~~ self deployment / pull mode Jul 7, 2016

ant31 mentioned this issue Sep 6, 2016

Remove Python dependency on hosts #470

Closed

bogdando added this to the v2.2.0 milestone Jan 10, 2017

ant31 modified the milestones: v2.2, 3.0 Aug 15, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2018

woopstar removed this from the 3.0 milestone Sep 28, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 10, 2019

k8s-ci-robot closed this as completed May 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self deployment / pull mode #320

self deployment / pull mode #320

Smana commented Jun 30, 2016 •

edited

Smana commented Jun 30, 2016

Smana commented Jul 2, 2016

mattymo commented Jul 2, 2016

Smana commented Jul 2, 2016 •

edited

v1k0d3n commented Jul 2, 2016

Smana commented Jul 2, 2016

Smana commented Jul 2, 2016

v1k0d3n commented Jul 2, 2016 •

edited

ant31 commented Jul 2, 2016

[member]

Smana commented Jul 2, 2016

Smana commented Jul 2, 2016

v1k0d3n commented Jul 2, 2016

yoojinl commented Jul 4, 2016

ant31 commented Jul 4, 2016

v1k0d3n commented Jul 4, 2016 •

edited

ant31 commented Jul 4, 2016 •

edited

Smana commented Jul 5, 2016

Smana commented Jul 5, 2016 •

edited

v1k0d3n commented Jul 5, 2016

ant31 commented Jul 5, 2016

Smana commented Jul 7, 2016 •

edited

v1k0d3n commented Jul 7, 2016

Smana commented Jul 8, 2016

billyoung commented Mar 25, 2017

Atoms commented Jun 7, 2018

Atoms commented Sep 12, 2018

fejta-bot commented Apr 10, 2019

fejta-bot commented May 11, 2019

k8s-ci-robot commented May 11, 2019

self deployment / pull mode #320

self deployment / pull mode #320

Comments

Smana commented Jun 30, 2016 • edited

Smana commented Jun 30, 2016

Smana commented Jul 2, 2016

mattymo commented Jul 2, 2016

Smana commented Jul 2, 2016 • edited

v1k0d3n commented Jul 2, 2016

Smana commented Jul 2, 2016

Smana commented Jul 2, 2016

v1k0d3n commented Jul 2, 2016 • edited

ant31 commented Jul 2, 2016

[member]

Smana commented Jul 2, 2016

Smana commented Jul 2, 2016

v1k0d3n commented Jul 2, 2016

yoojinl commented Jul 4, 2016

ant31 commented Jul 4, 2016

v1k0d3n commented Jul 4, 2016 • edited

ant31 commented Jul 4, 2016 • edited

Smana commented Jul 5, 2016

Smana commented Jul 5, 2016 • edited

v1k0d3n commented Jul 5, 2016

ant31 commented Jul 5, 2016

Smana commented Jul 7, 2016 • edited

v1k0d3n commented Jul 7, 2016

Smana commented Jul 8, 2016

billyoung commented Mar 25, 2017

Atoms commented Jun 7, 2018

Atoms commented Sep 12, 2018

fejta-bot commented Apr 10, 2019

fejta-bot commented May 11, 2019

k8s-ci-robot commented May 11, 2019

Smana commented Jun 30, 2016 •

edited

Smana commented Jul 2, 2016 •

edited

v1k0d3n commented Jul 2, 2016 •

edited

v1k0d3n commented Jul 4, 2016 •

edited

ant31 commented Jul 4, 2016 •

edited

Smana commented Jul 5, 2016 •

edited

Smana commented Jul 7, 2016 •

edited