Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self deployment / pull mode #320

Closed
Smana opened this issue Jun 30, 2016 · 29 comments
Closed

self deployment / pull mode #320

Smana opened this issue Jun 30, 2016 · 29 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Smana
Copy link
Contributor

Smana commented Jun 30, 2016

Configure the local node.
There will probably be some caveats like the certs/tokens management

@Smana
Copy link
Contributor Author

Smana commented Jun 30, 2016

#321

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

For the pull mode what do you think about:

1 - Create the etcd cluster in a way that it can be scaled, we need to discuss about what would be the proper way : static or discovery.
How can we scale up in the case of an autoscaling.

2 - Use the etcd cluster to store the inventory and use a dynamic inventory.
example (https://gist.github.com/justenwalker/09698cfd6c3a6a49075b)

3 - A big issue is the secrets management, where to store the certs/tokens?. how to sync between the nodes?, do we have to create a cert by node? ...

4 - Use ansible pull-mode

@mattymo
Copy link
Contributor

mattymo commented Jul 2, 2016

For public cloud consumers, etcd discovery is probably optimal since it almost never results in a broken cluster. For anyone deploying in-house, they might be reluctant to use discovery. An initial cluster array is adequate already.

Dynamic inventory and deploying etcd via ansible creates a chicken and egg problem. You can't use an inventory from etcd until etcd is up. Also, you need to make a way to populate this etcd. I would vote against adding complexity just for the sake of finding an innovative way to consume etcd.

Secrets management is a topic I've dealt with in previous projects. We currently have 1 master host which knows all the information. If you want to move to client-pull mode, all clients need to know where host(s) are located that know the secrets. Secret file storage should be replicated and transmitted using an encrypted method (ansible's SSH/rsync transport is totally fine).

I think you should add a new role for secrets and the first alphabetical node actually generates the secrets, while the others take a fully copy. All other nodes only take the secrets as-needed. It's important to ensure that scale-up/scale-down scenarios are covered.

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

Thank you mattymo for your answer.

We can let the user choose the way he wants to deploy the etcd cluster.
The pull mode would just be an option.

I understand that etcd would become a strong dependency but when for instance a new node is added he needs to know about the cluster topology (where is the api, the etcd, ...).
If you think about another option, we can evaluate it too.

Regarding the secrets,

all clients need to know where host(s) are located that know the secrets.

This is the reason why we need an inventory
What you describe is exactly current kargo's behaviour and it works just fine.
We can probably keep it if we have an inventory somewhere (e.g. etcd).

@Smana Smana added the feature label Jul 2, 2016
@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 2, 2016

i'm probably missing something, but why not consider DNS discovery with SRV records vs. etcd discovery?

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

This is one of the discovery option that offers etcd and i'm actually considering it @v1k0d3n

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

@RustyRobot , i need your input here too :)

@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 2, 2016

it's always been the easiest for me when building and tearing down etcd clusters for kubernetes during testing (granted, i've been pulled away from doing this in recent months so some of the syntax may have changed with etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd01.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd02.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd03.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd04.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd01.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd02.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd03.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd04.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd05.domain.com.

; 10.1.1.0/24 - A Records: Kubernetes/Etcd Members
kubetcd01              IN      A       10.1.1.21
kubetcd02              IN      A       10.1.1.22
kubetcd03              IN      A       10.1.1.23
kubetcd04              IN      A       10.1.1.24
kubetcd05              IN      A       10.1.1.25

and then configure the etcd cluster for dns discovery (example for kubetcd01)...

# [member]
ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd" 
ETCD_SNAPSHOT_COUNTER="1000" 
ETCD_ELECTION_TIMEOUT="1000" 
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001" 
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380" 

#[cluster]
ETCD_DISCOVERY_SRV="domain.com" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd" 
ETCD_INITIAL_CLUSTER_STATE="new" 
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379" 

@ant31
Copy link
Contributor

ant31 commented Jul 2, 2016

I would not use ansible pull if possible. Also about the all in one image
there are 2 images:
1 with de deployment scripts. 1 with all tools

I ll detail more later
Le 2 juil. 2016 14:50, "Brandon B. Jozsa" notifications@github.com a
écrit :

it's always been the easiest for me, and i bring up and tear down etcd
clusters for kubernetes all the time (granted, i've been pulled away from
doing this in recent months so some of the syntax may have changed with
etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd01.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd02.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd03.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd04.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd01.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd02.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd03.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd04.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd05.domain.com.

and then configure the etcd cluster for dns discovery (example for
kubetcd01)...

[member]

ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_SNAPSHOT_COUNTER="1000"
ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001"
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380"

#[cluster]
ETCD_DISCOVERY_SRV="jinkit.com"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380"
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379"


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#320 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA_vbBdsxWn2jp3XpNn5eMOMi3uDq03Mks5qRl6jgaJpZM4JB0LA
.

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

@ant31 yes, please do :)

@Smana
Copy link
Contributor Author

Smana commented Jul 2, 2016

@v1k0d3n how would you delete or add members ?

@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 2, 2016

i would let the users control that on the DNS side, and use proxy on the etcd members. destroy, and/or rebuild...add via dns. i mean, it's RAFT...so 3 or 5 members is ideal. how many members do you really want over that? my biggest stumbling block right now with kargo is I have this great srv/dns framework in place that i can't use to bring up etcd. :(

@yoojinl
Copy link
Contributor

yoojinl commented Jul 4, 2016

@Smana @v1k0d3n What I know about current state of etcd, is there is no other way to manage it except of having a static list of etcd members and synchronizing it with a etcd cluster by explicitly calling etcd member add node and etcd member remove node, also documentation explicitly states that discovery should be used only for cluster bootstrapping, after cluster is created, discovery becomes kind off useless. Also public discovery is not always an option, when we are talking about data centers with firewall in front (which may block some (all) traffic due to security reasons), in this case deploying own HA discovery system is another issue.

There is a good video on life cycle management of etcd from CoreOS fest which took place in Berlin. Basically the presenter had to reinvent new tool on top of etcd, in order to do proper cluster management, it's not a trivial task, and I would suggest to go with static list as the most simple and straight forward solution, until something like that is supported by etcd natively.

@ant31
Copy link
Contributor

ant31 commented Jul 4, 2016

  1. If we use ansible-pull it will add ansible dependency on every host instead of one (the deployment host). It's the opposite of the target (lower requirements / host modifications)
  2. We should have
    • 1 image with the deployment script: kargo-deploy
    • 1 image with with every binaries/tools (all-in-one)

-> the all-in one is optional and that's an another subject

The only requirement on host would be docker:
The base idea is to run:

docker run -e options=... -v /:/rootfs/ --rm kargo-deploy -- init

The image kargo-deploy, contains ansible + kargo scripts, we mount the host volume into the container, with privileged access we can configure it.

  • Each node should be able to configure himself,
  • We can give the list of other nodes, or some options (like the etcd-cluster addresses)
  • We assume that a container_engine is running (docker / rkt / ...)

To configure the container_engine, I propose to keep and use current playbooks. Maybe later we can switch to shell-script instead of ansible to remove the 'python' dependency from hosts.

@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 4, 2016

i think i'm losing track of what's being discussed in this thread, which is why I started #324 @Smana. giving users an option for how they want to bootstrap etcd distances ourselves away from which method is better and why. in my use case; i'm very specifically looking for a DNS SRV bootstrap discovery method for etcd, and i like the approach of "bring your own [xyz component]" to the project.

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc, or if the project becomes less democratic and more opinionated about the etcd bootstrap method i feel like the target audience will become more narrow over time.

@ant31
Copy link
Contributor

ant31 commented Jul 4, 2016

Discussion have deviated on etcd, maybe we should open a new issue to solve this question of etcd.

This issue is to how to switch kargo from push to pull.
The idea is to:

  1. scale deployment to large cluster
  2. allow auto-scale of nodes
  3. remove host dependencies (python etc)

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc,

That's the opposite of what we trying to solve with this issue.
We want that the only requirement on hosts is aa 'container engine' (docker/rkt)

This is why using ansible-pull is out! I don't want to install ansible on every hosts.
We have to find something else than ansible-pull (shell-script?) to deploy the kargo image

@Smana
Copy link
Contributor Author

Smana commented Jul 5, 2016

@ant31 i agree with you for the docker image which deploys the node where it resides.
Actually that's why i've opened the issue #321
That said how would you configure the local node without using the pull mode (inside the container of course) ?
The main issue is not to run inside a container, this is easy to do but how to configure the local node.

@Smana
Copy link
Contributor Author

Smana commented Jul 5, 2016

The pull is not mandatory in the case of a docker image (the ansible playbooks is inside the container) but we need to get the inventory from somewhere and automatically (when the node starts)

@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 5, 2016

Maybe I'm missing something, but why not stand up an etcd cluster with discovery and store secrets in etcd?

@ant31
Copy link
Contributor

ant31 commented Jul 5, 2016

@v1k0d3n yes, good idea to use the etcd cluster to store the secrets and configuration shared by nodes/masters

@Smana Smana changed the title Pull mode deployment Configure the local node Jul 5, 2016
@Smana Smana changed the title Configure the local node self deployment / pull mode Jul 7, 2016
@Smana
Copy link
Contributor Author

Smana commented Jul 7, 2016

Please refer to https://github.com/kubespray/kargome

@v1k0d3n
Copy link
Contributor

v1k0d3n commented Jul 7, 2016

private repo?

@Smana
Copy link
Contributor Author

Smana commented Jul 8, 2016

@v1k0d3n Sorry, i've changed my mind and i closed the repo, i'll try to do a PR instead.

@bogdando bogdando added this to the v2.2.0 milestone Jan 10, 2017
@billyoung
Copy link
Contributor

hello! i'm curious if this is still in the works? i'd be interested in contributing :)

@Atoms
Copy link
Member

Atoms commented Jun 7, 2018

@billyoung we thinking about something, but no real work is done as i know.

@ant31 ant31 modified the milestones: v2.2, 3.0 Aug 15, 2018
@Atoms
Copy link
Member

Atoms commented Sep 12, 2018

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2018
@woopstar woopstar removed this from the 3.0 milestone Sep 28, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 10, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests