New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes integration for Nelson #49

Merged
merged 1 commit into from Jan 12, 2018

Conversation

Projects
None yet
5 participants
@adelbertc
Member

adelbertc commented Dec 5, 2017

  • Add Kubernetes interpreter for SchedulerOp
  • Add new Canopus workflow for Kubernetes deployments
  • Add new 'scheduler' field in config to choose between Nomad or Kubernetes
  • Add support for SSLContext in HTTP4S Client creation
  • Add namespace to Summary ctor since Kubernetes needs that

Also:

  • Change Alpine APK endpoint as original one didn't seem to work anymore
  • Remove Allocations, was only used in reconcilliation which was removed in #48
@timperrett

This comment has been minimized.

Member

timperrett commented Dec 5, 2017

@rossabaker can you comment on the stability of #35 and it's mergability status?

@timperrett

Broadly this looks fantastic - thanks for putting the work in dude!

// Auto-mounted at this path for pods
// https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod
val path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
val serviceAccountToken = scala.io.Source.fromFile(path).getLines.toList.head

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Interesting. Am I right in reading this as "Nelson for k8s must be deployed with k8s"?

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

Currently yeah, which implies either single datacenter/region or awkward things like having a blessed master region that has line of sight to every other region. I should get it working w/ plain Docker like there is for Nomad, but this was done mostly as part of my "let's just get it working" effort

def readScheduler(kfg: KConfig, proxy: Option[Infrastructure.ProxyCredentials]): Option[SchedulerOp ~> Task] =
kfg.lookup[String]("scheduler") match {
case Some("nomad") => readNomadScheduler(kfg.subconfig("nomad"))
case Some("k8s") => readK8sScheduler(kfg.subconfig("k8s"))

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

This feels nit picky but can we refer to kubernetes with it's fully qualified name throughout?

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

Sure :-) Did you want this change just in the config or for the classes as well, e.g. s/K8sHttp/KubernetesHttp

import java.net.URI
import java.time.Instant
import scala.concurrent.duration.FiniteDuration
import com.amazonaws.regions.Region

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

This seems errant? I don't think this type is used anywhere

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

This happened since I was moving around/sorting the imports, my PR specifically does not use it but it's being used somewhere else in the file.

@@ -51,7 +51,7 @@ trait Workflow[O] {
object Workflow {
// There's currently only one workflow implementation.
val workflows = List(Magnetar)
val workflows = List(Magnetar, Canopus)

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Great name!

private val log = Logger[K8sHttp.type]
}
import K8sHttp.log

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Prefer member imports to have an enclosing scope (import inside the module that wants to use it)

@@ -73,7 +73,7 @@ dockerCommands ++= Seq(
ExecCmd("RUN", "chmod", "555", s"${(defaultLinuxInstallLocation in Docker).value}/bin/${normalizedName.value}"),
ExecCmd("RUN", "chown", "-R", "nelson:nelson", s"${(defaultLinuxInstallLocation in Docker).value}"),
ExecCmd("RUN", "apk", "add", "--update-cache", "bash", "graphviz", "wget", "libc6-compat"),
ExecCmd("RUN", "apk", "add", "docker=1.9.1-r2", "--update-cache", "--repository", "http://dl-3.alpinelinux.org/alpine/v3.3/community/")
ExecCmd("RUN", "apk", "add", "docker=1.9.1-r2", "--update-cache", "--repository", "http://dl-2.alpinelinux.org/alpine/v3.3/community/")
)

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

This is ok for now - ideally we should modernize this but it's future work

"spec" := Json(
"replicas" := plan.environment.desiredInstances.getOrElse(1),
"selector" := Json(
"matchLabels" := Json("app" := name)

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

What does this do? Stack name here is cannonical but depending on what the intention is you might want some other variables as well.

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

Kubernetes has a notion of labels for each pod (analogous to a unit of deployment, could be a service, a service + sidecars), this matchLabels determines which pods are controlled by this deployment. This becomes important since the pods deployed through a Kubernetes Deployment aren't special, there could be existing pods that match a label in which case the Deployment will consider those as part of it's control.

For example if there is an existing foo with label name=foo, and you create a Deployment for foo with "matchLabels": { "name": "foo" } for 2 replicas, it will notice there is already an existing one and only create one additional pod.

tl;dr matchLabels should match on labels that uniquely identify a particular deployment, so I think name (which is the stack name) makes sense here, do I need anything else?

),
"template" := Json(
"metadata" := Json(
"labels" := Json(

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Might be worth adding version and namespace labels here for operability? "Show ,e all the qa things" for example is useful for billing and admin purposes

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

For sure, I've added stack name, unit name, and version in an incoming commit.

// schedule: Option[Schedule], // ignored
ports: Option[Ports]
): Json = {
// TODO: Cron/Job not supported atm, only Service

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Presumably we can support jobs too? It looks like kube supports them https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#creating-a-cron-job

This comment has been minimized.

@adelbertc

adelbertc Dec 6, 2017

Member

Yes sir, I just haven't done it yet, I just wanted to get something working so I focused on services. But absolutely we can supports jobs!

k8s.endpoint / "apis" / "apps" / "v1beta2" / "namespaces" / "default" / "deployments" / name
))
// TODO: Potential 404 or similar here if trying to delete something that doesn't exist

This comment has been minimized.

@timperrett

timperrett Dec 6, 2017

Member

Should absolutely do something more meaningful here - if we know it's unrecoverable for example

@timperrett timperrett changed the title from Nelson on k8s to Kubernetes integration for Nelson Dec 6, 2017

@coltfred

This comment has been minimized.

coltfred commented Dec 6, 2017

I'm super pumped you did this @adelbertc. I'll need to look at it in more detail later.

@adelbertc

This comment has been minimized.

Member

adelbertc commented Dec 6, 2017

@coltfred Yes! Is your company using Kubernetes? I'm planning on writing a "Setup Nelson for K8s" guide after we finish up this initial PR. Also feel free to ask me (or anyone else) questions in the Gitter channel.

@daddykotex

This comment has been minimized.

Contributor

daddykotex commented Dec 7, 2017

👍 I'm impressed by how little code it took. Really looking forward to it. Thanks for that

@adelbertc adelbertc changed the base branch from k8s to master Dec 7, 2017

@codecov-io

This comment has been minimized.

codecov-io commented Dec 8, 2017

Codecov Report

Merging #49 into master will decrease coverage by 3.19%.
The diff coverage is 2.15%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #49     +/-   ##
=========================================
- Coverage   62.76%   59.57%   -3.2%     
=========================================
  Files         129      129             
  Lines        4034     4245    +211     
  Branches      107      104      -3     
=========================================
- Hits         2532     2529      -3     
- Misses       1502     1716    +214
Impacted Files Coverage Δ
core/src/main/scala/Datacenter.scala 82.45% <ø> (+1.42%) ⬆️
core/src/main/scala/scheduler/NomadHttp.scala 44.59% <ø> (ø) ⬆️
core/src/main/scala/Nelson.scala 43.44% <0%> (-0.12%) ⬇️
core/src/main/scala/scheduler/SchedulerOp.scala 25% <0%> (ø) ⬆️
core/src/main/scala/KubernetesClient.scala 0% <0%> (ø)
core/src/main/scala/scheduler/KubernetesHttp.scala 0% <0%> (ø)
...e/src/main/scala/scheduler/DeploymentSummary.scala 0% <0%> (ø)
core/src/main/scala/Workflow.scala 22.22% <100%> (ø) ⬆️
core/src/main/scala/workflows/Canopus.scala 8.33% <8.33%> (ø)
core/src/main/scala/Config.scala 77.61% <9.67%> (-9.06%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e7a65d...b71e36a. Read the comment docs.

x509Cert.map { cert =>
val trustManager = cacert(Array(cert))
val sslContext = SSLContext.getInstance("TLS")
sslContext.init(null, Array(trustManager), new SecureRandom())

This comment has been minimized.

@timperrett

timperrett Dec 11, 2017

Member

Doesn't this make a new secure random for every request? It's not clear if this code block is called frequently or not.

This comment has been minimized.

@adelbertc

adelbertc Dec 11, 2017

Member

Appears so yeah, but it should only be called once on startup I think since (currently) it's only being used to create the K8s scheduler interpreter. I could move it out but I'm not sure what the security or mutability ramifications of that are.

new scheduler.KubernetesHttp(kubernetes, http4sClient(kubernetes.timeout, sslContext = Some(sslContext)), serviceAccountToken)
}
def readScheduler(kfg: KConfig, proxy: Option[Infrastructure.ProxyCredentials]): Option[SchedulerOp ~> Task] =

This comment has been minimized.

@timperrett

timperrett Dec 11, 2017

Member

Can we just double check there's no documentation that needs updating pertaining to this?

This comment has been minimized.

@adelbertc

adelbertc Dec 11, 2017

Member

Just checked again, I did a grep through the repo for nomad { and made sure there was a scheduler = "nomad" in those locations:

$ rg "nomad \{" --context 1
etc/development/http/http.dev.cfg
93-          scheduler = "nomad"
94:          nomad {
95-           endpoint = "http://localhost:4646"
--
117-      }
118:      # nomad {
119-      #   endpoint = "$(NOMAD_ADDR)"

etc/development/docker/nelson.cfg
47-          scheduler = "nomad"
48:          nomad {
49-            endpoint = ""

docs/src/hugo/content/index.md
689-          scheduler = "nomad"
690:          nomad {
691-            # Where can Nelson access the Nomad leader cluster

core/src/test/resources/nelson/datacenters-missing-consul.cfg
33-          scheduler = "nomad"
34:          nomad {
35-            endpoint  = "https://nomad.service"
--
78-          scheduler = "nomad"
79:          nomad {
80-            endpoint  = "https://nomad.service"

core/src/test/resources/nelson/datacenters-missing-domain.cfg
32-          scheduler = "nomad"
33:          nomad {
34-            endpoint  = "https://nomad.service"

core/src/test/resources/nelson/datacenters.cfg
32-          scheduler = "nomad"
33:          nomad {
34-            endpoint  = "https://nomad.service"
--
87-          scheduler = "nomad"
88:          nomad {
89-            endpoint  = "https://nomad.service"

core/src/main/resources/nelson/defaults.cfg
135-
136:  nomad {
137-    # These service tags will be automatically added (via set union)
--
236-  #         scheduler = "nomad"
237:  #         nomad {
238-  #           endpoint = "http://nomad.service.texas.your.company.com:1234/"
@@ -17,10 +17,28 @@
package nelson
package scheduler
import scalaz.Monoid
import scalaz.Scalaz._

This comment has been minimized.

@timperrett

timperrett Dec 11, 2017

Member

Can we use the a la carte imports within the scope of a module? I dont like importing these globally in a file

summary(dc, ns, stackName)
// TODO: Legacy, no longer used
case RunningUnits(dc, prefix) =>

This comment has been minimized.

@timperrett

timperrett Dec 11, 2017

Member

This is actually used in nelson stack runtime I believe :-)

This comment has been minimized.

@adelbertc

adelbertc Dec 11, 2017

Member

Hm I tried tracing it in the codebase and also looking at https://verizon.github.io/nelson/reference.html#cli , I think that only uses Summary ? It seems RunningUnits is mostly used in NomadHttp so it can kill child processes for jobs and stuff ?

This comment has been minimized.

@adelbertc

adelbertc Dec 11, 2017

Member

My trace:

private def jobSpecJson(
stackName: StackName,
image: Image,
// dc: Datacenter, // ignored

This comment has been minimized.

@timperrett

timperrett Dec 11, 2017

Member

Zombie code

@@ -0,0 +1,4 @@
# Nelson has moved!
Please find Nelson at its new home: [getnelson/nelson](https://getnelson.github.io/nelson/)

This comment has been minimized.

@timperrett

timperrett Dec 26, 2017

Member

@adelbertc is this a bad merge? In this repo we don't want to change the repo from what it was IMHO

This comment has been minimized.

@adelbertc

adelbertc Dec 26, 2017

Member

Yes definitely a bad merge, will fix...

@timperrett

This comment has been minimized.

Member

timperrett commented Dec 26, 2017

@kaiserpelagic @rossabaker @adelbertc so we got the healthcheck stuff on the other PR, but what else will it take to get this over the line? Cheers

@timperrett timperrett closed this Dec 26, 2017

@timperrett timperrett reopened this Dec 26, 2017

@adelbertc

This comment has been minimized.

Member

adelbertc commented Dec 26, 2017

So my goal for this PR is to get Nelson running on a single K8s cluster.. with that in mind I think these are the missing pieces:

  • Health check algebra implementation
  • Service discovery
  • Load balancer support

Health check algebra is just a matter of getting the other PR merged and then implementing it, should be straightforward.

For service discovery it seems at least short term we have 2 options:

  • Write a key-value store algebra (since currently things are tied to Consul) and use it for Lighthouse - Kubernetes will probably implement this with etcd
  • Punt to the routing layer, e.g. teach Envoy how to fetch/read/interpret Lighthouse, if possible

For load balancing we have a couple options:

Single ingress

Seems to be the predominant pattern in Kubernetes, or rather nobody I've asked has refuted me on that claim yet. This means all requests go to the same IP address, share the same load balancer (including # of nodes handling requests, etc.) and have two options for routing:

  • Hostname - every service gets its own hostname, all of which point to the single ingress IP address. Routing rules are defined on a per-hostname basis
spec:
  rules:
  - host: foo.bar.com
    http:
      paths:
      - backend:
          serviceName: s1
          servicePort: 80
  - host: bar.foo.com
    http:
      paths:
      - backend:
          serviceName: s2
          servicePort: 80
  • Routes - all services share a hostname, and are routed via the path, e.g. <load balancer>/foo/bar
spec:
  rules:
  - host: foo.bar.com
    http:
      paths:
      - path: /foo
        backend:
          serviceName: s1
          servicePort: 80
      - path: /bar
        backend:
          serviceName: s2
          servicePort: 80

Multiple Ingress

Seems to be not that common in Kubernetes, or at least doesn't seem to be a direction the Kubernetes folks are heading. This would however match more closely with Nelson's current design of exposing a load balancer per service.

I don't have many thoughts re: single vs. multiple ingress ATM, I do think many users currently do not operate at a scale/domain where having a single ingress point matters much? At least it doesn't for us, and we can certainly just implement both suggestions I made above and somehow make loadbalancer implementation pluggable. I do agree the multiple loadbalancer thing is ultimately more scalable though.

Kubernetes integration for Nelson!
* Add Kubernetes interpreter for SchedulerOp
* Add Kubernetes interpreter for HealthCheckOp
* Add new Canopus workflow for Kubernetes deployments
* Add new 'scheduler' field in config to choose between Nomad or Kubernetes
* Add support for SSLContext in HTTP4S Client creation
* Add namespace to Summary ctor since Kubernetes needs that

Also:
* Change Alpine APK endpoint as original one didn't seem to work anymore
* Remove Allocations, was only used in reconcilliation which was removed in #48

@timperrett timperrett merged commit 9fa33d9 into getnelson:master Jan 12, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment