Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent error getting kube config in-cluster #331

Closed
benjamin-wright opened this issue Oct 23, 2020 · 27 comments
Closed

Silent error getting kube config in-cluster #331

benjamin-wright opened this issue Oct 23, 2020 · 27 comments

Comments

@benjamin-wright
Copy link

Hi, I'm having a really basic problem with getting kube working in cluster with my k3s cluster (v1.18.9-k3s1). I have a very simple application which is basically just the following:

#![feature(proc_macro_hygiene, decl_macro)]

extern crate kube;
extern crate serde;
extern crate serde_json;

use kube::{ Config };

#[tokio::main]
async fn main() -> Result<(), kube::Error> {
    println!("Starting...");

    println!("Getting kube config...");
    let config_result = Config::from_cluster_env(); //This is breaking

    println!("Resolving kube config result...");
    let config = match config_result {
        Ok(config) => config,
        Err(e) => {
            println!("Error getting config {:?}", e);
            return Err(e);
        }
    };

    println!("Finished!");
    Ok(())
}
[dependencies]
kube = { version = "0.43.0", default-features = false, features = ["derive", "native-tls"] }
kube-runtime = { version = "0.43.0", default-features = false, features = [ "native-tls" ] }
k8s-openapi = { version = "0.9.0", default-features = false, features = ["v1_18"] }
serde =  { version = "1.0", features = ["derive"] }
serde_derive = "1.0"
serde_json = "1.0"
tokio = { version = "0.2.22", features = ["full"] }
reqwest = { version = "0.10.8", default-features = false, features = ["json", "gzip", "stream", "native-tls"] }

The output I'm getting is:

[server] Starting...
[server] Getting kube config...
[K8s EVENT: Pod auth-controller-6fb8f87b4d-5stf5 (ns: ponglehub)] Back-off restarting failed container

I'm hoping this is something obvious in my dependencies, but am suspicious that it's a K3S compatibility issue, since I tried using rustls originally and had to switch to native openssl because the k3s in-cluster api server address is an IP address rather than a hostname...

@clux
Copy link
Member

clux commented Oct 23, 2020

It's not a lot the Config::from_cluster_env fn is doing other than reading environment variable.
My first thought is that you might not be have automountServiceAccountToken: true to get the evars in there.

Weird that you are not getting Error information from the call. It should return an Error (and clearly it's crashing). What produces the [server] and [k8s EVENT] output? This doesn't look like straight kubectl logs which probably would have the error.

dependency wise, if you are not using rustls, you can take out the default-features = false on kube and kube-runtime and just pick the features you want (like derive for kube)

@benjamin-wright
Copy link
Author

Hi @clux, thanks for getting back to me so quickly. Sorry, the output is from Tilt, which I'm using to deploy and pipe back the logs. But k9s gives a similar story:

Starting...           
Getting kube config...
stream closed         

Yeah, it's the lack of logs that's really getting me. I haven't created an explicit service account for it yet, will try giving it a service account and see if that was throwing it for a loop somehow. Ta for the pointer on default-features :)

@benjamin-wright
Copy link
Author

Nope, explicit serviceaccount with explicit automountServiceAccountToken: true, same output 😞

@clux
Copy link
Member

clux commented Oct 23, 2020

are you associating the deployment with the service account?
https://github.com/clux/kube-rs/blob/master/tests/deployment.yaml#L114

@benjamin-wright
Copy link
Author

Yup:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: auth-controller
    app.kubernetes.io/managed-by: tilt
  name: auth-controller
  namespace: ponglehub
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: auth-controller
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: auth-controller
        app.kubernetes.io/managed-by: tilt
        tilt.dev/pod-template-hash: ad41af2b3e7eaa93efd5
    spec:
      containers:
      - image: auth-controller:tilt-20f9932817b95f85
        imagePullPolicy: IfNotPresent
        name: server
        resources:
          limits:
            cpu: 100m
            memory: 32Mi
          requests:
            cpu: 100m
            memory: 32Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: auth-controller
      serviceAccountName: auth-controller
      terminationGracePeriodSeconds: 30

@benjamin-wright
Copy link
Author

And k9s is saying that the service account tokens are mounted:
Mounts: /var/run/secrets/kubernetes.io/serviceaccount from auth-controller-token-wx5ll (ro)

@clux
Copy link
Member

clux commented Oct 23, 2020

Ok, that should be fine then 🤔

I would try to use kubectl logs on the pod to get the full logs, tilt might be hiding the last error for you.
if this is all the code you use on k3s i could maybe try to reproduce it later.

@benjamin-wright
Copy link
Author

Same from kubectl, last line is Getting kube config.... Is there a trick to turning on the trace! logs?

If you could try to replicate, that would be amazing! Code above is literally everything, I deleted everything extraneous I could find :)

@clux
Copy link
Member

clux commented Oct 23, 2020

you can turn on traces by tuning an evar; RUST_LOG=info,kube=trace (if you are using env_logger). some examples show how to do this.

@benjamin-wright
Copy link
Author

Thanks, am still getting my head round a lot of how rust works!

Turning on the trace gives no more info, which I guess means it's falling over somewhere before it gets to the error path 😞

If I get a chance I'll try pulling the kube-rs source and dropping some more traces in to see if I can pinpoint where it's coming unstuck

@nightkr
Copy link
Member

nightkr commented Oct 23, 2020

What exit code and reason does kubectl describe pod show?

@nightkr
Copy link
Member

nightkr commented Oct 23, 2020

FWIW it seems to run fine for me in K3d 0.8.0.6, but I couldn't get Tilt to cooperate.

@benjamin-wright
Copy link
Author

Thanks @teozkr, that confirms that there's no reason it shouldn't work in principle.

    State:          Terminated
      Reason:       Error
      Exit Code:    139
      Started:      Fri, 23 Oct 2020 21:58:11 +0100
      Finished:     Fri, 23 Oct 2020 21:58:11 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    139
      Started:      Fri, 23 Oct 2020 21:57:57 +0100
      Finished:     Fri, 23 Oct 2020 21:57:57 +0100

I was on brew's k3d v3.0.1, and v3.1.5 is available now. Will try rolling the version tomorrow and see what happens... 🤞

FYI: On the tilt front, there's a script of theirs that got it working with k3d for me: https://github.com/tilt-dev/k3d-local-registry/

@nightkr
Copy link
Member

nightkr commented Oct 23, 2020 via email

@benjamin-wright
Copy link
Author

Found some more info from valgrind while trying to figure out how to get the stack size, it's a bit chonky but kinda interesting:

==124== Memcheck, a memory error detector
==124== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==124== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==124== Command: ./rust_binary
==124== 
==124== Conditional jump or move depends on uninitialised value(s)
==124==    at 0x4C90A5: strlen (in /rust_binary)
==124==    by 0x4C172C: setenv (in /rust_binary)
==124==    by 0x400AE9F: ???
==124==    by 0x400989F: ???
==124==    by 0x400AE9F: ???
==124==    by 0x8: ???
==124==    by 0x4D91C7: ??? (in /rust_binary)
==124==    by 0x4953B3: setenv (os.rs:560)
==124==    by 0x4953B3: std::env::_set_var (env.rs:322)
==124==    by 0x1AB6FC: std::env::set_var (in /rust_binary)
==124==    by 0x1AC1BB: auth_controller::main::{{closure}} (in /rust_binary)
==124==    by 0x1AB0F9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll (in /rust_binary)
==124==    by 0x1AAA51: tokio::runtime::enter::Enter::block_on::{{closure}} (in /rust_binary)
==124==  Uninitialised value was created
==124==    at 0x4CDB68: __expand_heap (in /rust_binary)
==124==    by 0x2: ???
==124==    by 0x67D457: ??? (in /rust_binary)
==124==    by 0x4C5F00: sched_yield (in /rust_binary)
==124==    by 0x1FFF000607: ???
==124== 
==124== Conditional jump or move depends on uninitialised value(s)
==124==    at 0x4C90A5: strlen (in /rust_binary)
==124==    by 0x495026: from_ptr (c_str.rs:1172)
==124==    by 0x495026: getenv (os.rs:548)
==124==    by 0x495026: std::env::_var_os (env.rs:241)
==124==    by 0x494E15: var_os<&std::ffi::os_str::OsStr> (env.rs:237)
==124==    by 0x494E15: std::env::_var (env.rs:209)
==124==    by 0x1C378F: std::env::var (in /rust_binary)
==124==    by 0x1B0A60: env_logger::Var::get (in /rust_binary)
==124==    by 0x1B096C: env_logger::Env::get_filter (in /rust_binary)
==124==    by 0x1AFCDE: env_logger::Builder::parse_env (in /rust_binary)
==124==    by 0x1AFC10: env_logger::Builder::from_env (in /rust_binary)
==124==    by 0x1B0CED: env_logger::try_init_from_env (in /rust_binary)
==124==    by 0x1B0C2C: env_logger::try_init (in /rust_binary)
==124==    by 0x1B0C59: env_logger::init (in /rust_binary)
==124==    by 0x1AC1C6: auth_controller::main::{{closure}} (in /rust_binary)
==124==  Uninitialised value was created
==124==    at 0x4CDB68: __expand_heap (in /rust_binary)
==124==    by 0x2: ???
==124==    by 0x67D457: ??? (in /rust_binary)
==124==    by 0x4C5F00: sched_yield (in /rust_binary)
==124==    by 0x1FFF000607: ???
==124== 
[2020-10-24T07:17:47Z INFO  auth_controller] Starting...
[2020-10-24T07:17:47Z INFO  auth_controller] An info message
[2020-10-24T07:17:47Z TRACE auth_controller] A trace message
[2020-10-24T07:17:47Z INFO  auth_controller] Getting kube configs...
==124== Conditional jump or move depends on uninitialised value(s)
==124==    at 0x4C2E55: calloc (in /rust_binary)
==124==    by 0x67D457: ??? (in /rust_binary)
==124==    by 0x1FFEFF248F: ???
==124==    by 0x1FFF000607: ???
==124==    by 0x39180B: alloc::alloc::alloc_zeroed (in /rust_binary)
==124==    by 0x39198E: alloc::alloc::Global::alloc_impl (in /rust_binary)
==124==    by 0x3921BD: <alloc::alloc::Global as core::alloc::AllocRef>::alloc_zeroed (in /rust_binary)
==124==    by 0x3861F5: alloc::raw_vec::RawVec<T,A>::allocate_in (in /rust_binary)
==124==    by 0x38C127: alloc::raw_vec::RawVec<T,A>::with_capacity_zeroed_in (in /rust_binary)
==124==    by 0x383779: alloc::raw_vec::RawVec<T>::with_capacity_zeroed (in /rust_binary)
==124==    by 0x2ED444: <T as alloc::vec::SpecFromElem>::from_elem (in /rust_binary)
==124==    by 0x2F31AD: alloc::vec::from_elem (in /rust_binary)
==124==  Uninitialised value was created by a stack allocation
==124==    at 0x3B0A4A: regex_syntax::ast::parse::ParserI<P>::parse_with_comments (in /rust_binary)
==124== 
==124== Jump to the invalid address stated on the next line
==124==    at 0x0: ???
==124==    by 0x27ACF1: openssl_sys::init::{{closure}} (in /rust_binary)
==124==    by 0x27AE16: std::sync::once::Once::call_once::{{closure}} (in /rust_binary)
==124==    by 0x49BF51: std::sync::once::Once::call_inner (once.rs:419)
==124==    by 0x27AD97: std::sync::once::Once::call_once (in /rust_binary)
==124==    by 0x27ACD0: openssl_sys::init (in /rust_binary)
==124==    by 0x27A333: openssl::x509::X509::from_pem (in /rust_binary)
==124==    by 0x1FA840: native_tls::imp::Certificate::from_pem (in /rust_binary)
==124==    by 0x1FADF1: native_tls::Certificate::from_pem (in /rust_binary)
==124==    by 0x1F51AB: reqwest::tls::Certificate::from_pem (in /rust_binary)
==124==    by 0x1E33A8: kube::config::incluster_config::load_cert::{{closure}} (in /rust_binary)
==124==    by 0x1D200D: core::iter::adapters::map_try_fold::{{closure}} (in /rust_binary)
==124==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==124== 
==124== 
==124== Process terminating with default action of signal 11 (SIGSEGV)
==124==  Bad permissions for mapped region at address 0x0
==124==    at 0x0: ???
==124==    by 0x27ACF1: openssl_sys::init::{{closure}} (in /rust_binary)
==124==    by 0x27AE16: std::sync::once::Once::call_once::{{closure}} (in /rust_binary)
==124==    by 0x49BF51: std::sync::once::Once::call_inner (once.rs:419)
==124==    by 0x27AD97: std::sync::once::Once::call_once (in /rust_binary)
==124==    by 0x27ACD0: openssl_sys::init (in /rust_binary)
==124==    by 0x27A333: openssl::x509::X509::from_pem (in /rust_binary)
==124==    by 0x1FA840: native_tls::imp::Certificate::from_pem (in /rust_binary)
==124==    by 0x1FADF1: native_tls::Certificate::from_pem (in /rust_binary)
==124==    by 0x1F51AB: reqwest::tls::Certificate::from_pem (in /rust_binary)
==124==    by 0x1E33A8: kube::config::incluster_config::load_cert::{{closure}} (in /rust_binary)
==124==    by 0x1D200D: core::iter::adapters::map_try_fold::{{closure}} (in /rust_binary)
==124== 
==124== HEAP SUMMARY:
==124==     in use at exit: 0 bytes in 0 blocks
==124==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==124== 
==124== All heap blocks were freed -- no leaks are possible
==124== 
==124== For lists of detected and suppressed errors, rerun with: -s
==124== ERROR SUMMARY: 14 errors from 4 contexts (suppressed: 0 from 0)
Segmentation fault

Looks like something weird going on in openssl, will carry on digging but thought I'd post this up in-case it's obvious to someone!

@nightkr
Copy link
Member

nightkr commented Oct 24, 2020

What's your Dockerfile and Cargo.lock? I was using https://gist.github.com/teozkr/00f2106ba4bd18d05cc5d5ada681e098

@benjamin-wright
Copy link
Author

https://gist.github.com/benjamin-wright/4bdf55dc304fa82c388fa93f25df40c9

There's a dockerfile that defines the build environment, the command in build.sh which creates the binary, then another dockerfile which makes the image.

(I've got this setup rather than a multistage build only because I was mucking about with making a simple multi-language monorepo build tool, and up until incorporating rust it had been quite convenient to build the artefacts locally with this tool, then let tilt deal with building the images.)

@nightkr
Copy link
Member

nightkr commented Oct 24, 2020

Okay, I've been able to narrow it down to the following:

Cargo.toml:

[package]
name = "kube-331"
version = "0.1.0"
authors = ["Teo Klestrup Röijezon <teo@nullable.se>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
openssl-sys = "0.9.58"

[profile.release]
opt-level = 0
debug = true
debug-assertions = false
overflow-checks = false
lto = false
panic = 'unwind'
incremental = true
codegen-units = 256
rpath = true

main.rs:

fn main() {
    openssl_sys::init();
}

Curiously, it doesn't seem to happen when I build and run it inside of the same container...

@clux
Copy link
Member

clux commented Oct 24, 2020

Oh, you can't use openssl in alpine if you are not building openssl with musl. You need something like muslrust for that.

@clux
Copy link
Member

clux commented Oct 24, 2020

Although, you are grabbing openssl-dev from apk. In theory it looks okay, but I've never managed to get that combo to work, which is why muslrust is cross compiling from ubuntu.

@clux
Copy link
Member

clux commented Oct 24, 2020

You might be able to make it work if you set a bunch of evars and have pkgconfig installed so that the openssl-sys build script can detect openssl-dev, but it's a bit of a rabbit whole. Generally why people don't compile straight from alpine if they have C dependencies.

@nightkr
Copy link
Member

nightkr commented Oct 24, 2020

Oh hey, found the difference. Turns out, it seems to work if the builder installs cargo via apk, but not via rustup. I guess Alpine applies some patch on rustc somewhere.

@clux
Copy link
Member

clux commented Oct 24, 2020

Oh wow, ok, that's a less horrible solution :D

@benjamin-wright
Copy link
Author

Just tried out installing cargo from apk, works like a charm! 😁

Thanks so much for all your help with this @teozkr and @clux, really appreciate the effort 😄

@benjamin-wright
Copy link
Author

Is there anything you'd like to do off the back of this ticket? Happy to close as resolved otherwise

@clux
Copy link
Member

clux commented Oct 24, 2020

I might leave a caveat in the readme for when others come by trying to use musl. Will close after that.

@Tinkesh-Kumar
Copy link

Just tried out installing cargo from apk, works like a charm! 😁

Thanks so much for all your help with this @teozkr and @clux, really appreciate the effort 😄

Hey can you share your Dockerfile after the change, is your base image still alpine ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants