New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
m3db docker image crashes on rancher desktop, runs fine on other docker environments #1324
Comments
Hi @nick-stephen I'm trying to replicate, but I didn't duplicate it. ❯ docker run -d -p 7201:7201 -p 7203:7203 --name m3db -v $(pwd)/m3db_data:/var/lib/m3db quay.io/m3db/m3dbnode:v1.4.2
Unable to find image 'quay.io/m3db/m3dbnode:v1.4.2' locally
v1.4.2: Pulling from m3db/m3dbnode
79e9f2f55bf5: Pull complete
e4706972f07d: Pull complete
712072678c7e: Pull complete
1a35e5f058d4: Pull complete
Digest: sha256:b5ebfd1dfe1a4478c988d48deb7b647aace3662ffc00727099c55af5d54f5614
Status: Downloaded newer image for quay.io/m3db/m3dbnode:v1.4.2
eb8ed40222d180c90cc5f2cdd98f43523fface326f3397c05a6a1a855d9f28d3
❯ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
93d70cf93ca0 897ce3c5fc8f "entry" 4 seconds ago Up 4 seconds k8s_lb-port-443_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
51ca20ba3c9a rancher/klipper-lb "entry" 5 seconds ago Up 4 seconds k8s_lb-port-80_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
838faa540b9d rancher/library-traefik "/entrypoint.sh --gl…" 10 seconds ago Up 9 seconds k8s_traefik_traefik-97b44b794-d6pd7_kube-system_62582030-7fc5-4af1-9286-34ff36fa09b8_0
fb3c303226f3 rancher/pause:3.1 "/pause" 16 seconds ago Up 15 seconds k8s_POD_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
ef2fe0a1fec7 rancher/pause:3.1 "/pause" 17 seconds ago Up 16 seconds k8s_POD_traefik-97b44b794-d6pd7_kube-system_62582030-7fc5-4af1-9286-34ff36fa09b8_0
eb8ed40222d1 quay.io/m3db/m3dbnode:v1.4.2 "/bin/m3dbnode -f /e…" 21 seconds ago Up 20 seconds 0.0.0.0:7201->7201/tcp, :::7201->7201/tcp, 2379-2380/tcp, 9000-9004/tcp, 0.0.0.0:7203->7203/tcp, :::7203->7203/tcp m3db
d9a2a8342ebe rancher/coredns-coredns "/coredns -conf /etc…" 29 seconds ago Up 29 seconds k8s_coredns_coredns-7448499f4d-h68k6_kube-system_b3091229-2f46-437b-bd87-e07d6b733596_0
911fdb23d010 rancher/local-path-provisioner "local-path-provisio…" 35 seconds ago Up 34 seconds k8s_local-path-provisioner_local-path-provisioner-5ff76fc89d-g7dhr_kube-system_c1c934ba-5ffa-4dbf-a40a-35602948dbb4_0
6a7d4edb3315 rancher/metrics-server "/metrics-server" 40 seconds ago Up 39 seconds k8s_metrics-server_metrics-server-86cbb8457f-7vm47_kube-system_ee246292-b0f2-4dde-905d-6964d76698d9_0
7372da5d5315 rancher/pause:3.1 "/pause" 46 seconds ago Up 45 seconds k8s_POD_metrics-server-86cbb8457f-7vm47_kube-system_ee246292-b0f2-4dde-905d-6964d76698d9_0
22f471134b1a rancher/pause:3.1 "/pause" 47 seconds ago Up 45 seconds k8s_POD_coredns-7448499f4d-h68k6_kube-system_b3091229-2f46-437b-bd87-e07d6b733596_0
3c2fc391030c rancher/pause:3.1 "/pause" 47 seconds ago Up 45 seconds k8s_POD_local-path-provisioner-5ff76fc89d-g7dhr_kube-system_c1c934ba-5ffa-4dbf-a40a-35602948dbb4_0
❯ curl --location --request POST 'http://localhost:7201/api/v1/database/create' \
--header 'Content-Type: application/json' \
--data-raw '{
"namespaceName": "default_unaggregated",
"retentionTime": "24h",
"type": "local"
}'
{"namespace":{"registry":{"namespaces":{"default_unaggregated":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodNanos":"86400000000000","blockSizeNanos":"3600000000000","bufferFutureNanos":"120000000000","bufferPastNanos":"600000000000","blockDataExpiry":true,"blockDataExpiryAfterNotAccessPeriodNanos":"300000000000","futureRetentionPeriodNanos":"0"},"snapshotEnabled":true,"indexOptions":{"enabled":true,"blockSizeNanos":"3600000000000"},"schemaOptions":null,"coldWritesEnabled":false,"runtimeOptions":null,"cacheBlocksOnRetrieve":false,"aggregationOptions":{"aggregations":[{"aggregated":false,"attributes":null}]},"stagingState":{"status":"UNKNOWN"},"extendedOptions":null}}}},"placement":{"placement":{"instances":{"m3db_local":{"id":"m3db_local","isolationGroup":"local","zone":"embedded","weight":1,"endpoint":"127.0.0.1:9000","shards":[{"id":0,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":1,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":2,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":3,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":4,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":5,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":6,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":7,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":8,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":9,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":10,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":11,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":12,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":13,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":14,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":15,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":16,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":17,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":18,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":19,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":20,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":21,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":22,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":23,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":24,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":25,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":26,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":27,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":28,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":29,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":30,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":31,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":32,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":33,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":34,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":35,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":36,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":37,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":38,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":39,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":40,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":41,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":42,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":43,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":44,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":45,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":46,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":47,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":48,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":49,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":50,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":51,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":52,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":53,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":54,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":55,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":56,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":57,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":58,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":59,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":60,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":61,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":62,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":63,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null}],"shardSetId":0,"hostname":"localhost","port":9000,"metadata":{"debugPort":0}}},"replicaFactor":1,"numShards":64,"isSharded":true,"cutoverTime":"0","isMirrored":false,"maxShardSetId":0},"version":0}}%
# expected error - need a few minutes to be ready
❯ curl --location --request POST 'http://localhost:7201/api/v1/services/m3db/namespace/ready' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "default_unaggregated"
}'
{"status":"error","error":"namepace default_unaggregated not yet ready, err: unable to satisfy consistency requirements, shards=64: failed to meet consistency level unstrict_majority with 1/1 success, 1 nodes responded, errors: []"}
❯ curl --location --request POST 'http://localhost:7201/api/v1/services/m3db/namespace/ready' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "default_unaggregated"
}'
{"ready":true}
❯ curl --location --request POST 'http://localhost:7201/api/v1/json/write' \
--header 'Content-Type: application/json' \
--data-raw '{
"tags": {
"__name__": "third_avenue",
"city": "new_york",
"checkout": "1"
},
"timestamp": "1642741531",
"value": 3347.26
}'
curl: (52) Empty reply from server Tried to replicate using Rancher Desktop |
@evertonlperes Thx for looking into this! Did you check to see if your docker container was still running? I think you may have found that the container crashed, and if you look at its logs you'll see the stacktraces I showed you. In any case I just reproduced it again using the beta-1 bits: Version: 1.0.0-beta.1 |
Thanks for clarifying @nick-stephen. Here's the log from container: |
Good that you can reproduce. What you show doesn't appear to be the full log. If you run |
by the way, as I found was already documented on the m3db web site, the error messages you show in the log about rlimits etc are a red-herring and can be ignored, they appear on all docker environments even those that don't crash. It's the crash (and the illegal instruction exception) that's the problem with rancher-desktop's setup... Thx! Red-herring - not related: [{"error":"current value for vm.swappiness(60) is above recommended threshold(1)"},{"error":"current value for RLIMIT_NOFILE(1048576) is below recommended threshold(3000000)"},{"error":"max value for RLIMIT_NOFILE(1048576) is below recommended threshold(3000000)"},{"error":"current value for vm.max_map_count(65530) is below recommended threshold(3000000)"}]} |
Sweet, thanks for sharing it. |
Looking at the stack, it seems like it's this file, so it's probably the hard-coded [Edit] I can't seem to find anything in their documentation regarding the CPU requirements (which would be good given that they are using assembly with hand-coded instructions)… |
This looks like a frequent issue with m3db in virtual machines, e.g. m3db/m3#3105, m3db/m3#3659, m3db/m3#3827. The answer is always: Make sure you run on a machine that supports the We are running qemu with Sounds like @evertonlperes can reproduce the problem, but just for reference, @nick-stephen and @evertonlperes, what is the exact model of your CPU, e.g. $ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz |
sysctl -n machdep.cpu.brand_string I'd interpreted this error as coming from the golang core runtime (x/sync) rather than m3db source but that could be wrong. In any case, yes, it appears to be a software bug which is triggered by the fact that rancher desktop lima is not supporting this instruction which is supported on the host, whereas other solutions (docker desktop, vmware fusion, ...) do support this by default. Thanks for investigating so quickly! |
For what it's worth, it looks like the go runtime thinks that it is checking the existence of the rdtscp instruction before invoking it.. not sure what's happening here. maybe there's a golang bug? Or the QEMU emulator is saying that the instruction is supported but then it fails when invoked... not sure... or |
FTR, unfortunately, that does not seem to be sufficient. I tried editing I might try to dig in deeper in the next days. |
I checked the instruction and apparently, independently from the https://gist.github.com/moio/1f50c268c48c8ece0b1f857838f75dca I tried to look into QEMU's sources but could not make much sense, so I asked in the mailing list. |
No response from the mailing list, but I could figure out that QEMU's RDTSCP detection code on macOS (via Hypervisor.framework) seems not to be functioning correctly (problem does not occur in Linux). Culprit seems to be in code changed by this proposed (not yet merged) patch: https://lore.kernel.org/qemu-devel/20211101054836.21471-1-dirty@apple.com/ In fact, the current QEMU code does not detect RDTSCP support on my host, despite obviously having it, while the new code does detect it correctly. I created a small reproducer here: https://gist.github.com/moio/cab4b5d0f05128ec1a6b8b4be94cafa0 I am now testing a patched QEMU. |
@jandubois: I could replicate the problem and verified a working fix in QEMU (taken from a patch submitted to the QEMU mailing list but not yet merged in master). I opened a QEMU issue: https://gitlab.com/qemu-project/qemu/-/issues/1011 In the meantime I submitted a PR on Homebrew: https://github.com/Homebrew/homebrew-core/pull/100645/files When it comes to Rancher Desktop, would it be worth it to integrate the above patch into https://github.com/rancher-sandbox/lima-and-qemu? Would it help if I open a PR there? I was thinking about patching around here, although it might not be the cleanest approach: Opinions welcome 😄 |
@moio Thanks for being pro-active about this and opening the issues and pull requests. Right now I would want to wait a week to see if homebrew accepts your PR; then we could pick this up in |
Patch was accepted in upstream qemu, and also accepted in upstream homebrew: Homebrew/homebrew-core#100645 Alas, homebrew moved to qemu 7.0 in the meantime :-( |
I think we will move Rancher Desktop to the latest qemu release from homebrew for the next release, so it should all work out, wouldn't it? |
Yes, I do not expect any problem from my perspective. Thanks! |
Rancher Desktop Version
0.7.1
Rancher Desktop K8s Version
1.21.0
What operating system are you using?
macOS
Operating System / Build Version
MacOS Monterey
What CPU architecture are you using?
x64
Linux only: what package format did you use to install Rancher Desktop?
No response
Windows User Only
No response
Actual Behavior
official m3db docker image crashes during execution with a SIGILL on rancher desktop, but works fine on docker desktop, minikube/vmware fusion and other docker environments on Mac.
See full output here: OUTPUT.txt
Steps to Reproduce
Run the docker image:
The first two steps only need to be done once to configure the service.
Invoke REST API to create a namespace in m3db (this is async and takes 1-2 mins to complete):
Wait approx 1-2 mins for namespace to be created before next step(ready namespace) will succeed:
Crash occurs on next step: insert a metric into the namespace:
Result
SIGILL: illegal instruction
PC=0xd01bf0 m=5 sigcode=2
goroutine 14333 [running]:
github.com/m3db/m3/src/x/sync.getCore(0xc00fc88d80, 0x3, 0x3, 0x1, 0x230eba0, 0xc0a4637500, 0x22bc140, 0xc0a4637500, 0x22bc120, 0xc0a4637500, ...)
/go/src/github.com/m3db/m3/src/x/sync/cpu_linux_amd64.s:9 fp=0xc00609a218 sp=0xc00609a210 pc=0xd01bf0
github.com/m3db/m3/src/x/sync.CPUCore(...)
/go/src/github.com/m3db/m3/src/x/sync/index_cpu.go:72
Expected Behavior
Docker image should run correctly
Additional Information
No response
The text was updated successfully, but these errors were encountered: