Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m3db docker image crashes on rancher desktop, runs fine on other docker environments #1324

Open
nick-stephen opened this issue Jan 25, 2022 · 18 comments
Labels
kind/bug Something isn't working platform/macos
Projects

Comments

@nick-stephen
Copy link

Rancher Desktop Version

0.7.1

Rancher Desktop K8s Version

1.21.0

What operating system are you using?

macOS

Operating System / Build Version

MacOS Monterey

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

Actual Behavior

official m3db docker image crashes during execution with a SIGILL on rancher desktop, but works fine on docker desktop, minikube/vmware fusion and other docker environments on Mac.

See full output here: OUTPUT.txt

Steps to Reproduce

Run the docker image:

docker run -d -p 7201:7201 -p 7203:7203 --name m3db -v $(pwd)/m3db_data:/var/lib/m3db quay.io/m3db/m3dbnode:v1.4.2

The first two steps only need to be done once to configure the service.

Invoke REST API to create a namespace in m3db (this is async and takes 1-2 mins to complete):

curl --location --request POST 'http://localhost:7201/api/v1/database/create' \
--header 'Content-Type: application/json' \
--data-raw '{
    "namespaceName": "default_unaggregated",
    "retentionTime": "24h",
    "type": "local"
}

Wait approx 1-2 mins for namespace to be created before next step(ready namespace) will succeed:

curl --location --request POST 'http://localhost:7201/api/v1/services/m3db/namespace/ready' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "default_unaggregated"
}'

Crash occurs on next step: insert a metric into the namespace:

curl --location --request POST 'http://localhost:7201/api/v1/json/write' \
--header 'Content-Type: application/json' \
--data-raw '{
    "tags": {
        "__name__": "third_avenue",
        "city": "new_york",
        "checkout": "1"
    },
    "timestamp": "1642741531",
    "value": 3347.26
}'

Result

SIGILL: illegal instruction
PC=0xd01bf0 m=5 sigcode=2

goroutine 14333 [running]:
github.com/m3db/m3/src/x/sync.getCore(0xc00fc88d80, 0x3, 0x3, 0x1, 0x230eba0, 0xc0a4637500, 0x22bc140, 0xc0a4637500, 0x22bc120, 0xc0a4637500, ...)
/go/src/github.com/m3db/m3/src/x/sync/cpu_linux_amd64.s:9 fp=0xc00609a218 sp=0xc00609a210 pc=0xd01bf0
github.com/m3db/m3/src/x/sync.CPUCore(...)
/go/src/github.com/m3db/m3/src/x/sync/index_cpu.go:72

Expected Behavior

Docker image should run correctly

Additional Information

No response

@nick-stephen nick-stephen added the kind/bug Something isn't working label Jan 25, 2022
@evertonlperes
Copy link
Contributor

Hi @nick-stephen
Thanks for open this issue.

I'm trying to replicate, but I didn't duplicate it.
Following the steps described, I got this output (seems to be correct):

❯ docker run -d -p 7201:7201 -p 7203:7203 --name m3db -v $(pwd)/m3db_data:/var/lib/m3db quay.io/m3db/m3dbnode:v1.4.2
Unable to find image 'quay.io/m3db/m3dbnode:v1.4.2' locally
v1.4.2: Pulling from m3db/m3dbnode
79e9f2f55bf5: Pull complete
e4706972f07d: Pull complete
712072678c7e: Pull complete
1a35e5f058d4: Pull complete
Digest: sha256:b5ebfd1dfe1a4478c988d48deb7b647aace3662ffc00727099c55af5d54f5614
Status: Downloaded newer image for quay.io/m3db/m3dbnode:v1.4.2
eb8ed40222d180c90cc5f2cdd98f43523fface326f3397c05a6a1a855d9f28d3


❯ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED          STATUS          PORTS                                                                                                                NAMES
93d70cf93ca0   897ce3c5fc8f                     "entry"                  4 seconds ago    Up 4 seconds                                                                                                                         k8s_lb-port-443_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
51ca20ba3c9a   rancher/klipper-lb               "entry"                  5 seconds ago    Up 4 seconds                                                                                                                         k8s_lb-port-80_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
838faa540b9d   rancher/library-traefik          "/entrypoint.sh --gl…"   10 seconds ago   Up 9 seconds                                                                                                                         k8s_traefik_traefik-97b44b794-d6pd7_kube-system_62582030-7fc5-4af1-9286-34ff36fa09b8_0
fb3c303226f3   rancher/pause:3.1                "/pause"                 16 seconds ago   Up 15 seconds                                                                                                                        k8s_POD_svclb-traefik-s5slz_kube-system_ebc823ce-fb35-4a3d-b240-d9ba002cc31c_0
ef2fe0a1fec7   rancher/pause:3.1                "/pause"                 17 seconds ago   Up 16 seconds                                                                                                                        k8s_POD_traefik-97b44b794-d6pd7_kube-system_62582030-7fc5-4af1-9286-34ff36fa09b8_0
eb8ed40222d1   quay.io/m3db/m3dbnode:v1.4.2     "/bin/m3dbnode -f /e…"   21 seconds ago   Up 20 seconds   0.0.0.0:7201->7201/tcp, :::7201->7201/tcp, 2379-2380/tcp, 9000-9004/tcp, 0.0.0.0:7203->7203/tcp, :::7203->7203/tcp   m3db
d9a2a8342ebe   rancher/coredns-coredns          "/coredns -conf /etc…"   29 seconds ago   Up 29 seconds                                                                                                                        k8s_coredns_coredns-7448499f4d-h68k6_kube-system_b3091229-2f46-437b-bd87-e07d6b733596_0
911fdb23d010   rancher/local-path-provisioner   "local-path-provisio…"   35 seconds ago   Up 34 seconds                                                                                                                        k8s_local-path-provisioner_local-path-provisioner-5ff76fc89d-g7dhr_kube-system_c1c934ba-5ffa-4dbf-a40a-35602948dbb4_0
6a7d4edb3315   rancher/metrics-server           "/metrics-server"        40 seconds ago   Up 39 seconds                                                                                                                        k8s_metrics-server_metrics-server-86cbb8457f-7vm47_kube-system_ee246292-b0f2-4dde-905d-6964d76698d9_0
7372da5d5315   rancher/pause:3.1                "/pause"                 46 seconds ago   Up 45 seconds                                                                                                                        k8s_POD_metrics-server-86cbb8457f-7vm47_kube-system_ee246292-b0f2-4dde-905d-6964d76698d9_0
22f471134b1a   rancher/pause:3.1                "/pause"                 47 seconds ago   Up 45 seconds                                                                                                                        k8s_POD_coredns-7448499f4d-h68k6_kube-system_b3091229-2f46-437b-bd87-e07d6b733596_0
3c2fc391030c   rancher/pause:3.1                "/pause"                 47 seconds ago   Up 45 seconds                                                                                                                        k8s_POD_local-path-provisioner-5ff76fc89d-g7dhr_kube-system_c1c934ba-5ffa-4dbf-a40a-35602948dbb4_0

❯ curl --location --request POST 'http://localhost:7201/api/v1/database/create' \
--header 'Content-Type: application/json' \
--data-raw '{
    "namespaceName": "default_unaggregated",
    "retentionTime": "24h",
    "type": "local"
}'

{"namespace":{"registry":{"namespaces":{"default_unaggregated":{"bootstrapEnabled":true,"flushEnabled":true,"writesToCommitLog":true,"cleanupEnabled":true,"repairEnabled":false,"retentionOptions":{"retentionPeriodNanos":"86400000000000","blockSizeNanos":"3600000000000","bufferFutureNanos":"120000000000","bufferPastNanos":"600000000000","blockDataExpiry":true,"blockDataExpiryAfterNotAccessPeriodNanos":"300000000000","futureRetentionPeriodNanos":"0"},"snapshotEnabled":true,"indexOptions":{"enabled":true,"blockSizeNanos":"3600000000000"},"schemaOptions":null,"coldWritesEnabled":false,"runtimeOptions":null,"cacheBlocksOnRetrieve":false,"aggregationOptions":{"aggregations":[{"aggregated":false,"attributes":null}]},"stagingState":{"status":"UNKNOWN"},"extendedOptions":null}}}},"placement":{"placement":{"instances":{"m3db_local":{"id":"m3db_local","isolationGroup":"local","zone":"embedded","weight":1,"endpoint":"127.0.0.1:9000","shards":[{"id":0,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":1,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":2,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":3,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":4,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":5,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":6,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":7,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":8,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":9,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":10,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":11,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":12,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":13,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":14,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":15,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":16,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":17,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":18,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":19,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":20,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":21,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":22,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":23,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":24,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":25,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":26,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":27,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":28,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":29,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":30,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":31,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":32,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":33,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":34,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":35,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":36,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":37,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":38,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":39,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":40,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":41,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":42,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":43,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":44,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":45,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":46,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":47,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":48,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":49,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":50,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":51,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":52,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":53,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":54,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":55,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":56,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":57,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":58,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":59,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":60,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":61,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":62,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null},{"id":63,"state":"INITIALIZING","sourceId":"","cutoverNanos":"0","cutoffNanos":"0","redirectToShardId":null}],"shardSetId":0,"hostname":"localhost","port":9000,"metadata":{"debugPort":0}}},"replicaFactor":1,"numShards":64,"isSharded":true,"cutoverTime":"0","isMirrored":false,"maxShardSetId":0},"version":0}}%

# expected error - need a few minutes to be ready
❯ curl --location --request POST 'http://localhost:7201/api/v1/services/m3db/namespace/ready' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "default_unaggregated"
}'

{"status":"error","error":"namepace default_unaggregated not yet ready, err: unable to satisfy consistency requirements, shards=64: failed to meet consistency level unstrict_majority with 1/1 success, 1 nodes responded, errors: []"}


❯ curl --location --request POST 'http://localhost:7201/api/v1/services/m3db/namespace/ready' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "default_unaggregated"
}'

{"ready":true}


❯ curl --location --request POST 'http://localhost:7201/api/v1/json/write' \
--header 'Content-Type: application/json' \
--data-raw '{
    "tags": {
        "__name__": "third_avenue",
        "city": "new_york",
        "checkout": "1"
    },
    "timestamp": "1642741531",
    "value": 3347.26
}'

curl: (52) Empty reply from server

Tried to replicate using Rancher Desktop 1.0.0-beta1 and K8s version 1.21.0 (macOS Monterey 12.1 amd64).

@evertonlperes evertonlperes added triage/needs-information Further information is requested platform/macos labels Jan 25, 2022
@nick-stephen
Copy link
Author

@evertonlperes Thx for looking into this! Did you check to see if your docker container was still running? I think you may have found that the container crashed, and if you look at its logs you'll see the stacktraces I showed you.

In any case I just reproduced it again using the beta-1 bits: Version: 1.0.0-beta.1

@evertonlperes
Copy link
Contributor

@evertonlperes Thx for looking into this! Did you check to see if your docker container was still running? I think you may have found that the container crashed, and if you look at its logs you'll see the stacktraces I showed you.

In any case I just reproduced it again using the beta-1 bits: Version: 1.0.0-beta.1

Thanks for clarifying @nick-stephen.
You're right, the container is gone after inserting the metric, as you have described.

Here's the log from container:
container-output.log

@evertonlperes evertonlperes removed the triage/needs-information Further information is requested label Jan 25, 2022
@evertonlperes evertonlperes added this to To do in Stripey Jan 25, 2022
@nick-stephen
Copy link
Author

Good that you can reproduce. What you show doesn't appear to be the full log. If you run docker logs <container_id> after the crash or if you are running docker logs -f just before the crash, you'll see the stacktrace information I showed you, which indicates that the golang runtime is triggering an illegal instruction exception in its thread sync core library.
This exception doesn't occur on other docker environments I've tried. I suspect that this is because of the options given to q3emu in terms of supported CPU characteristics, but I'm not an x86 expert to know more.

@nick-stephen
Copy link
Author

by the way, as I found was already documented on the m3db web site, the error messages you show in the log about rlimits etc are a red-herring and can be ignored, they appear on all docker environments even those that don't crash.

It's the crash (and the illegal instruction exception) that's the problem with rancher-desktop's setup... Thx!

Red-herring - not related:

[{"error":"current value for vm.swappiness(60) is above recommended threshold(1)"},{"error":"current value for RLIMIT_NOFILE(1048576) is below recommended threshold(3000000)"},{"error":"max value for RLIMIT_NOFILE(1048576) is below recommended threshold(3000000)"},{"error":"current value for vm.max_map_count(65530) is below recommended threshold(3000000)"}]}

@evertonlperes
Copy link
Contributor

Sweet, thanks for sharing it.
Tagging @jandubois in this thread.

@mook-as
Copy link
Contributor

mook-as commented Jan 25, 2022

Looking at the stack, it seems like it's this file, so it's probably the hard-coded rdtscp. I have no idea which CPUs we should be using to support that… but either way, that code should be checking CPUID before calling that. (Linking to Intel's architecture manual is hard because it's a PDF, but see 2A 3-233 (CPUID reference chart, EAX = 80000001H, EDX bit 27).)

[Edit] I can't seem to find anything in their documentation regarding the CPU requirements (which would be good given that they are using assembly with hand-coded instructions)…

@jandubois
Copy link
Member

This looks like a frequent issue with m3db in virtual machines, e.g. m3db/m3#3105, m3db/m3#3659, m3db/m3#3827.

The answer is always: Make sure you run on a machine that supports the rdtscp instruction!

We are running qemu with -cpu host, but it looks like qemu is disabling rdtscp even on some CPUs that would support it. Not quite sure where to go from here; I've filed a Lima issue yesterday to make the qemu cpu setting configurable: lima-vm/lima#592. But I'm not sure if this will be sufficient.

Sounds like @evertonlperes can reproduce the problem, but just for reference, @nick-stephen and @evertonlperes, what is the exact model of your CPU, e.g.

$ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

@nick-stephen
Copy link
Author

nick-stephen commented Jan 26, 2022

sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

I'd interpreted this error as coming from the golang core runtime (x/sync) rather than m3db source but that could be wrong. In any case, yes, it appears to be a software bug which is triggered by the fact that rancher desktop lima is not supporting this instruction which is supported on the host, whereas other solutions (docker desktop, vmware fusion, ...) do support this by default.

Thanks for investigating so quickly!

@nick-stephen
Copy link
Author

For what it's worth, it looks like the go runtime thinks that it is checking the existence of the rdtscp instruction before invoking it.. not sure what's happening here. maybe there's a golang bug? Or the QEMU emulator is saying that the instruction is supported but then it fails when invoked... not sure...

https://github.com/golang/go/blob/5b1b80beb1a2a9a353738e80777d1e25cfdfa095/src/runtime/asm_amd64.s#L1052

or

https://github.com/golang/go/blob/5b1b80beb1a2a9a353738e80777d1e25cfdfa095/src/runtime/asm_386.s#L844

@moio
Copy link
Contributor

moio commented Apr 21, 2022

I've filed a Lima issue yesterday to make the qemu cpu setting configurable [...] But I'm not sure if this will be sufficient.

FTR, unfortunately, that does not seem to be sufficient.

I tried editing lima.yaml manually to add cpuType: {x86_64: Haswell} and, while the -cpu Haswell flag is on the qemu commandline according to ps the error is still reproducible on my machine (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz).

I might try to dig in deeper in the next days.

@moio
Copy link
Contributor

moio commented Apr 22, 2022

I checked the instruction and apparently, independently from the -cpu flag, it seems that RDTSCP is never really exposed by QEMU:

https://gist.github.com/moio/1f50c268c48c8ece0b1f857838f75dca

I tried to look into QEMU's sources but could not make much sense, so I asked in the mailing list.

@moio
Copy link
Contributor

moio commented May 2, 2022

No response from the mailing list, but I could figure out that QEMU's RDTSCP detection code on macOS (via Hypervisor.framework) seems not to be functioning correctly (problem does not occur in Linux).

Culprit seems to be in code changed by this proposed (not yet merged) patch:

https://lore.kernel.org/qemu-devel/20211101054836.21471-1-dirty@apple.com/

In fact, the current QEMU code does not detect RDTSCP support on my host, despite obviously having it, while the new code does detect it correctly. I created a small reproducer here:

https://gist.github.com/moio/cab4b5d0f05128ec1a6b8b4be94cafa0

I am now testing a patched QEMU.

@moio
Copy link
Contributor

moio commented May 2, 2022

@jandubois: I could replicate the problem and verified a working fix in QEMU (taken from a patch submitted to the QEMU mailing list but not yet merged in master).

I opened a QEMU issue: https://gitlab.com/qemu-project/qemu/-/issues/1011

In the meantime I submitted a PR on Homebrew: https://github.com/Homebrew/homebrew-core/pull/100645/files

When it comes to Rancher Desktop, would it be worth it to integrate the above patch into https://github.com/rancher-sandbox/lima-and-qemu? Would it help if I open a PR there?

I was thinking about patching around here, although it might not be the cleanest approach:
https://github.com/rancher-sandbox/lima-and-qemu/blob/main/.github/workflows/release.yml#L37

Opinions welcome 😄

@jandubois
Copy link
Member

@moio Thanks for being pro-active about this and opening the issues and pull requests.

Right now I would want to wait a week to see if homebrew accepts your PR; then we could pick this up in lima-and-qemu automatically without additional effort (as long as they stay with qemu 6.2 and not move to 7.0).

@moio
Copy link
Contributor

moio commented Jul 18, 2022

Right now I would want to wait a week to see if homebrew accepts your PR; then we could pick this up in lima-and-qemu automatically without additional effort (as long as they stay with qemu 6.2 and not move to 7.0).

Patch was accepted in upstream qemu, and also accepted in upstream homebrew: Homebrew/homebrew-core#100645

Alas, homebrew moved to qemu 7.0 in the meantime :-(

@jandubois
Copy link
Member

Alas, homebrew moved to qemu 7.0 in the meantime :-(

I think we will move Rancher Desktop to the latest qemu release from homebrew for the next release, so it should all work out, wouldn't it?

@moio
Copy link
Contributor

moio commented Jul 18, 2022

I think we will move Rancher Desktop to the latest qemu release from homebrew for the next release, so it should all work out, wouldn't it?

Yes, I do not expect any problem from my perspective. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working platform/macos
Projects
No open projects
Stripey
To do
Development

No branches or pull requests

5 participants