Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki no longer runs on 32bit architecture #5388

Closed
slim-bean opened this issue Feb 14, 2022 · 19 comments
Closed

Loki no longer runs on 32bit architecture #5388

slim-bean opened this issue Feb 14, 2022 · 19 comments
Labels
keepalive An issue or PR that will be kept alive and never marked as stale. stale A stale issue or PR that will automatically be closed.

Comments

@slim-bean
Copy link
Collaborator

This is mostly for posterity because I don't think we will likely fix this. ☹️

panic: unaligned 64-bit atomic operation
goroutine 2314 [running]:
github.com/grafana/loki/pkg/util/server.onPanic({0x17de240, 0x1eb8cd8})
        /src/loki/pkg/util/server/recovery.go:41 +0x4c
github.com/grpc-ecosystem/go-grpc-middleware/recovery.WithRecoveryHandler.func1.1({0x1ef6a48, 0x3d7d308}, {0x17de240, 0x1eb8cd8})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/options.go:33 +0x2c
github.com/grpc-ecosystem/go-grpc-middleware/recovery.recoverFrom({0x1ef6a48, 0x3d7d308}, {0x17de240, 0x1eb8cd8}, 0x30a59a0)
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:61 +0x44
github.com/grpc-ecosystem/go-grpc-middleware/recovery.StreamServerInterceptor.func1.1(0x3d65b0a, {0x1f12ed4, 0x4660ee0}, 0x30a5998, 0x3d65b40)
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:47 +0x90
panic({0x17de240, 0x1eb8cd8})
        /usr/local/go/src/runtime/panic.go:1038 +0x23c
runtime/internal/atomic.panicUnaligned()
        /usr/local/go/src/runtime/internal/atomic/unaligned.go:8 +0x24
runtime/internal/atomic.Xadd64(0x3d3aa2c, 0x1)
        /usr/local/go/src/runtime/internal/atomic/atomic_arm.s:256 +0x14
github.com/grafana/loki/pkg/logqlmodel/stats.(*Context).AddHeadChunkLines(...)
        /src/loki/pkg/logqlmodel/stats/context.go:222
github.com/grafana/loki/pkg/chunkenc.(*unorderedHeadBlock).forEntries.func1(0x4639488)
        /src/loki/pkg/chunkenc/unordered.go:169 +0x54
github.com/grafana/loki/pkg/chunkenc.(*unorderedHeadBlock).forEntries(0x334b480, {0x1ef6a48, 0x3d7d368}, 0x1, 0x16d3bcf301c31540, 0x16d3c039327bb540, 0x3d64ff0)
        /src/loki/pkg/chunkenc/unordered.go:201 +0x220
github.com/grafana/loki/pkg/chunkenc.(*unorderedHeadBlock).Iterator(0x334b480, {0x1ef6a48, 0x3d7d368}, 0x1, 0x16d3bcf301c31540, 0x16d3c039327bb540, {0x1ef2c78, 0x4662538})
        /src/loki/pkg/chunkenc/unordered.go:225 +0xc4
github.com/grafana/loki/pkg/chunkenc.(*MemChunk).Iterator(0x3016e10, {0x1ef6a48, 0x3d7d368}, {0x30756540, 0xed99ca0f8, 0x0}, {0x30756540, 0xed99caf08, 0x0}, 0x1, ...)
        /src/loki/pkg/chunkenc/memchunk.go:808 +0x308
github.com/grafana/loki/pkg/ingester.(*stream).Iterator(0x30db600, {0x1ef6a48, 0x3d7d368}, 0x3d3a960, {0x30756540, 0xed99ca0f8, 0x0}, {0x30756540, 0xed99caf08, 0x0}, ...)
        /src/loki/pkg/ingester/stream.go:456 +0x268
github.com/grafana/loki/pkg/ingester.(*instance).Query.func1(0x30db600)
        /src/loki/pkg/ingester/instance.go:337 +0xfc
github.com/grafana/loki/pkg/ingester.(*instance).forMatchingStreams(0x365c3f0, {0x1ef6a48, 0x3d7d368}, {0x4662510, 0x1, 0x1}, 0x0, 0x3d65584)
        /src/loki/pkg/ingester/instance.go:571 +0x264
github.com/grafana/loki/pkg/ingester.(*instance).Query(0x365c3f0, {0x1ef6a48, 0x3d7d368}, {0x3d719a0})
        /src/loki/pkg/ingester/instance.go:332 +0x1c4
github.com/grafana/loki/pkg/ingester.(*Ingester).Query(0x31d7c00, 0x3d719a0, {0x1f135a8, 0x4662500})
        /src/loki/pkg/ingester/ingester.go:580 +0x130
github.com/grafana/loki/pkg/logproto._Querier_Query_Handler({0x1a7c860, 0x31d7c00}, {0x1f12ed4, 0x4660ee0})
        /src/loki/pkg/logproto/logproto.pb.go:3163 +0xdc
github.com/grpc-ecosystem/go-grpc-middleware/recovery.StreamServerInterceptor.func1({0x1a7c860, 0x31d7c00}, {0x1f12ed4, 0x4660ee0}, 0x4660e30, 0x1b5dc24)
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/recovery/interceptors.go:51 +0x98
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1({0x1a7c860, 0x31d7c00}, {0x1f12ed4, 0x4660ee0})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49 +0x4c
github.com/grafana/loki/pkg/util/fakeauth.glob..func3({0x1a7c860, 0x31d7c00}, {0x1f123e4, 0x4660ed0}, 0x4660e30, 0x4660e40)
        /src/loki/pkg/util/fakeauth/fake_auth.go:65 +0xe4
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1({0x1a7c860, 0x31d7c00}, {0x1f123e4, 0x4660ed0})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49 +0x4c
github.com/weaveworks/common/middleware.StreamServerInstrumentInterceptor.func1({0x1a7c860, 0x31d7c00}, {0x1f123e4, 0x4660ed0}, 0x4660e30, 0x4660e50)
        /src/loki/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:43 +0x54
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1({0x1a7c860, 0x31d7c00}, {0x1f123e4, 0x4660ed0})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49 +0x4c
github.com/opentracing-contrib/go-grpc.OpenTracingStreamServerInterceptor.func1({0x1a7c860, 0x31d7c00}, {0x1f125c4, 0x3d2dd50}, 0x4660e30, 0x4660e60)
        /src/loki/vendor/github.com/opentracing-contrib/go-grpc/server.go:114 +0x364
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1({0x1a7c860, 0x31d7c00}, {0x1f125c4, 0x3d2dd50})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49 +0x4c
github.com/weaveworks/common/middleware.GRPCServerLog.StreamServerInterceptor({{0x1f235dc, 0x30a5a40}, 0x0}, {0x1a7c860, 0x31d7c00}, {0x1f125c4, 0x3d2dd50}, 0x4660e30, 0x4660e70)
        /src/loki/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:49 +0x4c
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1({0x1a7c860, 0x31d7c00}, {0x1f125c4, 0x3d2dd50})
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:49 +0x4c
github.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1({0x1a7c860, 0x31d7c00}, {0x1f125c4, 0x3d2dd50}, 0x4660e30, 0x1b5dc24)
        /src/loki/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:58 +0xc4
google.golang.org/grpc.(*Server).processStreamingRPC(0x31a4fc0, {0x1f1fdb0, 0x3642500}, 0x3d61860, 0x39593b0, 0x2d37398, 0x0)
        /src/loki/vendor/google.golang.org/grpc/server.go:1539 +0xde4
google.golang.org/grpc.(*Server).handleStream(0x31a4fc0, {0x1f1fdb0, 0x3642500}, 0x3d61860, 0x0)
        /src/loki/vendor/google.golang.org/grpc/server.go:1612 +0xa44
google.golang.org/grpc.(*Server).serveStreams.func1.2(0x30bc8b0, 0x31a4fc0, {0x1f1fdb0, 0x3642500}, 0x3d61860)
        /src/loki/vendor/google.golang.org/grpc/server.go:923 +0x8c
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /src/loki/vendor/google.golang.org/grpc/server.go:921 +0x1b8

I think the problem starts here:

runtime/internal/atomic.Xadd64(0x3d3aa2c, 0x1)
        /usr/local/go/src/runtime/internal/atomic/atomic_arm.s:256 +0x14
github.com/grafana/loki/pkg/logqlmodel/stats.(*Context).AddHeadChunkLines(...)
        /src/loki/pkg/logqlmodel/stats/context.go:222

I don't totally understand what's happening here but this isn't the first time I've seen an offset problem in Loki trying to run on 32bit architecture.

The object being added, and the struct it's being added it to, don't seem to be the problem directly it's a 64bit variable in a struct where everything is 64bit.

I think the problem is this struct is included in another struct with other structs, one of which has a 32bit variable:

int32 totalReached = 1 [(gogoproto.jsontag) = "totalReached"];

If this is in fact the problem, I think there is a bigger problem. I don't know how to fix this without making a breaking protobuf change. We need basically to make that totalReached 64 bit or remove it and put a 64bit version of it at the end.

Pretty sure making it 64bit would be a breaking change, I think we would have to remove it and add a new var at the end which is 64bit. But I think for this to be not breaking it would need to be deprecated for a few versions and removed.

If I'm right and if this works, Is it worth it? This takes time, it's hard to test (I have one raspberry pi 3b+ i'm trying to run Loki on and it's probably gonna be running a 64bit OS in a few hours since finding this)

What else is gonna be broken, I'm updating this Pi after 2 years, I have no idea where else there may be problems and I'm sorry to say without someone else who would really want to champion this I think 32bit support has ended....

If there was a way to lint for this problem and we could scope the work to fix what's currently broken and most importantly be able to find new bugs when they are introduced, then I think we could continue support for 32bit.

@cyriltovena
Copy link
Contributor

We should be able to reserved the field and create a new one. Code wise it should be very quick.

@rfratto
Copy link
Member

rfratto commented Feb 15, 2022

👋 sorry for randomly dropping in :)

The problem does originate from the protobuf, but it might be helpful to distinguish it not as a protobuf problem, but rather a problem of the Go structs which are generated from the protobuf. (i.e., if gogoproto generated aligned structs with padding, you'd have no issue here).

There's a few things you can do:

  1. Like Cyril suggested, reserve the old field and add a new 64-bit one, as long as you're sure old code won't completely break before it updates to the new definitions.
  2. Changing the type from int32 to int64 may actually work since they're both varint encoded and the unmarshaler can detect overflows.
  3. Add an unused 32-bit padding field after totalReached so the generated Go structs are aligned (see example below)
  4. Add padding to the Go structs which contain these unaligned structs as members

Adding an unused padding field so the generated Go structs are aligned:

message Ingester {
  // Total ingester reached for this query.
  int32 totalReached = 1 [(gogoproto.jsontag) = "totalReached"];
  int32 unusedTotalReachedPadding = 6;
 
  // Total of chunks matched by the query from ingesters
  int64 totalChunksMatched = 2 [(gogoproto.jsontag) = "totalChunksMatched"];
  // Total of batches sent from ingesters.
  int64 totalBatches = 3 [(gogoproto.jsontag) = "totalBatches"];
  // Total lines sent by ingesters.
  int64 totalLinesSent = 4 [(gogoproto.jsontag) = "totalLinesSent"];

  Store store = 5 [(gogoproto.nullable) = false, (gogoproto.jsontag) = "store"];
}

@slim-bean
Copy link
Collaborator Author

Very interesting idea @rfratto in fact I wonder if it could be even easier to just reorder the structs without adding a padding variable?

I think however the bigger question is: is it even worth it anymore.

Without a way to lint for this problem it feels like we'll always be chasing our tails here and I wonder if it's just time to throw in the towel and say 32bit doesn't work anymore.

@rfratto
Copy link
Member

rfratto commented Feb 15, 2022

Without a way to lint for this problem it feels like we'll always be chasing our tails here

Yeah, it'd probably be hard to have a linting rule which isn't overaggressive but still catches most of the alignment problems that'd you'd care about

@stale
Copy link

stale bot commented Apr 17, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 17, 2022
@stale stale bot closed this as completed Apr 28, 2022
@MeikelLP
Copy link

Hey, I actually have this problem on my Raspberrypi 4. I'm using docker with the image 2.5.0-arm. Is there any way I can fix this?

@littleblacklb
Copy link

Hey, I actually have this problem on my Raspberrypi 4. I'm using docker with the image 2.5.0-arm. Is there any way I can fix this?

I tried the 2.4.0 on my Raspberry 3b+ and it works

@slim-bean
Copy link
Collaborator Author

The 3b+ and 4 both support 64bit operating systems and the Raspberry Pi OS 64bit OS is officially released now.

So as long as you run the 64bit OS you should have no problem running Loki on a 3b+ or 4.

@MeikelLP
Copy link

Any way to debug this? I explicitly tried 2.5.0-arm64 as well. No change.

@slim-bean
Copy link
Collaborator Author

Make sure your OS is 64bit:

pi@k04:~ $ uname -m
aarch64

(If it's 32bit it will say armv7l or armv6)

If it's not aarch64 I'm afraid you will have to re-install your OS making sure to use the 64bit distro

@MeikelLP
Copy link

user@host:~ $ uname -m
aarch64

@slim-bean
Copy link
Collaborator Author

I'm not sure then? that looks like the correct arch. It should just work even without specifying the arch on the image, docker will choose the correct architecture.

I know Loki runs on a pi4 with 64bit os. I have several running myself.

If you are seeing an 'unaligned 64bit operation' error that really feels like something is still 32bit... I know the arm chips will support running as 32 and 64 so maybe make sure the docker images downloaded are all cleared out? Maybe download the 64bit binary and try running it directly 69 narrow it down to a docker vs OS issue?

@MeikelLP
Copy link

I went into the container and executed the same command:

user@host:~ $ sudo docker exec -it bf7b659aedff /bin/sh
/ $ uname -m
aarch64

@frepkovsky
Copy link

I'm facing this problem when trying to run Loki 2.6.1 on 32-bit ARM architecture (armv7l) running on Synology NAS. If I understand this is not going to be fixed in future release, right? Will you officially obsolete 32-bit ARM platform?

@jakubzieba
Copy link

I'm facing this problem when trying to run Loki 2.6.1 on 32-bit ARM architecture (armv7l) running on Synology NAS. If I understand this is not going to be fixed in future release, right? Will you officially obsolete 32-bit ARM platform?

This is true ?

@abferm
Copy link
Contributor

abferm commented Aug 23, 2022

This is a particularly painful problem for me. I recently started using Loki to manage logs on a fleet of 32-bit arm devices, and ran into this problem when trying to upgrade my test devices from 2.4.2-arm to 2.6.1-arm.

In my personal experience developing golang applications for ARM that use protobuf, it is better to use a go defined type as much as possible throughout the application, and convert to the generated protobuf message before sending across the wire.

This allows more flexibility and control within the application. In this case it would allow for proper alignment for atomic operations by either padding or re-ordering the fields.

gogoproto has a face plugin that may be useful here, as it would generate the code to convert to the proto message as long as the getter interface matches. This would also help enforce that the generated code and go code stay in sync.

@gtaws
Copy link

gtaws commented Oct 7, 2022

does anyone have a comment on @adferm approach here? from the sounds of it, the code is somehow using protobof as a value holder on the stack instead of a transport? if so, from my past experience programming in protobuf, I agree as well.

@gtaws
Copy link

gtaws commented Oct 16, 2022

If anything, can you please remove the loki-linux-arm.zip (arm 32) builds as that is obviously broken now.

@thomas123xyz
Copy link

thanks for this information. I stumbled over the same on my raspberry pi 2 and went back to vesion 2.4.0 which is working.
I just used this to get to know grafana/loki/promtail.
if 32Bit will be supported in the future, some people would still use it and appreciate :)

thanks

@slim-bean slim-bean added the keepalive An issue or PR that will be kept alive and never marked as stale. label Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale. stale A stale issue or PR that will automatically be closed.
Projects
None yet
Development

No branches or pull requests

10 participants