Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(hole-punch): Add support for arm64 architecture #311

Merged

Conversation

diegomrsantos
Copy link
Contributor

@diegomrsantos diegomrsantos commented Oct 17, 2023

Currently, only the x86-64 arch is supported when running tests locally.

@diegomrsantos
Copy link
Contributor Author

diegomrsantos commented Oct 17, 2023

The initial commit doesn't fix the tests inside the impl dir as they are being downloaded from archives, not sure how to handle that. Also not sure if the current approach to detect the host arch is the right choice. @thomaseizinger could you please advise? Thanks.

@thomaseizinger
Copy link
Contributor

Hmm, can we detect the architecture inside the Docker container instead? Docker already detects the host architecture for us and pulls the correct image, right?

I.e. should we run uname -a inside the docker container?

@diegomrsantos
Copy link
Contributor Author

I found this and seems the proper way of doing that. It's pretty late for me now, feel free to handle it if you have time. Otherwise, I'll look into it tomorrow.

@diegomrsantos diegomrsantos marked this pull request as draft October 17, 2023 21:48
@thomaseizinger
Copy link
Contributor

I found this and seems the proper way of doing that. It's pretty late for me now, feel free to handle it if you have time. Otherwise, I'll look into it tomorrow.

Just skimmed it and looks like a good approach to me. Would be great if you can handle this!

@diegomrsantos
Copy link
Contributor Author

@thomaseizinger Could you please take a look? Additionally, what commit hash should we use?

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

A few suggestions :)

hole-punch-interop/rust-relay/setRustTarget.sh Outdated Show resolved Hide resolved
hole-punch-interop/rust-relay/Dockerfile Outdated Show resolved Hide resolved
hole-punch-interop/rust-relay/Dockerfile Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

Additionally, what commit hash should we use?

I don't understand what you mean here.

@diegomrsantos diegomrsantos changed the title fix(hole-punch) support arm64 arch fix(hole-punch): Add support for arm64 architecture Oct 19, 2023
@diegomrsantos
Copy link
Contributor Author

Additionally, what commit hash should we use?

I don't understand what you mean here.

Sorry, should have been clearer. I meant the commitSha in test-plans/hole-punch-interop/impl/rust/v0.52/Makefile, but I get how it works now.

@diegomrsantos
Copy link
Contributor Author

It seems that If we are building and running on the same platform, you don't need to specify a --target option when building with Cargo, as Rust will default to the host system's architecture and operating system. The --target option is primarily used for cross-compilation scenarios where the build platform and the target platform are different.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this? It doesn't seem to be working :)

Plus, if we are no longer build for -musl, we don't need the musl-tools installed. I wonder if it still works on alpine then though. Shouldn't be a big deal though I think having it auto-select the ARCH is better this way.

@diegomrsantos
Copy link
Contributor Author

Sorry, I didn't run the test, thought if it was building the image without errors it'd work. When I run npm run test on my machine I see:

> libp2p hole-punch test@0.0.1 test
> ts-node src/*.test.ts && ts-node testplans.ts

Running 2 tests
Running test spec: rust-v0.52 x rust-v0.52 (quic)
Running test spec: rust-v0.52 x rust-v0.52 (tcp)
2 failures:
---------- rust-v0.52 x rust-v0.52 (quic) ---------- (1 / 2)

undefined
---------- rust-v0.52 x rust-v0.52 (tcp) ---------- (2 / 2)

undefined
Run complete

@thomaseizinger
Copy link
Contributor

We don't seem to be copying the relay binary correctly into the final layer. If you run docker build . in the rust-relay directory and try to run the image, it fails.

@diegomrsantos
Copy link
Contributor Author

The problem was the lack of glibc on the Alpine container. I managed to fix this part using Debian, but the test still fails and the logs don't say why when running locally. It could be because of the difference in the installed packages like bind-tools and iproute2-tc. Is Alpine essential? It seems the image we use in the building is based on Debian. This is the Dockerfile:

# syntax=docker/dockerfile:1.5-labs
FROM --platform=$BUILDPLATFORM rust:1.72.0 as builder

# Run with access to the target cache to speed up builds
WORKDIR /workspace
ADD . .

RUN --mount=type=cache,target=./target \
    --mount=type=cache,target=/usr/local/cargo/registry \
    cargo build --release --package relay && \
    mv ./target/release/relay /usr/local/bin/relay

FROM --platform=$TARGETPLATFORM debian:bookworm-slim
COPY --from=builder /usr/local/bin/relay /usr/bin/relay
RUN --mount=type=cache,target=/var/cache/apt apt-get update && apt-get install -y iproute2
ENV RUST_BACKTRACE=1
CMD ["/usr/bin/relay"]

@diegomrsantos
Copy link
Contributor Author

This is proving to be more complicated than I thought. I think my last commit is the simplest solution for now. I ran it locally and it seems to work, although the QUIC test seems flaky.

First run:

Running 2 tests
Running test spec: rust-v0.52 x rust-v0.52 (quic)
Running test spec: rust-v0.52 x rust-v0.52 (tcp)
Expected RTT of direct connection to be ~200ms but was 214ms
1 failures:
---------- rust-v0.52 x rust-v0.52 (quic) ---------- (1 / 1)

Second run:

Running 2 tests
Running test spec: rust-v0.52 x rust-v0.52 (quic)
Expected RTT of direct connection to be ~200ms but was 212ms
Running test spec: rust-v0.52 x rust-v0.52 (tcp)
Expected RTT of direct connection to be ~200ms but was 216ms
0 failures:
Run complete

@thomaseizinger
Copy link
Contributor

the QUIC test seems flaky.

Damn, I thought it was stable now. I'll look into it.

@diegomrsantos diegomrsantos marked this pull request as ready for review October 23, 2023 22:01
@thomaseizinger
Copy link
Contributor

Let me know if what I pushed builds for you locally. It seems that the QUIC test is actually broken so I'll look into that.

@diegomrsantos
Copy link
Contributor Author

It works and it's much cleaner, thanks!

@diegomrsantos
Copy link
Contributor Author

But it seems the TCP test can also be flaky:

> libp2p hole-punch test@0.0.1 test
> ts-node src/*.test.ts && ts-node testplans.ts

Running 2 tests
Running test spec: rust-v0.52 x rust-v0.52 (quic)
Expected RTT of direct connection to be ~200ms but was 207ms
Running test spec: rust-v0.52 x rust-v0.52 (tcp)
1 failures:
---------- rust-v0.52 x rust-v0.52 (tcp) ---------- (1 / 1)

undefined
Run complete

@diegomrsantos
Copy link
Contributor Author

@thomaseizinger the tests are still flaky for me locally. I'm using the last test-plan version and rust-libp2p def98ebb6fc4fcd6b7e796859c0408241c5ff90f. Is it worth it to have arm64 on the CI?

@thomaseizinger
Copy link
Contributor

@thomaseizinger the tests are still flaky for me locally. I'm using the last test-plan version and rust-libp2p def98ebb6fc4fcd6b7e796859c0408241c5ff90f. Is it worth it to have arm64 on the CI?

That is odd, I ran them literally about a hundred times on CI (in parallel too) and didn't get a single failure. Can you attach the logs (found in the runs directory).

@thomaseizinger thomaseizinger merged commit 5332a26 into libp2p:master Oct 26, 2023
1 check passed
@diegomrsantos
Copy link
Contributor Author

I've attached rust-v0_52_x_rust-v0_52__tcp_ stdout.log

@thomaseizinger
Copy link
Contributor

I've attached rust-v0_52_x_rust-v0_52__tcp_ stdout.log

I can find Successfully hole-punched to twice in there. Is this actually the log of a failed test?

@diegomrsantos
Copy link
Contributor Author

Yes. When I run I see this:

> libp2p hole-punch test@0.0.1 test
> ts-node src/*.test.ts && ts-node testplans.ts

Running 5 tests
Running test spec: nim-v1.1 x rust-v0.52 (tcp)
Running test spec: nim-v1.1 x nim-v1.1 (tcp)
Running test spec: rust-v0.52 x rust-v0.52 (quic)
Expected RTT of direct connection to be ~200ms but was 215ms
Running test spec: rust-v0.52 x rust-v0.52 (tcp)
Running test spec: rust-v0.52 x nim-v1.1 (tcp)
4 failures:
---------- nim-v1.1 x rust-v0.52 (tcp) ---------- (1 / 4)

Network nim-v1_1_x_rust-v0_52__tcp__internet  Creating
Network nim-v1_1_x_rust-v0_52__tcp__internet  Created
Network nim-v1_1_x_rust-v0_52__tcp__lan_dialer  Creating
Network nim-v1_1_x_rust-v0_52__tcp__lan_dialer  Created
Network nim-v1_1_x_rust-v0_52__tcp__lan_listener  Creating
Network nim-v1_1_x_rust-v0_52__tcp__lan_listener  Created
Container nim-v1_1_x_rust-v0_52__tcp_-redis-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-redis-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-listener_router-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-relay-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-dialer_router-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-relay-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-listener_router-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-listener-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-dialer_router-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-dialer-1  Creating
Container nim-v1_1_x_rust-v0_52__tcp_-dialer-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-listener-1  Created
Container nim-v1_1_x_rust-v0_52__tcp_-listener-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-dialer-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-dialer-1  Stopped
Container nim-v1_1_x_rust-v0_52__tcp_-dialer_router-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-listener-1  Stopped
Container nim-v1_1_x_rust-v0_52__tcp_-relay-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-listener_router-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-dialer_router-1  Stopped
Container nim-v1_1_x_rust-v0_52__tcp_-relay-1  Stopped
Container nim-v1_1_x_rust-v0_52__tcp_-listener_router-1  Stopped
Container nim-v1_1_x_rust-v0_52__tcp_-redis-1  Stopping
Container nim-v1_1_x_rust-v0_52__tcp_-redis-1  Stopped

---------- nim-v1.1 x nim-v1.1 (tcp) ---------- (2 / 4)

Network nim-v1_1_x_nim-v1_1__tcp__lan_dialer  Creating
Network nim-v1_1_x_nim-v1_1__tcp__lan_dialer  Created
Network nim-v1_1_x_nim-v1_1__tcp__lan_listener  Creating
Network nim-v1_1_x_nim-v1_1__tcp__lan_listener  Created
Network nim-v1_1_x_nim-v1_1__tcp__internet  Creating
Network nim-v1_1_x_nim-v1_1__tcp__internet  Created
Container nim-v1_1_x_nim-v1_1__tcp_-redis-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-redis-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-dialer_router-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-listener_router-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-relay-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-relay-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-dialer_router-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-dialer-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-listener_router-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-listener-1  Creating
Container nim-v1_1_x_nim-v1_1__tcp_-listener-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-dialer-1  Created
Container nim-v1_1_x_nim-v1_1__tcp_-dialer-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-listener-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-dialer-1  Stopped
Container nim-v1_1_x_nim-v1_1__tcp_-dialer_router-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-listener-1  Stopped
Container nim-v1_1_x_nim-v1_1__tcp_-relay-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-listener_router-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-relay-1  Stopped
Container nim-v1_1_x_nim-v1_1__tcp_-dialer_router-1  Stopped
Container nim-v1_1_x_nim-v1_1__tcp_-listener_router-1  Stopped
Container nim-v1_1_x_nim-v1_1__tcp_-redis-1  Stopping
Container nim-v1_1_x_nim-v1_1__tcp_-redis-1  Stopped

---------- rust-v0.52 x rust-v0.52 (tcp) ---------- (3 / 4)

undefined
---------- rust-v0.52 x nim-v1.1 (tcp) ---------- (4 / 4)

undefined
Run complete

@diegomrsantos
Copy link
Contributor Author

Could it be related to the logs Connection closed with error?

@thomaseizinger
Copy link
Contributor

I think the fact that this says undefined means no log was captured and you probably sent me an old one. Might be problem with the test runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants