Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception(core dump) while running ib_send_bw #14

Open
UntaggedRui opened this issue Oct 7, 2019 · 2 comments
Open

Floating point exception(core dump) while running ib_send_bw #14

UntaggedRui opened this issue Oct 7, 2019 · 2 comments

Comments

@UntaggedRui
Copy link

UntaggedRui commented Oct 7, 2019

Problem

I'm reproducing freeflow on my own machine, and when I run ib_send_bw in two containers located at the same physical machine, Floating point exception(core dump) occured. The error is also in two containers located at two machines while these two machines can run ib_send_bw correctely.
I am strictly following the quick start of github. Here is my environment and how I run freeflow.

Environment

  • os: ubuntu 14.04.6 with Linux Kernel 4.4.0-142-generic
  • RMDA NIC:
05:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28
05:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28=
  • OFED version:MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64,
  • Docker version: Docker version 1.13.0, build 49bf474
  • weave: 2.5.2
  • gcc: gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0.

When searching my problems in google, I saw the reason for this error is because the gcc version is wrong. So I tried gcc version with gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0. However, it is still like this.

host IPs and virtual IP to host IP mapping

I have two machines, 192.168.2.203 and 192.168.2.206. They are connected by weave overlay.I have modied host IPs and virtual IP to host IP mapping in my code. In ffrouter.h#L76,

const char HOST_LIST[HOST_NUM][16] = {
    "192.168.2.13",
    "192.168.2.15"
};

In ffrouter.cpp#L215,

    this->vip_map["10.47.128.0"] = "192.168.2.203";
    this->vip_map["10.47.0.5"] = "192.168.2.206";

Implementation

At host 203, enter freeflow router container and excute ./router router1, and run a container named node1, whose ip is 10.47.128.0, and run ib_send_bw.
At host 206, enter freeflow router container and excute ./router router1, and run a container named node2, whose ip is 10.47.0.8, and run ib_send_bw 10.47.128.0.
Then the error happened.
In container which run ib_send_bw, log is
error
In container which run ib_send_bw 10.47.128.0, log is
error2
How should I solve it? And can you tell me your gcc version ?

@sglee0323
Copy link

sglee0323 commented Mar 5, 2021

I have the same problem...
It has been solved? Can I get the solution?

@wwwzrb
Copy link

wwwzrb commented Jun 8, 2022

Hi, we encountered a similar problem when trying to reproduce Freeflow with the MT27710 family [ConnectX-4 ]. Do you have any suggestions for this problem?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants