Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ix0 no carrier #2591

Closed
abplfab opened this issue Jul 31, 2018 · 129 comments
Closed

ix0 no carrier #2591

abplfab opened this issue Jul 31, 2018 · 129 comments
Assignees
Labels
upstream Third party issue
Milestone

Comments

@abplfab
Copy link

abplfab commented Jul 31, 2018

After upgrading to opnsense 18.7 the ix NIC (attached with a DAC to a switch) reports "media: no carrier". Setting the media to fixed 10Gbase-Twinax doesn't help...

@fichtner
Copy link
Member

the joys of intel driver updates :(

run this and reboot

# opnsense-update -kr 18.1.11 -n "18.1\/dummy"

@fichtner
Copy link
Member

ps booting from kernel.old should also work via boot menu

@fichtner fichtner added the support Community support label Jul 31, 2018
@abplfab
Copy link
Author

abplfab commented Jul 31, 2018

Thanks

@fichtner
Copy link
Member

if this indeed works we need to take this to FreeBSD soon if 11.2 has the same defect

@abplfab
Copy link
Author

abplfab commented Jul 31, 2018

Doesn't help :(

@fichtner
Copy link
Member

makes no sense at all ?!

what's the output of:

 # uname -a

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

FreeBSD asterix2.lan.neratec.com 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11 21b4c8ea1d5(stable/18.7) amd64

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

Did the opnsense-update -kr 18.1.11 -n "18.1\/dummy again, now:
root@asterix2:~ # uname -a
FreeBSD asterix2.lan.neratec.com 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11 116e406d37f(stable/18.1) amd64
but still:
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 0c:c4:7a:97:ee:ee
hwaddr 0c:c4:7a:97:ee:ee
inet6 fe80::ec4:7aff:fe97:eeee%ix0 prefixlen 64 scopeid 0x1
inet6 2a02:aa08:e000:902::253 prefixlen 64
inet6 2a02:aa08:e000:902::10 prefixlen 64 vhid 18
inet 192.168.11.253 netmask 0xffffff00 broadcast 192.168.11.255
inet 192.168.11.10 netmask 0xffffff00 broadcast 192.168.11.255 vhid 11
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
carp: INIT vhid 11 advbase 1 advskew 100
carp: INIT vhid 18 advbase 1 advskew 100

@fichtner
Copy link
Member

fichtner commented Aug 1, 2018

ok, kernels are correctly replaced. I'm unsure how any other change would relate to the reported issue of the driver reporting no carrier anymore.

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

Would it make sense to try install new with 18.7 and import the config?
This system is the slave of a carp cluster, so currently no hurry...

@fichtner
Copy link
Member

fichtner commented Aug 1, 2018

going back to 18.1.13 would make more sense than trying again with 18.7 (18.1.6 config import + install and update back to 18.1.13). but the chances for ix0 finding its carrier is not more than 50%. it could be hardware related.

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

  • Live booting 18.1 -> no link
  • Booting a live linux -> no link
    Will go check hardware in the next hours (where's my 1. August Party?!)

@fichtner
Copy link
Member

fichtner commented Aug 1, 2018

Could it be that FreeBSD 11.2 ships with a new binary firmware blob that bricked your NIC? That's the only theory I have besides it's fully the hardware's fault.

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

No idea, the server has a second NIC (ix1), will try this one and report...

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

  • ix1 works
  • Completely removed power and reboot -> ix0 has link (FreeBSD asterix2.lan.neratec.com 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11 116e406d37f(stable/18.1) amd64)
  • Updated Kernel to 18.7 -> ix0 dead (FreeBSD asterix2.lan.neratec.com 11.1-RELEASE-p11 FreeBSD 11.1-RELEASE-p11 21b4c8ea1d5(stable/18.7) amd64)
  • Completely removed power and reboot -> ix0 still dead
  • Go back to Kernel 18.1.11 and remove power -> ix0 has link

So with ix NICs better not update to 18.7.

System
Supermicro SYS-5018D-FN8T

Mainboard
X10SDV-TP8F

Firmware versions
BIOS 1.3,
IPMI/BMC: 3.68
Redfish Version : 1.0.1

NIC
ix0@pci0:4:0:0: class=0x020000 card=0x15ac15d9 chip=0x15ac8086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = 'Ethernet Connection X552 10 GbE SFP+'
class = network
subclass = ethernet

CPU
Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz (8 cores)

@mimugmail
Copy link
Member

@fichtner I have two identical boxed like above configured but not in production, ping me the usual way when you need access.

@abplfab
Copy link
Author

abplfab commented Aug 1, 2018

FYI: a server reset or power off/on by ipmi/bmc doesn't "fix" the NIC hang. Had to remove power from the box.

@fichtner fichtner added upstream Third party issue and removed support Community support labels Aug 1, 2018
@fichtner fichtner self-assigned this Aug 1, 2018
@Tsuroerusu
Copy link

Tsuroerusu commented Aug 2, 2018

I am seeing something that could well be related to this issue. After upgrading my company setup (A HA setup of two firewalls) to 18.7, the result was that none of my VLAN interfaces on ix1 activate properly, however the non-VLAN ix0 works fine. On the main page, the VLAN interfaces are marked with red saying "Ethernet autoselect", and under System --> Interfaces --> Overview their status says "no carrier". All of my igb* interfaces come up without any issue. I have tried changing the VLAN configuration a bit hoping that a re-write of the configuration would solve it, but to no avail.

Update: I just tried booting kernel.old, as was suggested earlier in this thread, and just doing that has resolved the problem on both of my nodes! I did not have to do any actual power cycling, just booting into kernel.old did it.

@fichtner
Copy link
Member

fichtner commented Aug 2, 2018

@Tsuroerusu Can you try the 18.1.11 kernel as well?

# opnsense-update -kr 18.1.11 -n "18.1\/dummy"

(reboot)

@abplfab said you need to remove the power, otherwise the carrier will not come up on a quick reboot.

fichtner added a commit to opnsense/src that referenced this issue Aug 2, 2018
It was mentioned that several ix(4) devices are stuck in "no
carrier", see opnsense/core#2591

This reverts commit ae8c90c.
fichtner added a commit to opnsense/src that referenced this issue Aug 2, 2018
It was mentioned that several ix(4) devices are stuck in "no
carrier", see opnsense/core#2591

This reverts commit ae8c90c.
@fichtner
Copy link
Member

fichtner commented Aug 2, 2018

Relevant driver update was reverted and will be gone from 18.7.1. It's unclear if the issue exists in FreeBSD 11.2 but we'll find out soon (@mimugmail could you check this with your test system).

@fichtner fichtner added this to the 19.1 milestone Aug 2, 2018
@mimugmail
Copy link
Member

@fichtner Update is in progress .. need to figure out what exactly happens with and without VLANs.

@fichtner
Copy link
Member

fichtner commented Aug 2, 2018

@mimugmail thanks a lot!

@abplfab
Copy link
Author

abplfab commented Aug 2, 2018

Just started a live CD of FreeBSD 11.2 -> ix0 status: no carrier
grafik

@Tsuroerusu
Copy link

@fichtner I am using the hardware that I have, because it was working just fine with 18.1, I also said that I fully understand why developers cannot test my particular setup. The only thing I insisted on is that I am not using incompatible hardware as everything I am using is officially validated by Supermicro, WHICH WAS WORKING WITH 18.1, and thus my modest claim was that it was unreasonable for you guys to keep telling me that I just need to change my hardware and spend hundreds of euros on that.

Why do you keep putting words into my mouth? I never said anything about the "stance" of the OPNsense project, all I wanted to ask was about whether there were driver backports planned in 19.1, that was all!

I am not using this platform for anything! I simply asked a question about drivers, and then I ended up responding to the absurd accusations that you made against me, how is that unreasonable?

I fully accept that I am responsible for my hardware.

I am, frankly, a little bit shocked that this is how you choose to treat your users, who, 1. have reported a problem, 2. tried to help, and 3. just asked a question.

But do rest assured, I can promise you that I have no intention of participating in this discussion anymore, and I will not be reporting any problems in the future given that I simply got insulted and shown contempt for simply asking a question.

I wish you and everybody else a pleasant Sunday.
Good bye.

@fichtner
Copy link
Member

@Tsuroerusu Okay, listen. Tell me what you in very precise words want us to do and I'll objectively explain why that may or may not be feasible.

@Tsuroerusu
Copy link

@Tsuroerusu Okay, listen. Tell me what you in very precise words want us to do and I'll objectively explain why that may or may not be feasible.

This is the heart of the matter, Franco. I did not, and still do not actually want you to do anything, and that is why I have been so amazed (in the negative sense) by your responses today (specifically).

The ONLY thing that I requested was a simple "yes/no" answer to this question:
Will OPNsense 19.1 contain any backported Intel drivers?

@enoch85
Copy link

enoch85 commented Nov 18, 2018

@Tsuroerusu I understand that you're frustrated. Believe me, I'm too! But instead of arguing, why not test the latest 19.1 release to see if it works? That would help all users far more IMHO.

Maybe I can save a few bucks on a new cable. :D

@Tsuroerusu
Copy link

@Tsuroerusu I understand that you're frustrated. Believe me, I'm too! But instead of arguing, why not test the latest 19.1 release to see if it works? That would help all users far more IMHO.

Maybe I can save a few bucks on a new cable. :D

@enoch85 I actually agree with you on that, and that is why I was so disappointed that instead of a simple answer to the question I asked, I got absurd accusations thrown at me (not by you), which I then had to respond to.

Your suggestion of testing the 19.1 beta, I have no issue with considering that, it is a perfectly reasonable thing to ask of me. In fact, let me just go further and say, that the reason I asked about the drivers in 19.1 was precisely because I was interested in potentially testing it with my setup, however for that it would be useful to know which drivers I would actually be testing (Because earlier I got the vanilla FreeBSD 11.2 live media to work fine).

My setup is at production-level and because of that, I have to image it before testing things, and before spending an hour or two doing that, I simply needed some information as I have explained.

@cvbkf
Copy link

cvbkf commented Nov 18, 2018

Some update from me, i tried the current 19.1 beta image via live cd, and both ports of the intel x520-da2 are online now.

the driver version is the same as before:

[1] ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xc020-0xc03f mem 0xdfe80000-0xdfefffff,0xdff04000-0xdff07fff irq 22 at device 0.0 on pci4

I'll install the beta now, hopefully config from 18.7 can be imported.

edit: performance is really bad, iperf3 measures around 1.2 gbit/s while hitting massive iowait.

@fichtner
Copy link
Member

Will OPNsense 19.1 contain any backported Intel drivers?

The answer is no, but not in the sense that the backported drivers of 18.7 are not different from 19.1, because the backport is from FreeBSD 11.2 and 19.1 is based on stock HardenedBSD/FreeBSD 11.2 so the same drivers are included.

I was under the impression that this has been communicated clearly in several of places and I apologise if that wasn't actually the case.

@Tsuroerusu
Copy link

Will OPNsense 19.1 contain any backported Intel drivers?

The answer is no, but not in the sense that the backported drivers of 18.7 are not different from 19.1, because the backport is from FreeBSD 11.2 and 19.1 is based on stock HardenedBSD/FreeBSD 11.2 so the same drivers are included.

Thank you for answering my question. :)

I was under the impression that this has been communicated clearly in several of places and I apologise if that wasn't actually the case.

Earlier you linked to a forum post, which stated the following:
"The key difference is the operating system switch from FreeBSD 11.1 to HardenedBSD 11.2."

However, for me, that did not exclude the possibility of any potential driver backports from 11-STABLE or from Intel directly, so that was why I sought clarification from you regarding that point.

The reason why I was interested in this is that shortly after I started experiencing problems with 18.7, I tried booting up the FreeBSD 11.2 live USB media, and configured my ix interfaces with VLANs and that worked in the way I expected them to. So my thinking was that if 19.1 did not contain any modifications to the ixgbe driver compared to the one in FreeBSD 11.2, perhaps it would fix my problem.

I hope this makes sense from my perspective of being a user/sysadmin, and not developer, just trying to use pattern matching and logical inferences to try to find solutions.

@enoch85
Copy link

enoch85 commented Nov 29, 2018

The cable won't show up until 2018-12-13 , so I won't be able to test if it's a cabling issue or not until then.

Can someone else please confirm that 19.1 works?

@cvbkf
Copy link

cvbkf commented Nov 30, 2018

19.1 (installed from ISO) works for me without changing the cables.

interestingly, i switched back to 18.7 by doing a fresh install and found the following: running in live cd mode: ix0 and ix1 are working. the first boot of the fresh install from disk: ix0 & ix1 online. but if i restore a backup, or change the interface settings via GUI ix0 "dies" after the reboot with the ominous "no carrier" status. ix1 stays online.

@mimugmail
Copy link
Member

Can you make a diff of config.xml and your backup?

@enoch85
Copy link

enoch85 commented Dec 5, 2018

@cvbkf

Can you make a diff of config.xml and your backup?

Please

I will get the cable tomorrow, so I will test this weekend. :D

@enoch85
Copy link

enoch85 commented Dec 7, 2018

IT WORKS!

What I did:

  1. Replaced the cable and did a live boot with 18.7
    a) Got a warning that my WAN interface didn't work
    b) Renewed the IP --> Worked
    c) Noticed that I couldn't reach the servers sites from my WIFI net, but it was reachable from LTE (or outside of my own network)

  2. Figured I could live with not reaching the DNS for the sites I host (could just use network outside my own, or thought it might be a firewall rule issue) and went ahead and installed 18.7
    a) After installation I changed back to LibreSSL (which I had originally)
    b) Ran an update to reach 18.7.8

  3. Rebooted --> MISSION SUCCESS, everything now works as expected and I can reach my sites from my own network again. Everything regarding ix1 is "up" and so far I have no issues.

Conclusion
Maybe it was a combined error with my SFP+ cable and that something got fixed in the update to 18.7.6. Anyway, I'm happy again since I now can use the latest stable release. :D

@cvbkf
Copy link

cvbkf commented Jan 7, 2019

I am on 19.1-rc1 for a few weeks now, until today both ix0 and ix1 were fine. After a reboot ix0 didn't come online -> the good old "no carrier" problem is back to haunt me. I tried a few things, but for now ix0 stays dead.

But, i made an interesting discovery: If i reset the machine via reset button, the link comes up at 10 Gbit/s - then, the link stays online until "Configuring LAN interface..." at the OPNSense bootup, then it goes down until the next reboot.

Maybe there is some invalid config applied ? Are there any files or something i can provide ?

@Tsuroerusu
Copy link

I just upgraded one of my firewalls to 19.1.5 from 18.7 (but using "kernel.old", i.e. the kernel fra 18.1), which I had been running since August because of the issue I had with VLANs on my ix interfaces saying "no carrier", as described earlier in this thread. However, I am sad to say that the issue still persists despite the update to FreeBSD 11.2 in OPNsense, which is really depressing :-(

The strange thing for me is that, as I mentioned before, when 11.2 came out, I tried using the Live boot with my machines, and I could configure VLANs without any issue and run network traffic through them. Which makes this issue even more baffling to me.

And before anybody jumps in with this. Before I went to do the upgrade, the firewalls were working fine with the kernel from 18.1. So this is not a cabling issue, unless the newer Intel drivers have some change that in itself causes compatibility issues with the Supermicro cables that I am using.

At this point, I am millimeters from giving up and buying some different NIC card, and hoping that I will not face the same issue, because at this point I have been without security updates since August. Unfortunately, that will cost me 500 euros before I even know whether it will actually solve the problem.

@cvbkf
Copy link

cvbkf commented Apr 5, 2019

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread)
https://forum.opnsense.org/index.php?topic=11384.0;topicseen

@Tsuroerusu
Copy link

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread)
https://forum.opnsense.org/index.php?topic=11384.0;topicseen

Thanks for the suggestion, I appreciate that. Unfortunately, the newer driver has the same issue.
Are you using VLANs with that Mellanox card?

@cvbkf
Copy link

cvbkf commented Apr 5, 2019

yes, i do, but just only one (which lead to the "no carrier" problem on the intel x550)

@Tsuroerusu
Copy link

yes, i do, but just only one (which lead to the "no carrier" problem on the intel x550)

I must say, the ix driver sure is a wuss, 'ey? A single VLAN, in your case, and it melts down! It would be funny if it wasn't so annoying. I run something like 8 VLANs and I REALLY need them to work, so Mellanox it is for me then. Thanks for the recommendation, as I think I have found a good source for them! Hopefully this will solve my problem. :-)

@DesruX
Copy link

DesruX commented Apr 17, 2019

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread)
https://forum.opnsense.org/index.php?topic=11384.0;topicseen

Mention of no carrier bug - recommending the 3.3.6 driver
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235918

This may also be related
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221967

Talk of permanent allow unsupported SFP in driver, dated but similar principle.
https://sourceforge.net/p/e1000/mailman/message/28694855/

Another patch for ix driver:
http://www.grosbein.net/freebsd/patches/patch-if_ix.c

Would be great if no artificial restrictions where present in the driver and we only have to worry about actual hw compatibility.

@abplfab
Copy link
Author

abplfab commented Jan 31, 2020

With 20.1 the problem is back. "no carrier" on the 10Ge I/F. :(

@abplfab
Copy link
Author

abplfab commented Jan 31, 2020

Booting kernel.old (19.7) doesn't help.

@mimugmail
Copy link
Member

Sounds occasional, what happens when plugging of the cable and in again?

@abplfab
Copy link
Author

abplfab commented Jan 31, 2020

Doesn't help. Exactly the same behavior as in the beginning of this thread. Hardware unchanged, "only" updated to 20.1.

@mimugmail
Copy link
Member

ifconfig -vvvvvv please

It worked with 19.7.10?

@abplfab
Copy link
Author

abplfab commented Jan 31, 2020

Yes with 19.7.10 it worked.

ix0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 0c:c4:7a:97:ed:ce
hwaddr 0c:c4:7a:97:ed:ce
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: no carrier
plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail)
vendor: Panduit Corp. PN: PSF1PXA3MBLLN SN: 15219315U-0017B DATE: 2017-01-10
Class: (null)
Length: (null)
Tech: Passive Cable
Media: (null)
Speed: (null)

    SFF8472 DUMP (0xA0 0..127 range):
    03 04 21 00 00 00 00 00 04 00 00 00 67 00 00 00
    00 00 03 00 50 61 6E 64 75 69 74 20 43 6F 72 70
    2E 20 20 20 00 00 0F 9C 50 53 46 31 50 58 41 33
    4D 42 4C 4C 4E 20 20 20 34 20 20 20 01 00 00 F8
    00 00 00 00 31 35 32 31 39 33 31 35 55 2D 30 30
    31 37 42 20 31 37 30 31 31 30 00 00 00 00 00 71
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

@fichtner
Copy link
Member

19.7.x OS = 20.1.x OS with no modifications. It does not look like this is a software issue if it works sometimes, but not always.

@abplfab
Copy link
Author

abplfab commented Jan 31, 2020

19.7.x always stable. After upgrading to 20.1.x no chance to get it working.

@fichtner
Copy link
Member

fichtner commented Jan 31, 2020 via email

@AdSchellevis
Copy link
Member

other sfp modules (or cables) very often help fix these kind of issues in our experience, often unstable connections point to issues there. The connector contains the transceiver, which is responsible for the connection (some cards even check for specific firmware in the transceiver).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

10 participants