Bluetooth TX Timeout Issue #66

Closed
ratherDashing opened this Issue Jul 24, 2012 · 36 comments

Projects

None yet
@ratherDashing

Seems to be an issue with the 3.1 kernel: https://bugs.archlinux.org/task/27449

Steps to recreate:

  • Run bluetooth-agent 0000
  • Try to pair phone with computer

Background information:

hci0: Type: BR/EDR Bus: USB
BD Address: 00:02:72:BB:E5:54 ACL MTU: 1022:8 SCO MTU: 121:3
UP RUNNING PSCAN
RX bytes:18134 acl:205 sco:0 events:632 errors:0
TX bytes:14192 acl:221 sco:0 commands:195 errors:0
Features: 0xff 0xfe 0x0d 0xfe 0x98 0x7f 0x79 0x87
Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
Link policy: RSWITCH HOLD SNIFF
Link mode: SLAVE ACCEPT
Name: 'raspberrypi-0'
Class: 0x420100
Service Classes: Networking, Telephony
Device Class: Computer, Uncategorized
HCI Version: 3.0 (0x5) Revision: 0x9999
LMP Version: 3.0 (0x5) Subversion: 0x9999
Manufacturer: Atheros Communications, Inc. (69)

Distro: Raspbian
Kernel: [ 0.000000] Linux version 3.1.9+ (dc4@dc4-arm-01) (gcc version 4.5.1 (Broadcom-2708) ) #168 PREEMPT Sat Jul 14 18:56:31 BST 2012

@popcornmix
Contributor

Well, latest firmware is now 3.2.27. Also the USB driver has been udpated. Can you test again?
You could also try adding:
dwc_otg.microframe_schedule=1
to cmdline.txt.

@gregd72002

I'm facing the same issue. Just tested it with the latest kernel:
Linux raspberrypi 3.2.27+ #60 PREEMPT Thu Aug 23 15:33:51 BST 2012 armv6l GNU/Linux

And the issues is still there.

It looks like we need kernel 3.3 to fix it (based on the archlinux bug report)
or the following patches:
http://article.gmane.org/gmane.linux.bluez.kernel/19950/
http://article.gmane.org/gmane.linux.bluez.kernel/19951/

@popcornmix
Contributor

@gregd72002
Have you tried applying the patches? Does it fix it?

@gregd72002

No, I have not. I do not have the environment just now to compile it.

@gregd72002

Tried it overnight and it works! However the patches do no apply cleanly.. Will try to issue a diff in a moment

@gregd72002

This is a consolidated patch that fixes the issue in the current kernel sources.

I also tested it myself by issuing some scan commands and all seems to be fine!

diff -rupN linux.old/include/net/bluetooth/hci_core.h linux.new/include/net/bluetooth/hci_core.h
--- linux.old/include/net/bluetooth/hci_core.h  2012-08-25 10:09:56.532713215 +0100
+++ linux.new/include/net/bluetooth/hci_core.h  2012-08-25 10:17:33.167036880 +0100
@@ -130,7 +130,7 @@ struct hci_dev {
    __u8        major_class;
    __u8        minor_class;
    __u8        features[8];
-   __u8        extfeatures[8];
+   __u8        host_features[8];
    __u8        commands[64];
    __u8        ssp_mode;
    __u8        hci_ver;
@@ -618,7 +618,7 @@ void hci_conn_del_sysfs(struct hci_conn
 #define lmp_le_capable(dev)        ((dev)->features[4] & LMP_LE)

 /* ----- Extended LMP capabilities ----- */
-#define lmp_host_le_capable(dev)   ((dev)->extfeatures[0] & LMP_HOST_LE)
+#define lmp_host_le_capable(dev)   ((dev)->host_features[0] & LMP_HOST_LE)

 /* ----- HCI protocols ----- */
 struct hci_proto {
diff -rupN linux.old/net/bluetooth/hci_event.c linux.new/net/bluetooth/hci_event.c
--- linux.old/net/bluetooth/hci_event.c 2012-08-25 10:11:25.409662054 +0100
+++ linux.new/net/bluetooth/hci_event.c 2012-08-25 10:18:06.035908486 +0100
@@ -695,7 +695,14 @@ static void hci_cc_read_local_ext_featur
    if (rp->status)
        return;

-   memcpy(hdev->extfeatures, rp->features, 8);
+   switch (rp->page) {
+   case 0:
+       memcpy(hdev->features, rp->features, 8);
+       break;
+   case 1:
+       memcpy(hdev->host_features, rp->features, 8);
+       break;
+   }

    hci_req_complete(hdev, HCI_OP_READ_LOCAL_EXT_FEATURES, rp->status);
 }
@gregd72002

Right, this does work partially on my brand-less dongle. The timeout issue now appears randomly. While I was able to pair one device once now the timeout issue is back. Not sure what it depends on.

Also, I tried to use a different bluetooth dongle (Belkin mini) and with this one nothing works at all..

I believe we really need kernel-3.3/3.4 in here... any timeframes for this?

@gregd72002

Did some debugging:

[  940.548222] Bluetooth: Generic Bluetooth USB driver ver 0.6
[  940.548424] hci_register_dev: c8594800 name  bus 1 owner bf296d08
[  940.557617] hci_register_sysfs: c8594800 name hci0 bus 1
[  940.560798] hci_power_on: hci0
[  940.560857] hci_dev_get: 0
[  940.560878] hci_dev_open: hci0 c8594800
[  940.560970] __hci_request: hci0 start
[  940.561014] hci_init_req: hci0 0
[  940.561054] hci_send_cmd: hci0 opcode 0xc03 plen 0
[  940.561112] hci_send_cmd: skb len 3
[  940.561181] hci_cmd_task: hci0 cmd 1
[  940.561248] hci_send_frame: hci0 type 1 len 3
[  940.561387] hci_send_cmd: hci0 opcode 0x1003 plen 0
[  940.561484] hci_send_cmd: skb len 3
[  940.561527] hci_send_cmd: hci0 opcode 0x1001 plen 0
[  940.561600] hci_cmd_task: hci0 cmd 0
[  940.561655] hci_send_cmd: skb len 3
[  940.561722] hci_cmd_task: hci0 cmd 0
[  940.561756] hci_send_cmd: hci0 opcode 0x1005 plen 0
[  940.561792] hci_send_cmd: skb len 3
[  940.561810] hci_send_cmd: hci0 opcode 0x1009 plen 0
[  940.561862] hci_cmd_task: hci0 cmd 0
[  940.561915] hci_send_cmd: skb len 3
[  940.561934] hci_send_cmd: hci0 opcode 0xc23 plen 0
[  940.561969] hci_cmd_task: hci0 cmd 0
[  940.562007] hci_send_cmd: skb len 3
[  940.562042] hci_send_cmd: hci0 opcode 0xc14 plen 0
[  940.562084] hci_cmd_task: hci0 cmd 0
[  940.562136] hci_send_cmd: skb len 3
[  940.562172] hci_send_cmd: hci0 opcode 0xc25 plen 0
[  940.562207] hci_cmd_task: hci0 cmd 0
[  940.562260] hci_send_cmd: skb len 3
[  940.562296] hci_send_cmd: hci0 opcode 0xc05 plen 1
[  940.562332] hci_cmd_task: hci0 cmd 0
[  940.562384] hci_send_cmd: skb len 4
[  940.562420] hci_send_cmd: hci0 opcode 0xc16 plen 2
[  940.562455] hci_cmd_task: hci0 cmd 0
[  940.562507] hci_send_cmd: skb len 5
[  940.562543] hci_send_cmd: hci0 opcode 0xc12 plen 7
[  940.562587] hci_cmd_task: hci0 cmd 0
[  940.562618] hci_send_cmd: skb len 10
[  940.562674] hci_cmd_task: hci0 cmd 0
[  940.562793] hci_sock_dev_event: hdev hci0 event 1
[  940.562861] hci_send_to_sock: hdev   (null) len 8
[  940.563467] usbcore: registered new interface driver btusb
[  940.861966] hci_rx_task: hci0
[  940.862040] hci_cc_reset: hci0 status 0x0
[  940.862079] hci_req_complete: hci0 command 0x0c03 result 0x00
[  940.862145] hci_cmd_task: hci0 cmd 1
[  940.862192] hci_send_frame: hci0 type 1 len 3
[  940.863942] hci_rx_task: hci0
[  940.864011] hci_cc_read_local_features: hci0 status 0x0
[  940.864056] hci_cc_read_local_features: hci0 features 0xffff8ffe9bff7983
[  940.864103] hci_cmd_task: hci0 cmd 1
[  940.864164] hci_send_frame: hci0 type 1 len 3
[  940.865963] hci_rx_task: hci0
[  940.866034] hci_cc_read_local_version: hci0 status 0x0
[  940.866075] hci_cc_read_local_version: hci0 manufacturer 15 hci ver 4:20868
[  940.866121] hci_send_cmd: hci0 opcode 0xc01 plen 8
[  940.866175] hci_send_cmd: skb len 11
[  940.866211] hci_send_cmd: hci0 opcode 0x1002 plen 0
[  940.866248] hci_send_cmd: skb len 3
[  940.866284] hci_send_cmd: hci0 opcode 0xc56 plen 1
[  940.866444] hci_send_cmd: skb len 4
[  940.866508] hci_send_cmd: hci0 opcode 0xc45 plen 1
[  940.866561] hci_send_cmd: skb len 4
[  940.866596] hci_send_cmd: hci0 opcode 0xc58 plen 0
[  940.866641] hci_send_cmd: skb len 3
[  940.866676] hci_send_cmd: hci0 opcode 0x1004 plen 1
[  940.866727] hci_send_cmd: skb len 4
[  940.866802] hci_cmd_task: hci0 cmd 1
[  940.866874] hci_send_frame: hci0 type 1 len 3
[  940.867920] hci_rx_task: hci0
[  940.867979] hci_cc_read_buffer_size: hci0 status 0x0
[  940.868024] hci_cc_read_buffer_size: hci0 acl mtu 1021:8 sco mtu 64:1
[  940.868066] hci_cmd_task: hci0 cmd 1
[  940.868117] hci_send_frame: hci0 type 1 len 3
[  940.868939] hci_rx_task: hci0
[  940.868994] hci_cc_read_bd_addr: hci0 status 0x0
[  940.869033] hci_req_complete: hci0 command 0x1009 result 0x00
[  940.869091] hci_cmd_task: hci0 cmd 1
[  940.869156] hci_send_frame: hci0 type 1 len 3
[  940.870930] hci_rx_task: hci0
[  940.870977] hci_cc_read_class_of_dev: hci0 status 0x0
[  940.871018] hci_cc_read_class_of_dev: hci0 class 0x000000
[  940.871075] hci_cmd_task: hci0 cmd 1
[  940.871138] hci_send_frame: hci0 type 1 len 3
[  940.876953] hci_rx_task: hci0
[  940.877025] hci_event_packet: hci0 event 0x69
[  940.877969] hci_rx_task: hci0
[  940.878035] hci_event_packet: hci0 event 0x0
[  940.878090] hci_event_packet: hci0 event 0x0
[  940.878134] hci_event_packet: hci0 event 0x0
[  940.878170] hci_event_packet: hci0 event 0x0
[  940.878208] hci_event_packet: hci0 event 0x0
[  940.878249] hci_event_packet: hci0 event 0x0
[  940.878285] hci_event_packet: hci0 event 0x0
[  940.878320] hci_event_packet: hci0 event 0x0
[  940.878964] hci_rx_task: hci0
[  940.879014] hci_event_packet: hci0 event 0x0
[  940.879054] hci_event_packet: hci0 event 0x0
[  940.879091] hci_event_packet: hci0 event 0x0
[  940.879129] hci_event_packet: hci0 event 0x0
[  940.879179] hci_event_packet: hci0 event 0x0
[  940.879215] hci_event_packet: hci0 event 0x0
[  940.879251] hci_event_packet: hci0 event 0x0
[  940.879287] hci_event_packet: hci0 event 0x0
[  940.879956] hci_rx_task: hci0
[  940.880020] hci_event_packet: hci0 event 0x0
[  940.880073] hci_event_packet: hci0 event 0x0
[  940.880118] hci_event_packet: hci0 event 0x0
[  940.880154] hci_event_packet: hci0 event 0x0
[  940.880191] hci_event_packet: hci0 event 0x0
[  940.880226] hci_event_packet: hci0 event 0x0
[  940.880268] hci_event_packet: hci0 event 0x0
[  940.880304] hci_event_packet: hci0 event 0x0
[  940.880964] hci_rx_task: hci0
[  940.881028] hci_event_packet: hci0 event 0x0
[  940.881081] hci_event_packet: hci0 event 0x0
[  940.881102] hci_event_packet: hci0 event 0x0
[  940.881154] hci_event_packet: hci0 event 0x0
[  940.881190] hci_event_packet: hci0 event 0x0
[  940.881226] hci_event_packet: hci0 event 0x0
[  940.881268] hci_event_packet: hci0 event 0x0
[  940.881303] hci_event_packet: hci0 event 0x0
[  940.883971] hci_rx_task: hci0
[  940.884040] hci_event_packet: hci0 event 0x0
[  940.884094] hci_event_packet: hci0 event 0x0
[  940.884139] hci_event_packet: hci0 event 0x0
[  940.884175] hci_event_packet: hci0 event 0x0
[  940.884211] hci_event_packet: hci0 event 0x0
[  940.884253] hci_event_packet: hci0 event 0x0
[  940.884289] hci_event_packet: hci0 event 0x0
[  940.884325] hci_event_packet: hci0 event 0x0
[  940.884958] hci_rx_task: hci0
[  940.885003] hci_event_packet: hci0 event 0x0
[  940.885042] hci_event_packet: hci0 event 0x0
[  940.885095] hci_event_packet: hci0 event 0x0
[  940.885123] hci_event_packet: hci0 event 0x0
[  940.885159] hci_event_packet: hci0 event 0x0
[  940.885195] hci_event_packet: hci0 event 0x0
[  940.885232] hci_event_packet: hci0 event 0x0
[  940.885269] hci_event_packet: hci0 event 0x0
[  940.885967] hci_rx_task: hci0
[  940.886028] hci_event_packet: hci0 event 0x0
[  940.886081] hci_event_packet: hci0 event 0x0
[  940.886135] hci_event_packet: hci0 event 0x0
[  940.886172] hci_event_packet: hci0 event 0x0
[  940.886208] hci_event_packet: hci0 event 0x0
[  940.886261] hci_event_packet: hci0 event 0x0
[  940.886298] hci_event_packet: hci0 event 0x0
[  940.886450] hci_event_packet: hci0 event 0x0
[  940.886966] hci_rx_task: hci0
[  940.887017] hci_event_packet: hci0 event 0x0
[  940.887057] hci_event_packet: hci0 event 0x0
[  940.887108] hci_event_packet: hci0 event 0x0
[  940.887150] hci_event_packet: hci0 event 0x0
[  940.887186] hci_event_packet: hci0 event 0x0
[  940.887221] hci_event_packet: hci0 event 0x0
[  940.887259] hci_event_packet: hci0 event 0x0
[  940.887295] hci_event_packet: hci0 event 0x0
[  940.887965] hci_rx_task: hci0
[  940.888015] hci_event_packet: hci0 event 0x0
[  940.888054] hci_event_packet: hci0 event 0x0
[  940.888092] hci_event_packet: hci0 event 0x0
[  940.888131] hci_event_packet: hci0 event 0x0
[  940.888166] hci_event_packet: hci0 event 0x0
[  940.888202] hci_event_packet: hci0 event 0x0
[  940.888244] hci_event_packet: hci0 event 0x0
[  940.888279] hci_event_packet: hci0 event 0x0
[  940.888961] hci_rx_task: hci0
[  940.889010] hci_event_packet: hci0 event 0x0
[  940.889049] hci_event_packet: hci0 event 0x0
[  940.889085] hci_event_packet: hci0 event 0x0
[  940.889127] hci_event_packet: hci0 event 0x0
[  940.889163] hci_event_packet: hci0 event 0x0
[  940.889198] hci_event_packet: hci0 event 0x0
[  940.889242] hci_event_packet: hci0 event 0x0
[  940.889283] hci_event_packet: hci0 event 0x0
[  940.889967] hci_rx_task: hci0
[  940.890014] hci_event_packet: hci0 event 0x0
[  940.890053] hci_event_packet: hci0 event 0x0
[  940.890104] hci_event_packet: hci0 event 0x0
[  940.890147] hci_event_packet: hci0 event 0x0
[  940.890181] hci_event_packet: hci0 event 0x0
[  940.890217] hci_event_packet: hci0 event 0x0
[  940.890254] hci_event_packet: hci0 event 0x0
[  941.866443] Bluetooth: hci0 command tx timeout
[  941.866522] hci_cmd_task: hci0 cmd 1
[  941.866585] hci_send_frame: hci0 type 1 len 3
[  942.866455] Bluetooth: hci0 command tx timeout
[  942.866564] hci_cmd_task: hci0 cmd 1
[  942.866638] hci_send_frame: hci0 type 1 len 4
[  943.866492] Bluetooth: hci0 command tx timeout
[  943.866569] hci_cmd_task: hci0 cmd 1
[  943.866635] hci_send_frame: hci0 type 1 len 5
[  943.868358] hci_rx_task: hci0
[  943.868426] hci_event_packet: hci0 event 0x0
[  943.868491] hci_conn_request_evt: hci0 bdaddr 00:00:CF:E4:00:16 type 0x0
[  943.868553] sco_connect_ind: hdev hci0, bdaddr 00:00:CF:E4:00:16
[  943.868613] hci_inquiry_cache_lookup: cache c8594bdc, 00:00:CF:E4:00:16
[  943.868673] hci_conn_add: hci0 dst 00:00:CF:E4:00:16
[  943.868789] hci_conn_init_sysfs: conn c8587200
[  943.868848] hci_send_cmd: hci0 opcode 0x429 plen 21
[  943.868915] hci_send_cmd: skb len 24
[  943.868969] hci_remote_version_evt: hci0
[  943.869010] hci_cmd_task: hci0 cmd 0
[  944.866530] Bluetooth: hci0 command tx timeout
[  944.866607] hci_cmd_task: hci0 cmd 1
[  944.866681] hci_send_frame: hci0 type 1 len 10
[  944.944545] hci_rx_task: hci0
[  944.944601] hci_cc_delete_stored_link_key: hci0 status 0x0
[  944.944641] hci_req_complete: hci0 command 0x0c12 result 0x00
[  944.944708] hci_cmd_task: hci0 cmd 1
[  944.944755] hci_send_frame: hci0 type 1 len 11
[  944.946609] hci_rx_task: hci0
[  944.946658] hci_cc_set_event_mask: hci0 status 0x0
[  944.946699] hci_req_complete: hci0 command 0x0c01 result 0x00
[  944.946818] hci_cmd_task: hci0 cmd 1
[  944.946878] hci_send_frame: hci0 type 1 len 3
[  945.316991] usb 1-1.3.2: reset full-speed USB device number 6 using dwc_otg
[  945.647063] usb 1-1.3.1: reset full-speed USB device number 5 using dwc_otg
[  945.857047] usb 1-1.3.1: reset full-speed USB device number 5 using dwc_otg
[  945.946586] Bluetooth: hci0 command tx timeout
[  945.946659] hci_cmd_task: hci0 cmd 1
[  945.946738] hci_send_frame: hci0 type 1 len 4
[  946.946611] Bluetooth: hci0 command tx timeout
[  946.946671] hci_cmd_task: hci0 cmd 1
[  946.946747] hci_send_frame: hci0 type 1 len 4
[  947.717055] usb 1-1.3.1: reset full-speed USB device number 5 using dwc_otg
[  947.937177] usb 1-1.3.1: reset full-speed USB device number 5 using dwc_otg
[  947.946651] Bluetooth: hci0 command tx timeout
[  947.946788] hci_cmd_task: hci0 cmd 1
[  947.946885] hci_send_frame: hci0 type 1 len 3
[  948.457167] usb 1-1.3.2: reset full-speed USB device number 6 using dwc_otg
[  948.946676] Bluetooth: hci0 command tx timeout
[  948.946749] hci_cmd_task: hci0 cmd 1
[  948.946814] hci_send_frame: hci0 type 1 len 4
[  948.948068] hci_rx_task: hci0
[  948.948135] hci_cc_read_local_commands: hci0 status 0x0
[  948.948190] hci_send_cmd: hci0 opcode 0x80f plen 2
[  948.948236] hci_send_cmd: skb len 5
[  948.948274] hci_req_complete: hci0 command 0x1002 result 0x00
[  948.948339] hci_event_packet: hci0 event 0x10
[  948.948392] hci_inquiry_complete_evt: hci0 status 1
[  948.948427] hci_req_complete: hci0 command 0x0401 result 0x01
[  948.948467] hci_conn_check_pending: hdev hci0
[  948.948519] hci_event_packet: hci0 event 0x0
[  948.948554] hci_event_packet: hci0 event 0x0
[  948.948617] hci_event_packet: hci0 event 0x0
[  948.948655] hci_cmd_task: hci0 cmd 1
[  948.948698] hci_send_frame: hci0 type 1 len 24
[  949.227131] usb 1-1.3.1: reset full-speed USB device number 5 using dwc_otg
[  949.946695] Bluetooth: hci0 command tx timeout
[  949.946786] hci_cmd_task: hci0 cmd 1
[  949.946837] hci_send_frame: hci0 type 1 len 5
[  950.556840] __hci_request: TIMEOUT
[  950.556893] __hci_request: hci0 end: err -110
[  950.946690] Bluetooth: hci0 command tx timeout
[  950.946786] hci_cmd_task: hci0 cmd 1

and one more run of the same case but it fails in a different stage?

[  164.541215] Bluetooth: Core ver 2.16
[  164.543562] NET: Registered protocol family 31
[  164.543591] Bluetooth: HCI device and connection manager initialized
[  164.543611] Bluetooth: HCI socket layer initialized
[  164.543625] Bluetooth: L2CAP socket layer initialized
[  164.543698] Bluetooth: SCO socket layer initialized
[  205.951115] Bluetooth: Generic Bluetooth USB driver ver 0.6
[  205.951335] bluetooth:hci_register_dev: c84da000 name  bus 1 owner bf239d08
[  205.951619] bluetooth:hci_register_sysfs: c84da000 name hci0 bus 1
[  205.954824] bluetooth:hci_power_on: hci0
[  205.954861] bluetooth:hci_dev_get: 0
[  205.954880] bluetooth:hci_dev_open: hci0 c84da000
[  205.954974] bluetooth:__hci_request: hci0 start
[  205.954997] bluetooth:hci_init_req: hci0 0
[  205.955047] bluetooth:hci_send_cmd: hci0 opcode 0xc03 plen 0
[  205.955114] bluetooth:hci_send_cmd: skb len 3
[  205.955191] bluetooth:hci_cmd_task: hci0 cmd 1
[  205.955252] bluetooth:hci_send_frame: hci0 type 1 len 3
[  205.955350] bluetooth:hci_send_cmd: hci0 opcode 0x1003 plen 0
[  205.955528] bluetooth:hci_send_cmd: skb len 3
[  205.955585] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.955621] bluetooth:hci_send_cmd: hci0 opcode 0x1001 plen 0
[  205.955660] bluetooth:hci_send_cmd: skb len 3
[  205.955698] bluetooth:hci_send_cmd: hci0 opcode 0x1005 plen 0
[  205.955735] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.955792] bluetooth:hci_send_cmd: skb len 3
[  205.955829] bluetooth:hci_send_cmd: hci0 opcode 0x1009 plen 0
[  205.955865] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.955907] bluetooth:hci_send_cmd: skb len 3
[  205.955943] bluetooth:hci_send_cmd: hci0 opcode 0xc23 plen 0
[  205.955979] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956054] bluetooth:hci_send_cmd: skb len 3
[  205.956093] bluetooth:hci_send_cmd: hci0 opcode 0xc14 plen 0
[  205.956130] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956170] bluetooth:hci_send_cmd: skb len 3
[  205.956208] bluetooth:hci_send_cmd: hci0 opcode 0xc25 plen 0
[  205.956243] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956289] bluetooth:hci_send_cmd: skb len 3
[  205.956326] bluetooth:hci_send_cmd: hci0 opcode 0xc05 plen 1
[  205.956362] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956428] bluetooth:hci_send_cmd: skb len 4
[  205.956466] bluetooth:hci_send_cmd: hci0 opcode 0xc16 plen 2
[  205.956502] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956542] bluetooth:hci_send_cmd: skb len 5
[  205.956579] bluetooth:hci_send_cmd: hci0 opcode 0xc12 plen 7
[  205.956614] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956661] bluetooth:hci_send_cmd: skb len 10
[  205.956735] bluetooth:hci_cmd_task: hci0 cmd 0
[  205.956839] bluetooth:hci_sock_dev_event: hdev hci0 event 1
[  205.956887] bluetooth:hci_send_to_sock: hdev   (null) len 8
[  205.957509] usbcore: registered new interface driver btusb
[  205.959127] bluetooth:hci_rx_task: hci0
[  205.959195] bluetooth:hci_cc_reset: hci0 status 0x0
[  205.959237] bluetooth:hci_req_complete: hci0 command 0x0c03 result 0x00
[  205.959304] bluetooth:hci_cmd_task: hci0 cmd 1
[  205.959350] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.950997] Bluetooth: hci0 command tx timeout
[  206.951085] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.951166] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.952235] bluetooth:hci_rx_task: hci0
[  206.952304] bluetooth:hci_cc_read_local_version: hci0 status 0x0
[  206.952367] bluetooth:hci_cc_read_local_version: hci0 manufacturer 15 hci ver 4:20868
[  206.952428] bluetooth:hci_send_cmd: hci0 opcode 0xc01 plen 8
[  206.952484] bluetooth:hci_send_cmd: skb len 11
[  206.952521] bluetooth:hci_send_cmd: hci0 opcode 0x1002 plen 0
[  206.952565] bluetooth:hci_send_cmd: skb len 3
[  206.952625] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.952692] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.955263] bluetooth:hci_rx_task: hci0
[  206.955335] bluetooth:hci_cc_read_buffer_size: hci0 status 0x0
[  206.955378] bluetooth:hci_cc_read_buffer_size: hci0 acl mtu 1021:8 sco mtu 64:1
[  206.955444] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.955491] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.957223] bluetooth:hci_rx_task: hci0
[  206.957301] bluetooth:hci_cc_read_bd_addr: hci0 status 0x0
[  206.957341] bluetooth:hci_req_complete: hci0 command 0x1009 result 0x00
[  206.957387] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.957455] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.959230] bluetooth:hci_rx_task: hci0
[  206.959301] bluetooth:hci_cc_read_class_of_dev: hci0 status 0x0
[  206.959340] bluetooth:hci_cc_read_class_of_dev: hci0 class 0x000000
[  206.959384] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.959447] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.976246] bluetooth:hci_rx_task: hci0
[  206.976320] bluetooth:hci_cc_read_local_name: hci0 status 0x0
[  206.976388] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.976454] bluetooth:hci_send_frame: hci0 type 1 len 3
[  206.978251] bluetooth:hci_rx_task: hci0
[  206.978316] bluetooth:hci_cc_read_voice_setting: hci0 status 0x0
[  206.978356] bluetooth:hci_cc_read_voice_setting: hci0 voice setting 0x0060
[  206.978411] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.978471] bluetooth:hci_send_frame: hci0 type 1 len 4
[  206.980220] bluetooth:hci_rx_task: hci0
[  206.980269] bluetooth:hci_cc_set_event_flt: hci0 status 0x0
[  206.980310] bluetooth:hci_req_complete: hci0 command 0x0c05 result 0x00
[  206.980368] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.980433] bluetooth:hci_send_frame: hci0 type 1 len 5
[  206.982259] bluetooth:hci_rx_task: hci0
[  206.982326] bluetooth:hci_cc_write_ca_timeout: hci0 status 0x0
[  206.982365] bluetooth:hci_req_complete: hci0 command 0x0c16 result 0x00
[  206.982420] bluetooth:hci_cmd_task: hci0 cmd 1
[  206.982480] bluetooth:hci_send_frame: hci0 type 1 len 10
[  207.060284] bluetooth:hci_rx_task: hci0
[  207.060357] bluetooth:hci_cc_delete_stored_link_key: hci0 status 0x0
[  207.060397] bluetooth:hci_req_complete: hci0 command 0x0c12 result 0x00
[  207.060485] bluetooth:hci_cmd_task: hci0 cmd 1
[  207.060552] bluetooth:hci_send_frame: hci0 type 1 len 11
[  207.062256] bluetooth:hci_rx_task: hci0
[  207.062330] bluetooth:hci_cc_set_event_mask: hci0 status 0x0
[  207.062373] bluetooth:hci_req_complete: hci0 command 0x0c01 result 0x00
[  207.062429] bluetooth:hci_cmd_task: hci0 cmd 1
[  207.062489] bluetooth:hci_send_frame: hci0 type 1 len 3
[  208.061037] Bluetooth: hci0 command tx timeout
[  208.061111] bluetooth:hci_cmd_task: hci0 cmd 1
[  215.951426] bluetooth:__hci_request: TIMEOUT
[  215.951479] bluetooth:__hci_request: hci0 end: err -110


So, what does it tell us? What is the next step?

@Diaoul
Diaoul commented Aug 27, 2012

@gregd72002: I am interested in how you did to enable that much logging, how did you achieve this?

@gregd72002

Dialoul: you need to compile kernel with CONFIG_DYNAMIC_DEBUG option (http://www.mjmwired.net/kernel/Documentation/dynamic-debug-howto.txt)

then once using the new kernel mount debugfs:

mount -t debugfs none /sys/kernel/debug/

and activate debug for items you interested in (refer to dynamic-debug-howto.txt for more info)):

echo -n "file hci_core.c +pf" > /sys/kernel/debug/dynamic_debug/control
echo -n "file hci_event.c +pf" > /sys/kernel/debug/dynamic_debug/control
echo -n "file mgmt.c +pf" > /sys/kernel/debug/dynamic_debug/control
echo -n "file btusb.c +pf" > /sys/kernel/debug/dynamic_debug/control
echo -n "module bluetooth +pf" > /sys/kernel/debug/dynamic_debug/control
echo -n "module btusb +pf" > /sys/kernel/debug/dynamic_debug/control

Then load the module you interested in and view the debug in 'dmesg'

@gregd72002

All my investigations are leading to an issue with USB subsystem and not bluetooth subsystem...

Cross referencing to a possible root cause: #19
raspberrypi/firmware#19

@gregd72002

Good progress!! Resolved!!

As thought - nothing to do with bluetooth subsystem.

Modify your cmdline.txt to have:

dwc_otg.microframe_schedule=1 dwc_otg.speed=1

my full cmdline.txt looks like:

dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline dwc_otg.microframe_schedule=1 dwc_otg.speed=1 rootwait

and everything works now as expected!

Just to share my findings:

  1. the issue with kernel 3.1.x and 3.2.x that is mentioned in the initial comment is to do with broken bluetooth dongles; this is ones that have issues with providing features lists (extfeatures).
  2. the above issue was also resolved in kernels >3.3; so I tried to incorporate kernel 3.5.4 bluetooth system into RPi kernel and this did not resolve my problem either
  3. the timeout issue I had was appearing randomly on a different occasions so nothing to do really with initiation
  4. thus, usb was suspected to be broken...
@popcornmix
Contributor

@gregd72002
Glad it's working. Are both microframe_schedule and speed required?
(in latest firmware microframe_schedule is actually on by default, but can be disabled with =0)

@gregd72002

@popcornmix
I just tried it and yes, indeed one just need 'speed' option, microframe_schedule is not required based on my last hour testing

@ratherDashing

dwc_otg.speed=1

my guess is that this pushes thing back to USB 1.1 speed. I have a USB hard drive also attached to the system so that would be an issue with me.

Is that the case?

@popcornmix
Contributor

Yes speed=1 will limit you to USB 1 speeds.

@gregd72002

@ratherDashing
I wouldn't be surprised to see USB1 being faster than USB2 on RPi as of USB2 having discussed issues, so they might affecting the speed...

@sanderjo
sanderjo commented Sep 3, 2012

When I add "dwc_otg.microframe_schedule=1 dwc_otg.speed=1" to /boot/cmdline.txt and reboot my raspi (3.1.9+, firmware newest version), the raspi boot gets into an endless loop, saying

ERROR::handle_hc_chhltd_intr_dma:2040: handle_hc_chhltd_intr_dma: Channel 0, DMA Mode -- ChHltd set, but reason for halting is unkown, hcint 0x00000002, intsts 0x06000021

Am I doing something wrong? Or is the entry only valid voor kernels above 3.1?

@popcornmix
Contributor

@sanderjo
microframe_schedule is not supported on 3.1.9. Can you update with rpi-update?

@sanderjo
sanderjo commented Sep 3, 2012

"sudo rpi-update" says
Your firmware is already up to date

But do you mean I should update the kernel? Can I do that with rpi-update?

FWIW: just adding "dwc_otg.speed=1" (so no microframe_schedule) to cmdline.txt also results in the loop with error messages.

@gregd72002

@sanderjo

sudo rm /boot/.firmware_version
sudo rpi-update

reboot and paste the output of:
uname -a

@sanderjo
sanderjo commented Sep 3, 2012

Ah, cool:

Linux raspbian-armhf-SJ 3.2.27+ #102 PREEMPT Sat Sep 1 01:00:50 BST 2012 armv6l GNU/Linux

That's good, isn't it?

However, adding "dwc_otg.speed=1" to /boot/cmdline.txt and rebooting, still results in the endless loop.

@gregd72002

@sanderjo

Unplug all USB devices from you RPi and check if it starts normally with speed=1 ..

@sanderjo
sanderjo commented Sep 3, 2012

OK, unplugging the USB devices (keyboard + mouse) before the boot and then booting makes the boot work. I can ssh into the Raspi. Good.

However, plugging in the USB devices later on (so: after the succesful boot), results in this:

  • keyboard and mouse not working (even no red light under the mouse)
  • ssh stops reacting, no network
  • nothing extra on tail -f /var/log/messages in the ssh session (probably because the network is gone?)
    So: connecting the USB device stops all USB stuff, including the ethernet-on-USB?

Anything else I can do?

Just to be sure, here's some info from my system:

pi@raspbian-armhf-SJ ~ $ cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 dwc_otg.speed=1  console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

pi@raspbian-armhf-SJ ~ $ uname -a
Linux raspbian-armhf-SJ 3.2.27+ #102 PREEMPT Sat Sep 1 01:00:50 BST 2012 armv6l GNU/Linux
pi@raspbian-armhf-SJ ~ $
@gregd72002

What is the sequence - do you firstly plug keyboard which works and then plug mouse and everything stops to work? What if you do it way around? Plug mouse - does it work? then plug keyboard does it still work?

My guess is power supply - get a better power supply and your problem should be gone

@sanderjo
sanderjo commented Sep 3, 2012

The Raspi already blocks with one device: keyboard or mouse.

I tried three different power supplies (one HTC, one Nokia, one Jabra), two different Raspi's, and so far no different response.

I removed the dwc_otg.speed=1 from cmdline.txt and the Raspi booted succesfully, with and without keyboard / mouse connected during boot.

I know a bad power supply can cause problems, but how can it cause the above problems?

pi@raspbian-armhf-SJ ~ $ lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp. 
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. 
Bus 001 Device 004: ID 046d:c00e Logitech, Inc. M-BJ58/M-BJ69 Optical Wheel Mouse
Bus 001 Device 005: ID 03f0:0024 Hewlett-Packard KU-0316 Keyboard
pi@raspbian-armhf-SJ ~ $
@popcornmix
Contributor

Can you measure the voltage on board:
http://elinux.org/R-Pi_Troubleshooting#Troubleshooting_power_problems

If you boot with no USB plugged in, does plugging in just the keyboard work? Does plugging in just the mouse work?

@sanderjo
sanderjo commented Sep 5, 2012

I measured the voltage:

  • 4.85 V with USB keyboard and mouse connected
  • 4.88 V with no USB keyboard and mouse connected
    So only a small voltage drop of 0.03V. Seems OK to me.

With a plain /boot/cmdline.txt, the Raspi works well with the USB keyboard and/or mouse:

  • booting with both plugged in works,
  • first booting and then connecting them works too

The problem appears as soon as I add "dwc_otg.speed=1" to /boot/cmdline.txt and reboot:

  • booting without USB mouse and/or keyboard goes well, ... until I plug in USB mouse or keyboard: my SSH session dies, nothing in my SSH-ed /var/log/messages or syslog (but: I can't check what's going on onside the Raspi ... maybe there is some message?) ... no ping, no ssh to the Raspi. Nothing on the physically connected screen. When I remove the keyboard, ping/ssh etc do NOT come back. So it looks like the USB (and thus ethernet-via-USB) is really dead. The Raspi itself is still running: still output on my physically connected screen, and the four LEDs are on.
  • booting with USB mouse and/or keyboard connected, results in a loop of the mentioned error message. Disconnecting the USB mouse/keyboard after booting does NOT end that loop. No response from the Raspi via ping or ssh.

So I really think "dwc_otg.speed=1" is the cause of the problem. Not my power supply.

PS: something strange: with "dwc_otg.speed=1" activated but no keyboard/mouse connected, the power is 4.85 V. After connecting the USB keyboard, the power goes up to 4.94 V. So, hypothesis: by connecting the USB keyboard the whole USB system turns off, causing a lower load on the power supply, causing a lower voltage drop.

I wish I had a powered USB hub to see if that makes a difference.

I still repeat my statement that "dwc_otg.speed=1" is a cause of the problem.

@sanderjo
sanderjo commented Sep 7, 2012

Some more tests:

I built a power supply with a 7805, with on the 7805's input side a big power supply, just to make sure the power supply to the Raspi is not the bottle neck.
All works well with with "dwc_otg.speed=1" with no USB connected. The current drawn by the Raspi is 0.37A - 0.40A (which happens to be the same current as charging my HTC smartphone). Everything is stable and works.

When I connect the USB keyboard, the current drops to 0.28A, and the Raspi immediately stops responding via SSH. So that proofs my earlier hypothesis "USB keyboard connected => Raspi's USB system turned off completely => lower current => lower voltage drop"
Needless to say the keyboard is not working.

So I rebooted without keyboard, removed "dwc_otg.speed=1" from /boot/cmdline.txt, rebooted, and - as expected - the keyboard is no problem: connecting it does not change the current (stays around 0.37A) so apparently a USB keyboard draws very little current, and dmesg shows:

[   57.612008] usb 1-1.2: new low-speed USB device number 4 using dwc_otg
[   57.721419] usb 1-1.2: New USB device found, idVendor=03f0, idProduct=0024
[   57.721460] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   57.721478] usb 1-1.2: Product: HP Basic USB Keyboard
[   57.721491] usb 1-1.2: Manufacturer: CHICONY
[   57.738476] input: CHICONY HP Basic USB Keyboard as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2:1.0/input/input0
[   57.738624] generic-usb 0003:03F0:0024.0001: input: USB HID v1.10 Keyboard [CHICONY HP Basic USB Keyboard] on usb-bcm2708_usb-1.2/input0

So:

  • putting in a bigger power supply does not take away the problem with "dwc_otg.speed=1" active
  • the current does NOT change when connecting a USB keyboard with "dwc_otg.speed=1" disabled

Tips welcome ...

@lmirel
lmirel commented Nov 5, 2012

I have a similar issue with 3.2.27+. After patching with the suggested changes, I don't have the error in dmegs but I still have the behavior.
With two different dongles I have:

  • either: Bluetooth: hci0 ACL packet for unknown connection handle 11
  • or: Bluetooth: hci0 link tx timeout

dwc_otg.speed=1 trick doesn't work: I can't use any USB devices after booting with this.
I have 2100 mA and 2300 mA powered hubs. The PI is powered by a standalone 750 mA power adapter.

@ghollingworth
Contributor

Does the same driver work anywhere else? Do we actually have any
evidence where the problem is, the error message looks more like a
protocol issue than a USB one...

Gordon

On 05/11/2012, lmirel notifications@github.com wrote:

I have a similar issue with 3.2.27+. After patching with the suggested
changes, I don't have the error in dmegs but I still have the behavior.
With two different dongles I have:

  • either: Bluetooth: hci0 ACL packet for unknown connection handle 11
  • or: Bluetooth: hci0 link tx timeout

dwc_otg.speed=1 trick doesn't work: I can't use any USB devices after
booting with this.
I have 2100 mA and 2300 mA powered hubs. The PI is powered by a standalone
750 mA power adapter.


Reply to this email directly or view it on GitHub:
#66 (comment)

Sent from my mobile device

@arcanon
arcanon commented Nov 23, 2012

I see the issue as well, as soon as I use "dwc_otg.speed=1", no usb works, on hub or not. eth0 also not.

It finds all the devices, but later trying to use them fails:
Nov 23 20:28:24 raspberrypi kernel: [ 2.495640] devtmpfs: mounted
Nov 23 20:28:24 raspberrypi kernel: [ 2.520432] Freeing init memory: 124K
Nov 23 20:28:24 raspberrypi kernel: [ 2.709922] usb 1-1: not running at top speed; connect to a high speed hub
Nov 23 20:28:24 raspberrypi kernel: [ 2.724092] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
Nov 23 20:28:24 raspberrypi kernel: [ 2.737462] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Nov 23 20:28:24 raspberrypi kernel: [ 2.752232] hub 1-1:1.0: USB hub found
Nov 23 20:28:24 raspberrypi kernel: [ 2.762840] hub 1-1:1.0: 3 ports detected
Nov 23 20:28:24 raspberrypi kernel: [ 3.050721] usb 1-1.1: new full-speed USB device number 3 using dwc_otg
Nov 23 20:28:24 raspberrypi kernel: [ 3.181731] usb 1-1.1: not running at top speed; connect to a high speed hub
Nov 23 20:28:24 raspberrypi kernel: [ 3.211813] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
Nov 23 20:28:24 raspberrypi kernel: [ 3.230682] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Nov 23 20:28:24 raspberrypi kernel: [ 3.249971] smsc95xx v1.0.4
Nov 23 20:28:24 raspberrypi kernel: [ 3.381568] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:63:92:a4
Nov 23 20:28:24 raspberrypi kernel: [ 3.490724] usb 1-1.2: new low-speed USB device number 4 using dwc_otg

... skip ...

Nov 23 20:28:24 raspberrypi kernel: [ 19.499570] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 20.499569] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 21.499560] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 21.929560] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000014
Nov 23 20:28:24 raspberrypi kernel: [ 21.929585] smsc95xx 1-1.1:1.0: eth0: Failed to write HW_CFG_LRST_ bit in HW_CFG
Nov 23 20:28:24 raspberrypi kernel: [ 22.499577] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 23.499571] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 23.499595] hub 1-1:1.0: Cannot enable port 2. Maybe the USB cable is bad?
Nov 23 20:28:24 raspberrypi kernel: [ 24.499571] hub 1-1:1.0: cannot disable port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 25.499566] hub 1-1:1.0: cannot reset port 2 (err = -110)
Nov 23 20:28:24 raspberrypi kernel: [ 26.499569] hub 1-1:1.0: cannot reset port 2 (err = -110)

uname -a
"Linux raspberrypi 3.2.27+ #285 PREEMPT Tue Nov 20 17:49:40 GMT 2012 armv6l GNU/Linux"

@cioban
cioban commented Nov 24, 2012

Hi all,

I think this problem may be the same as mine... http://raspberrypi.org/phpBB3/viewtopic.php?f=53&t=23022&p=220334
I'm trying to use bluetooth audio and my dmesg shows "Bluetooth: hci0 link tx timeout" and "Bluetooth: hci0 SCO packet for unknown connection handle" and somtimes I got an Kernel Panic... :(

I tested Raspbian and Arch Linux with default kernel...

@djmdjm
djmdjm commented Nov 26, 2012

dwc_otg.speed=1 fixes the problem for me, but the kernel patches do not. My device is:

Bus 001 Device 009: ID 0a5c:4500 Broadcom Corp. BCM2046B1 USB 2.0 Hub (part of BCM2046 Bluetooth)
Bus 001 Device 010: ID 0a5c:4502 Broadcom Corp. Keyboard (Boot Interface Subclass)
Bus 001 Device 011: ID 0a5c:4503 Broadcom Corp. Mouse (Boot Interface Subclass)
Bus 001 Device 012: ID 0a5c:2154 Broadcom Corp.

and my kernel is

Linux version 3.2.27+ (dc4@dc4-arm-01) (gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1+bzr2458 - Linaro GCC 2012.08) ) #250

I've had no problems with dwc_otg.speed=1, but my Pi does reboot when I plug or unplug my USB wifi. I'll see if adding a powered hub between the Pi and the bluetooth dongle makes a difference.

@dichro
dichro commented Feb 24, 2013

Blacklisting bnep, as reported at http://www.raspberrypi.org/phpBB3/viewtopic.php?t=9585&f=28, appears to solve the problem without needing to drop the USB bus speed.

EDIT: No, I take it back. The device comes up, but commands still time out. Still need to set dwc_otg.speed=1 for it to work. >_<

@swarren swarren pushed a commit to swarren/linux-rpi that referenced this issue Nov 22, 2013
Borislav Petkov + Ingo Molnar x86/boot: Further compress CPUs bootup message
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
a17bce4
@popcornmix popcornmix pushed a commit that referenced this issue Jul 1, 2014
@gregkh Jianguo Wu + gregkh ARM: 8037/1: mm: support big-endian page tables
commit 86f4062 upstream.

When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
	...
	set_pte_at(mm, address, page_table, entry);

	//debug code
	pfn = pte_pfn(entry);
	pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));

	//read out the pte just set
	new_pte = pte_offset_map(pmd, address);
	new_pfn = pte_pfn(*new_pte);
	pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
	...
}

pfn:   0x1fa4f5,     pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f	//new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
00d5951
@M1cha M1cha pushed a commit to M1cha/android_kernel_broadcom_rpi that referenced this issue Jul 4, 2014
@gregkh Jianguo Wu + gregkh ARM: 8037/1: mm: support big-endian page tables
commit 86f4062 upstream.

When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
	...
	set_pte_at(mm, address, page_table, entry);

	//debug code
	pfn = pte_pfn(entry);
	pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));

	//read out the pte just set
	new_pte = pte_offset_map(pmd, address);
	new_pfn = pte_pfn(*new_pte);
	pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
	...
}

pfn:   0x1fa4f5,     pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f	//new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1a73877
@popcornmix popcornmix pushed a commit that referenced this issue Jul 5, 2014
Jianguo Wu + Jiri Slaby ARM: 8037/1: mm: support big-endian page tables
commit 86f4062 upstream.

When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
	...
	set_pte_at(mm, address, page_table, entry);

	//debug code
	pfn = pte_pfn(entry);
	pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));

	//read out the pte just set
	new_pte = pte_offset_map(pmd, address);
	new_pfn = pte_pfn(*new_pte);
	pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
	...
}

pfn:   0x1fa4f5,     pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f	//new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
8f375cf
@popcornmix popcornmix pushed a commit that referenced this issue Aug 4, 2014
Jianguo Wu + Russell King ARM: 8037/1: mm: support big-endian page tables
When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
	...
	set_pte_at(mm, address, page_table, entry);

	//debug code
	pfn = pte_pfn(entry);
	pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));

	//read out the pte just set
	new_pte = pte_offset_map(pmd, address);
	new_pfn = pte_pfn(*new_pte);
	pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
	...
}

pfn:   0x1fa4f5,     pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f	//new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

CC stable <stable@vger.kernel.org>

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
86f4062
@popcornmix popcornmix pushed a commit that referenced this issue Nov 7, 2014
@gregkh Jianguo Wu + gregkh ARM: 8037/1: mm: support big-endian page tables
commit 86f4062 upstream.

When enable LPAE and big-endian in a hisilicon board, while specify
mem=384M mem=512M@7680M, will get bad page state:

Freeing unused kernel memory: 180K (c0466000 - c0493000)
BUG: Bad page state in process init  pfn:fa442
page:c7749840 count:0 mapcount:-1 mapping:  (null) index:0x0
page flags: 0x40000400(reserved)
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66
[<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14)
[<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104)
[<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c)
[<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0)
[<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8)
[<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120)
[<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354)
[<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90)
[<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40)

The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging,
I find in page fault handler, will get wrong pfn from pte just after set pte,
as follow:
do_anonymous_page()
{
	...
	set_pte_at(mm, address, page_table, entry);

	//debug code
	pfn = pte_pfn(entry);
	pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry));

	//read out the pte just set
	new_pte = pte_offset_map(pmd, address);
	new_pfn = pte_pfn(*new_pte);
	pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry));
	...
}

pfn:   0x1fa4f5,     pte:0xc00001fa4f575f
new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f	//new pfn/pte is wrong.

The bug is happened in cpu_v7_set_pte_ext(ptep, pte):
An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers.
On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB.
On a BE kernel, the assignment is reversed.

Unfortunately, the current code always assumes the LE case,
leading to corruption of the PTE when clearing/setting bits.

This patch fixes this issue much like it has been done already in the
cpu_v7_switch_mm case.

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
d4a06c5
@P33M
Contributor
P33M commented Jul 14, 2015

There have been no reported issues with bluetooth since fiq_fsm (USB rewrite) went into general usage.

@P33M P33M closed this Jul 14, 2015
@anholt anholt referenced this issue in anholt/linux Jan 28, 2016
@mmarcini @dledford mmarcini + dledford IB/qib: fix mcast detach when qp not attached
The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
09dc9cd
@popcornmix popcornmix pushed a commit that referenced this issue Feb 17, 2016
@mmarcini @sashalevin mmarcini + sashalevin IB/qib: fix mcast detach when qp not attached
[ Upstream commit 09dc9cd ]

The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
dfaff0e
@popcornmix popcornmix pushed a commit that referenced this issue Mar 4, 2016
@mmarcini @gregkh mmarcini + gregkh IB/qib: fix mcast detach when qp not attached
commit 09dc9cd upstream.

The code produces the following trace:

[1750924.419007] general protection fault: 0000 [#3] SMP
[1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
ffff88007af1c000
[1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
[1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
000000000000000f
[1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
6764697200000000
[1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
0000000000000000
[1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
ffff88007baa1d98
[1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
0000000000000000
[1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
knlGS:0000000000000000
[1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
00000000000007e0
[1750924.420364] Stack:
[1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[1750924.420364] Call Trace:
[1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
[1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
[1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[1750924.568035]  RSP <ffff88007af1dd70>
[1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]

The fix is to note the qib_mcast_qp that was found.   If none is found, then
return EINVAL indicating the error.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7bf68a0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment