Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[i2c_aspeed]: i2c_aspeed reports error when calling "i2c_smbus_read_block_data()" #58

Closed
adamliyi opened this issue Mar 3, 2016 · 12 comments
Assignees
Labels

Comments

@adamliyi
Copy link
Member

adamliyi commented Mar 3, 2016

obmc linux version:

root@barreleye:~# uname -a
Linux barreleye 4.3.6-openbmc-20160222-1 #1 Mon Feb 29 11:07:55 CST 2016 armv5tejl GNU/Linux

When run a short test like bellow:

 u8 block_buffer[I2C_SMBUS_BLOCK_MAX + 1];
 ret = i2c_smbus_read_block_data(client, 0x99, block_buffer);

i2c_aspeed driver will give error:

i2c_aspeed i2c-4: Invalid state (msg cfb1bcac, pending 8)?
@shenki
Copy link
Member

shenki commented Mar 4, 2016

Can you please reproduce this with dynamic debugging enabled for the i2c-aspeed driver, and paste the log here?

Something like:

cd /sys/kernel/debug/dynamic_debug
echo "file i2c-aspeed.c +p" > control

@adamliyi
Copy link
Member Author

adamliyi commented Mar 7, 2016

For "log" do you mean "dmesg"?
Looks the i2c_aspeed driver will generate too many log, and the error message will be overwrite.
How can I only produce debug log just for this test?

# echo adm1278 0x10 > /sys/class/i2c-adapter/i2c-4/new_device
# dmesg
[131123.600000] i2c_aspeed i2c-6: irq! status 0x00000001, cmd 0x0c430000
[131123.600000] i2c_aspeed i2c-6: ast_i2c_master_xfer_done xfer 0S
[131123.600000] i2c_aspeed i2c-6: ast_i2c_do_byte_xfer W-: addr 2e xfer 0, len 1
[131123.600000] i2c_aspeed i2c-6: issuing cmd: 2
[131123.600000] i2c_aspeed i2c-6: irq! status 0x00000001, cmd 0x0c430000
[131123.600000] i2c_aspeed i2c-6: ast_i2c_master_xfer_done xfer 0
[131123.600000] i2c_aspeed i2c-6: ast_i2c_do_byte_xfer R: addr 2e start, len 1
[131123.600000] i2c_aspeed i2c-6: issuing cmd: 3
[131123.600000] i2c_aspeed i2c-6: irq! status 0x00000001, cmd 0x0c410000
[131123.600000] i2c_aspeed i2c-6: ast_i2c_master_xfer_done xfer 0S
[131123.600000] i2c_aspeed i2c-6: ast_i2c_do_byte_xfer RT: addr 2e xfer 0, len 1
[131123.600000] i2c_aspeed i2c-6: issuing cmd: 38
[131123.600000] i2c_aspeed i2c-6: irq! status 0x00000014, cmd 0x0bdd0000
[131123.600000] i2c_aspeed i2c-6: ast_i2c_master_xfer_done xfer 0
[131123.610000] systemd-journald[515]: /dev/kmsg buffer overrun, some messages lost.

@shenki
Copy link
Member

shenki commented Mar 7, 2016

Okay, it looks like systemd might be stuffing us up. I'll try debugging this with a sane userspace. Can you please describe to me how to replicate the issue? Do I need a barreleye?

@adamliyi
Copy link
Member Author

adamliyi commented Mar 9, 2016

Thanks Milton. I increased kmesg buffer to 16MB and catch the log:

[    5.860000] i2c_aspeed i2c-4: i2c bus 4 registered, irq 224
[  627.500000] i2c_aspeed i2c-4: state[0], SCL[1], SDA[1], BUS[0]
[  627.500000] i2c_aspeed i2c-4: ast_i2c_do_byte_xfer W: addr 10 start, len 1
[  627.500000] i2c_aspeed i2c-4: issuing cmd: 3
[  627.500000] i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c430000
[  627.500000] i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0S
[  627.500000] i2c_aspeed i2c-4: ast_i2c_do_byte_xfer W-: addr 10 xfer 0, len 1
[  627.500000] i2c_aspeed i2c-4: issuing cmd: 2
[  627.500000] i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c430000
[  627.500000] i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0
[  627.510000] i2c_aspeed i2c-4: ast_i2c_do_byte_xfer R: addr 10 start, len 1
[  627.510000] i2c_aspeed i2c-4: issuing cmd: 3
[  627.510000] i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c410000
[  627.510000] i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0S
[  627.510000] i2c_aspeed i2c-4: ast_i2c_do_byte_xfer RT: addr 10 xfer 0, len 1
[  627.510000] i2c_aspeed i2c-4: issuing cmd: 38
[  627.510000] i2c_aspeed i2c-4: irq! status 0x00000004, cmd 0x135d0000
[  627.510000] i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0
[  627.510000] i2c_aspeed i2c-4: got rx len: 3
[  627.510000] i2c_aspeed i2c-4: ast_i2c_do_byte_xfer R-: addr 10 xfer 1, len 4
[  627.510000] i2c_aspeed i2c-4: issuing cmd: 8
[  627.510000] i2c_aspeed i2c-4: irq! status 0x00000010, cmd 0x0a060008
[  627.510000] i2c_aspeed i2c-4: Invalid state (msg ce833c94, pending 8)?
[  632.510000] i2c_aspeed i2c-4: controller timed out
[  632.530000] i2c i2c-4: new_device: Instantiated device adm1278 at 0x10

@adamliyi
Copy link
Member Author

adamliyi commented Mar 9, 2016

To reproduce the issue:
(Latest obmc at this moment has built adm1275 driver into kernel, so we have to use an earlier version):

  1. Kernel version:
root@barreleye:/sys/kernel/debug/dynamic_debug# uname -a
Linux barreleye 4.3.6-openbmc-20160222-1 #2 Thu Mar 3 16:31:12 CST 2016 armv5tejl GNU/Linux
  1. Need to test on Barreleye.
  2. Please use the modules in the adm1278.zip
    adm1278.zip
  3. insmod pmbus_core.ko
  4. insmod adm1275.ko
  5. echo adm1278 0x10 > /sys/class/i2c-adapter/i2c-4/new_device
    And you can see above error message in dmesg.
    (As Milton suggested, I add log_buf_len=16M in the kernel boot option. Otherwise, dmesg log will be overwrite.)

The error is triggered by code:

        ret = i2c_smbus_read_block_data(client, PMBUS_MFR_ID, block_buffer);
        if (ret < 0) {
                dev_err(&client->dev, "Failed to read Manufacturer ID\n");
                return ret;
        }

@shenki
Copy link
Member

shenki commented Mar 15, 2016

I've had a read of our driver, and it appears we should be okay with this. I'll take a closer look tomorrow.

@shenki
Copy link
Member

shenki commented Mar 16, 2016

Upstream has gained support for this device!

Here's the pull request for 4.6 that adds support for the adm1278:

https://lkml.org/lkml/2016/3/14/404

The commit is here:

https://git.kernel.org/cgit/linux/kernel/git/groeck/linux-staging.git/commit/?h=hwmon-for-linus-v4.6&id=709066acdd12c3312c94ebccc37630932e381949

@shenki
Copy link
Member

shenki commented Mar 16, 2016

I reproduced using our dev-4.4 kernel and the upstream driver:

i2c_aspeed i2c-4: state[0], SCL[1], SDA[1], BUS[0]
i2c_aspeed i2c-4: ast_i2c_do_byte_xfer W: addr 10 start, len 1
i2c_aspeed i2c-4: issuing cmd: 3
i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c430000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0S
i2c_aspeed i2c-4: ast_i2c_do_byte_xfer W-: addr 10 xfer 0, len 1
i2c_aspeed i2c-4: issuing cmd: 2
i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c430000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0 
i2c_aspeed i2c-4: ast_i2c_do_byte_xfer R: addr 10 start, len 1
i2c_aspeed i2c-4: issuing cmd: 3
i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c410000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0S
i2c_aspeed i2c-4: ast_i2c_do_byte_xfer RT: addr 10 xfer 0, len 1
i2c_aspeed i2c-4: issuing cmd: 38
i2c_aspeed i2c-4: irq! status 0x00000004, cmd 0x135d0000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0 
i2c_aspeed i2c-4: got rx len: 3
i2c_aspeed i2c-4: ast_i2c_do_byte_xfer R-: addr 10 xfer 1, len 4
i2c_aspeed i2c-4: issuing cmd: 8
i2c_aspeed i2c-4: irq! status 0x00000010, cmd 0x0a060008
i2c_aspeed i2c-4: Invalid state (msg cfaf3c94, pending 8)?
i2c_aspeed i2c-4: controller timed out
adm1275 4-0010: Failed to read Manufacturer ID
adm1275: probe of 4-0010 failed with error -5

@shenki
Copy link
Member

shenki commented Mar 16, 2016

We issue command AST_I2CD_M_RX_CMD (8).

irq status 0x00000010 (at offset 0x10, I2CD10: Interrupt Status Register):

Bit 4 is (WC) Normal Stop Condition Detection interrupt status.

irq cmd 0x0a060008 is I2CD14: Command/Status Register, bits set: 27, 25, 18, 17, 3.

                     SDA_OE: 0x0
                      SDA_O: 0x1
                      SCL_O: 0x0
                     SCL_OE: 0x1
 Transfer mode timing stage: 0x0
Transfer mode state machine: 0x0 [IDLE]
                Sampled SCL: 0x1
                Sampled SDA: 0x1
                   Bus busy: 0x0
      SDA_OE direct control: 0x0
       SDA_O direct control: 0x0
      SCL_OE direct control: 0x0
       SCL_O direct control: 0x0
         Enable bus recover: 0x0
         Enable slave alert: 0x0
 Enable receive data buffer: 0x0
Enable transmit data buffer: 0x0
                Master stop: 0x0
       Reveive command last: 0x0
             Master receive: 0x1
             Slave transmit: 0x0
            Master transmit: 0x0
               Master start: 0x0

@jk-ozlabs
Copy link
Member

OK, here's what I think is happening:

i2c_aspeed i2c-4: ast_i2c_do_byte_xfer R: addr 10 start, len 1
i2c_aspeed i2c-4: issuing cmd: 3
i2c_aspeed i2c-4: irq! status 0x00000001, cmd 0x0c410000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0S

The client has done a read from address 0x10, so we're sending a start along with the one-byte write to provide the address value. The actual command is transmit | start. That command completes sucessfully.

Next:

i2c_aspeed i2c-4: ast_i2c_do_byte_xfer RT: addr 10 xfer 0, len 1
i2c_aspeed i2c-4: issuing cmd: 38
i2c_aspeed i2c-4: irq! status 0x00000004, cmd 0x135d0000
i2c_aspeed i2c-4: ast_i2c_master_xfer_done xfer 0 
i2c_aspeed i2c-4: got rx len: 3

Here we're doing a read with the I2C_M_RECV_LEN flag (so a one-byte read to get the length, then a length-byte read to get the rest of the data). RT in the first line indicates that the client wants the driver to also send a stop after that read is done.

However, the command we've issued there is receive | last | stop. We shouldn't have set the stop or last bits there, as we expect the M_RECV_LEN flag to indicate that there are actually more bytes to read, once we've updated len in ast_i2c_master_xfer_done.

So, we should be suppressing the stop command if query_len is set. I think the most sensible way to do this would be in the is_last check in ast_i2c_do_byte_xfer.

@shenki
Copy link
Member

shenki commented Mar 16, 2016

Thanks Jeremy! I took your suggestion and this patch resulted in working transfers:

--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -471,7 +471,7 @@ static bool ast_i2c_do_byte_xfer(struct ast_i2c_bus *bus)

                if (bus->msg->flags & I2C_M_RD) {
                        cmd = AST_I2CD_M_RX_CMD;
-                       if (bus->send_stop && is_last)
+                       if (bus->send_stop && is_last && !bus->query_len)
                                cmd |= AST_I2CD_M_S_RX_CMD_LAST |
                                        AST_I2CD_M_STOP_CMD;

You also metioned we might need to ensure we do the right thing if the received length is zero. I'll think about that and send out a final patch tomorrow.

@shenki
Copy link
Member

shenki commented Mar 29, 2016

This is fixed as of c2c9856 in the dev-4.4 tree.

@shenki shenki closed this as completed Mar 29, 2016
shenki pushed a commit that referenced this issue Jun 5, 2017
commit 8ee9fdb upstream.

On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,

XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026

kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
DEBUG_PAGEALLOC
NUMA
pSeries
Modules linked in:
CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
task: c000000102606d80 task.stack: c0000001026b8000
NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
CR: 28004428  XER: 00000000
CFAR: c0000000004ef180 SOFTE: 1
GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
NIP [c0000000004ef798] .assfail+0x28/0x30
LR [c0000000004ef798] .assfail+0x28/0x30
Call Trace:
[c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
[c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
[c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
[c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
[c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
[c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
[c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
[c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
[c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
[c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
[c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
[c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
[c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
Instruction dump:
4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. This causes
xfs_ialloc_cluster_alignment() to return 0.  Due to this
args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
equivalent of -1 assigned to it. This later causes alloc_len in
xfs_alloc_space_available() to have a value of 0. In such a scenario
when args.total is also 0, the assert statement "ASSERT(args->maxlen >
0);" fails.

This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
eddiejames pushed a commit to eddiejames/linux that referenced this issue Jul 10, 2018
When the controller supports the Read LE Resolv List size feature, the
maximum list size are read and now stored.

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 17.979791
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 17.980629
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 17.980786
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 17.981627
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#59 [hci0] 17.981786
> HCI Event: Command Complete (0x0e) plen 12               openbmc#60 [hci0] 17.982636
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               openbmc#62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
eddiejames pushed a commit to eddiejames/linux that referenced this issue Jul 10, 2018
Check for Resolv list supported by controller. So check the supported
commmand first before issuing this command i.e.,HCI_OP_LE_CLEAR_RESOLV_LIST

Before patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 13.338168
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 13.338842
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 13.339029
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 13.339939
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 13.340152
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 13.340952
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#61 [hci0] 13.341180
> HCI Event: Command Complete (0x0e) plen 12               openbmc#62 [hci0] 13.341898
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

After patch:
< HCI Command: LE Read White List... (0x08|0x000f) plen 0  openbmc#55 [hci0] 28.919131
> HCI Event: Command Complete (0x0e) plen 5                openbmc#56 [hci0] 28.920016
      LE Read White List Size (0x08|0x000f) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear White List (0x08|0x0010) plen 0    openbmc#57 [hci0] 28.920164
> HCI Event: Command Complete (0x0e) plen 4                openbmc#58 [hci0] 28.920873
      LE Clear White List (0x08|0x0010) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Resolving L.. (0x08|0x002a) plen 0  openbmc#59 [hci0] 28.921109
> HCI Event: Command Complete (0x0e) plen 5                openbmc#60 [hci0] 28.922016
      LE Read Resolving List Size (0x08|0x002a) ncmd 1
        Status: Success (0x00)
        Size: 25
< HCI Command: LE Clear Resolving... (0x08|0x0029) plen 0  openbmc#61 [hci0] 28.922166
> HCI Event: Command Complete (0x0e) plen 4                openbmc#62 [hci0] 28.922872
      LE Clear Resolving List (0x08|0x0029) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Read Maximum Dat.. (0x08|0x002f) plen 0  openbmc#63 [hci0] 28.923117
> HCI Event: Command Complete (0x0e) plen 12               openbmc#64 [hci0] 28.924030
      LE Read Maximum Data Length (0x08|0x002f) ncmd 1
        Status: Success (0x00)
        Max TX octets: 251
        Max TX time: 17040
        Max RX octets: 251
        Max RX time: 17040

Signed-off-by: Ankit Navik <ankit.p.navik@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
shenki pushed a commit that referenced this issue Oct 7, 2019
…1 if 0.

[ Upstream commit aa9c218 ]

Syzbot reported divide error in vivid_thread_vid_cap, which has been
seen only once and does not have a reproducer.
This patch adds sanity checks for the
denominator value with WARN_ON if it is 0 and replaces it with 1.

divide error: 0000 [#1] PREEMPT SMP KASAN
kobject: 'tx-0' (0000000017161f7f): kobject_uevent_env
CPU: 0 PID: 23689 Comm: vivid-003-vid-c Not tainted 5.0.0-rc4+ #58
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:vivid_cap_update_frame_period
drivers/media/platform/vivid/vivid-kthread-cap.c:661 [inline]
RIP: 0010:vivid_thread_vid_cap+0x221/0x284
drivers/media/platform/vivid/vivid-kthread-cap.c:789
Code: 48 c1 e9 03 0f b6 0c 11 48 89 f2 48 69 c0 00 ca 9a 3b 83 c2 03 38
ca
7c 08 84 c9 0f 85 f0 1e 00 00 41 8b 8f 24 64 00 00 31 d2 <48> f7 f1 49
89
c4 48 89 c3 49 8d 87 28 64 00 00 48 89 c2 48 89 45
RSP: 0018:ffff88808b4afd68 EFLAGS: 00010246
kobject: 'tx-0' (0000000017161f7f): fill_kobj_path: path
= '/devices/virtual/net/gre0/queues/tx-0'
RAX: 000000de5a6f8e00 RBX: 0000000100047b22 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000004
RBP: ffff88808b4aff00 R08: ffff88804862e1c0 R09: ffffffff89997008
R10: ffffffff89997010 R11: 0000000000000001 R12: 00000000fffffffc
R13: ffff8880a17e0500 R14: ffff88803e40f760 R15: ffff8882182b0140
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004cdc90 CR3: 000000005d827000 CR4: 00000000001426f0
Call Trace:
kobject: 'gretap0' (00000000d7549098): kobject_add_internal: parent:
'net',
set: 'devices'
kobject: 'loop2' (0000000094ed4ee4): kobject_uevent_env
kobject: 'loop2' (0000000094ed4ee4): fill_kobj_path: path
= '/devices/virtual/block/loop2'
  kthread+0x357/0x430 kernel/kthread.c:246
kobject: 'gretap0' (00000000d7549098): kobject_uevent_env
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
kobject: 'gretap0' (00000000d7549098): fill_kobj_path: path
= '/devices/virtual/net/gretap0'
---[ end trace bc5c8b25b64d768f ]---
kobject: 'loop1' (0000000032036b86): kobject_uevent_env
RIP: 0010:vivid_cap_update_frame_period
drivers/media/platform/vivid/vivid-kthread-cap.c:661 [inline]
RIP: 0010:vivid_thread_vid_cap+0x221/0x2840
drivers/media/platform/vivid/vivid-kthread-cap.c:789
kobject: 'loop1' (0000000032036b86): fill_kobj_path: path
= '/devices/virtual/block/loop1'
Code: 48 c1 e9 03 0f b6 0c 11 48 89 f2 48 69 c0 00 ca 9a 3b 83 c2 03 38
ca
7c 08 84 c9 0f 85 f0 1e 00 00 41 8b 8f 24 64 00 00 31 d2 <48> f7 f1 49
89
c4 48 89 c3 49 8d 87 28 64 00 00 48 89 c2 48 89 45
kobject: 'loop0' (00000000dd9927c3): kobject_uevent_env
RSP: 0018:ffff88808b4afd68 EFLAGS: 00010246
RAX: 000000de5a6f8e00 RBX: 0000000100047b22 RCX: 0000000000000000
kobject: 'queues' (000000007ed20666): kobject_add_internal:
parent: 'gretap0', set: '<NULL>'
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000004
RBP: ffff88808b4aff00 R08: ffff88804862e1c0 R09: ffffffff89997008
kobject: 'loop0' (00000000dd9927c3): fill_kobj_path: path
= '/devices/virtual/block/loop0'
R10: ffffffff89997010 R11: 0000000000000001 R12: 00000000fffffffc
kobject: 'queues' (000000007ed20666): kobject_uevent_env
R13: ffff8880a17e0500 R14: ffff88803e40f760 R15: ffff8882182b0140
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000)
knlGS:0000000000000000
kobject: 'loop5' (00000000a41f9e79): kobject_uevent_env
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kobject: 'queues' (000000007ed20666): kobject_uevent_env: filter
function
caused the event to drop!
CR2: 00000000004cdc90 CR3: 000000005d827000 CR4: 00000000001426f0
kobject: 'loop5' (00000000a41f9e79): fill_kobj_path: path
= '/devices/virtual/block/loop5'

Reported-by: syz...@syzkaller.appspotmail.com
Signed-off-by: Vandana BN <bnvandana@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
shenki pushed a commit that referenced this issue Apr 14, 2022
[ Upstream commit 8d3ea3d ]

GCC12 appears to be much smarter about its dependency tracking and is
aware that the relaxed variants are just normal loads and stores and
this is causing problems like:

[  210.074549] ------------[ cut here ]------------
[  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
[  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
[  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
[  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
[  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
[  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
[  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
[  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  210.161358] pc : dev_watchdog+0x234/0x240
[  210.161364] lr : dev_watchdog+0x234/0x240
[  210.161368] sp : ffff8000080a3a40
[  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
[  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
[  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
[  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
[  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
[  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
[  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
[  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
[  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
[  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
[  210.269682] Call trace:
[  210.272133]  dev_watchdog+0x234/0x240
[  210.275811]  call_timer_fn+0x3c/0x15c
[  210.279489]  __run_timers.part.0+0x288/0x310
[  210.283777]  run_timer_softirq+0x48/0x80
[  210.287716]  __do_softirq+0x128/0x360
[  210.291392]  __irq_exit_rcu+0x138/0x140
[  210.295243]  irq_exit_rcu+0x1c/0x30
[  210.298745]  el1_interrupt+0x38/0x54
[  210.302334]  el1h_64_irq_handler+0x18/0x24
[  210.306445]  el1h_64_irq+0x7c/0x80
[  210.309857]  arch_cpu_idle+0x18/0x2c
[  210.313445]  default_idle_call+0x4c/0x140
[  210.317470]  cpuidle_idle_call+0x14c/0x1a0
[  210.321584]  do_idle+0xb0/0x100
[  210.324737]  cpu_startup_entry+0x30/0x8c
[  210.328675]  secondary_start_kernel+0xe4/0x110
[  210.333138]  __secondary_switched+0x94/0x98

The assumption when these were relaxed seems to be that device memory
would be mapped non reordering, and that other constructs
(spinlocks/etc) would provide the barriers to assure that packet data
and in memory rings/queues were ordered with respect to device
register reads/writes. This itself seems a bit sketchy, but the real
problem with GCC12 is that it is moving the actual reads/writes around
at will as though they were independent operations when in truth they
are not, but the compiler can't know that. When looking at the
assembly dumps for many of these routines its possible to see very
clean, but not strictly in program order operations occurring as the
compiler would be free to do if these weren't actually register
reads/write operations.

Its possible to suppress the timeout with a liberal bit of dma_mb()'s
sprinkled around but the device still seems unable to reliably
send/receive data. A better plan is to use the safer readl/writel
everywhere.

Since this partially reverts an older commit, which notes the use of
the relaxed variants for performance reasons. I would suggest that
any performance problems with this commit are targeted at relaxing only
the performance critical code paths after assuring proper barriers.

Fixes: 69d2ea9 ("net: bcmgenet: Use correct I/O accessors")
Reported-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Peter Robinson <pbrobinson@gmail.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220310045358.224350-1-jeremy.linton@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit that referenced this issue Mar 27, 2024
[ Upstream commit 32019c6 ]

When trying to use copy_from_kernel_nofault() to read vsyscall page
through a bpf program, the following oops was reported:

  BUG: unable to handle page fault for address: ffffffffff600000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
  ......
  Call Trace:
   <TASK>
   ? copy_from_kernel_nofault+0x6f/0x110
   bpf_probe_read_kernel+0x1d/0x50
   bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
   trace_call_bpf+0xc5/0x1c0
   perf_call_bpf_enter.isra.0+0x69/0xb0
   perf_syscall_enter+0x13e/0x200
   syscall_trace_enter+0x188/0x1c0
   do_syscall_64+0xb5/0xe0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   </TASK>
  ......
  ---[ end trace 0000000000000000 ]---

The oops is triggered when:

1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
page and invokes copy_from_kernel_nofault() which in turn calls
__get_user_asm().

2) Because the vsyscall page address is not readable from kernel space,
a page fault exception is triggered accordingly.

3) handle_page_fault() considers the vsyscall page address as a user
space address instead of a kernel space address. This results in the
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
due to SMAP.

Considering handle_page_fault() has already considered the vsyscall page
address as a userspace address, fix the problem by disallowing vsyscall
page read for copy_from_kernel_nofault().

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants