Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable pnor access through /dev/mtd #114

Closed
cyrilbur-ibm opened this issue Nov 15, 2016 · 6 comments
Closed

Unreliable pnor access through /dev/mtd #114

cyrilbur-ibm opened this issue Nov 15, 2016 · 6 comments

Comments

@cyrilbur-ibm
Copy link

I've been doing some work which relies on /dev/mtd7 "pnor" and using it has been unreliable. I've essentially been wanting to put all of the pnor into ram or into another file.

Depending on the kernel version different things happen, with:

 uname -a  Linux palmetto 4.7.3-be2001133835be80ba4657232b823ba2daaac0d6 #1 Wed Sep 14 12:05:02 UTC 2016 armv5tejl GNU/Linux

My test code succeeds, once. The second time it reads all 0xff into the file.

With a recent rework by Cedric there's a nice oops:

./mtd_test /dev/mtd7
Using /dev/mtd7
mmaped /tmp/mtd_test.out to 0xb2f82000
Reading from /dev/mtd7 to /tmp/mtd_test.out (0xb2f82000) for 0x04000000
Unable to handle kernel paging request at virtual address e6000000
pgd = dd7f8000
[e6000000] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
CPU: 0 PID: 817 Comm: mtd_test Not tainted 4.7.10cyrilbf528b5ac8dc9148be599117b1db769079f58face #1
Hardware name: ASpeed SoC
task: dd7dac20 ti: dd7ce000 task.ti: dd7ce000
PC is at aspeed_smc_from_ahb+0x50/0x94
LR is at 0xddc00000
pc : [<c02d2fc0>]    lr : [<ddc00000>]    psr: 20000013
sp : dd7cfdf4  ip : e6000000  fp : 00000000
r10: 02000000  r9 : ddc00000  r8 : dd7cfe94
r7 : 00400000  r6 : 02000000  r5 : de512010  r4 : 00400000
r3 : 00400000  r2 : 00400000  r1 : e6000000  r0 : ddc00000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 0005317f  Table: 5d7f8000  DAC: 00000051
Process mtd_test (pid: 817, stack limit = 0xdd7ce190)
Stack: (0xdd7cfdf4 to 0xdd7d0000)
fde0:                                              c02d35cc de512038 c02d3524
fe00: de512038 00000000 02000000 00400000 00000000 c02d198c 00400000 dd7cfe94
fe20: ddc00000 00000000 de512038 dd7cff78 02000000 00000000 dd80be20 c02c9ee8
fe40: 00400000 dd7cfe94 ddc00000 00000000 02000000 de512038 b4f82000 dd80be20
fe60: ddc00000 02000000 00000000 c02cf4ac 00400000 dd7cfe94 ddc00000 00000100
fe80: 60000013 02000000 dd7ce000 c060200c 00000048 00000000 00400000 c012da74
fea0: 00000004 00000001 de793c00 c029b0e0 dd4e63c0 de6fdd80 c029f580 bd9d30d1
fec0: de76cdc0 c02cf2c4 04000000 c060200c b2f82000 dd7cff78 04000000 00000000
fee0: befe4cb4 c01a15bc dd767d10 dd767d24 dd767d20 bd9d30d1 00000002 000000ff
ff00: 04000000 de76cdc0 dd469dc8 00000001 00000007 00000000 dd4e63c0 00000000
ff20: 00000000 bd9d30d1 00000000 dd469b40 dd7cff78 bd9d30d1 dd469b40 04000000
ff40: 00000000 b2f82000 dd7cff78 c01a174c dd469b40 b2f82000 04000000 dd469b40
ff60: 00000000 c060200c dd469b40 b2f82000 04000000 c01a1b5c 02000000 00000000
ff80: 00000000 bd9d30d1 00000001 00010908 00000000 000104e4 00000003 c0102344
ffa0: dd7ce000 c01021a0 00010908 00000000 00000003 b2f82000 04000000 04000000
ffc0: 00010908 00000000 000104e4 00000003 00000000 00000000 47f00000 befe4cb4
ffe0: 00000000 befe4c6c 000107e8 47fd308c 60000010 00000003 5effd871 5effdc71
[<c02d2fc0>] (aspeed_smc_from_ahb) from [<c02d35cc>] (aspeed_smc_read+0xa8/0xd8)
[<c02d35cc>] (aspeed_smc_read) from [<c02d198c>] (spi_nor_read+0x74/0x98)
[<c02d198c>] (spi_nor_read) from [<c02c9ee8>] (mtd_read+0x64/0x9c)
[<c02c9ee8>] (mtd_read) from [<c02cf4ac>] (mtdchar_read+0x1e8/0x240)
[<c02cf4ac>] (mtdchar_read) from [<c01a15bc>] (__vfs_read+0x28/0x128)
[<c01a15bc>] (__vfs_read) from [<c01a174c>] (vfs_read+0x90/0xfc)
[<c01a174c>] (vfs_read) from [<c01a1b5c>] (SyS_read+0x4c/0x9c)
[<c01a1b5c>] (SyS_read) from [<c01021a0>] (ret_fast_syscall+0x0/0x38)
Code: e1510002 1a000005 e3a00000 e49df004 (e59cc000) 
---[ end trace fcfa3868667a9aed ]---
Segmentation fault

Note: I believe there was a v2 of Cedrics patches that were actually applied, I will retest with these.

Test code:

#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <mtd/mtd-abi.h>

int main(int argc, char *argv[])
{
	struct mtd_info_user mtd_info;
	int mtd_fd, tmp_fd, rc;
	char *tmp_ptr;

	if (argc != 2) {
		fprintf(stderr, "Usage: %s /dev/mtdx\n", argv[0]);
		return 1;
	}

	printf("Using %s\n", argv[1]);
	mtd_fd = open(argv[1], O_RDONLY, 0);
	if (mtd_fd == -1) {
		perror("Opening mtd");
		return 1;
	}

	rc = ioctl(mtd_fd, MEMGETINFO, &mtd_info);
	if (rc) {
		perror("MEMGETINFO ioctl");
		close(mtd_fd);
		return 1;
	}

	tmp_fd = open("/tmp/mtd_test.out", O_RDWR | O_CREAT | O_TRUNC, S_IRWXU | S_IRWXG | S_IRWXO);
	if (tmp_fd == -1) {
		perror("Opening tmp file");
		close(mtd_fd);
		return 1;
	}

	rc = fallocate(tmp_fd, 0, 0, mtd_info.size);
	if (rc == -1) {
		perror("fallocate");
		close(tmp_fd);
		close(mtd_fd);
		return 1;
	}

	tmp_ptr = mmap(NULL, mtd_info.size, PROT_READ | PROT_WRITE, MAP_SHARED, tmp_fd, 0);
	if (tmp_ptr == MAP_FAILED) {
		perror("mmaping tmp file");
		close(tmp_fd);
		close(mtd_fd);
		return 1;
	}
	printf("mmaped /tmp/mtd_test.out to %p\n", tmp_ptr);

	printf("Reading from %s to /tmp/mtd_test.out (%p) for 0x%08x\n", argv[1], tmp_ptr, mtd_info.size);
	rc = read(mtd_fd, tmp_ptr, mtd_info.size);
	if (rc == -1) {
		perror("read");
		munmap(tmp_ptr, mtd_info.size);
		close(tmp_fd);
		close(mtd_fd);
		return 1;
	}
	if (rc != mtd_info.size)
		fprintf(stderr, "Short read %d out of %d\n", rc, mtd_info.size);

	if (*tmp_ptr == 0xff)
		fprintf(stderr, "The first byte of the flash read was 0xff, this can't be good\n");

	printf("You can now inspect what was read from %s in /tmp/mtd_test.out\n", argv[1]);
	munmap(tmp_ptr, mtd_info.size);
	close(tmp_fd);
	close(mtd_fd);

	return 0;
}
@shenki
Copy link
Member

shenki commented Nov 15, 2016

I tested on a Palmetto with a platform a mx66l51235l (65536 Kbytes) and could not reproduce. The flash contents looked good to me.

4.7.10-0edfa645374197350f6e64e32ef142bb4ab41b31

@cyrilbur-ibm
Copy link
Author

Appologies for the delay!

I couldn't find that sha in my tree.

I've done testing against: 9e654d6

Reliability of /dev/mtd7 has greatly increased, I've still had it return all 0xff in my buffer while succeeding the read() call. This was detected by the mtd test program above. Unfortunately it is quite rare now and I don't seem to find a good way to reproduce.

I have tried:

  • Concurrent accesses to /dev/mtd7 (two mtd tests running - modified to not use the same output file)
  • Stressing the CPU from userspace while running mtd test
  • Booting the host (with my box changes)
  • reboot
  • reboot -f
  • running pflash (mtd access only) intermittently

Perhaps my flash chip is unreliable?

@legoater
Copy link

could you a run a "md5sum /dev/mtd7" in a loop ?

@legoater
Copy link

I have run some mtd-utils torture tools on the evb and the palmetto and could not reproduce.
Here are the binaries for OpenBMC if you want to give a try : http://kaod.org/openbmc/mtd-utils/

Unfortunately, I can not boot the palmetto for the moment. But that is a good test.

@legoater
Copy link

Is this still an issue ?

@shenki
Copy link
Member

shenki commented Feb 13, 2017

Lets close

@shenki shenki closed this as completed Feb 13, 2017
dkodihal pushed a commit to NVIDIA/linux that referenced this issue May 7, 2024
When enabling the CRTC after waking up from a power-saving mode, the
primary plane's framebuffer might be NULL, which leads to a stack trace
as shown below.

  [  632.624608] BUG: kernel NULL pointer dereference, address: 0000000000000048
  [  632.624631] #PF: supervisor read access in kernel mode
  [  632.624639] #PF: error_code(0x0000) - not-present page
  [  632.624647] PGD 0 P4D 0
  [  632.624654] Oops: 0000 [openbmc#1] SMP PTI
  [  632.624662] CPU: 0 PID: 2082 Comm: gnome-shell Tainted: G            E     5.4.0-rc7-1-default+ openbmc#114
  [  632.624673] Hardware name: Sun Microsystems SUN FIRE X2270 M2/SUN FIRE X2270 M2, BIOS 2.05    07/01/2010
  [  632.624689] RIP: 0010:ast_crtc_helper_atomic_enable+0x7d/0x680 [ast]
  [  632.624698] Code: 48 8b 80 e0 02 00 00 4c 8b 60 10 31 c0 f3 48 ab 48 8b 83 78 04 00 00 4c 89 ef 48 8d 70 18 e8 9a e9 55 ce 48 8b 83 78 04 00 00 <49> 8b 7c 24 48 4c 89 ea 4c 8d 44 24 28 48 8d 4c 24 20 48 8d 70 18
  [  632.624718] RSP: 0018:ffffbe9ec123fa40 EFLAGS: 00010246
  [  632.624726] RAX: ffff95a13cfd3400 RBX: ffff95a13cf32000 RCX: 0000000000000000
  [  632.624735] RDX: 0000000000000000 RSI: ffff95a13cfd34e8 RDI: ffffbe9ec123fb40
  [  632.624744] RBP: ffffbe9ec123fb80 R08: 0000000000000000 R09: 0000000000000003
  [  632.624753] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  [  632.624762] R13: ffffbe9ec123fa70 R14: ffff95a13beb7000 R15: ffff95a13cf32800
  [  632.624772] FS:  00007f6d2763e140(0000) GS:ffff95a134000000(0000) knlGS:0000000000000000
  [  632.624782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  632.624790] CR2: 0000000000000048 CR3: 00000001192f8004 CR4: 00000000000206f0
  [  632.624800] Call Trace:
  [  632.624811]  ? __lock_acquire+0x409/0x7c0
  [  632.624830]  drm_atomic_helper_commit_modeset_enables+0x1af/0x200
  [  632.624840]  drm_atomic_helper_commit_tail+0x32/0x70
  [  632.624849]  commit_tail+0xc7/0x110
  [  632.624857]  drm_atomic_helper_commit+0x121/0x130
  [  632.624867]  drm_atomic_connector_commit_dpms+0xd7/0x100
  [  632.624878]  set_property_atomic+0xaf/0x110
  [  632.624890]  drm_mode_obj_set_property_ioctl+0xbb/0x190
  [  632.624899]  ? drm_mode_obj_find_prop_id+0x40/0x40
  [  632.624909]  drm_ioctl_kernel+0x86/0xd0
  [  632.624918]  drm_ioctl+0x1e4/0x36b
  [  632.624925]  ? drm_mode_obj_find_prop_id+0x40/0x40
  [  632.624939]  do_vfs_ioctl+0x4bd/0x6e0
  [  632.624949]  ksys_ioctl+0x5e/0x90
  [  632.624957]  __x64_sys_ioctl+0x16/0x20
  [  632.624966]  do_syscall_64+0x5a/0x220
  [  632.624976]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [  632.624984] RIP: 0033:0x7f6d2b0de387
  [  632.624991] Code: 00 00 90 48 8b 05 f9 9a 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 9a 0c 00 f7 d8 64 89 01 48
  [  632.625011] RSP: 002b:00007fffb49def38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [  632.625021] RAX: ffffffffffffffda RBX: 00007fffb49def70 RCX: 00007f6d2b0de387
  [  632.625030] RDX: 00007fffb49def70 RSI: 00000000c01864ba RDI: 0000000000000009
  [  632.625040] RBP: 00000000c01864ba R08: 0000000000000000 R09: 00000000c0c0c0c0
  [  632.625049] R10: 0000000000000030 R11: 0000000000000246 R12: 000055bc367eb920
  [  632.625058] R13: 0000000000000009 R14: 0000000000000002 R15: 0000000000000000
  [  632.625071] Modules linked in: ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) scsi_transport_iscsi(E) dmi_sysfs(E) msr(E) xfs(E) intel_powerclamp(E) coretemp(E) k)
  [  632.625185] CR2: 0000000000000048

The STR is

	* start gdm and wait for it to switch off the display
	* wake up the display by pressing a key

CRTC modesetting depends on the new state of the CRTC and the primary
plane's framebuffer. The bugfix moves the modesetting code into the
CRTC's atomic_flush() function, where it is protected from the plane's
framebuffer being NULL.

The CRTC's atomic-enable function, which is the modesetting's original
location, still contains DPMS state handling. It's exactly the inverse
of the atomic-disable function.

v3:
	* protect modesetting from from fb == NULL
v2:
	* do an atomic check for plane
	* reject invisible primary planes

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes: b48e1b6 ("drm/ast: Add CRTC helpers for atomic modesetting")
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Y.C. Chen" <yc_chen@aspeedtech.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202111557.15176-2-tzimmermann@suse.de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants