Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel Oops / General Protection Fault / NULL pointer dereference on write operations on samba smb:// mounts #2677

Open
2 tasks done
ermo opened this issue May 22, 2024 · 12 comments
Assignees
Labels
Bug Something isn't working Upstream This is an upstream issue.

Comments

@ermo
Copy link
Contributor

ermo commented May 22, 2024

Please confirm there isn't an existing open bug report

  • I have searched open bugs for this issue

Summary

Certain write operations to a mounted smb:// share in the Dolphin file manager appear to cause kernel BUG/Oops/GPF events.

Steps to reproduce

Server setup

/etc/samba/smb.conf repro_share configuration on server:
    [repro_share]
            comment = solus kernel oops repro_share
            path = /srv/repro_share
            public = no
            writeable = yes
            write list = your_user
            printable = no
            force group = users
            create mask = 0644
            directory mask = 2775
creation of repro_share on server filesystem
    sudo mkdir -pv /srv/repro_share
    sudo chown -c your_user:users /srv/repro_share
    # setting the setgid bit on repro_share keeps the rws permissions and group
    # ownership on newly created directories inside the share
    sudo chmod -c g+rws /srv/repro_share
    ls -ld /srv/repro_share
ensure samba picks up the new share definition
    sudo systemctl restart smb
    sudo smbstatus
    sudo smbclient -U your_user -L //localhost

Client Setup

mount point setup
sudo mkdir -pv /mnt/repro_share
/etc/fstab line on client
# please double check uid and gid for your user in, say, /etc/passwd or with ls -lan ${HOME}
//server/repro_share /mnt/repro_share cifs noauto,_netdev,username=your_user,uid=1000,gid=100,file_mode=00644,dir_mode=02755,vers=3.1.1,sec=ntlmssp  0   0
Mount the share
    sudo systemctl daemon-reload # systemctl will need to refresh its internal /etc/fstab representation
    sudo mount /mnt/repro_share

Repro steps after doing the above

  1. Install dbginfo for better debugging output:
    • sudo eopkg it {linux-current,glibc,kate,kio,kio-fuse,kio-extras,kio-admin}-dbginfo
  2. Open dolphin and navigate to the mounted /mnt/repro_share
  3. Right click in empty dir, select "Create New Text File" and save it as "test.txt"
  4. Double click on "test.txt" (should open in an editor -- this makes it hang for me with the above steps)
  5. If it doesn't hang, write the word "foo" and save the file (on my end, this has exhibited hangs)
  6. If the above didn't hang, close kwrite and delete the file to the Recycle Bin (on my end with an existing file, this has exhibited hangs)
  7. If you see a hang, open a terminal and type in sudo dmesg and check for a kernel oops
  8. Any attempt to create a new file on the SMB share afterwards fails and makes Dolphin or the App in which the file is created hang in my tests

kernel oops output from dmesg:

Expected result

Mount a read/write smb share and create/save/delete files as normal with no observed hangs or Kernel BUG/Oops/GPF events.

Actual result

Mount a read/write smb share and create/save/delete files, Dolphin hangs and the system experiences reproducible Kernel BUG/Oops/GPF events.

Environment

  • Is system up to date?

Repo

Unstable

Desktop Environment

Plasma

System details

Operating System: Solus 4.5
KDE Plasma Version: 6.0.4
KDE Frameworks Version: 6.2.0
Qt Version: 6.7.1
Kernel Version: 6.8.10-291.current (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 2700X Eight-Core Processor
Memory: 31,3 GiB of RAM
Graphics Processor: AMD Radeon RX 580 Series

Other comments

No response

@ermo ermo added the Bug Something isn't working label May 22, 2024
@ermo ermo changed the title kioworker appears to be causing a kernel panic on write operations on smb:// mounts? kioworker appears to be causing a kernel BUG/Oops on write operations on smb:// mounts? May 22, 2024
@TraceyC77
Copy link
Contributor

I'm not able to follow the reproduction steps. By default, mount.cifs is not setuid, so I need to use sudo to mount the samba share. Thus, my user is not able to create a file in the mounted directory.

Do you have anything set up that allows your normal user to use the mount command as written?
Or is there something else that might be missing from the steps?

Also wanted to note that after mounting a samba share, I didn't see any errors in dmesg.

@ermo
Copy link
Contributor Author

ermo commented May 23, 2024

I'm not able to follow the reproduction steps. By default, mount.cifs is not setuid, so I need to use sudo to mount the samba share. Thus, my user is not able to create a file in the mounted directory.

Do you have anything set up that allows your normal user to use the mount command as written? Or is there something else that might be missing from the steps?

I added a sudo prefix to the command.

Also wanted to note that after mounting a samba share, I didn't see any errors in dmesg.

The errors only show up after I attempt to write/delete on the mounted share; I have never seen things behave like this before, despite having used the exact same type of setup for literally as long as I've been maintaining Samba, because mounting shares like this gives higher throughput.

@TraceyC77
Copy link
Contributor

Thanks for the updated reproduction steps. I also got errors in dmesg after trying to edit the text file I created, but not an oops. I got a general protection fault. Attaching.

samba_kernel_gen_proteciton_fault.txt

@ermo ermo changed the title kioworker appears to be causing a kernel BUG/Oops on write operations on smb:// mounts? kioworker appears to be causing a kernel BUG/Oops/GPF on write operations on smb:// mounts? May 23, 2024
@TraceyC77
Copy link
Contributor

Today I can't even browse samba mounts with Dolphin. I see the error
"The process for the smb://htpc protocol died unexpectedly."

I see more general protection faults in the logs

@ReillyBrogan
Copy link
Contributor

Is this an issue with the LTS kernel as well?

@TraceyC77
Copy link
Contributor

I can't test on the XPS with the LTS kernel because Plasma won't start with that kernel
(I've tried removing all the optional kernel boot parameters one by one but no joy)
However, this may be related to something in the Plasma settings.

I'm able to browse samba shares in Dolphin on my Flex, running the same kernel and version of Plasma and version of Dolphin (both systems are on unstable)

Kernel 6.8.11-292.current
Dolphin 24.02.2

Also, I remembered about this upstream bug which was opened in March. I'm adding debug info to it. I'll keep this issue open so if other Solus users have the problem, and don't search the KDE bug tracker, hopefully they can find this issue.

@TraceyC77 TraceyC77 added the Upstream This is an upstream issue. label May 30, 2024
@Jezurko
Copy link

Jezurko commented Jun 8, 2024

Hey, I've been having similar issue as described by @TraceyC77, although for me it's a page fault caused by null pointer dereference.
The thing is, I'm not using KDE software and therefore it's not triggered by kioworker.
I am using caja as a file browser and gedit to edit a text file. I am able to perform some small writes, but if I try to add something bigger to the file (I tried few hundred characters) it triggers the bug.
I have also encountered this when using restic (which is how I discovered it). It was basically unable to write any files and just hangs infintely.

I have been using the same samba share for a long time and it suddenly stopped working a few weeks ago.

Not sure what's the correct place where should I report this bug, as I don't think it's KDE related. I can provide the systemd/dmesg log documenting the kernel error, if relevant.

@ermo
Copy link
Contributor Author

ermo commented Jun 11, 2024

Not sure what's the correct place where should I report this bug, as I don't think it's KDE related. I can provide the systemd/dmesg log documenting the kernel error, if relevant.

Please do!

And thanks for the report, +1 informative!

@TraceyC77
Copy link
Contributor

Agreed, please do share the system logs (as text pasted in a message please so it can be found with searches). That will help us pinpoint the root cause.

@TraceyC77 TraceyC77 changed the title kioworker appears to be causing a kernel BUG/Oops/GPF on write operations on smb:// mounts? kernel Oops / general proteciton fault / NULL pointer dereference on write operations on samba smb:// mounts Jun 11, 2024
@ermo ermo changed the title kernel Oops / general proteciton fault / NULL pointer dereference on write operations on samba smb:// mounts kernel Oops / general protection fault / NULL pointer dereference on write operations on samba smb:// mounts Jun 11, 2024
@ermo ermo changed the title kernel Oops / general protection fault / NULL pointer dereference on write operations on samba smb:// mounts kernel Oops / General Protection Fault / NULL pointer dereference on write operations on samba smb:// mounts Jun 11, 2024
@Jezurko
Copy link

Jezurko commented Jun 11, 2024

Okay, maybe I was a bit hasty with claiming it's a null ptr, but it is still an invalid address.
Interesting observation is, that this time I tried to write a single character into the file and it already triggered the bug.

I have two machines running Solus and both of them trigger this bug.
The dmesg log:

[   49.479752] general protection fault, probably for non-canonical address 0xe72aca85113b5427: 0000 [#1] PREEMPT SMP NOPTI
[   49.479762] CPU: 8 PID: 2652 Comm: pool-gedit Not tainted 6.8.11-292.current #1
[   49.479767] Hardware name: LENOVO 21AES02800/21AES02800, BIOS R1QET45W (1.31 ) 09/13/2023
[   49.479770] RIP: 0010:__fscache_use_cookie+0x23/0x380 [netfs]
[   49.479787] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 48 65 48 8b 04 25 28 00 00 00 48 89 44 24 40 31 c0 <48> 8b af 88 00 00 00 83 e5 01 0f 85 2f 02 00 00 4c 8d 67 14 48 89
[   49.479790] RSP: 0018:ffffc90002e7bc80 EFLAGS: 00010246
[   49.479794] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   49.479797] RDX: 0000000000000001 RSI: 0000000000000001 RDI: e72aca85113b539f
[   49.479799] RBP: ffff8881b4ff0000 R08: 0000000000000001 R09: ffffea0006bf5680
[   49.479801] R10: 0000000000000001 R11: ffffea0006bf5680 R12: e72aca85113b539f
[   49.479803] R13: ffff8881b4ff0088 R14: ffff88818a0f4c00 R15: 000000000000000c
[   49.479806] FS:  00007f0aaa0006c0(0000) GS:ffff8883dee00000(0000) knlGS:0000000000000000
[   49.479809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   49.479811] CR2: 00007f0aa000d028 CR3: 000000018b0be000 CR4: 0000000000f50ef0
[   49.479814] PKRU: 55555554
[   49.479816] Call Trace:
[   49.479820]  <TASK>
[   49.479823]  ? die_addr+0x32/0x80
[   49.479831]  ? exc_general_protection+0x24d/0x480
[   49.479840]  ? asm_exc_general_protection+0x22/0x30
[   49.479848]  ? __fscache_use_cookie+0x23/0x380 [netfs]
[   49.479863]  netfs_dirty_folio+0x84/0x90 [netfs]
[   49.479877]  cifs_write_end+0x203/0x480 [cifs]
[   49.479920]  generic_perform_write+0x120/0x230
[   49.479929]  cifs_strict_writev+0x25a/0x310 [cifs]
[   49.479966]  vfs_write+0x2a6/0x480
[   49.479975]  ksys_write+0x6b/0xf0
[   49.479980]  do_syscall_64+0x5f/0x180
[   49.479985]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   49.479989] RIP: 0033:0x7f0ab693477f
[   49.480025] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 08 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 08 f9 ff 48
[   49.480027] RSP: 002b:00007f0aa9fffb50 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[   49.480031] RAX: ffffffffffffffda RBX: 00007f0aa9fffbd0 RCX: 00007f0ab693477f
[   49.480033] RDX: 000000000000000c RSI: 000055cf174ca760 RDI: 000000000000000e
[   49.480036] RBP: 00007f0aa9fffbc0 R08: 0000000000000000 R09: 000000000000000c
[   49.480038] R10: 00007f0a880008e0 R11: 0000000000000293 R12: 0000000000000000
[   49.480040] R13: 000000000000000c R14: 000055cf174ca760 R15: 00007f0a88009ac0
[   49.480046]  </TASK>
[   49.480048] Modules linked in: nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc netfs rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr tps6598x intel_rapl_common edac_mce_amd kvm_amd qrtr snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir kvm snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_ctl_led irqbypass iwlmvm snd_sof snd_hda_codec_realtek snd_sof_utils snd_hda_codec_generic snd_soc_core rapl mac80211 snd_compress snd_hda_codec_hdmi ac97_bus btusb snd_pcm_dmaengine btrtl uvcvideo snd_pci_ps btintel snd_rpl_pci_acp6x uvc snd_hda_intel ptp videobuf2_vmalloc btbcm think_lmi snd_acp_pci videobuf2_memops snd_intel_dspcfg pps_core btmtk libarc4 firmware_attributes_class wmi_bmof videobuf2_v4l2 snd_acp_legacy_common snd_intel_sdw_acpi k10temp snd_pci_acp6x snd_pci_acp5x iwlwifi videodev snd_hda_codec bluetooth snd_rn_pci_acp3x r8169 snd_acp_config snd_hda_core
[   49.480142]  videobuf2_common snd_soc_acpi ecdh_generic realtek mdio_devres mc i2c_piix4 ecc snd_pci_acp3x snd_hwdep snd_pcm thinkpad_acpi ledtrig_audio cfg80211 libphy platform_profile snd_timer snd soundcore rfkill hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer serial_multi_instantiate kfifo_buf i2c_scmi industrialio i2c_designware_platform hid_sensor_iio_common i2c_designware_core joydev evdev sch_fq_codel msr i2c_dev fuse configfs dm_crypt trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj wacom usbhid rtsx_pci_sdmmc amdgpu mmc_core hid_sensor_hub amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm crct10dif_pclmul agpgart crc32_pclmul crc32c_intel i2c_algo_bit xhci_pci polyval_clmulni drm_suballoc_helper xhci_pci_renesas drm_display_helper polyval_generic ucsi_acpi ghash_clmulni_intel nvme cec sha512_ssse3 xhci_hcd typec_ucsi i2c_hid_acpi sha256_ssse3 drm_kms_helper i2c_hid nvme_core roles sha1_ssse3 psmouse amd_sfh sp5100_tco usbcore ccp video rtsx_pci t10_pi typec usb_common
[   49.480241]  wmi drm serio_raw aesni_intel crypto_simd
[   49.480250] ---[ end trace 0000000000000000 ]---
[   49.608072] RIP: 0010:__fscache_use_cookie+0x23/0x380 [netfs]
[   49.608109] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 48 65 48 8b 04 25 28 00 00 00 48 89 44 24 40 31 c0 <48> 8b af 88 00 00 00 83 e5 01 0f 85 2f 02 00 00 4c 8d 67 14 48 89
[   49.608113] RSP: 0018:ffffc90002e7bc80 EFLAGS: 00010246
[   49.608119] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   49.608121] RDX: 0000000000000001 RSI: 0000000000000001 RDI: e72aca85113b539f
[   49.608124] RBP: ffff8881b4ff0000 R08: 0000000000000001 R09: ffffea0006bf5680
[   49.608127] R10: 0000000000000001 R11: ffffea0006bf5680 R12: e72aca85113b539f
[   49.608129] R13: ffff8881b4ff0088 R14: ffff88818a0f4c00 R15: 000000000000000c
[   49.608132] FS:  00007f0aaa0006c0(0000) GS:ffff8883dee00000(0000) knlGS:0000000000000000
[   49.608135] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   49.608138] CR2: 00007f0aa000d028 CR3: 000000018b0be000 CR4: 0000000000f50ef0
[   49.608140] PKRU: 55555554

@Jezurko
Copy link

Jezurko commented Jun 11, 2024

Huh, looking at the systemd log I also noticed some gvfsd issues, that I didn't see earlier because they were far away from the GPF.

čen 11 18:50:59 paddle gvfsd[2202]: smbXcli_negprot_smb1_done: No compatible protocol selected by server.
čen 11 18:50:59 paddle gvfsd[2202]: smbXcli_negprot_smb1_done: No compatible protocol selected by server.
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: GFileInfo created without standard::content-type
čen 11 18:50:59 paddle gvfsd-network[2188]: file ../gio/gfileinfo.c: line 1821 (g_file_info_get_content_type): should not be reached
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_new_intern: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_ref_string_release: assertion 'str != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_file_info_set_content_type: assertion 'content_type != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_file_info_set_content_type: assertion 'content_type != NULL' failed
čen 11 18:50:59 paddle gvfsd-network[2188]: g_file_info_set_content_type: assertion 'content_type != NULL' failed

I tried looking into it and discovered this Linux Mint forum post mentioning Ubuntu Launchpad report.

But I don't understand this, as I believe cifs should not interact with gvfs, right? For the record, my mount is created using systemd mount service

@TraceyC77
Copy link
Contributor

I don't think gvfs is related to this particular failure. The upstream gvfs bug was fixed (apparently Ubuntu still has to implement it in their distro)

Also, while I have those same gvfsd errors in my logs, they do not appear when I trigger the bug (by attempting to browse a Samba share in Dolphin).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Upstream This is an upstream issue.
Projects
Status: Ready
Development

No branches or pull requests

4 participants