Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[freebsd] add support for the FreeBSD >= 13 namecache #180

Closed
wants to merge 1 commit into from

Conversation

DamjanJovanovic
Copy link
Contributor

The nc_hash field changed from a doubly linked list
to a singly linked list in commit 2b86f9d6d013a9ad8b1f8d03286018e785d5b3f6.

The nc_hash field changed from a doubly linked list
to a singly linked list in commit 2b86f9d6d013a9ad8b1f8d03286018e785d5b3f6.
@lrosenman
Copy link
Contributor

@emaste can you look at and validate this change, please?

@@ -545,7 +545,11 @@ struct namecache {
# else
LIST_ENTRY(namecache) nc_src; /* source vnode list */
TAILQ_ENTRY(namecache) nc_dst; /* destination vnode list */
# if __FreeBSD_version >= 1300000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1300105 is the __FreeBSD_version bump after 2b86f9d6d013a9ad8b1f8d03286018e785d5b3f6.

@DamjanJovanovic
Copy link
Contributor Author

Thank you. I am now thinking this is the wrong approach, and we should drop use of the namecache entirely, especially on newer FreeBSD versions. Even with this patch, msdosfs gives incomplete "/mountpoint/ -- /filename" paths, and zfs still doesn't work. cd9660 was broken since FreeBSD 6. And it keep going through ABI breaking changes. Using the namecache is too fragile, and simply unmaintainable.

On the other hand, a 16 line patch to gather_proc_info() in dialects/freesbd/dproc.c, to use libutil's kinfo_getfile() and provide paths from there, gets both msdosfs and zfs working perfectly, without the need for any filesystem-specific code, and would even support future filesystems without lsof changes. I am now trying to see whether that can be taken further, and eliminate the need for most or even all kernel memory access, while vastly simplifying the code and keeping it working with little future maintenance.

@lrosenman
Copy link
Contributor

@DamjanJovanovic sounds good. Let's see how your investigation goes.

@emaste
Copy link
Contributor

emaste commented Dec 6, 2021

use libutil's kinfo_getfile()

Indeed this is the best approach and (something like this) has been discussed in the past as the path forward for lsof. I hadn't looked into details before though, it's great that the patch seems like it will be straightforward.

@DamjanJovanovic
Copy link
Contributor Author

In my freebsd-nokvm branch, so far, lsof can:

  • stop using kvm and namecache completely on FreeBSD >= 7, while fully preserving compatibility with FreeBSD < 7.
  • run as user, not root, but warn if security.bsd.see_other_uids==0 and the like.
  • show processes and open files, with the COMMAND, PID, USER, FD, and NAME columns populated so far.

The flow of control through lsof is pretty simple. Execution starts in main.c, calling initialize() then gather_proc_info() in the dialect, in our case dialects/freebsd/dproc.c. The latter function populates all processes and their open files into in-memory data structures. Then it prints them. If there is no path set on a file by the time it tries to print it, then it uses the namecache to look it up.

How were my changes made? On FreeBSD >= 7:

  • instead of kvm initialization in initialize(), I just do the security.bsd.* warnings.
  • in gather_proc_info(), instead of using kvm_getprocs(), use sysctl with CTL_KERN+KERN_PROC+KERN_PROC_ALL/PROC (depending on whether we want threads). Both kvm_getprocs() and the sysctl return struct kinfo_proc, so we can reuse a lot of the code in that function.
  • The horrible file descriptor code was moved from gather_proc_info() into separate functions, and a new version was written using kinfo_getfile().
  • The path is always populated before gather_proc_info() returns, so the namecache isn't used.

There is now thousands of lines of vnode-specific code to convert from using kernel data to using kinfo_file and the like. So far I've managed to eliminate kvm and namecache access from the core of lsof, but kvm usage may be much harder to eliminate completely. It is used for reporting on all kinds of things, eg. file locks, which seems unavailable through other APIs. So it may have to continue existing and be used in certain cases, possibly with degraded functionality when it's unavailable.

@emaste
Copy link
Contributor

emaste commented Dec 7, 2021

stop using kvm and namecache completely on FreeBSD >= 7, while fully preserving compatibility with FreeBSD < 7.

Excellent. FreeBSD 6.x was EOL as of November 30, 2010, so we could remove the compatibility code in the near future as well.

It is used for reporting on all kinds of things, eg. file locks, which seems unavailable through other APIs.

If we can make a list of the information that lsof typically reports that is not available from a proper API these can be added to the kernel.

@DamjanJovanovic
Copy link
Contributor Author

If we can make a list of the information that lsof typically reports that is not available from a proper API these can be added to the kernel.

For starters:

  • The CTL_KERN + KERN_VNODE sysctl needs to work. Without it we cannot access the inode ("NODE" column) nor filter by inode or vnode id, nor determine an open file's filesystem efficiently or sometimes at all (if kinfo_file.kf_path is empty, then statfs() can't work, whereas we can cache results from previous statfs() and match by xvnode.xv_mount). There could be further limitations. This sysctl was turned off in 2003 in commit acb18acfec97aa7fe26ff48f80a5c3f89c9b542d.
  • Some way to access link count and possibly other properties in sys/vnode.h "struct vattr" will be necessary. Currently lsof obtains that by digging through filesystem-specific data structures with custom code for every supported filesystem.
  • For unsupported files lsof wants to print the value of struct file's f_ops (a kernel address). I can get struct xfile through the CTL_KERN + KERN_FILE sysctl, but it doesn't have f_ops. This might not matter much.

@DamjanJovanovic
Copy link
Contributor Author

So I've gotten quite far now. All vnodes, pipes, fifos, unix sockets, even some new things like pts, work. Can search by filename etc. Still need to do kqueue, TCP/UDP sockets, pipe's "buffer_size" / "in" / "out" fields (a kernel patch is easy), byte range locks, KF_FD_TYPE_CTTY, and a few other details. Then I'll clean up and prepare a nice patch set with incremental changes that can be reviewed and merged.

I'll be dropping support for legacy FreeBSD versions after all: they're already very broken anyway, eg. procfs only works on FreeBSD < 5, dev/rdev values seem wrong on FreeBSD 13 and change endlessly between versions, namecache doesn't fully work even with the patch here. Also kvm will continue to be used, for cases where there's no other way, but it will be optional and lsof will do what it can without it.

@lrosenman
Copy link
Contributor

Sounds very cool.

@DamjanJovanovic
Copy link
Contributor Author

I've now finished and submitted those changes as a new pull request:
#184

Closing this one. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants