Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store inode when reading /proc/*/smaps in _pslinux.py/get_memory_maps() #444

Open
giampaolo opened this issue May 23, 2014 · 7 comments
Open

Comments

@giampaolo
Copy link
Owner

From remi.cha...@gmail.com on November 20, 2013 12:12:24

At the moment, the function get_memory_maps() in the Linux implementation 
_pslinux.py, correctly parses the content of /proc/<pid>/smaps and stores some 
of the memory maps data in a container.

Unfortunately, it does not store the inode, although it is correctly parsed 
from first_line.

This is specifically necessary with SYSV00000000 maps, because their path names 
are not unique.

It would be great to store the inode as a new item in the returned container. 
It does not imply changes to existing applications.

On the other hand, I do not know how this can be emulated on other operating systems.

Original issue: http://code.google.com/p/psutil/issues/detail?id=444

@giampaolo
Copy link
Owner Author

From g.rodola on November 20, 2013 03:23:27

I doubt this can be emulated on most operating systems.
By "path names not unique" you mean the path name is the same for different 
entries and you would like to differentiate them? Is that your use case?
If that's the case can't you just use the address ('addr')?

@giampaolo
Copy link
Owner Author

From remi.cha...@gmail.com on November 20, 2013 03:48:44

Exactly, I need to differentiate the anonymous ("SYSV00000000") shared memory blocks.

In the use case you described, it is definitively necessary because I need to 
detect related processes, which share an unique memory segment (And are 
therefore part of the same logical application).

The mapped address is not unique because different processes can map the same 
shared memory segment at different addresses in their address space.

On the other hand, the inode really represent the shared memory segment: It is 
equivalent to the shmid returned by the command ipcs.

I guess there is something very similar on all Unices, BSD etc... On Windows, 
there is probably an unique integer id which could serve the same purpose.

Thanks

@giampaolo
Copy link
Owner Author

From g.rodola on November 20, 2013 10:31:07

...but what would you do after you can differentiate between unique memory mappings?
I'm not sure I'm understanding your use case, I'm sorry (maybe some pseudo come 
could help?).

Generally speaking, if the use case is not common enough and the feature is not 
portable across many platform I usually decide not to add it. My impression so 
far is that this is the case (but again: I'm probably misinterpreting the use case).

@giampaolo
Copy link
Owner Author

From remi.cha...@gmail.com on November 20, 2013 12:50:54

One of the use of shared memory segments is to have different processes 
communicating with each other. For this, they need to map the segment in their 
memory space, using the unique shared id, called "shmid" by the command "ipcs".

The pathname of the shared memory segment is usable only if this is a file 
mapped into memory, but this is a special case (Most read-only files mapped in 
memory for ease of access). For interprocess communication, memory segments are 
defined by their shmid (Equivalent to an inode), not by their path names which 
is non-existent (Or given a default value): They are anonymous.

In other words, memory segments used in interprocess communications cannot be 
used without their id (That is, the inode as returned by the /proc file 
system), simply because they cannot be identified. Internally, Linux does not 
use the pathname, but of course the inode.

For example, these ids are necessary to check if two different applications are 
communicating via shared memory segments: This technology is widely used (I can 
give specific example by email, if necessary).

It would make sense, for example to concatenate the shmid to the string 
"SYSV00000000" which is not usable anyway.

On Solaris, you might use this id: http://www.cs.cf.ac.uk/Dave/C/node27.html Thanks

@ganguin
Copy link

ganguin commented Mar 9, 2018

Is this still on track?

I'm developing a tool that track processes that are running with outdated mmap'ed libraries (containing bugs or security issues or even a malware infected file using this to hide). /proc/pid/smap flags them with appending " (deleted)" at the end of the path.

psutil does not allow to get the inode which would but required to validate the mmap'ed file is different from the file on the filesystem. psutil also strips the " (deleted)". Therefore I see no way of getting the info for my case.

like remi mentioned in the first comment the patch seems trivial:

  • Line 1672 already reads the inode from /proc/_/smaps

    psutil/psutil/_pslinux.py

    Lines 1672 to 1674 in f0094db

    addr, perms, offset, dev, inode, path = hfields
    except ValueError:
    addr, perms, offset, dev, inode, path = \
  • the inode number could be stored as a new item in the returned container

    psutil/psutil/_pslinux.py

    Lines 1685 to 1696 in f0094db

    ls.append((
    decode(addr), decode(perms), path,
    data[b'Rss:'],
    data.get(b'Size:', 0),
    data.get(b'Pss:', 0),
    data.get(b'Shared_Clean:', 0),
    data.get(b'Shared_Dirty:', 0),
    data.get(b'Private_Clean:', 0),
    data.get(b'Private_Dirty:', 0),
    data.get(b'Referenced:', 0),
    data.get(b'Anonymous:', 0),
    data.get(b'Swap:', 0)

I don't know about other OS, however as I can see in the doc, the info is already very different depending on the OS:
http://psutil.readthedocs.io/en/latest/#psutil.Process.memory_maps

@giampaolo
Copy link
Owner Author

Yes, info is different depending on the OS, but all the values that differ are integers that are summed together when using grouped=True argument. Adding the inode would make this kinda twisted because we should exclude it from the sum() logic.
You use case (identifying "dead" / "(deleted)" mmap files) looks very specific and wouldn't benefit from adding the inode alone (correct?).

@ganguin
Copy link

ganguin commented Mar 12, 2018

For my case it would be ok to be excluded from grouped=True. It's probably realistically unlikely, but the same path may be loaded with different inodes, so using grouped=False may even be required.

Once i have the inode, I can "stat" the file in the filesytem and compare if the inodes are the same. If the mmap'ed path is delted in the filesystem, the logic gets a bit more complicated to remove false positive (basically if the file is a shared library, I can check if a newer version is present in the filesystem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants