Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: getProcess may return null when the current thread is interrupted #1589

Closed
joshiste opened this issue Mar 29, 2021 · 4 comments · Fixed by #1593
Closed

Linux: getProcess may return null when the current thread is interrupted #1589

joshiste opened this issue Mar 29, 2021 · 4 comments · Fixed by #1593
Labels
design discussion Ways to improve OSHI's design

Comments

@joshiste
Copy link

The information from the procfs are read using the Files.readAllLines method and all exceptions are swallowed including the java.nio.channels.ClosedByInterruptException.
Which means that the getProcess() might return null due to the fact that the reading thread was interrupted on reading the procfs. And lead to an unexpected behavior in our case.

Imho it would be better to wrap and throw the exception instead of swallowing and returning null.

@dbwiddis
Copy link
Member

Thanks for this report. There's been quite a bit of discussion regarding the appropriate way to handle exceptions here: oshi/oshi6#2

I'm not quit sure I understand exactly where you would want to throw an exception as opposed to failing gracefully. It seems more common that the Linux process query may fail due to a race condition (pid terminates between collecting pids from /proc before looking into /proc/[pid]) and handling the expected possible null there should be documented.

@dbwiddis dbwiddis added the design discussion Ways to improve OSHI's design label Mar 29, 2021
@joshiste
Copy link
Author

joshiste commented Mar 30, 2021

@dbwiddis thanks for your response.

Thanks for this report. There's been quite a bit of discussion regarding the appropriate way to handle exceptions here: oshi/oshi6#2

I'll have a look into this.

It seems more common that the Linux process query may fail due to a race condition (pid terminates between collecting pids from /proc before looking into /proc/[pid]) and handling the expected possible null there should be documented.

The process was running the entire time and long after that (so the whole time throughout the getProcess(int) call. So imho this is no race condition.

Imho it would be ok to return null when the process is terminated and the procfs files are missing. But this interception basically tells that reading the file failed, because the reading thread was aborted, should be propagated.

The stack looks like this:

java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:161)
        at java.io.BufferedReader.readLine(BufferedReader.java:324)
        at java.io.BufferedReader.readLine(BufferedReader.java:389)
        at java.nio.file.Files.readAllLines(Files.java:3205)
        at oshi.util.FileUtil.readFile(FileUtil.java:209)  <--- doesn't even log the error
        at oshi.util.FileUtil.getKeyValueMapFromFile(FileUtil.java:89)
        at oshi.software.os.linux.LinuxOSProcess.updateAttributes(LinuxOperatingSystem.java:290)   <--- reading the proc status
        at oshi.software.os.linux.LinuxOSProcess.<init>(LinuxOSProcess.java:101)
        at oshi.software.os.linux.LinuxOperatingSystem.getProcess(LinuxOperatingSystem.java:183)

@dbwiddis
Copy link
Member

I understand that this particular case was not caused by a race condition; the point I was trying to make is that exceptions in the process of reading procfs are going to be reasonably common. OSHI attempts to gracefully handle exceptions, particularly in very common scenarios.

It's a fair criticism that this exception wasn't logged, because file reading exceptions are also reasonably common. There is a boolean which controls logging -- it was put there to avoid logspam!

I'm not sure what is causing this exception for you; is it common to be unable to read files from procfs? How would you suggest separating out that particular exception vs. the very common "file not found" types of exceptions? Should we log all file exceptions (or make it a configurable value)?

It's not clear to me the best way to handle this rare condition without generating a bunch of noise for normal conditions.

Long term, with an API rewrite, it makes complete sense to either throw a lot more exceptions to the user and let them catch/handle them, or provide more detail in a result object regarding a failed read and the reason for failure, but that goes well beyond this particular case.

I'm open to a PR if you think you have a way to improve OSHI here, but from my perspective, it seems to be a very subjective matter which exceptions a user wants to have thrown, have logged, or considers noise.

@dbwiddis
Copy link
Member

I think there's room for debug-level logging of the file errors here. Do you think that would be sufficient? Thinking about this case, it seems to stem from a problem (e.g., not synchronizing multithreading well so an interrupt happens) and a log message would help resolve the problem better than a random null somewhere not expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design discussion Ways to improve OSHI's design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants