Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8329961: Buffer overflow in os::Linux::kernel_version #18697

Closed
wants to merge 6 commits into from

Conversation

jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Apr 9, 2024

Hi,

There was a bug in the original implementation of os::Linux::kernel_version which this PR fixes. Namely, the comparison walker != nullptr is wrong, the intended comparison was *walker != '\0' or walker[0] != '\0'. This means that if a bad/unexpected version string is encountered the walker would read past the string.

We fix this by applying the correct comparison and adding some basic tests.

@luhenry , @robehn. You attempted to create automatic backport branches on this in the original PR, can you check whether this fix also needs to be backported to the mentioned versions? The original PR link is this: #17889


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8329961: Buffer overflow in os::Linux::kernel_version (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18697/head:pull/18697
$ git checkout pull/18697

Update a local copy of the PR:
$ git checkout pull/18697
$ git pull https://git.openjdk.org/jdk.git pull/18697/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18697

View PR using the GUI difftool:
$ git pr show -t 18697

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18697.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 9, 2024

👋 Welcome back jsjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 9, 2024

@jdksjolen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8329961: Buffer overflow in os::Linux::kernel_version

Reviewed-by: rehn, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 507 new commits pushed to the master branch:

  • 9731b1c: 8327137: Add test for ConcurrentModificationException in BasicDirectoryModel
  • c5150c7: 8309751: Duplicate constant pool entries added during default method processing
  • 86cb767: 8326568: jdk/test/com/sun/net/httpserver/bugs/B6431193.java should use try-with-resource and try-finally
  • b49ba42: 8330002: Remove redundant public keyword in BarrierSet
  • dd6e453: 8329767: G1: Move G1BlockOffsetTable::set_for_starts_humongous to HeapRegion
  • e0fd6c4: 8329545: [s390x] Fix garbage value being passed in Argument Register
  • 51ed69a: 8327621: Check return value of uname in os::get_host_name
  • bea9acc: 8328482: Convert and Open source few manual applet test to main based
  • d037a59: 8311248: Refactor CodeCache::initialize_heaps to simplify adding new CodeCache segments
  • bab7019: 8329431: Improve speed of writing CDS heap objects
  • ... and 497 more: https://git.openjdk.org/jdk/compare/ae5e3fdd5922a232c9b48fc846c4fcdc8f2b2645...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8329961 8329961: Buffer overflow in os::Linux::kernel_version Apr 9, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 9, 2024
@openjdk
Copy link

openjdk bot commented Apr 9, 2024

@jdksjolen The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Apr 9, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 9, 2024

Webrevs

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you!

Yes it should be backported to jdk21u-dev and jdk22u.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 9, 2024
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

@tstuefe
Copy link
Member

tstuefe commented Apr 9, 2024

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Okay scratch the last sentence, since you manually specify the base, this would not be an issue.

@jdksjolen
Copy link
Contributor Author

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

@robehn
Copy link
Contributor

robehn commented Apr 10, 2024

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I don't follow that. uname() is POSIX portable, while /proc/sys/kernel/osrelease is Linux specific.
Now this is code is in Linux part, but I don't see why a less portable way that require more code would be superior?

@tstuefe
Copy link
Member

tstuefe commented Apr 10, 2024

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.

My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

@jdksjolen
Copy link
Contributor Author

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.

My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html

OK, I'm probably too paranoid regarding uname. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with uname then we're compromised regardless.

@jdksjolen
Copy link
Contributor Author

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.
My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html

OK, I'm probably too paranoid regarding uname. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with uname then we're compromised regardless.

And on top of that: If we don't trust uname, then can we trust fopen to open the file for us :-)? No. So, the issue is largely moot.

@tstuefe
Copy link
Member

tstuefe commented Apr 10, 2024

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.
My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html

Okay that. Note that deprecation of numerical conversions has been hugely controversial, and it was a step taken by glibc manpage maintainers. This is just about numerical overflow. E.g. 3333333333333333333333333333333.333333333333333333333333333333 will overflow int range, and return true. Are we honestly concerned about this here?

By avoiding sscanf, we swap very readable and simple-hence-safe code with manual parsing, which is both unreadable and, as this issue shows, in practice a lot unsafer than had we used sscanf to begin with.

OK, I'm probably too paranoid regarding uname. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with uname then we're compromised regardless.

If you are paranoid, do this:

struct {
  utsname uts; char c;
} v;

then set v.c to 0 before calling uname.

@jdksjolen
Copy link
Contributor Author

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.
My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html

Okay that. Note that deprecation of numerical conversions has been hugely controversial, and it was a step taken by glibc manpage maintainers. This is just about numerical overflow. E.g. 3333333333333333333333333333333.333333333333333333333333333333 will overflow int range, and return true. Are we honestly concerned about this here?

By avoiding sscanf, we swap very readable and simple-hence-safe code with manual parsing, which is both unreadable and, as this issue shows, in practice a lot unsafer than had we used sscanf to begin with.

OK, I'm probably too paranoid regarding uname. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with uname then we're compromised regardless.

If you are paranoid, do this:

struct {
  utsname uts; char c;
} v;

then set v.c to 0 before calling uname.

Aha, it's just overflow issues? Well, then let's just use sscanf. Thanks for the hint, interesting code pattern! That won't work as the string isn't necessarily embedded within the struct, but it's an interesting strategy nonetheless. I'll just trust that it ends at some point.

@tstuefe
Copy link
Member

tstuefe commented Apr 10, 2024

Hmm, wouldn't sscanf not be simpler and safer? No need to factor out the parser. IMHO no need to even add a gtest since parsing would be really simple and not loop based. E.g.

if (sscanf(release, "%d.%d", &major, &minor) != 2) {
  log_warning blabla
}

As bonus, you avoid accidental conversion from hex numbers and such that strotol provides and that we don't really want here.

Hi, according to C11 standard (and my man pages) it is UB to call the scanf-family of functions with "invalid" data and strtol is recommended instead. So, unfortunately, it might not be safer.

Can you point me to the relevant man page? If that were not to work, it would drastically limit the usefulness of scanf, and a lot of code in hotspot would be UB.
My manpage:

The format string consists of a sequence of directives which describe how to process the sequence of input characters. If processing of a directive fails, no further input is read, and scanf() returns. A "failure" can be either of the following: input failure, meaning that input characters were unavailable, or matching failure, meaning that the input was inappropriate (see below). 

Matching failures are part of the API.

Another thing: We shouldn't call uname. Reading /proc/sys/kernel/osrelease is sufficient.

I agree with @robehn, uname is Posix and portable. May also be cached inside glibc, saving us a proc fs read.

See BUGS section: https://man7.org/linux/man-pages/man3/sscanf.3.html

Okay that. Note that deprecation of numerical conversions has been hugely controversial, and it was a step taken by glibc manpage maintainers. This is just about numerical overflow. E.g. 3333333333333333333333333333333.333333333333333333333333333333 will overflow int range, and return true. Are we honestly concerned about this here?
By avoiding sscanf, we swap very readable and simple-hence-safe code with manual parsing, which is both unreadable and, as this issue shows, in practice a lot unsafer than had we used sscanf to begin with.

OK, I'm probably too paranoid regarding uname. I don't like that we've got a C string and there's no way of knowing whether it actually ends in a NUL byte or not. At least with a file you know that it has a specific length and can account for that. We probably have to assume that if someone is being malicious with uname then we're compromised regardless.

If you are paranoid, do this:

struct {
  utsname uts; char c;
} v;

then set v.c to 0 before calling uname.

Aha, it's just overflow issues? Well, then let's just use sscanf. Thanks for the hint, interesting code pattern! That won't work as the string isn't necessarily embedded within the struct, but it's an interesting strategy nonetheless.

I think it is required to be inline, though. See https://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/utsname.h.html , definition is to be an array, not a pointer:

char  release[]  Current release level of this implementation. 

I'll just trust that it ends at some point.

It has to. User address space is ending at 128TB usually. :-)

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat

@jdksjolen
Copy link
Contributor Author

Cool! Thank you for the reviews.

/integrate

@openjdk
Copy link

openjdk bot commented Apr 10, 2024

Going to push as commit 279ed0d.
Since your change was applied there have been 507 commits pushed to the master branch:

  • 9731b1c: 8327137: Add test for ConcurrentModificationException in BasicDirectoryModel
  • c5150c7: 8309751: Duplicate constant pool entries added during default method processing
  • 86cb767: 8326568: jdk/test/com/sun/net/httpserver/bugs/B6431193.java should use try-with-resource and try-finally
  • b49ba42: 8330002: Remove redundant public keyword in BarrierSet
  • dd6e453: 8329767: G1: Move G1BlockOffsetTable::set_for_starts_humongous to HeapRegion
  • e0fd6c4: 8329545: [s390x] Fix garbage value being passed in Argument Register
  • 51ed69a: 8327621: Check return value of uname in os::get_host_name
  • bea9acc: 8328482: Convert and Open source few manual applet test to main based
  • d037a59: 8311248: Refactor CodeCache::initialize_heaps to simplify adding new CodeCache segments
  • bab7019: 8329431: Improve speed of writing CDS heap objects
  • ... and 497 more: https://git.openjdk.org/jdk/compare/ae5e3fdd5922a232c9b48fc846c4fcdc8f2b2645...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 10, 2024
@openjdk openjdk bot closed this Apr 10, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 10, 2024
@openjdk
Copy link

openjdk bot commented Apr 10, 2024

@jdksjolen Pushed as commit 279ed0d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@robehn
Copy link
Contributor

robehn commented Apr 12, 2024

@jdksjolen are you backporting this?

@jerboaa
Copy link
Contributor

jerboaa commented Apr 12, 2024

@robehn I've got PRs for it in testing:
openjdk/jdk22u#141
openjdk/jdk21u-dev#498

@jdksjolen
Copy link
Contributor Author

@jdksjolen are you backporting this?

@robehn, thanks. I forgot about doing that. Good for me to have a try at doing that, but I'll need sponsorship.

/backport jdk21u-dev
/backport jdk22u

@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@jdksjolen the backport was successfully created on the branch backport-jdksjolen-279ed0dd in my personal fork of openjdk/jdk21u-dev. To create a pull request with this backport targeting openjdk/jdk21u-dev:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 279ed0dd from the openjdk/jdk repository.

The commit being backported was authored by Johan Sjölen on 10 Apr 2024 and was reviewed by Robbin Ehn and Thomas Stuefe.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21u-dev:

$ git fetch https://github.com/openjdk-bots/jdk21u-dev.git backport-jdksjolen-279ed0dd:backport-jdksjolen-279ed0dd
$ git checkout backport-jdksjolen-279ed0dd
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21u-dev.git backport-jdksjolen-279ed0dd

⚠️ @jdksjolen You are not yet a collaborator in my fork openjdk-bots/jdk21u-dev. An invite will be sent out and you need to accept it before you can proceed.

@openjdk
Copy link

openjdk bot commented Apr 12, 2024

@jdksjolen the backport was successfully created on the branch backport-jdksjolen-279ed0dd in my personal fork of openjdk/jdk22u. To create a pull request with this backport targeting openjdk/jdk22u:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 279ed0dd from the openjdk/jdk repository.

The commit being backported was authored by Johan Sjölen on 10 Apr 2024 and was reviewed by Robbin Ehn and Thomas Stuefe.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22u:

$ git fetch https://github.com/openjdk-bots/jdk22u.git backport-jdksjolen-279ed0dd:backport-jdksjolen-279ed0dd
$ git checkout backport-jdksjolen-279ed0dd
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk22u.git backport-jdksjolen-279ed0dd

⚠️ @jdksjolen You are not yet a collaborator in my fork openjdk-bots/jdk22u. An invite will be sent out and you need to accept it before you can proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants