Buffer(number) is unsafe #4660

Closed
feross opened this Issue Jan 13, 2016 · 208 comments

Projects

None yet
@feross
Member
feross commented Jan 13, 2016

tl;dr

This issue proposes:

  1. Change new Buffer(number) to return safe, zeroed-out memory
  2. Create a new API for creating uninitialized Buffers, Buffer.alloc(number)

Update: Jan 15, 2016

Upon further consideration, I think that returning zeroed out memory is a separate issue. The core issue is: unsafe buffer allocation should be in a different API.

I now support adding two APIs:

  • Buffer.from(value) - convert from any type to a buffer
  • Buffer.alloc(size) - create an uninitialized buffer with given size

This solves the core problem that affected ws and bittorrent-dht which is Buffer(variable) getting tricked into taking a number argument.

Why is Buffer unsafe?

Today, the node.js Buffer constructor is overloaded to handle many different argument types like String, Array, Object, TypedArrayView (Uint8Array, etc.), ArrayBuffer, and also Number.

The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.

Because the Buffer constructor is so powerful, you often see code like this:

// Convert UTF-8 strings to hex
function toHex (str) {
  return new Buffer(str).toString('hex')
}

_But what happens if toHex is called with a Number argument?_

Remote Memory Disclosure

If an attacker can make your program call the Buffer constructor with a Number argument, then they can make it allocate uninitialized memory from the node.js process. This could potentially disclose TLS private keys, user data, or database passwords.

When the Buffer constructor is passed a Number argument, it returns an UNINITIALIZED block of memory of the specified size. When you create a Buffer like this, you MUST overwrite the contents before returning it to the user.

Would this ever be a problem in real code?

Yes. It's surprisingly common to forget to check the type of your variables in a dynamically-typed language like JavaScript.

Usually the consequences of assuming the wrong type is that your program crashes with an uncaught exception. But the failure mode for forgetting to check the type of arguments to the Buffer constructor is more catastrophic.

Here's an example of a vulnerable service that takes a JSON payload and converts it to hex:

// Take a JSON payload {str: "some string"} and convert it to hex
var server = http.createServer(function (req, res) {
  var data = ''
  req.setEncoding('utf8')
  req.on('data', function (chunk) {
    data += chunk
  })
  req.on('end', function () {
    var body = JSON.parse(data)
    res.end(new Buffer(body.str).toString('hex'))
  })
})

server.listen(8080)

In this example, an http client just has to send:

{
  "str": 1000
}

and it will get back 1,000 bytes of uninitialized memory from the server.

This is a very serious bug. It's similar in severity to the the Heartbleed bug that allowed disclosure of OpenSSL process memory by remote attackers.

Which real-world packages were vulnerable?

bittorrent-dht

@mafintosh and I found this issue in one of our own packages, bittorrent-dht. The bug would allow anyone on the internet to send a series of messages to a user of bittorrent-dht and get them to reveal 20 bytes at a time of uninitialized memory from the node.js process.

Here's the commit that fixed it. We released a new fixed version, created a Node Security Project disclosure, and deprecated all vulnerable versions on npm so users will get a warning to upgrade to a newer version.

ws

That got us wondering if there were other vulnerable packages. Sure enough, within a short period of time, we found the same issue in ws, the most popular WebSocket implementation in node.js.

If certain APIs were called with Number parameters instead of String or Buffer as expected, then uninitialized server memory would be disclosed to the remote peer.

These were the vulnerable methods:

socket.send(number)
socket.ping(number)
socket.pong(number)

Here's a vulnerable socket server with some echo functionality:

server.on('connection', function (socket) {
  socket.on('message', function (message) {
    message = JSON.parse(message)
    if (message.type === 'echo') {
      socket.send(message.data) // send back the user's message
    }
  })
})

socket.send(number) called on the server, will disclose server memory.

Here's the release where the issue was fixed, with a more detailed explanation. Props to @3rd-Eden for the quick fix. Here's the Node Security Project disclosure.

What's the solution?

It's important that node.js offers a fast way to get memory otherwise performance-critical applications would needlessly get a lot slower.

But we need a better way to signal our intent as programmers. When we want uninitialized memory, we should request it explicitly.

Sensitive functionality should not be packed into a developer-friendly API that loosely accepts many different types. This type of API encourages the lazy practice of passing variables in without checking the type very carefully.

Buffer.alloc(number)

The functionality of creating buffers with uninitialized memory should be part of another API. We propose Buffer.alloc(number). This way, it's not part of an API that frequently gets user input of all sorts of different types passed into it.

var buf = Buffer.alloc(16) // careful, uninitialized memory!

// Immediately overwrite the uninitialized buffer with data from another buffer
for (var i = 0; i < buf.length; i++) {
  buf[i] = otherBuf[i]
}

How do we fix node.js core?

We sent a PR (merged as semver-major) which defends against one case:

var str = 16
new Buffer(str, 'utf8')

In this situation, it's implied that the programmer intended the first argument to be a string, since they passed an encoding as a second argument. Today, node.js will allocate uninitialized memory in the case of new Buffer(number, encoding), which is probably not what the programmer intended.

But this is only a partial solution, since if the programmer does new Buffer(variable) (without an encoding parameter) there's no way to know what they intended. If variable is sometimes a number, then uninitialized memory will sometimes be returned.

What's the real long-term fix?

We could deprecate and remove new Buffer(number) and use Buffer.alloc(number) when we need uninitialized memory. But that would break 1000s of packages. So that's a no-go.

Instead, we believe the best solution is to:

  1. Change new Buffer(number) to return safe, zeroed-out memory
  2. Create a new API for creating uninitialized Buffers. We propose: Buffer.alloc(number)

This way, existing code continues working and the impact on the npm ecosystem will be minimal. Over time, npm maintainers can migrate performance-critical code to use Buffer.alloc(number) instead of new Buffer(number).

Conclusion

We think there's a serious design issue with the Buffer API as it exists today. It promotes insecure software by putting high-risk functionality into a convenient API with friendly "developer ergonomics".

This wasn't merely a theoretical exercise because we found the issue in some of the most popular npm packages.

Eventually, we hope that node.js core can switch to this new, safer behavior. We believe the impact on the ecosystem would be minimal since it's not a breaking change. Well-maintained, popular packages would be updated to use Buffer.alloc quickly, while older, insecure packages would magically become safe from this attack vector.

@feross
Member
feross commented Jan 13, 2016

For those interested in getting the new behavior we propose without a change to node core, we published a user-land package: safe-buffer.

@JacksonTian
Contributor

Now, the new Buffer(number, encoding) will throw exception, detail is here: 3b27dd5 . I introduce a new API Buffer.encode() for encoding case.

@silverwind silverwind added the buffer label Jan 13, 2016
@rvagg
Member
rvagg commented Jan 13, 2016

You could make similar arguments about child_process being unsafe, about loading native addons being unsafe, even about fs being unsafe. The more bubble-wrap we put in core, the more users are likely to make the false assumption that programming in Node.js is like programming in the browser, which it absolutely is not. That's not a -1 to this proposal (in fact, this exact discussion is an ongoing on that keeps on coming up, we have an active thread in the security repo to thrash this out too), just a word of caution because we keep on having discussions like this where we want to move closer to a sandbox but that will never be the case for Node and it would be even more unsafe to lull users into the perception that they are as protected as in the browser (there's a very good chance that's exactly what happened to TrendMicro).

It should be noted that the primary reason for not zero-filling by default is performance. So new Buffer() should be considered a form of malloc(). In the past, it has been demonstrated that allocating filled memory has significant performance penalties to a plain malloc(). I keep on suggesting that those who are advocating switching the default to a clean allocation should do some benchmarking to demonstrate what kind of impact it might have. Otherwise, this argument is dead-in-the-water because the "it's faster" argument currently holds this one in place.

@trevnorris
Contributor

This discussion has been had several times in the last few weeks.

If a "high-risk functionality" application doesn't have proper type checking in place then it already has a security issue. This problem can't be dished into Buffer as if it's at fault.

Not zero filling the Buffer is completely a performance issue, but the impact on performance is more than trivial. Even at allocations as low as 1KB. And while this type of impact probably wouldn't harm your standard web app, it can noticeably affect performance of a node process.

The concern is when a number is passed when something else is expected. Again this is a bug in the developers code, and TBH it doesn't make much sense to impact the performance of the many modules out there today and force them to update their code to use new syntax because of that.

And let's be honest, in the years that Buffer has worked exactly this way how many times has this been reportedly seen as an issue in the wild. The only reason you're here now is because it affected modules you were directly involved with.

@feross
Member
feross commented Jan 13, 2016

@rvagg @trevnorris You both raise good points. Node is not the browser, and performance is critical.

The difference with modules like child_process and fs is that they're very obviously doing powerful and sensitive operations. Even the most trivial use of them will confront the developer with the fact that they're interacting with their OS on a low level and need to be careful.

Buffer is different. 95% of the time, it's safe and idiot-proof. 5% of the time you need to be very, very careful.

You can pass in:

  • Array –> safe
  • Buffer –> safe
  • ArrayBuffer –> safe
  • Uint8Array –> safe
  • Uint16Array –> safe
  • the rest of the typed array types –> safe
  • Any array-like –> safe
  • Object (reverse of .toJSON()) –> safe
  • String –> safe
  • Number –> _SURPRISE! REMOTE MEMORY DISCLOSURE!_

The API seems like it's trying to lay a trap for the user.

Look at the contributors on each of those repos. Some of the most talented node.js developers looked at this code extensively and didn't notice this issue for years.

@feross
Member
feross commented Jan 13, 2016

And while this type of impact probably wouldn't harm your standard web app, it can noticeably affect performance of a node process.

To be clear, I'm not suggesting that node core should use zeroed-out buffers. It would continue using unallocated buffers, but that functionality would move to a different API away from the one that looks and behaves (and now actually is) a Uint8Array.

@mscdex
Contributor
mscdex commented Jan 13, 2016

-1 to changing new Buffer(number) to zero-out by default. Even if there was a separate API to preserve the behavior, that means a lot of code change for a lot of packages and application code. I'd personally be more open to a separate method that created zeroed-out Buffers, that way it's an explicit opt-in and you are (or should be) at that point aware of the tradeoff you're making.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

separate method that created zeroed-out Buffers

That would not fix anything, there is already a new Buffer(number).fill(0).

@silverwind
Contributor

I see around 20-25% perf regression using .fill() on small buffers (on ARM here):

node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)"  6.16s user 0.19s system 100% cpu 6.287 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)"  6.10s user 0.19s system 101% cpu 6.222 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)"  5.97s user 0.17s system 101% cpu 6.074 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()"  8.41s user 0.16s system 100% cpu 8.504 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()"  7.89s user 0.14s system 100% cpu 7.968 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()"  7.94s user 0.24s system 100% cpu 8.111 total
@feross
Member
feross commented Jan 13, 2016

I see around 20-25% perf regression using .fill() on small buffers

Nice.

Remember too, this is not a 20-25% perf regression across the board. It's only going to apply when all of these conditions are true:

  • it's user-land code
  • they're calling new Buffer(number) because they need uninitialized memory
  • and, they don't upgrade to Buffer.alloc()

Node.js core would not be slower. Uses of new Buffer(string), new Buffer(array), etc. would not be slower.

@silverwind
Contributor

I think I'm in support of this, but I'm not aware of all the use cases for Buffer(number), so take that with a grain of salt.

Instead of introducing new API, how about an opt-in?

Buffer(100); // safe
Buffer.unsafe = true;
Buffer(100); // unsafe
@karissa
karissa commented Jan 13, 2016

Thanks feross for explaining this well, I learned something tonight! I am relatively new to Node - I gained my open source chops in Python land. In Python, slow performance is sort of a given. If you want to make a Python program more performant, you can optimize it in certain ways, which are well known. Performance then is an advanced problem. The old adage "make it work, then make it fast" works well for these kinds of dynamic, interpreted languages. After all, we are not writing in C here, and the beauty of languages like Node and Python is the ease with which external, more performant packages can be accessed when necessary.

20-25% is actually quite a lot slower, true -- but how many calls to Buffer are using Buffer(Number) and require high performance? It seems as though this being fixed in core will bring great gratitude from newcomers and old hats alike. As a relatively new node programmer, I am certainly grateful - my memory won't be stolen!

@finnp
finnp commented Jan 13, 2016

I generally agree that the API is problematic.

I just want to point out, that fixing Buffer to be zero-filled potentially introduces security risks to code that would use the constructor to get some entropy. Consider this (arguably badly coded) example, where someone might use it instead of crypto.randomBytes.

var someEntropy = (new Buffer(256)).toString('hex')

Updating to a new node major version with the proposed new Buffer API would not break the code, but would introduce (an even higher) security risk to that code, without the user noticing right away.

I just wanted to point that out for the sake of argument.

@mcollina
Member

@finnp that API is so insecure that I would flag it as a security vulnerability, for the same assumption of this PR.

So, I'm 👎 on anything that make node (or apps that runs on node) slower. However, I see the security vulnerability.

My gut tells me: let's benchmark this change in node. Let's benchmark this change in userland (all the popular modules have benchmarks), and let's see where we stand. Does express gets slower? or Hapi? or socket.io? To what degree?

I think the problem is that malloc and encodings should not be on the same API. I propose to deprecate anything that is not new Buffer(number), and prepare a factory method for all the other cases. new Buffer(number) will stay like malloc, and we migrate all users to a safe API, that does not support the 'number' argument.

@steve-gray

I think anyone depending on that kind of buffer entropy is already walking in the land of the undocumented, and the security risk represented by the issue versus the pretty modest performance ding make this one a no brainer. To put this back into a C context, returning memory without a memset in response to a 'client' request is effectively insanity.

@joepie91
Contributor

+1 on making uninitialized Buffer creation a separate function in the API, and removing it from the "type-guessing" constructor, or replacing it with a 'safe' variant. This kind of potentially dangerous functionality should be explicitly opt-in.

I'm not sure why there's any discussion about it at all, to be honest. It's well understood that having safe defaults considerably decreases the chance of vulnerabilities, and moving uninitialized creation to a separate function would not incur any performance issues for maintained code (because they can simply use the new function if they know what they are doing, and actually intend uninitialized creation).

The unmaintained code would not receive security patches anyway, and in those cases a slowdown is better than an outright vulnerability. Node.js, as a project that follows semantic versioning, is in a fairly unique position to actually be able to make such patches without widespread ecosystem breakage.

@silverwind: Instead of introducing new API, how about an opt-in?

I would vote against a global opt-in. It doesn't really fix the problem (as a single intended opt-in means your entire codebase is now unprotected), while still introducing the same kind of break. That, plus global state gets messy in general.

@trevnorris: The concern is when a number is passed when something else is expected. Again this is a bug in the developers code, and TBH it doesn't make much sense to impact the performance of the many modules out there today and force them to update their code to use new syntax because of that.

It absolutely does. Safe defaults are the single most important thing a language/runtime can possibly provide for fostering a secure ecosystem around it. No matter how competent people are, they make mistakes, and the amount of possibly footguns should be reduced to an absolute minimum. The code change would be trivial, and provide significant security gains.

@trevnorris: And let's be honest, in the years that Buffer has worked exactly this way how many times has this been reportedly seen as an issue in the wild. The only reason you're here now is because it affected modules you were directly involved with.

Observance of security issues does not correlate to severity or how badly they need to be solved. Not to mention that you have no idea whether this has been discovered (and exploited) in the wild or not, nor any way to check, nor would the perpetrators have any reason to disclose as such.

The security community has spent several decades now understanding how to assess the severity of a security issue and how to prevent common issues, and it would probably be advisable to work from that knowledge, rather than repeating that entire process again in the Node.js community and exposing many users to danger in the process.

@mscdex: I'd personally be more open to a separate method that created zeroed-out Buffers, that way it's an explicit opt-in and you are (or should be) at that point aware of the tradeoff you're making.

Secure opt-ins don't work. See also this and the general principle of people following the path of least resistance.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

While I first thought that making Buffer(number) zero-filled by default is a good idea, I do not think that now. Imo it should better be deprecated and replaced by two separate methods.

Making Buffer(number) zero-filled will solve the issue long-term, but will cause havoc short-term. For example, if Alice writes a lib, sees that with Node.js 6.x, 7.x etc Buffer(number) is zero-filled, and does not zero-fill the buffer in her code (she doesn't want to zero-fill it twice!), and Bob uses that lib under Node.js 5.x where Buffer is not zero-filled by default — it will cause more damage that it solves.

Documenting that in a way «zero-fills since 6.x, zero-fill it twice if you use a lower version» would not help, Alice could think «I don't need 5.x» and publish a lib that does not manually zero-fill Buffers, but some other person could use that lib under 5.x and will be affected. There is no way around that — if Buffer(number) becomes zero-filled by default, the user will have to choose between zero-filling it two times in newer versions of Node.js or not zero-filling it at all in less recent versions of Node.js (and the third option would be — write some ugly version-detection code). Guess what will the user choose?

Alternate proposal:

  1. Introduce Buffer.allocateRaw(number) to allocate a non-zero filled Buffer, document it, try hard to note the possible security consequences of careless usage.
  2. Introduce Buffer.allocate(number) to allocate a zero-filled Buffer.
  3. Soft-deprecate Buffer(number) (first in doc for at least a whole major release, perhaps), tell people to use Buffer.allocate(number) instead.
  4. Introduce a command-line flag for opt-in to zero-filling all Buffer(number) calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.
  5. Hard-deprecate Buffer(number), point people to Buffer.allocate(number). Perhaps make Buffer(number) zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.

Thoughts?

This will also solve unintentional calls to Buffer(number) when a person wanted to call Buffer('200') instead (and called Buffer(200)).

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

@finnp Anyone who uses the code you provided is already doomed.

@vkurchatkin
Member

At this point it's easier just to soft-deprecate Buffer altogether and encourage devs to use typed arrays instead.

@mcollina
Member

At this point it's easier just to soft-deprecate Buffer altogether and encourage devs to use typed arrays instead.

I think this is even harder than changing Buffer api, as Buffer is part of the Stream API, and also a lot of C++ code.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

@vkurchatkin Buffer(number) has much lower usage than general Buffer, plus the perfomance concerns of Buffer vs typed arrays. Let's be realistic here and let's try to solve this somehow. Deprecating Buffer, if that would ever happen, will take long time.

@vkurchatkin
Member

I think this is even harder than changing Buffer api, as Buffer is part of the Stream API, and also a lot of C++ code.

it shouldn't be that hard to change, since Buffers ARE typed arrays now.

@ChALkeR

plus the perfomance concerns of Buffer vs typed arrays

The only concern is instantiation. We can still provide a function to create uninitialized typed array or array buffer.

Let's be realistic here and let's try to solve this somehow. Deprecating Buffer, if that would ever happen, will take long time.

I mean not really deprecating them, but discouraging and maybe eventually removing from docs.

@jasnell jasnell added the ctc-agenda label Jan 13, 2016
@jasnell
Member
jasnell commented Jan 13, 2016

I'm +1 on the general approach @ChALkeR suggests in #4660 (comment)

By introducing two new alternative factory functions (one safe, one unsafe) and deprecating the current unsafe constructor, we ensure that existing code isn't adversely affected while giving new code an appropriate path forward. Changing the expected behavior of the Buffer constructor or even deprecating Buffer altogether does not address all of the requirements here.

One tweak I would make to @ChALkeR's suggestion is: instead of allocate and allocateRaw, I would use allocateSafe and allocateUnsafe, to make sure it's clearly indicated that there is a safety/security choice being made.

@Fishrock123
Member

Some comments, in no particular order:

At this point it's easier just to soft-deprecate Buffer altogether and encourage devs to use typed arrays instead.

@vkurchatkin Except that Buffer has extra APIs that make it more useful both to core and userland.

20-25% is actually quite a lot slower, true -- but how many calls to Buffer are using Buffer(Number) and require high performance? It seems as though this being fixed in core will bring great gratitude from newcomers and old hats alike. As a relatively new node programmer, I am certainly grateful - my memory won't be stolen!

@karissa Core's are probably most important, but if you are, say, running a heavy realtime websocket application, and ws were using zero-filled buffers for everything, you'd likely loose a lot of throughput.

In Python, slow performance is sort of a given. If you want to make a Python program more performant, you can optimize it in certain ways, which are well known. Performance then is an advanced problem. The old adage "make it work, then make it fast" works well for these kinds of dynamic, interpreted languages.

Python is a tricky comparison because the language just comes with everything. Sure, we expect JavaScript to not execute as fast as C, or even Java, but in Node, being able to achieve high throughput of I/O operations is pretty darn important. That's what we do in a nutshell after all.

Secure opt-ins don't work. See also this and the general principle of people following the path of least resistance.

I would generally agree with this.

Alternate proposal:

  1. Introduce Buffer.allocateRaw(number) to allocate a non-zero filled Buffer, document it, try hard to note the possible security consequences of careless usage.
  2. Introduce Buffer.allocate(number) to allocate a zero-filled Buffer.
  3. Soft-deprecate Buffer(number) (first in doc for at least a whole major release), tell people to use Buffer.allocate(number) instead.
  4. Introduce a command-line flag for opt-in to zero-filling all Buffer(number) calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.
  5. Hard-deprecate Buffer(number), point people to Buffer.allocate(number). Perhaps make Buffer(number) zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.

I think this the most sound proposal here, although I wouldn't mind tweaking it like @jasnell pointed out.

@mscdex
Contributor
mscdex commented Jan 13, 2016

Alternate proposal:

Introduce Buffer.allocateRaw(number) to allocate a non-zero filled Buffer, document it, try hard to note the possible security consequences of careless usage.

-1

Introduce Buffer.allocate(number) to allocate a zero-filled Buffer.

+1

Soft-deprecate Buffer(number) (first in doc for at least a whole major release, perhaps), tell people to use Buffer.allocate(number) instead.

-1

Introduce a command-line flag for opt-in to zero-filling all Buffer(number) calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.

+2

Hard-deprecate Buffer(number), point people to Buffer.allocate(number). Perhaps make Buffer(number) zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.

-1

@mcollina
Member

I agree with @jasnell. Buffer.allocateSafe() and Buffer.allocateUnsafe() are the best ones. I would shorten them up to Buffer.safe() and Buffer.unsafe(), but that's my taste.

@Fishrock123
Member

@mscdex I actually disagree more with that, allocate() does not sound safe-by-default to me.

@silverwind
Contributor

Imho, new API would just complicate things unneccessary. +1 for a --safe-buffers flag.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

@silverwind Just an opt-in command-line flag will not fix anything in the ecosystem. It will fix things only for those setups that care, and those will receive unneeded performance penalties (by double-zeroing the Buffers). It should be viewed only as a temporary measure.

@mscdex
Contributor
mscdex commented Jan 13, 2016

@Fishrock123 I was +1 to that particular idea, not the specific function name. I do agree a better name could be chosen if that route was taken (allocZeroed()?).

@silverwind
Contributor

@ChALkeR we could just make it --unsafe-buffers if we really want to go that far, but I think a few people here would rather have zeroed buffers as an opt-in because performance is a key feature of Node.js after all.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

@silverwind

When trying to solve a problem, we should ask these questions:

  1. How should this been done from the start, putting all current situation aside?
  2. How do we get there in a way that solves things asap, but still breaks the ecosystem in the least possible way?

Preserving new Buffer(string), new Buffer(number), and adding just the weird command-line flag that affects the behaviour of the latter is not the way how this should have been done.

@joepie91
Contributor

I'm a bit iffy about a long hard-deprecation period (as this is a critical security issue that will affect existing applications right now, likely without awareness of the operator), but in principle, I agree with @ChALkeR's suggestion to deprecate the current uninitialized constructor entirely, and to introduce two new methods.

That having been said, I think @jasnell's suggested tweak is a very good idea, and even an essential one, from an "encouraging safe implementations where possible" point of view. Rust takes the same approach with unsafe blocks.

It requires the developer to commit to the choice of using something 'unsafe', which will make them less likely to do so without understanding the consequences. As a bonus, it acts as a 'red flag' to others reviewing the code - a clear indication that they need to be careful when modifying it, as an 'unsafe' feature is being used.

I think the --safe-buffers flag as a temporary mitigation strategy is a good proposal as well, but it should absolutely not be the only mitigation against this issue in the long term. It comes with a severe performance penalty, and will thus introduce a "security vs. performance" trade-off, and I think we all know how that is going to turn out when managers get involved.

@ChALkeR
Member
ChALkeR commented Jan 13, 2016

@joepie91 A hard deprecation done sooner would leave us in a situation when module authors can't use the old API because it's hard deprecated and can't use the new API becase old Node.js versions do not support it.

I guess that will not object to hard-deprecating Buffer(number) sooner that one major release away, if all the other supported Node.js branches receive the two new methods (as backports). But I doubt that anyone else would support that.

@joepie91
Contributor

I guess that will not object to hard-deprecating Buffer(number) sooner that one major release away, if all the other supported Node.js branches receive the two new methods (as backports). But I doubt that anyone else would support that.

Releasing a minor update for existing branches with the new methods would probably be a good idea anyway, for module compatibility reasons. Requiring developers to "detect" which API to use is a great way to ensure that they will just use the old (dangerous) API for as long as they can get away with it instead...

@trevnorris
Contributor

To be clear, I'm not suggesting that node core should use zeroed-out buffers.

This is mostly moot. The majority of memory for core purposes is allocated on the native side, bypassing the ArrayBufferAllocator. The reason for how it operates today was always done for the best interest of users, and our assumption that developers understood the clearly documented API they were using.

The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.

This makes it sound like Buffer does a best guess. Every acceptable value is documented, and what will happen with that value is also documented.

The difference with modules like child_process and fs is that they're very obviously doing powerful and sensitive operations.

node is full of operations that allow us to do power OS interactive things with a simplistic API. This is one of the most powerful aspects of node. I remember when I first started using node (and was completely new to server side development in general) I did all sorts of stupid things to my machine. I learned, admitted my stupidity and didn't do it again. Truth be told I'm definitely not done doing stupid things.

@karissa
karissa commented Jan 14, 2016

Those who want performance will figure out how to get it.

Let's zero fill buffers by default. Please let's not create a
mysql_real_escape_string situation. Just fix it.

On Wednesday, January 13, 2016, Sven Slootweg notifications@github.com
wrote:

I guess that will not object to hard-deprecating Buffer(number) sooner
that one major release away, if all the other supported Node.js branches
receive the two new methods (as backports). But I doubt that anyone else
would support that.

Releasing a minor update for existing branches with the new methods would
probably be a good idea anyway, for module compatibility reasons. Requiring
developers to "detect" which API to use is a great way to ensure that they
will just use the old (dangerous) API for as long as they can get away with
it instead...


Reply to this email directly or view it on GitHub
#4660 (comment).

Karissa McKelvey
http://karissa.github.io/

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@karissa Zero-filling new Buffer(number) by default will actually make the situation worse from the security point of view, taking the current ecosystem into account. See #4660 (comment) for explanation and an alternate approach.

@silverwind silverwind added security and removed memory labels Jan 14, 2016
@silverwind
Contributor

(added a security label, memory is about memory leaks)

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

Btw, I made that old note (that was unfinished before) public: https://github.com/ChALkeR/notes/blob/master/Buffer-knows-everything.md

@feross
Member
feross commented Jan 14, 2016

This makes it sound like Buffer does a best guess. Every acceptable value is documented, and what will happen with that value is also documented.

You're right: it's already documented. But the disclosure is a total of 39 words, amidst a 6000 word document. We can blame users for not reading the docs – but there's no denying it's bad API design to mix safe and unsafe functionality in the same constructor. It's laying a trap for the user.

node is full of operations that allow us to do power OS interactive things with a simplistic API. This is one of the most powerful aspects of node.

It's interesting that you bring up other core APIs.

There are many core APIs that helpfully coerce from String to Number, or the reverse. For example, setTimeout(fn, '1000') becomes setTimeout(fn, 1000).

Even buffer itself does coercion: buf[0] = '40' sets the index to the number 40, and buf.readInt8('0') reads out the value from index 0. Both automatically coerce from String to Number.

Can users really be blamed for expecting new Buffer(myVariable) to be safe?

@feross
Member
feross commented Jan 14, 2016

I also think the existence of Uint8Array and the other TypedArrayView types plays a role here.

Buffer's similarity to these safe typed arrays lulls users into a false sense of security. The fact that Buffer is now also a subclass of Uint8Array makes the potential for confusion even greater.

@bricss
bricss commented Jan 14, 2016

Every new Buffer should allocate new sandboxed memset, fulfilled with zeros depends on it size.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@bricss Switching to zero-filled (by default) Buffer(number) will most likely make things even worse in the current situation, even from the security point of view. Please read my explanation above.

@feross
Member
feross commented Jan 14, 2016

@ChALkeR Btw, the point you raised is valid and I hadn't considered it. Thanks for bringing it to light. Any solution should take that into account.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@feross What do you think about the proposal at #4660 (comment)?

@jasnell
Member
jasnell commented Jan 14, 2016

@nodejs/ctc: Distilling the conversation up to this point: the existing behavior of Buffer(number) is likely not going to change. Given that zero-filling by default would be a significant breaking behavioral change, it would take at least one full semver-major cycle to do and would make Buffer(number) behavior expectations different across different versions of Node, which will lead to more issues than it would solve. Given that, changing the current behavior of Buffer(number) is unlikely to happen. Scanning through the thread, I believe there is consensus on this point among the core contributor participants.

There also appears to be consensus among the core contributors that adding a new factory method for creating a "safe" zero-filled Buffer instance is a good thing. There is some disagreement over what to call it. I believe simply calling it Buffer.safe(number) is sufficient. The implementation of Buffer.safe(number) would essentially be return new Buffer(n).fill(0).

There appears to be a difference of opinion among the core contributors about whether the Buffer(number) constructor should be deprecated and replaced by a Buffer.unsafe(number) type of factory method. I can understand the reasoning for this. For my part, I'm in favor of introducing a new Buffer.unsafe(number) factory method to be symmetrical with Buffer.safe(number) -- which would actually make it easier to understand. But I get why folks don't want Buffer(number) deprecated and I can live with that.

There appears to be consensus among the core contributors about having a command line switch that would explicitly change the default behavior of Buffer(number) to zero-fill by default.

There also appears to be consensus that the documentation can do a better job of spelling out the risks.

The tl/dr; version is:

  • Change or Deprecate Buffer(number): No
  • Add Buffer.safe(number): Yes
  • Add --safe-buffers (or similarly named): Yes
  • Improve documentation: Yes

If we went with this now, we would still have the ability to add Buffer.unsafe(number) and deprecate Buffer(number) later.

I suggest we go with this approach.

@joepie91
Contributor

Change or Deprecate Buffer(number): No

This is unacceptable. This API is extremely dangerous and likely already an issue in many applications due to the ease of introducing it (any typed user-supplied input is a problem!). It absolutely needs to be deprecated as soon as possible, at the very least a soft deprecation.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@jasnell Without deprecation of Buffer(number), this has no sense. There is already an opt-in (Buffer(number).fill(0)) for those who care, but users should be guarded from inadvertently creating unitialized Buffers in the long term, so Buffer(number) must be deprecated or changed. Changing it will break more stuff, so the deprecation must happen.

@jasnell
Member
jasnell commented Jan 14, 2016

@ChALkeR ... ok. As I said, I prefer that option also. @trevnorris and @mscdex , if we had symmetrical Buffer.unsafe(number) and Buffer.safe(number) with a soft-deprecated Buffer(number), is that something you could live with?

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

Add Buffer.safe(number): Yes

There is no point in just adding Buffer.safe(number) without targeting deprecation of Buffer(number). Buffer.safe(number) would be better than the existing Buffer(number).fill(0) only because it wouldn't use the deprecated Buffer(number) method.

@feross
Member
feross commented Jan 14, 2016

@jasnell Just to be clear, this new proposal is:

  • Deprecate Buffer(number) and suggest using Buffer.unsafe(number): Yes
  • Add Buffer.safe(number): Yes
  • Add Buffer.unsafe(number): Yes
  • Add --safe-buffers (or similarly named): Yes
  • Improve documentation: Yes
@jasnell
Member
jasnell commented Jan 14, 2016

@feross ... yes, my goal right now is to see what changes we actually have consensus on. So far a couple of the other core contributors have -1'd the idea of deprecating Buffer(number) so I'm wanting to see what we actually do have consensus on.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@feross

Deprecate Buffer(number) and suggest using Buffer.unsafe(number).

No. Deprecate Buffer(number) and suggest using Buffer.safe(number).

Buffer.unsafe(number) should be used only by those who are absolutely sure that they need it and are not leaking anything.

@jesseschalken

This makes it sound like Buffer does a best guess. Every acceptable value is documented, and what will happen with that value is also documented.

I think in a dynamically typed language it is unsafe to assume that the developer knows what the types are going to be (and that they therefore know which constructor signature is going to be used). In practice values are thrown around and problems solved until things seem to work. The developer will never know they exposed random memory. Even in Flow, TypeScript and Closure it is easy to use any (? in Closure) or a union type which happens to include number without knowing the consequences.

If, for example, Buffer were a Java class with different constructor signatures, the problem wouldn't exist, because the choice of constructor happens at compile time using static types, and so the developer knows exactly which constructor signature is being used (Ctrl-click will take them straight there). It would be impossible for some unknown value to be passed into new Buffer(...) and have it use the new Buffer(number) constructor unintentionally.

(Personally, if the Buffer API were being designed afresh I would move new Buffer(number) to Buffer.alloc(number) for this reason, but I don't know if it's a sufficient problem for a breaking change.)

@tqbf
tqbf commented Jan 14, 2016

You will perish in flames.

@feross
Member
feross commented Jan 14, 2016

Deprecate Buffer(number) and suggest using Buffer.safe(number), Buffer.unsafe(number)

@ChALkeR You're right. Better to push people to use Buffer.safe in the deprecation message. Buffer.unsafe can be used by those who absolutely need it.

@joepie91
Contributor

@bricss There are definitely usecases for unsafe Buffers, even if they are edge cases - notably, when you can be certain that you will write contents to the full Buffer before using it, and you are working on a performance-critical component.

Keep the discussion constructive, please.

@mscdex
Contributor
mscdex commented Jan 14, 2016

@jasnell The main reason I'm against the deprecation is because I don't want to have to do a conditional everywhere I create a Buffer to be backwards compatible with older versions of node and because I always overwrite my entire Buffers with data. Not to mention that I'd have to do so in other people's modules that also write the entirety of their Buffers, just to keep performance from taking a nosedive.

@joepie91
Contributor

@mscdex The backporting of safe and unsafe methods to older (supported) branches of Node resolves that issue.

@feross
Member
feross commented Jan 14, 2016

@mscdex What about just adding trivial Buffer.safe and Buffer.unsafe implementations in a node v4 and v5 minor release:

Buffer.safe = function (size) {
  if (typeof size !== 'number') throw new Error('Argument must be a number')
  return new Buffer(size).fill(0)
}
Buffer.unsafe = function (size) {
  if (typeof size !== 'number') throw new Error('Argument must be a number')
  return new Buffer(size)
}
@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mscdex That could be solved with backporting the two new methods and doing a slow deprecation, turning on the hard deprecation only after all the supported Node.js versions receive the new methods.

@mscdex
Contributor
mscdex commented Jan 14, 2016

@feross I'm thinking about v0.10 and v0.12 specifically.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mscdex 0.10 and 0.12 will be unsupported by the end of this year.

@mscdex
Contributor
mscdex commented Jan 14, 2016

@ChALkeR unsupported !== unused

@feross
Member
feross commented Jan 14, 2016

@mscdex Why not backport to those too then?

Edit: Nevermind, I don't think this is a good idea. See my next comment.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mscdex Hard deprecation would be delayed until at least v7, and probably v8 anyways. Any objections against soft documentation-only deprecation at this point?

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

unsupported !== unused

@mscdex That means that they receive no support, including even the security fixes. You can't care about people who use old versions and leave those versions unsupported (namely without security fixes) at the same time.

It would be better for them to update to the new versions, and if new Buffer methods will give them one more reason to update — it would actually be good for them.

@feross
Member
feross commented Jan 14, 2016

Since 0.10 and 0.12 will be unsupported by end of year, I don't expect much new code to target those versions anyway. So it's safe for most to start using Buffer.safe and Buffer.unsafe right away.

Folks who need 0.10 and 0.12 support can pay the cost of writing the extra conditional check.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

0.10 is in maitance mode, 0.12 is in active LTS.
@mscdex Does this mean that 0.12 might receive the new API?

Edit: Ah, I see, adding new features is against the LTS definition.

@jasnell
Member
jasnell commented Jan 14, 2016

Given the v0.10 and v0.12 are essentially in maintenance, backporting the Buffer.safe() and Buffer.unsafe() to those likely would not be done. Our stance towards landing semver-minors in v4 is less strict but still on a case by case basis. I would not count on either Buffer.safe() or Buffer.unsafe() ever landing in v0.10, v0.12 or v.4. Likewise, deprecating Buffer(number) is a semver-major that would never be backported to v0.10, v0.12 or v4.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@jasnell

Likewise, deprecating Buffer(number) is a semver-major that would never be backported to v0.10, v0.12 or v4.

I guess that wasn't proposed by anyone =).

@mscdex
Contributor
mscdex commented Jan 14, 2016

FWIW here are benchmark results for allocating a 1mb Buffer:

 v5.4.0v4.2.3v0.12.9v0.10.41
new Buffer41,515 ops/sec ±3.00%
(164 runs sampled)
43,670 ops/sec ±1.86%
(166 runs sampled)
53,969 ops/sec ±1.41%
(162 runs sampled)
147,350 ops/sec ±1.82%
(166 runs sampled)
new Buffer (zero-filled)5,041 ops/sec ±2.00%
(174 runs sampled)
4,293 ops/sec ±1.79%
(173 runs sampled)
7,953 ops/sec ±0.55%
(192 runs sampled)
8,167 ops/sec ±2.38%
(184 runs sampled)
@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mscdex The current proposal does not deal with replacing non-zero-filled Buffers with zero-filled Buffers. You could keep the performance using the new API, so the performance argument does not have anything to do with it.

@alfiepates

Reading through this thread as a node.js user, I'm rather disappointed it's taken this long to come to a conclusion of "hey this shit's broke, gotta be fixed yo".

This bug could be considered more damaging than Heartbleed and the fact that it isn't being taken as seriously as Heartbleed astonishes me - Do I need to give it a name and a pretty logo?

IMO, this is not a difficult issue to solve:

  • Deprecate the existing buffer, right now. I don't particularly care if it breaks unsupported code, I just want it gone because it's dangerous. A soft deprecation would also be acceptable.
  • Implement buffer.safe and promote buffer.safe as a direct replacement for buffer.
  • Implement buffer.unsafe for people who absolutely know their code will be writing to the buffer immediately and they are working on performance-critical code. Document it, but don't encourage it's use, and clearly state the tradeoffs.
  • Backport these new methods to existing supported versions, as with any other critical security fix.

This is a bug, and it's a huge bug, and as a node.js user it's a bug I want squashed as soon as possible.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mscdex @feross Pointing people from Buffer(number) to Buffer.safe(number) instead of Buffer.unsafe(number) is needed only because this way people who do not care or are not sure what they are doing will get the correct method. People who are experienced enough to work with unitialized Buffers and who care about performance would be able to actually read the docs and find Buffer.unsafe(number). Anyways, it's unsafe to work with unitialized buffers without reading the docs on those first.

@joepie91
Contributor

@jasnell What's the point of a maintenance period if security patches are not backported?

@alfiepates

What's the point of a maintenance period if security patches are not backported?

+1

@jasnell
Member
jasnell commented Jan 14, 2016

Adding Buffer.safe() and Buffer.unsafe() would not qualify as a security patch. It is new API. As has been pointed out repeatedly, the existing behavior is documented and has been around for a very long time.

@jasnell
Member
jasnell commented Jan 14, 2016

For v0.10, v0.12 and v4.x, polyfills for Buffer.safe() and Buffer.unsafe() are drop dead simple, any module that wants to use the new APIs but be backwards compatible could easily do something like:

Buffer.safe = Buffer.safe || function(n) { return new Buffer(n).fill(0); };
Buffer.unsafe = Buffer.unsafe || function(n) { return new Buffer(n); }
@jasnell
Member
jasnell commented Jan 14, 2016

I have opened a PR (#4682) that provides an initial (somewhat sloppy) implementation for Buffer.safe() and Buffer.unsafe(). It could be improved but it's a start. I'm opening it now only for discussion purposes so the new APIs can be experimented with and benchmarked. We can also iterate on the implementation of Buffer.safe() to try to make it more efficient. The implementation of Buffer.unsafe() is the same as Buffer(num) and should have identical performance characteristics.

@joepie91
Contributor

@jasnell The addition of Buffer.safe() and Buffer.unsafe() in older versions is a prerequisite for the security patch in later versions to have effect - if the change is not backported, module authors will just hang on to the deprecated behaviour for as long as they possibly can, prolonging the issue, to avoid having to feature-detect the availability of safe/unsafe or think about polyfills. Whether people can do something is irrelevant if they have no motivation to do it.

Are there any actual technical concerns regarding backporting safe and unsafe to older, maintained versions? It would not introduce any backwards compatibility issues whatsoever as far as I can tell (as it is purely an API addition, not a deprecation), and arguing against this seems to me like following the letter of the guidelines, rather than the spirit (or its goals).

Whereas this is admittedly a bit of a strange case (compared to the average security patch), it is required for correct functioning of the patch in newer versions - which can only happen with ecosystem support - and I can't see any reason to treat it differently from any other security patch in that regard. Backporting non-breaking functionality to older versions will allow the ecosystem to take care of the rest.

How long it's been "around" for doesn't really matter - it only makes the issue worse. I have yet to speak to a developer who actually expected this issue to exist, so it is reasonable to assume that all usage of new Buffer in existing code is wrong.

@jasnell
Member
jasnell commented Jan 14, 2016

Yes, there are technical concerns. Additive changes can break (and have
broken) things. The argument "what's the harm in landing this in LTS" can
be applied to each and every semver-minor we get. There are always
reasons we could land but those rarely, if ever, justify whether we
should. In this case, the existing behavior of Buffer(num) is well
established and there is an existing workaround already available to
developers (e.g. calling .fill(0) as already recommended by the
documentation).

At most I can see an argument for backporting the --safe-buffers
command line argument but even that would need to be decided by ctc
consensus.
On Jan 13, 2016 8:39 PM, "Sven Slootweg" notifications@github.com wrote:

@jasnell https://github.com/jasnell The addition of Buffer.safe() and
Buffer.unsafe() is a prerequisite for the security patch in later
versions to have effect - if the change is not backported, module authors
will just hang on to the deprecated behaviour for as long as they possibly
can, prolonging the issue, to avoid having to feature-detect the
availability of safe/unsafe or think about polyfills. Whether people can
do something is irrelevant if they have no motivation to do it.

Are there any actual technical concerns regarding backporting safe and
unsafe to older, maintained versions? It would not introduce any
backwards compatibility issues whatsoever as far as I can tell (as it is
purely an API addition, not a deprecation), and arguing against this seems
to me like following the letter of the guidelines, rather than the spirit
(or its goals).

Whereas this is admittedly a bit of a strange case (compared to the
average security patch), it is required for correct functioning of the
patch in newer versions - which can only happen with ecosystem support -
and I can't see any reason to treat it differently from any other security
patch in that regard. Backporting non-breaking functionality to older
versions will allow the ecosystem to take care of the rest.

How long it's been "around" for doesn't really matter - it only makes the
issue worse. I have yet to speak to a developer who actually expected this
issue to exist, so it is reasonable to assume that all usage of new Buffer
in existing code is wrong.


Reply to this email directly or view it on GitHub
#4660 (comment).

@alfiepates

Adding Buffer.safe() and Buffer.unsafe() would not qualify as a security patch.

I'm afraid we're now drifting away from the core of this issue; There's a bug, and it's gotta be fixed.

We can sit here all day talking about how a fix for this bug do or do not qualify as a "security patch", but to be completely frank I don't care: There's a glaring security issue here that really needs fixing, and the longer we continue to argue about what this issue is, the more time someone has to exploit it.

@jasnell I also think you're missing @joepie91's point, somewhat: Yeah, there is a workaround, but that's useless for unmaintained code. If there is a bug in a piece of unmaintained code which somebody is using in production which allows this to be exploited, then claiming "There's a workaround!" isn't gonna do anything about the people who just got their private keys lifted.

Hell, even for maintained code it's an issue: We know developers are lazy, if someone can take security over performance they most likely will if they don't know any better.

Backport something, even if it does break old code. Add an --unsafe-buffers flag if you want, but don't refuse to patch a vulnerability that you know exists in supposedly supported code, because IMHO that is irresponsible.

@joepie91
Contributor

Yes, there are technical concerns. Additive changes can break (and have broken) things.

What can you see this addition breaking?

The argument "what's the harm in landing this in LTS" can be applied to each and every semver-minor we get. There are always reasons we could land but those rarely, if ever, justify whether we should.

This isn't just "yet another semver-minor". This is a security issue. It is not properly solvable without at the very least backporting the API addition. That's the justification.

In this case, the existing behavior of Buffer(num) is well established and there is an existing workaround already available to developers (e.g. calling .fill(0) as already recommended by the documentation).

Which makes the entire patch moot, and ignores the reason the patch proposal exists to begin with - because the current "recommend that people zero-fill themselves" approach doesn't work, in part because of constructor overloading. This is not a solution, and does not address the "motivation" concern I raised above at all.

@jasnell
Member
jasnell commented Jan 14, 2016

Adding Buffer.safe and Buffer.unsafe to LTS versions does nothing to
help unmaintained code either given that none of that code will be modified
to use the new APIs. Back porting those APIs does not solve the problem.

As I said, there is an argument to be made for backporting
--safe-buffers. There is no argument that I can see for backporting
Buffer.safe and Buffer.unsafe. Modules that want to use the new APIs on
older versions can polyfill, developers that want to make sure older
unmaintained modules can't be exploited could use --safe-buffers. Even
still, there's a cost when introducing new command line options in older
versions of Node. The decision cannot be made lightly.

I have opened a PR that begins to address the problem moving forward. The
next step is to implement --safe-buffers and get the idea of backporting
that in front of the LTS WG and the CTC for discussion.
On Jan 13, 2016 9:17 PM, "Alfie Pates" notifications@github.com wrote:

Adding Buffer.safe() and Buffer.unsafe() would not qualify as a security
patch.

I'm afraid we're now drifting away from the core of this issue; There's a
bug, and it's gotta be fixed.

We can sit here all day talking about how a fix for this bug do or do
not qualify as a "security patch", but to be completely frank I don't care:
There's a glaring security issue here that really needs fixing, and the
longer we continue to argue about what this issue is, the more time someone
has to exploit it.

@jasnell https://github.com/jasnell I also think you're missing
@joepie91 https://github.com/joepie91's point, somewhat: Yeah, there is
a workaround, but that's useless for unmaintained code. If there is a
bug in a piece of unmaintained code which somebody is using in production
which allows this to be exploited, then claiming "There's a workaround!"
isn't gonna do anything about the people who just got their private keys
lifted.

Hell, even for maintained code it's an issue: We know developers are lazy,
if someone can take security over performance they most likely will if they
don't know any better.

Backport something, even if it does break old code. Add an
--unsafe-buffers flag if you want, but don't refuse to patch a
vulnerability that you know exists in supposedly supported code, because
IMHO that is irresponsible.


Reply to this email directly or view it on GitHub
#4660 (comment).

@alfiepates

Adding Buffer.safe and Buffer.unsafe to LTS versions does nothing to
help unmaintained code either given that none of that code will be modified
to use the new APIs. Back porting those APIs does not solve the problem.

Then you make buffer behave like buffer.safe and you add a flag, say --unsafe-buffers to enable the unsafe functionality.

Don't implement a fail-deadly.

@jasnell
Member
jasnell commented Jan 14, 2016

-1. As already indicated several times the existing behavior of
Buffer(num) is not changing. We can deprecate that constructor in the
next major version, but (based on the opinions of the core contributors
expressed in this thread) we are not changing that default behavior. The
option on the table is to replace it moving forward with Buffer.safe
and Buffer.unsafe, to introduce the --safe-buffers flag, and to
possibly backport --safe-buffers to LTS.

I'll make sure this topic is on next week's @nodejs/ctc call agenda to get
a final resolution if necessary.
On Jan 13, 2016 9:50 PM, "Alfie Pates" notifications@github.com wrote:

Adding Buffer.safe and Buffer.unsafe to LTS versions does nothing to
help unmaintained code either given that none of that code will be modified
to use the new APIs. Back porting those APIs does not solve the problem.

Then you make buffer behave like buffer.safe and you add a flag, say
--unsafe-buffers to enable the unsafe functionality.

Don't implement a fail-deadly.


Reply to this email directly or view it on GitHub
#4660 (comment).

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@alfiepates

Then you make buffer behave like buffer.safe

Once again: that will only make things worse, even from the security point of view.

@rvagg
Member
rvagg commented Jan 14, 2016

@jasnell I think you might be jumping a bit early on calling "consensus", many of us are hovering and watching this evolve. This discussion has actually been going on significantly longer than this issue and the reason nothing has changed to date, over the many years Node has had a Buffer and this behaviour was intentionally introduced, is exactly because there has been no consensus that something should change.

To all those calling this a bug: this is not a bug, it is intended behaviour and is documented, it is not a surprise to anyone who works in core and is common amongst non-web programming environments. That's not to say that what we have is ideal and shouldn't be changed or improved. Just please stop interpreting this discussion as some newly discovered heartbleed-like situation, that's unhelpful hyperbole that will be dismissed when folks are making considered decisions here. Any association with heartbleed goes directly to the two packages that saw fixes this week for misusing the Buffer API, not core. I think it's the fact that very seasoned developers could find such misuse so easy that is the focus of this for core, that's the real problem that core has and needs to address in some way.

This is a very long discussion with a considerable amount of historical context that can't be boiled down to a meme.

And no, I'm not picking a side on this, don't read that into my words. I would just like to ask the heat and pace be taken out of this so we can take a more considered approach. I think we all appreciate the input being provided here, just please don't expect anything to happen very quickly due to the complexities and differences of opinions involved.

@feross
Member
feross commented Jan 14, 2016

@ChALkeR I think that @mafintosh's point is relevant.

In the current proposal, developers are still forced to keep using Buffer(variable) to convert to a string, leaving them vulnerable to the original issues that affected bittorrent-dht and ws.

With these:

  • Buffer.from(value) - convert from any type to a buffer
  • Buffer.alloc(size) - create an uninitialized buffer with given size

the developer intent is clearly conveyed, and potentially unsafe functionality is put into it's own API.

I think if the API were being designed today, this how it would be done. Let's deprecate Buffer(value) entirely.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mafintosh Hm. That's a valid point. Perhaps that justifies adding a new API endpoint to create Buffers from values.

@trevnorris
Contributor

I think if the API were being designed today, this how it would be done. Let's deprecate Buffer(value) entirely.

The assertiveness of your statement is a little off putting. At minimum this needs to be voted on by the @nodejs/ctc and a change this drastic may warrant a proper EPS.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@mafintosh, @feross I just gave you the access to the post about the current proposal, it's about 70% done, so that you can view it and comment here or somewhere else.

@ChALkeR
Member
ChALkeR commented Jan 14, 2016

@trevnorris «What the API would look like if it was designed today» is the first most important question that we should study and think about. The second one is «how do we get there from where we are now in a best possible way»?

#4660 (comment)

@mafintosh
Member

@ChALkeR thanks. i'll have a look.

@trevnorris
Contributor

@ChALkeR "What should it look like today" is always going to be subjective. I understand the merit in asking that question, but for example designing the fs module today from scratch would bring on quite the argument. Instead I would suggest "what feels most natural with the existing API". Maybe it's not "the best", but at least it will work the way the community expects it to.

@jasnell
Member
jasnell commented Jan 14, 2016

Quick update on #4682 :

  • The --zero-fill-buffers command line option is implemented.
  • The new APIs are now named Buffer.alloc() and Buffer.zalloc().
    • Buffer.zalloc() == 'zeroed-allocation'. It essentially just creates and returns a new Uint8Array that is always zero-filled.
    • Buffer.alloc() is equivalent to the existing new Buffer(num).
  • Equivalent SlowBuffer.alloc() and SlowBuffer.zalloc() methods are added as well
  • The existing Buffer(num) and SlowBuffer(num) are hard deprecated.
  • Documentation is updated as well.

As a reminder, this is a work in progress PR that is not ready to land as is. It is intended to give us something concrete to work with as opposed to going around in circles over whether or not there's really a bug in Buffer or not.

Using the --zero-fill-buffers command line flag would force Buffer.alloc() and Buffer(num) to fallback to Buffer.zalloc() (and SlowBuffer.alloc()/SlowBuffer(num) to fall back to SlowBuffer.zalloc()).

This approach gives us two distinct ways of addressing the problem:

  • Policy: Node can be run with the --zero-fill-buffers flag to enforce that all buffers be zeroed
  • Dev: Developers can choose either Buffer.alloc() or Buffer.zalloc() based on their specific needs, with the --zero-fill-buffers command line flag taking precedence.

For LTS, I would propose that only the --zero-fill-buffers flag would be backported to v4, v0.12 and v0.10. Note, however, that due to the differences in the underlying Buffer implementation in v0.12/v0.10, the actual implementation would be different than v4 and above. The differences in the implementation are likely inconsequential here.

This all said, I'm still personally unconvinced that the new Buffer.alloc() and Buffer.zalloc() methods are even required (similar to @mikeal's objection here). I think the new methods simply add confusion and make the API even muddier. However, I won't let my personal objections block the action on this.

@jsha
Contributor
jsha commented Jan 14, 2016

Changing new Buffer(number) in a way so that it will zero-fill the memory will cause more security issues and make this matter even worse. Please read the above discussion or wait for my post.

I read the entire thread in detail before replying. I see two arguments that de-fanging the present Buffer(num) API might cause security issues:

  1. Users could be depending on uninitialized memory to provide entropy to a security-critical CSPRNG.

Such code is already critically, dangerously broken, and hopefully very rare. I don't see this as a good reason to hold up a very real security fix affecting a broad range of modules.

  1. Module authors might notice the new zero-filling behavior (without reading the docs) and start to depend on it in new code that intentionally calls Buffer with a number argument. Users running old versions of Node would import those modules and get incorrect behavior.

I have been taking it for granted that de-fanging the current API would come along with deprecating it, which would avoid this problem.

Are there arguments I'm missing about why this would cause additional security issues?

To give a little more perspective, think of all the small devices that are now running Node.js on incredibly limited resources.

I understand this concern, but can you give some examples of apps and/or devices? I would be very surprised if even the smallest devices can't afford to zero their memory. It would also be useful to have examples of popular npm modules that intentionally use Buffer(num) for performance reasons. It would then be possible to benchmark those modules under realistic conditions.

Even if we assume there is a device and app that would become unusable with a de-fanged API, the maintainer can just switch to some new Buffer.unsafe_alloc() method, if that's really what they intend to do.

@feross
Member
feross commented Jan 14, 2016

Another argument for Buffer.from and Buffer.alloc as a replacement for the current overloaded Buffer(value) constructor is that it's familiar because of Array.from and TypedArray.from.

Those already work in the way that Buffer.from would work.

The Array.from() method creates a new Array instance from an array-like or iterable object.

Almost all the types that Buffer(value) takes today (except for number, of course, and ArrayBuffer) match what Array.from and TypedArray.from do today.

@rmhrisk rmhrisk referenced this issue in PeculiarVentures/node-webcrypto-ossl Jan 14, 2016
Open

look at key handling #12

@trevnorris
Contributor

Almost all the types that Buffer(value) takes today (except for number, of course, and ArrayBuffer) match what Array.from and TypedArray.from do today.

So except for type number, ArrayBuffer, string and JSON type (leaving TypedArray and Array) almost all the types match?

@feross
Member
feross commented Jan 14, 2016

Actually, Array.from works on string.

@feross
Member
feross commented Jan 14, 2016

The whole point of this issue is that number shouldn't be included in the same constructor.

So, yes. Array.from works on TypedArray, Array, and string. Not on ArrayBuffer or JSON.

@trevnorris
Contributor

Doesn't work the same way.
Uint8Array.from('123') -> [1, 2, 3]
Buffer('123') -> [31, 32, 33]

@feross
Member
feross commented Jan 14, 2016

Fair point. So it's not exactly the same. I just suggested that it would be familiar.

@alfiepates

@ChALkeR

Once again: that will only make things worse, even from the security point of view.

No, It won't. The only "reasons" I've seen mentioned for that sound a lot more like excuses than legitimate concerns, and certainly not worth holding up what is a pretty important security fix for.

I'm not satisfied with adding a --zero-fill-buffers flag as the only backported fix for this issue. I'm a big believer in secure defaults, and this is not one.

I'll say it again: Do not implement a fail-deadly. Stop giving your users footguns.

@pornel
pornel commented Jan 14, 2016

@mscdex

Benchmarks comparing Buffer(n) vs Buffer(n).fill(0) are not realistic, because they compare not initializing the buffer at all, ever (which you shouldn't do - that the vulnerability) with initializing the buffer once, while the valid real-world case is initializing the buffer once (in your code by overwriting unitialized memory) vs initializing the buffer twice (by overwriting zeroed memory). Therefore, in real world worst-case slowdown can't be worse than 2x.

In OS X malloc is only ~15% slower than calloc if you overwrite the buffer with some data. For large allocations OS can use zero page copy-on-write memory, which makes zeroing almost free (and when you read pages that were left unchanged it's even slightly faster due to improved cache locality).

The difference in performance is tiny - anything done with the buffer in JS is likely to completely eclipse cost of zeroing that can be done very efficiently in low-level code.

@pornel
pornel commented Jan 14, 2016

I'm strongly in favour of security over IMHO inconsequential performance cost, and would prefer Buffer to be secure and deterministic.

For me uninitialized Buffer was also a source of bugs: I forgot to fully overwrite a buffer, but my program seemed to work, because I was lucky to get uninitialized values that didn't trigger an error. Of course it broke later. If Buffer was deterministic, it'd either always fail or always work, which I prefer.

Even new systems programming languages (D, Rust, Swift) don't allocate uninitalized memory by default any more.

I'd prefer zeroing to be the default behavior, treated as a security issue, and backported to old versions. If someone doesn't update - that's the same problem as if they don't apply any other security patch.

@alexjeffburke
Contributor

I'm going to sidestep comment on the proposals as I can see it's getting heated and I think almost all the views are being represented - but I would like to add another voice representing a long term node developer where this thread is the first I knew of this behaviour and being surprised. That may also be unimportant, there are many areas we don't touch as developers - but as the lead on a node codebase in light of this I'd want to go back and make sure we got this right.

@trevnorris
Contributor

@pornel Zero-filling memory or not has nothing to do with how deterministic the API works. The fact that uninitialized memory is returned is also well documented. So the most you can say is that it was a surprise because you weren't fully aware of how Buffer behaved.

Myself and several other CTC members have agreed that we are open to discussion of how to improve the "surprise" of the Buffer API, along with how to best address this in a forward looking and backwards compatible way. But this isn't going to happen in a couple days, and probably not a couple weeks. This is a substantial API change and will be treated as such. It's likewise a major version change, and taking the release schedule into account there's ample time to sit down and work out what's best going forward.

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

@trevnorris @feross @mafintosh

Note that I split the proposal into two parts — the old one that was previosly discussed and the Buffer.from(value) one. Those could be discussed separately and merged with different schedules.

@alexjeffburke
Contributor

@trevnorris Thanks for the update - think that's all many of us could have asked for.

@feross
Member
feross commented Jan 15, 2016

@trevnorris

we are open to discussion of how to improve the "surprise" of the Buffer API

Thanks for making that point clear. That's all I intended by opening this issue. 👍

@jasnell
Member
jasnell commented Jan 15, 2016

As far as the "inconsequential performance cost"... here are some current benchmarks comparing buffer creation using #4682's zalloc() vs. alloc() (hint: larger numbers are better, zalloc == zero-filled allocated, vs alloc == the current uninitialized behavior):

methodsizeop/s
zalloc101218.96236
alloc103354.52777
methodsizeop/s
zalloc10241076.14519
alloc10242101.91937
methodsizeop/s
zalloc2048648.60563
alloc2048991.04239
methodsizeop/s
zalloc4096441.87567
alloc4096516.88185
methodsizeop/s
zalloc8192298.96463
alloc8192412.11662

These clear demonstrate that the performance difference is not inconsequential by any stretch of the imagination. As @trevnorris and I and the other CTC members here have indicated, we are certainly open to making improvements here (as demonstrated by the work in #4682) but it will take a while and there is quite a bit to consider before making any firm decisions.

@paragonie-scott

Is performance really more important than security?

Let me put it another way: If a developer introduces a vulnerability in their application because of how Buffer is implemented, who do you blame?

  • Them, for being stupid
  • Yourself, for designing an API that made it easy to shoot oneself in the foot

Seriously. What are your priorities? I made a similar argument against PHP last year:

There was serious resistance to throwing an Exception in the event of "our system is totally fucked and no CSPRNG is available". They'd rather just return false because it would be more consistent.

Then there were the "No it should be a critical E_ERROR that terminates the script." Which would make developers nervous about adopting the new interfaces ("this can make my app crash randomly with no recourse? No, I'll stick with an insecure random number generator").

I argued that an Exception would be a good balance of secure-by-default but allowing users to handle failures gracefully.

The relevance here: As developers, we're often tasked with juggling multiple seemingly-opposed requirements at once. There were many seemingly valid arguments (consistency! no, we need to always fail!).

The path PHP chose, which I argue is the path Node.js should consider, is to be secure by default but allow developers who really want to trade away security for some other metric (performance, etc.) to opt out of this default.

try {
    $str .= chr(random_int(0, 255));
} catch (Exception $e) {
    // Look ma, I'm living dangerously!
    $str .= chr(mt_rand(0, 255));
}

My suggestion as a security researcher

Make the constructor sane by default, but let developers who need an unsafe but more performant API endpoint opt into that by changing the way their code is written.

Is it a BC break? Yes! But secure code is necessarily incompatible with insecure code, lest it open the door to downgrade attacks.

@paragonie-scott

More concise recommendation: Migrate the way Buffer is currently implemented to UnsafeBuffer and then fix Buffer.

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

@paragonie-scott That will make things worse even from the security point of view, and moreover will not solve an important part of this issue. Please wait for my post regarding the proposal, it explains my statements.

@paragonie-scott

That will make things worse even from the security point of view,

Explain how migrating the insecure API to an explicitly labelled UnsafeBuffer and then fixing Buffer to behave more securely will make things worse from a security point of view.

@alfiepates

@ChALkeR

That will make things worse even from the security point of view

Can you please clarify exactly how it'll make things worse?

@kumavis
kumavis commented Jan 15, 2016

Make the constructor sane by default, but let developers who need an unsafe but more performant API endpoint opt into that by changing the way their code is written.

This seems like a healthy best-of-both-worlds approach

@mikeal
Member
mikeal commented Jan 15, 2016

This is getting a bit too heated. It would help tremendously if we did a few things:

  • Stop imagining things like "if we built it today." In order to move towards a resolution we have to move towards a shared understanding of the reality of the project and the constraints it has in terms of stability with such a massive ecosystem and userbase.
  • Stop referring to "deprecation" or "migration" when talking about any kind of breaking change. This is one of the most depended on APIs in core, altering it is such an enormous breaking change that any solution that doesn't break it is favourable to breaking it. The world we live in today is one where, regardless of the reasoning behind it, a breaking change of this magnitude is likely to delay adoption of a major release that incorporates for a year or more.
  • Please acknowledge the breadth of use cases Node.js is under. Phrasing things in terms of "performance vs. security" assumes a specific set of use cases that are not representative of all Node.js use cases and we need to continue to support all of them. The view you have is built from the experiences you've had in your own environment but Node.js is used in many environments totally different than yours and many of the tradeoffs you many be advocating do not apply.
@wbl
wbl commented Jan 15, 2016

@mikeal, do you have actual code with actual measurements where Buffer(number) is the rate-limiting step? Of course, there are several other approaches code that does this can take to regain the performance, such as using the opt-in methods that will be provided. By contrast code which is written with Buffer(object) will result in a security failure whenever object becomes a number. There is no correctness issue with clearing new buffers, just a minor speed issue

@mikeal
Member
mikeal commented Jan 15, 2016

@wbl you have to look at the performance cost in aggregate. For instance: not too long ago I was talking with someone who needs to run 5 docker instances (one per network device + 1 extra) for a networking project he's running on that is based on Raspberry Pi. He wasn't going to be able to use Node.js because of the memory constraints but when our idle memory usage dropped significantly in v4 he was able to use Node.js. Cutting in half the number of operations we can do per second here is likely to throw a new wrench in that project and this is just one example I happen to know about because I started chatting with him at a conference. Node.js is now the most widely supported runtime in IoT platforms and is running on-chip in nearly all next generation platforms. These environments are highly constrained and something we have to fight hard to keep in mind as we have historically been server focused.

@joepie91
Contributor

@jasnell: Adding Buffer.safe and Buffer.unsafe to LTS versions does nothing to help unmaintained code either given that none of that code will be modified to use the new APIs. Back porting those APIs does not solve the problem.

My proposal is unrelated to 'unmaintained code'. The unmaintained code will be broken anyway, that is a given - there's no point optimizing for that usecase beyond the --safe-buffers flag that was already discussed, as the code itself will never ever get updated.

The usecase I am proposing this for is for maintained, third-party code that needs to run on older LTS/maintenance releases. When faced with this memory issue, a module developer (not application developer!) has several possible ways to approach this in their module:

  1. Rely on Buffer.safe and/or Buffer.unsafe. This will break their module in anything but the newest versions, affecting their userbase. They will almost certainly not pick this option for that reason alone.
  2. Keep using Buffer(num) for as long as they can get away with it - ie. until it gets hard-deprecated. This is an attractive option - they can file it away as "look at this later", and from a functional point of view, their code will still keep working, on both old and new versions. It is also very dangerous.
  3. Use a polyfill, or manual feature detection. This will look complicated, people will worry about 'polyfill performance', and be concerned about the amount of moving parts that are involved. They're adding another potential point of failure, so this option is also unlikely to be chosen.

In practice, the only reasonable option for a module developer is to keep using the unsafe behaviour. Only the most security-conscious people who actually understand the issue (it is not simple to understand!) will spend more than a few seconds assessing the impact of using a polyfill and how they can make it a viable option. The rest of the ecosystem remains vulnerable.

If you backport the safe and unsafe methods, however, the tradeoffs change. You can now safely use the new methods in your module code, and expect users to have them in their environment - it is much more viable to tell your users to upgrade to the latest minor release on their branch, than to tell them to upgrade to a new, major, breaking version. This is a tradeoff between version compatibility and security that many module developers will consider reasonable.

In short: my proposal is not about patching a security issue directly in older versions. It's about providing the right tools in the right versions, to drive 'secure behaviour' from module developers, and make it worry-free for them to migrate to the new API. This is a necessity for this to truly get fixed. You need that ecosystem adoption.

@rvagg: To all those calling this a bug: this is not a bug, it is intended behaviour and is documented, it is not a surprise to anyone who works in core and is common amongst non-web programming environments.

Yes, it is absolutely a bug. The Buffer constructor is overloaded, where some methods of invocation cause safe behaviour, and others cause unsafe behaviour. That is a design flaw, and thus a bug. The actual memory-related vulnerabilities in eg. ws are just a consequence of that bug.

@rvagg: Just please stop interpreting this discussion as some newly discovered heartbleed-like situation, that's unhelpful hyperbole that will be dismissed when folks are making considered decisions here.

It is a new Heartbleed-like situation. This is not an edge case. It affects literally almost everything that takes typed user input (eg. anything with a JSON API). If anything, it's worse - because now you can freely define exactly how much memory you want to get back.

@seishun: Personally, I use new Buffer(number) a lot in my code. I wouldn't mind replacing those calls with Buffer.alloc(number), but I would mind having the word "unsafe" all over my code that doesn't even send data anywhere. Not every Node.js program is a web server, and the concept of "safety" depends heavily on the use case.

It doesn't. In this context, "safe" and "unsafe" refers to memory safety. A non-zeroed alloc call is memory-unsafe. Whether that has implications for your application is a separate discussion entirely.

The 'unsafe' naming is appropriate and necessary. If nothing else, it is a warning flag to people working on your code, indicating that they need to be careful what they're doing. Which is the case in any case where Buffer is used, because your Buffer-filling code may well stop working when the code is changed.

That having been said, I don't particularly care whether it's called safe/unsafe or alloc/allocUnsafe. As long as the unsafe option is clearly marked as such, and does not look like the obvious option. Similarly, new Buffer.fill(0) is not an acceptable option, as it is not the obvious option.

@mikael: This would mean that individual modules decide whether they want to be faster or safer and would leave users of modules with a combination of slower and less safe modules when they want to choose a specific characteristic.

No such trade-off exists here. Your module being "safe" is not optional. The only thing this change does, is make you explicitly decide whether you want to let the Buffer module handle the safety, or whether you want to roll your own (possibly more performant) implementation for it. In either case, using uninitialized memory in an application-unsafe manner is never acceptable, regardless of the environment.

To give an example: If you are using a Buffer to store a bitmap, and you know and ensure that you won't access segments before they have been written to, then that's fine. Your code is still safe - as long as it enforces that - but it is more performant, as you don't have the now-unnecessary zerofill.

So, again, this explicitly isn't about a trade-off of security vs. performance. It's about making a user to decide whether to take care of the security aspect themselves, and defaulting to doing it for them if the user has no idea. This has no effect on embedded/low-powered platforms.

@jasnell: The new APIs are now named Buffer.alloc() and Buffer.zalloc().

Those names do not sufficiently convey the safety characteristics. I guarantee you that any new developer to Node is going to be confused about this, and pick alloc because it's shorter and looks "less special".

@trevnorris: The fact that uninitialized memory is returned is also well documented.

Whether this is 'documented' is irrelevant. No amount of documentation will sufficiently change user expectations. When desiging a potentially dangerous API, you must design it so that even if the user doesn't read the documentation, the worst thing that'll happen is that performance or availability is affected, but never security.

For a practical example of this, look at nearly every bit of cryptography code written in the past two decades. Almost without exception, it's broken because it uses cryptographic primitives incorrectly, despite being documented just fine. On the other hand you have libsodium/NaCL, which is designed to be secure-by-default (without removing access to lower-level functionality), and I can count on none fingers the amount of incorrect implementations I have seen for that.

@mikael: Stop imagining things like "if we built it today." In order to move towards a resolution we have to move towards a shared understanding of the reality of the project and the constraints it has in terms of stability with such a massive ecosystem and userbase.

While I understand your sentiment, I partially disagree. Yes, what the ecosystem looks like right now is important - but that is step two. Step one is to determine what it should look like if we started from a blank canvas, and after that, we need to start considering how close we can get to that without breaking the ecosystem. The second step does not and cannot replace the first - it will just result in poorly-implemented patches and bikeshedding.

@ariscop
ariscop commented Jan 15, 2016

From @jasnell 's measurements it's a 15% to 60% cost depending on size. Is avoiding that cost really worth the subtle trap? @feross has demonstrated that it exists in real-world code, it's exploitable on the open web right now.

@wbl
wbl commented Jan 15, 2016

@mikeal, Getting pwnd on IoT devices also matters. Furthermore, we aren't cutting performance in half. Even if we are reducing the performance of new Buffer(number) by 50%, the overall application's performance will be affected much less, as it doesn't spend 100% of its time in new Buffer(number). Again, do you actually have numbers to back up your assertions that new Buffer(number) is extremely critical?

@reqshark

@trevnorris: we are open to discussion of how to improve the "surprise" of the Buffer API, along with how to best address this in a forward looking and backwards compatible way

well this is good to hear! my 2 cents: can we start documenting node::Buffer C++ API?

I think that would help and the C++ ought be documented in general (different issue)

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

@feross @mafintosh @alfiepates @paragonie-scott @karissa @wbl

Everyone, https://github.com/ChALkeR/notes/blob/master/Lets-fix-Buffer-API.md — this is my current proposal. That article covers many opinions and questions on the issue, please make sure that you read it before asking further questions and proposing anything else (it might be already covered).

@jasnell, @trevnorris, @mikeal, ptal.

@feross Do I get it right that your proposal on the start of this page does not represent your current point of view anymore? If so, could you update it?
Another variant: move to a separate thread and discuss the current proposal from the beginning.

I am going to get some sleep now, it's 5:50 here. Will be back in ~10 hours. Please read my note.

@feross
Member
feross commented Jan 15, 2016

I think this discussion has gone off the rails a bit.

To limit scope, I think we should stop considering zero-filling or changing the Buffer constructor in any way for now. Instead, let's add two new functions:

  • Buffer.from(value) - convert from many types to a buffer
  • Buffer.alloc(size) - create an uninitialized buffer with given size

Going forward, we can all use Buffer.from(val) the way we currently use Buffer(val), and Buffer.alloc(size) when we need uninitialized memory.

This would have prevented the issues that affected ws and bittorrent-dht because Buffer.from(number) throws instead of revealing uninitialized memory.

That's it.

@joepie91
Contributor

@feross This would still require eventual deprecation of the Buffer constructor. There's a bit of an obsession with constructor functions in the Node.js community, in my experience - especially amongst new developers (coming from other languages), who are the most affected by this issue.

If new Buffer is not deprecated, then people will just keep using the constructor (also in part because of the compatibility reasons I explained earlier).

@wbl
wbl commented Jan 15, 2016

Not fixing the constructor means all existing code will be exposed to issue, plus new code written based on old code. Why isn't backporting to 5.x a valid answer? It seems like an easy fix is being avoided for spurious reasons.

@thejefflarson thejefflarson added a commit to thejefflarson/node that referenced this issue Jan 15, 2016
@thejefflarson thejefflarson Fix uninitialized memory in buffers
This commit fixes an uninitialized memory bug #4660, which will
cause sensitive data to be exposed by array buffers. The current
behavior is not only insecure, but contrary to what the v8 api
expects of ArrayBuffer::Allocator.

The v8 documentation [1], clearly states that ArrayBuffer::Allocator
implementations must return zero-initialized memory from a call to
Allocate.
f563aa8
@feross
Member
feross commented Jan 15, 2016

@joepie91 I think any solution that involves changing Buffer will have to happen slowly. We can't change this overnight because of the impact on the ecosystem.

The best outcome, in my opinion: We add Buffer.from and Buffer.alloc to node v4, v5, and future versions, and update the docs to recommend using these new constructors.

A long, long time from now, when most of the ecosystem has migrated, we can consider zeroing out the memory from Buffer(number), or removing Buffer, or whatever seems right at that time.

At least with Buffer.from and Buffer.alloc we have a clean way forward, where developer intent is expressed clearly and new security bugs of this type won't be written.

@joepie91
Contributor

I think any solution that involves changing Buffer will have to happen slowly. We can't change this overnight because of the impact on the ecosystem.

I understand that. I am definitely not advocating breaking changes to older versions, for example.

The best outcome, in my opinion: We add Buffer.from and Buffer.alloc to node v4, v5, and future versions, and update the docs to recommend using these new constructors.

I'd say it should probably do more than 'recommend' - deprecation does not necessarily mean that the functionality has to be removed straight away, but it can definitely help to encourage people to move away from it. The deprecation notice in the PHP manual played an important role in getting people to transition from mysql_* to PDO, for example, even though it took years for it to actually get removed.

Aditionally, the (non-breaking) backporting would still have to occur for all maintained versions, including 0.10 and 0.12 branches, for the reasons I explained before. In reality, many module authors still support 0.10 and 0.12, and will not use the new methods unless they work across the board.

A long, long time from now, when most of the ecosystem has migrated, we can consider zeroing out the memory from Buffer(number), or removing Buffer, or whatever seems right at that time.

I would now argue against automatic zeroing, for the reason @ChALkeR described. This would remain a possible issue even in the long run.

At least with Buffer.from and Buffer.alloc we have a clean way forward, where developer intent is expressed clearly and new security bugs of this type won't be written.

There would have to be a reason for people to migrate to the new methods, though. Especially for those who come from classical languages and are biased towards the use of constructors (as somebody who helps out people in #Node.js on a daily basis - this is a very serious and very common problem), the old, insecure approach would still look more attractive. Even if there are safer alternatives.

Either a lot of emphasis needs to be put onto the safety consequences of the constructor, or it must be made very clear that using the constructor is not a viable longer-term approach because of eg. deprecation.

@joepie91
Contributor

Further, @thejefflarson raises a concern regarding what V8 expects in the message for this commit (documentation reference is provided at the bottom of that page). I'm not qualified to comment on the validity of this, but it seems plausible.

@mikeal
Member
mikeal commented Jan 15, 2016

@feross has a good point. Because of the type differentials programmers don't
have a "best practice" to avoid using Buffer(number) when they want
Buffer(anything but number). Separating this from the whole zero-fill
issue this would be good to put into people's hands.

On Thu, Jan 14, 2016 at 7:50 PM, Sven Slootweg notifications@github.com
wrote:

Further, @thejefflarson https://github.com/thejefflarson raises a
concern regarding what V8 expects in the message for this commit
thejefflarson@f563aa8
(documentation reference is provided at the bottom of that page).


Reply to this email directly or view it on GitHub
#4660 (comment).

@Qard
Member
Qard commented Jan 15, 2016

This is a security issue and as such, I believe a fix absolutely should be devised for all LTS releases. FWIW, the original proposal should actually be safe to backport to LTS. Any situation that is relying on that memory to not be zeroed is a security issue in itself and should not be supported anyway.

I think we actually should be zeroing the memory in new Buffer(number) and use a flag to mark it as zeroed until a write happens, so double zeroing would not have a performance impact, then change core to use the unsafe Buffer.allocate(number) where appropriate.

Frankly, expecting users to upgrade to newer major releases just to get a security fix is unacceptable. Many will not upgrade due to inability to quickly make the appropriate code changes, lack of resources to audit the major release at short notice, or even lack of awareness a security issue exists due to only tracking status of the particular LTS they are on.

Consider this: a security patch should, by definition, be a breaking change. It is the removal of a behaviour that has dangerous consequences. A removal is always a breaking change because someone could be relying on that wrongness. Rather than letting them continue in their blissful ignorance of how terrible the thing are doing is, they should be forced to fix it.

I would however suggest getting @chrisdickinson to run his magical npm module finding thing to search for any modules using the single argument buffer initializer and have them evaluated to actually see if it breaks anything in userland, rather than making baseless assumptions about it.

As for the performance argument: yes, performance is great, speed is a good thing to strive for. You know what's even better than a fast server? A server that hasn't been hacked.

@kurtseifried

@Qard "a security patch should, by definition, be a breaking change" is so incredibly wrong it beggars belief. The vast majority of security fixes break unwanted/unused behaviours (e.g. Buffer overflows). Speaking with my Red Hat on we go out of our way not to break ABI/API compatibility (because doing so will for sure result in a DoS of all the software that suddenly breaks).

Now in this specific case we have a generally well understood security issue: giving random chunks of memory data to someone without zeroing them out when they request a buffer is known to be a problem/bad idea because most apps don't sanitize memory when they are done (they simply release it).

A perfect and timely example of this is from today's CVE-2016-0777 (the OpenSSH client bug):

If the OpenSSH client connects to an SSH server that offers the key
exchange algorithm "resume@appgate.com", it sends the global request
"roaming@appgate.com" to the server, after successful authentication. If
this request is accepted, the client allocates a roaming buffer out_buf,
by calling malloc() (and not calloc()) with an out_buf_size that is
arbitrarily chosen by the server:

Had calloc been used...

@Qard
Member
Qard commented Jan 15, 2016

@kurtseifried Zeroing the memory in the same API everyone already uses does not break API compatibility. It only removes the ability to do a thing you should not have been allowed to do in the first place. It's a "breaking" change in that behaviour differs, but it differs in a way that it should.

@Blub
Blub commented Jan 15, 2016

But if a program frees up memory containing private data without zeroing it out first then THAT program is the one with a security bug, not the next one allocating and reading the memory...

@glenjamin
Contributor

The discussions and benchmarks around zero-filling by default have all explored an implementation which does .fill(0). This has been shown to be not insignificant slower.

There has also been some mentions of using new Buffer(new TypedArray), or the low level calloc syscall. Is anyone able to explore the perf impacts of these approaches? If it's possible to get zero-filled within an order of magnitude, do most of the objections go away?

On the docs issue: if I have a string and want to create a buffer, I can read the docs: https://nodejs.org/api/buffer.html#buffer_new_buffer_str_encoding
Each constructor is documented separately, and I only think I need to handle strings. It's entirely possible for me to have read the docs for the functionality I'm actually using, but miss the fact there is an overload for number. For this reason I support the idea of introducing from/allow or similar, deprecating and removing the existing constructor from the docs.

This issue reminds me of the Ruby YAML issue from a while back: http://tenderlovemaking.com/2013/02/06/yaml-f7u12.html

Similarities:

  • input types were overloaded to handle in a documented but not necessarily obvious way (XML parse could trigger YAML)
  • YAML parser had a .load method that allowed RCE with untrusted input (I think this was documented), it later gained a .safe_load.
@alfiepates

@Blub True, but that doesn't mean that we can just fob the responsibility off to whomever to avoid doing the things we are capable of doing to improve security.

Security is everyone's responsibility.

@Qard
Member
Qard commented Jan 15, 2016

Zeroing on destruction is super unpredictable in a GC'd environment. For native buffers, the memory is not necessarily completely unreachable at the point of collection, so it needs to be the responsibility of the destructing code to clean itself up. This is just an unfortunate trait of C/C++ that leaks into the implementation of the JS side of buffers. A buffer can't safely free the memory it points to. It's an unsafe construct by design, really. But we have the opportunity to make it a bit safer here.

@mcollina
Member

Let me contribute some little thoughts on this:

  1. The problematic feature of Node we are talking about is that new Buffer(number) and new Buffer(string) have the same signature. new Buffer(string) is very safe to use, but new Buffer(number) might cause a memory disclosure.
  2. Developers might pass that value from a client request, without validating it before passing it to Buffer. The problem relies on this missed check, because calling new Buffer(string).fill(0) does not make any sense. This is the problem ws had, and the most critical one.
  3. It is as a security vulnerability as using malloc in C. We can argue that a decision was made in the past, and that decision was wrong. Unfortunately we have to live with malloc().
  4. This change is going to break backward compatibility just as bad as updating all the C++ modules to the latest V8.
  5. Overloaded constructors are very safe in statically typed languages, but not for dynamic languages. I guess some other security vulnerabilities in JS-land can be found for this very same problem. I would recommend security researchers go around and check.

Completely alternative option: remove new Buffer(string) in favor of new Buffer(string, enc), which is now safe (thanks to #4514). Bonus points: this requires no new APIs.

All of this to note that the solution to this problem might not be zero-filling all buffers, and other solutions might be possible as well.

@jasnell
Member
jasnell commented Jan 15, 2016

@glenjamin ... the PR #4682 does not use the fill(0) any more to implement the zero-filled buffer and it includes benchmarks. The approach take by #4682 is to essentially fallback to using Uint8Array when a zero-filled buffer is requested. This uses the calloc syscall under the covers. This approach also has the disadvantage of not being pool-allocated. The combination of the no-pooling + calloc makes the zero-filled buffer anywhere from about 20-60% slower than the non-zero-filled alternative, depending on the size of buffer being allocated.

@glenjamin
Contributor

@jasnell Thanks for the clarification, is it feasible to try going a level deeper, and putting some malloc/calloc logic in node_buffer.cc directly?

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

@mcollina Your solution will not solve anything right now, because any removal would require a very long time to actually happen. Giving the users a new API will allow them to fix their code without waiting for the removal of new Buffer(value). Thus, the solution could not consist only of the removal, and has to introduce new, safer API.

@seishun
Member
seishun commented Jan 15, 2016

@mcollina

Completely alternative option: remove new Buffer(string) in favor of new Buffer(string, enc), which is now safe (thanks to #4514). Bonus points: this requires no new APIs.

That's what I suggested way back in #4660 (comment). This would cover all the both vulnerabilities presented so far.

@ChALkeR

Giving the users a new API will allow them to fix their code without waiting for the removal of new Buffer(value).

They can already use new Buffer(string, enc), and in the next major version it's guaranteed to throw if they accidentally pass a number.

@mcollina
Member

@seishun too many comments, sorry I've missed that. 👍 for this solution, even though it makes the code a little bit more wordy, from new Buffer('hello') to new Buffer('hello', 'utf8').

@ChALkeR the only thing that would be needed in v4 (I would not touch anything else) is a deprecation warning.

@Blub
Blub commented Jan 15, 2016

@alfiepates perhaps, but zeroing isn't free and you're basically cleaning up after a mess left by a tiny fraction of programs. Most of the time the allocated buffer will be worthless. I can understand your point of view, though.

@thejefflarson

So, I went ahead and benchmarked my change in #4706 using Benchmark.js. There is a slight performance hit, but it may be negliglible: most likely around 3.09% but as little as 0.15%.

https://gist.github.com/thejefflarson/bb6a7c651cddd305a18f

At this point I don't see the harm in switching to calloc. The current situation makes both Buffer and ArrayBuffer instances unsafe.

(edited to fix the math)

@mikeal
Member
mikeal commented Jan 15, 2016

So, I went ahead and benchmarked my change in #4706 using Benchmark.js

Node.js has a series of benchmarks for Buffer, you should run those in order to quantify the difference. You'll find them in the benchmarks directory under buffers :)

@thejefflarson

@mikeal thanks, updated the gist with those numbers. Sometimes it's faster, sometimes slower.

@Fishrock123
Member

The current situation makes both Buffer and ArrayBuffer instances unsafe.

If by ArrayBuffer, you mean TypedArray(s), this isn't true. We reset the flag so TypedArrays are zero-filled.

@thejefflarson

@Fishrock123 oh, yes I see.

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

Another vuln, now fixed and public: Mongoose vulnerability — assigning a number to the property that is Buffer-typed saves unitialized memory block to the DB. POC.

@ChALkeR
Member
ChALkeR commented Jan 15, 2016

Another issue, now fixed and public, in node-floody by @soldair: soldair/node-floody@6c44722. POC: var f = require('floody')(process.stdout); f.write(1000); f.stop();.

@jsha
Contributor
jsha commented Jan 15, 2016

Sequelize also had this issue, now fixed and public: sequelize/sequelize@cbfaa4f

@rwaldron

Ping @ those who are not watching @jasnell's proposal

@rvagg rvagg removed the ctc-agenda label Feb 24, 2016
@jasnell
Member
jasnell commented Mar 16, 2016

#4682 has landed adding safe constructors for v6.

@jasnell jasnell closed this Mar 16, 2016
@mhart
Contributor
mhart commented Mar 16, 2016

Wow, epic 🚀 – amazing job!

@joepie91
Contributor

Excellent!

@ChALkeR
Member
ChALkeR commented Mar 19, 2016

Related: #5799.

@feross
Member
feross commented May 30, 2016 edited

Hey folks!

If you followed this issue, then you're probably like me and want to start using Buffer.from and Buffer.alloc in the new code you write. But if you're also trying to support Node.js v4.x and earlier versions, then you need to check if these functions exist before using them, since old Node.js versions lack support.

There's a better way! You can use safe-buffer.

safe-buffer implements the new Buffer APIs for Node.js versions back to v0.10. In newer Node.js versions, the default Buffer implementation is used.

Hope someone finds this useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment