Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default decoder limits #295

Closed
methane opened this issue Mar 19, 2018 · 15 comments
Closed

Change default decoder limits #295

methane opened this issue Mar 19, 2018 · 15 comments

Comments

@methane
Copy link
Member

methane commented Mar 19, 2018

Current default limits seems too large. It can cause DoS attack.
Change default value to more safe value.

Current plan is "about 1MiB on amd64 system".

  • max_bin_len: 1024*1024
  • max_str_len: 1024*1024
  • max_array_len: 1024*1024/8 (each pointer has 8 bytes)
  • max_map_len: 1024*1024/32 (8byte key,hash,value, and some extra space)
@codypiersall
Copy link

Hmmm, is there any chance this can be reverted? I think this is breaking a lot of code that has legitimate uses for larger messages than 1 MiB. It certainly caused surprising crashes for me, and I guess I'm not alone. I do appreciate the DoS issue though. Maybe it would be possible to not allocate the memory for the string/bin/ext until all the data is present? Like store all the chunks of data in a boring old Python list until the length of all the components of the list add up to the length that was promised? I haven't looked at the implementation at all to know if that even makes sense or would be possible. Or maybe add a special case for when all the data to pack/unpack is already present? These ideas are based on the understanding that the attack vector is something like: someone sends you data (maybe over a socket, maybe from a file) that says they are giving you a 2 GiB EXT message, but then they give you nothing else. So they can do minimal work and make you allocate tons of memory. If the data actually is present, that seems like less of an issue.

On a related note, it would be really nice if the docs could be updated to the latest version that incorporates these changes; when reading the docs (which are currently for 0.5 on readthedocs if I'm looking at the right thing) it still shows 2**32 - 1 as the max size. It took me a bit to realize I was looking at the wrong version.

Thanks for the great library!

@codypiersall
Copy link

codypiersall commented Jan 22, 2019

I guess my idea about waiting for all the data doesn't work for array and map types though, since the data needs to be parsed to figure out when it's all present for those types.

@jfolz
Copy link
Contributor

jfolz commented Jan 22, 2019

A migration guide would certainly go a long way to help devs update their code for upcoming versions.

@codypiersall
Copy link

One more idea: we could have a kwarg trusted_source which if True ignores the limits on max_bin_len and the rest. It would default to False. This would solve the issue where you have to pass a lot of keyword arguments; currently, if you want to indicate that you trust the other side, you have to set bin/str/array/map/ext separately.

If you are interested in this, I would be willing to attempt a PR in the next couple weeks.

@methane
Copy link
Member Author

methane commented Jan 22, 2019

It certainly caused surprising crashes for me, and I guess I'm not alone.

It means many people doesn't aware max_xxx_type options although it's very important
to avoid DoS attack.
That's why I think I must change the default value to safe.

Of course, I know there are many applications they are not relating to DoS attack.
But the library doesn't know it. Application should specify it.

Maybe it would be possible to ...

It is not practical because it will kill performance, or introduce a lot of code (and bugs) I don't want maintain.

On a related note, it would be really nice if the docs could be updated to the latest version that incorporates these changes;

I'm really sorry. I didn't notice the doc isn't updated automatically.
I manually start build and I configured webhook too. The doc should be updated soon.

@methane
Copy link
Member Author

methane commented Jan 22, 2019

A migration guide would certainly go a long way to help devs update their code for upcoming versions.

Sorry, I can't understand this English.
Docstring and ChangeLog is explicit already. Please don't require me to write more English.
Writing English is very hard to me, maybe than you think. It leads me to burn-out.

@methane
Copy link
Member Author

methane commented Jan 22, 2019

One more idea: we could have a kwarg trusted_source which if True ignores the limits on max_bin_len and the rest.

I dislike the idea. I don't want to add any more options unless strictly required.
And two knob for one thing seems ugly too.

@methane
Copy link
Member Author

methane commented Jan 23, 2019

I came up with an idea to mitigate the pain.

  • Changing all max_xxx_len of unpackb to 0. It means "auto".
  • unpackb uses len(packed_data) for max_xxx_len when it's 0. In case of max_map_len, it's len(packed_data)/2.

For Unpacker, max_buffer_size can be used for "auto", but max_buffer_size is 0 (unlimited) for now. Changing it breaks backward compatibility again. Maybe, I can change it when 1.0.

@codypiersall
Copy link

Thanks for the feedback @methane.

It is not practical because it will kill performance, or introduce a lot of code (and bugs) I don't want maintain.

That's fair.

I don't want to add any more options unless strictly required. And two knob for one thing seems ugly too.

This is the same problem I was trying to solve. From my perspective, the various max_xxx_len are 5 knobs for one thing: I wanted a way for msgpack to attempt to decode whatever it was given. Your solution you came up with just above would work just fine for me.

Ping me back if you don't have the time or desire to implement it and I will give it a try.

Side note: Does it even make sense to have the max_xxx_len args to unpackb? As I understand it, it's not really susceptible to the same DoS since you can always verify if the needed number of bytes are present.

@methane
Copy link
Member Author

methane commented Jan 23, 2019

Side note: Does it even make sense to have the max_xxx_len args to unpackb? As I understand it, it's not really susceptible to the same DoS since you can always verify if the needed number of bytes are present.

After "auto" is implemented, the removal can be considered.
But it should be in 1.0, not 0.6.1.

@codypiersall
Copy link

Makes sense to me.

@fake-name
Copy link

fake-name commented Jan 28, 2019

Has anyone ever had a DoS attack that this even helps mitigate? Is anyone exposing messagepack interfaces to untrusted inputs? This sounds like a solution in search of a problem.

If you're going to add a interface for handling untrusted inputs, don't just overwrite the current API with one that will no longer work in a lot of cases. Adding a separate unpacker like UnpackUntrusted or similar would seem much more clear to me.

And two knob for one thing seems ugly too.

Right now, you have four knobs I have to set. And I'm in a context where both ends are my code, the transport is SSL wrapped, and a DoS is not a concern at all. The whole "unpack limit" thing in general is silly for my use case (and probably most others).

Honestly, I'd be most in support of just getting rid of it entirely. If people need to worry about DoS attacks, they can just look at the size of the string they're shoving into the unpacker.

@methane
Copy link
Member Author

methane commented Jan 28, 2019

Has anyone ever had a DoS attack that this even helps mitigate? Is anyone exposing messagepack interfaces to untrusted inputs? This sounds like a solution in search of a problem.

RPC is one of the usage msgpack is designed for. (See this article)
And Fluentd is most successed project which has msgpack API.

Right now, you have four knobs I have to set.

Four knobs for four thing. It's orthological.
And, again, you don't "have to". I added "auto limit" in #342. You can just use only one option for Unpacker, and zero option for unpackb now.

@methane
Copy link
Member Author

methane commented Jan 28, 2019

If people need to worry about DoS attacks, they can just look at the size of the string they're shoving into the unpacker.

Only five bytes input (e.g. b'\xdd\xff\xff\xff\xff') can consume 32GB RAM.
When CPython provides public API for preallocated dict, five bytes (e.g. b'\xdf\xff\xff\xff\xff' will consume 96GB RAM.

@fake-name
Copy link

Only five bytes input (e.g. b'\xdd\xff\xff\xff\xff') can consume 32GB RAM.

Well, huh. Never mind, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants