-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protecting against trouble on invalid limit values #70
Comments
Forgot to mention, the way I noticed this is that I saw this code: Lines 265 to 268 in c3bad9b
|
@springmeyer taking a look at this now. Good catch! I think we can bump the validation to make sure the limit value is > 0 and set a maximum of 100,000 results to avoid huge results.
I'm not sure how to go about doing this. Right now we need to be able to iterate through results in order to sort and dedupe, so the container they exist in must be created. Perhaps we can just only pre-allocate up to a certain number? |
100,000 feels too large. It might be totally fine and perhaps I'm feeling overly cautious. But I worry about how much memory that would take. Can you assuage my fears here by calculating the exact memory requirement for when
Ideally we don't allocated more than 5 MB here. Anything more could really add up (since we'd be allocating 5 MB for every thread). To estimate the size of std::clog << "Size of ResultObject: " << sizeof(ResultObject) << "\n"; And capture that value. That is the size, in bytes, that each instance of that class takes in memory. This does not capture all the memory that would be needed for a class instance (since So, what I'm getting at is that 100,000 might be DDOS potential. And if not (and I'm being overly scared), it might still cause a lot of memory to be reserved contiguously when not actually needed. Which would be wasteful. e.g. if
Neither am I. While I don't like this pre-allocation, I'm not following the code well enough to know how to remove it. Was hoping you could find a way.
What would happen if you only pre-allocated 1 ResultObject. Would the logic fall apart and the results be incorrect? |
Nope, the reason we were pre-allocating is to save ourselves from on-the-fly allocation every time we add to the vector. Perhaps it's worth benching what "no preallocation" looks like, perhaps it's trivial. |
Ah, that is great news. I'd assumed wrongly then that the pre-allocation was fundamental to the algorithm.
I think it is. The rule book is to pre-allocate when:
If a is not true, don't pre-allocate. If b is not true, then pre-allocation might help you or might hurt you, so don't pre-allocate. The
With a std::vector |
@springmeyer I lied to you, the reservation is important to the sorting algorithm, in that it pops and pushes objects into the vector and sorts against the empty result objects which have distance that is an unrealistically huge number to compare against. Looking at the size of a ResultObect:
So for every 1000 items we pre-allocate, that uses 0.088 MB of memory per thread. |
@mapsam, okay how about we clamp the max at 1000 then? |
Setting the maximum results limit to 1000 - ready to rock over in #74 |
Thanks for opening this @springmeyer, closing now, and we can reopen if 1000 appears too tight of a limitation. |
I've done some poking at what happens when invalid
limit
values are passed. I've found that:limit=0
segfaults the processReplicate with:
And
node bench/vtquery.bench.js --iterations 50 --concurrency 1
limit=1000000000
hangs the processLikely due to a large allocation.
Replicate with:
And
node bench/vtquery.bench.js --iterations 50 --concurrency 1
limit=1000000000000
overflowsThe error is
Error: 'limit' must be a positive number
, which is better than a crash but indicates that the number has overflowed to negative which should be protected. Instead we should have some reasonable limit to the max value and throw if it is over that value.Replicate with:
And
node bench/vtquery.bench.js --iterations 50 --concurrency 1
The text was updated successfully, but these errors were encountered: