-
Notifications
You must be signed in to change notification settings - Fork 5.7k
SERVER-26751 $indexOfArray can optimize for constant array #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Mostly stylistic comments and a couple suggestions that might help simplify the code a bit.
* it in the license file. | ||
*/ | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid stray whitespace changes. It makes it harder to review and complicates the git blame unnecessarily. Not generally a big deal, but please avoid it going forward.
src/mongo/db/pipeline/expression.h
Outdated
: ExpressionRangedArity<ExpressionIndexOfArray, 2, 4>(expCtx) {} | ||
|
||
Value evaluate(const Document& root) const final; | ||
Value evaluate(const Document& root) const ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space after 'const'.
src/mongo/db/pipeline/expression.h
Outdated
|
||
Value evaluate(const Document& root) const final; | ||
Value evaluate(const Document& root) const ; | ||
std::vector<Value> parseDeps(const Document& root, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment here describing what this does. Also - parseDeps doesn't really give me a clue what this method does, so a more descriptive name like 'evaluateAndValidateArguments' might be more appropriate. Finally, if this is only used within IndexOfArray and IndexOfArray::Optimized, can you move it to the private or protected section?
}); | ||
} | ||
|
||
TEST(ExpressionArrayTest, ExpressionArrayWithALlConstantValuesShouldOptimizeToExpressionConstant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra capital in WithALlConstant
src/mongo/db/pipeline/expression.cpp
Outdated
* This class handles the case where IndexOfArray is given an ExpressionConstant | ||
* instead of using a vector and searching through it we can use a unordered_map | ||
* for O(1) lookup time. | ||
**/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra *
in close, just one here.
src/mongo/db/pipeline/expression.cpp
Outdated
}; | ||
|
||
intrusive_ptr<Expression> ExpressionIndexOfArray::optimize() { | ||
ExpressionNary::optimize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here indicating that this will optimize the arguments to this expression. In particular, if all arguments are constant, this should immediately evaluate to a constant. Can you add tests for that like you did for ExpressionArray?
src/mongo/db/pipeline/expression.cpp
Outdated
intrusive_ptr<Expression> ExpressionIndexOfArray::optimize() { | ||
ExpressionNary::optimize(); | ||
|
||
// If the input arr is an ExpressionConstant we can optimize using a unordered_map instead of an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't abbreviate - it just makes it harder to understand, which is counter to the point of the comment. arr -> array.
src/mongo/db/pipeline/expression.cpp
Outdated
// If the input arr is an ExpressionConstant we can optimize using a unordered_map instead of an | ||
// Array | ||
if (dynamic_cast<ExpressionConstant*>(vpOperand[0].get())) { | ||
ExpressionConstant* ec = dynamic_cast<ExpressionConstant*>(vpOperand[0].get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid short variable names like ec
. Also, you can combine this with the if statement above:
if (auto constantArray = dynamic_cast<ExpressionConstant*>(vpOperand[0].get())) {
src/mongo/db/pipeline/expression.cpp
Outdated
ExpressionConstant* ec = dynamic_cast<ExpressionConstant*>(vpOperand[0].get()); | ||
const Value valueArray = ec->getValue(); | ||
if (valueArray.nullish()) { | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know what the answer is in this case, just return ExpressionConstant::create(getExpressionContext(), Value(BSONNULL))
to save time. Also add a test case for this please :)
src/mongo/db/pipeline/expression.cpp
Outdated
getExpressionContext()->getValueComparator().makeUnorderedValueMap<int>(); | ||
|
||
for (int i = 0; i < int(arr.size()); i++) { | ||
auto pair = std::make_pair(arr[i], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional, re-write this as
indexMap.emplace(arr[i], i);
or if that doesn't work, indexMap[arr[i]] = i;
? I'm curious why you made a pair here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more points of feedback.
src/mongo/db/pipeline/expression.cpp
Outdated
for (int i = 0; i < int(arr.size()); i++) { | ||
auto pair = std::make_pair(arr[i], i); | ||
indexMap.insert(pair); | ||
// auto pair = std::make_pair(arr[i], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove dead code.
src/mongo/db/pipeline/expression.cpp
Outdated
ExpressionConstant* ec = dynamic_cast<ExpressionConstant*>(vpOperand[0].get()); | ||
const Value valueArray = ec->getValue(); | ||
// If the input array is an ExpressionConstant we can optimize using a unordered_map instead of an | ||
// array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End comments in periods - also lines 2801 and in the header.
src/mongo/db/pipeline/expression.h
Outdated
protected: | ||
// Evaluates and validates the value that is being searched for and if given the start index and | ||
// the end index passed to ExpressionIndexOfArray returns arguments in the form of a | ||
// vector<Value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/**
* Use this syntax for comments documenting the behavior of a function or method.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - can we re-word this comment to make it more clear what the vector of values will be? How about something like
When given 'operands' which correspond to the arguments to $indexOfArray, evaluates and validates the target value, starting index, and ending index arguments and returns their values in that order. The starting index and ending index are optional, so the returned vector will have a length between 1 and 3. Throws a UserException if the values are found to be invalid in some way, e.g. if the indexes are not numbers.
src/mongo/db/pipeline/expression.cpp
Outdated
vector<Value> ExpressionIndexOfArray::parseDeps(const Document& root, | ||
vector<Value> ExpressionIndexOfArray::evaluateAndValidateArguments(const Document& root, | ||
const ExpressionVector& operands, | ||
size_t arrayLength) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrayLength
is unused?
src/mongo/db/pipeline/expression.cpp
Outdated
vector<Value> ExpressionIndexOfArray::evaluateAndValidateArguments(const Document& root, | ||
const ExpressionVector& operands, | ||
size_t arrayLength) const { | ||
std::vector<Value> deps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deps
is a strange name for this - how about validatedArguments
?
auto expIndexOfArray = Expression::parseExpression( | ||
expCtx, | ||
BSON("$indexOfArray" << BSON_ARRAY(BSON_ARRAY(BSON("$add" << BSON_ARRAY(1 << 1)) << 1 << 1) | ||
// Start index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the start index, I think this is the value you're searching for?
ASSERT_TRUE(shouldBeNary); | ||
|
||
auto optimizedArgs = shouldBeNary->getOperandList(); | ||
// All arguments should be ExpressionConstants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting - I would have expected it to fully optimize to a constant in this case... I wonder why it didn't? Can you make sure it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because we are optimizing an ExpresssionIndexOfArray not evaluating so it should still be an ExpressionIndexOfArray.
Im casting it to an ExpressionNary because I need to be able to call getOperandList() to determine if the arguments optimized. I think this is where the confusion is to clearify I will cast it to an ExpressionIndexOfArray
// Nullish input array should become a Value(BSONNULL). | ||
ASSERT_VALUE_EQ(Value(BSONNULL), shouldBeExpressionConstant->getValue()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we aren't adding any coverage of actually evaluating $indexOfArray when it can be optimized, can you either add some tests here which exercise the new code (create an IndexOfArray
with a constant array and various other arguments, making sure to call optimize()
before evaluate()
), or find some that are already written to prove that it will be exercised?
src/mongo/db/pipeline/expression.cpp
Outdated
}; | ||
|
||
intrusive_ptr<Expression> ExpressionIndexOfArray::optimize() { | ||
// This is optimize all arguments to this expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will optimize ...
src/mongo/db/pipeline/expression.cpp
Outdated
getExpressionContext()->getValueComparator().makeUnorderedValueMap<int>(); | ||
|
||
for (int i = 0; i < int(arr.size()); i++) { | ||
indexMap.emplace(arr[i], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we should not add a value if it already occurred so that we only ever return the first occurrence.
src/mongo/db/pipeline/expression.cpp
Outdated
auto comparator = getExpressionContext()->getValueComparator(); | ||
Value vIndex = Value(index->second); | ||
if (index != _indexMap.end() && comparator.evaluate(Value(index->second) >= operands[1]) && | ||
comparator.evaluate(Value(index->second) < operands[2])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is pretty awkward. You have to keep using comparator.evaluate(...)
because operands[1]
and operands[2]
are of type Value
. In reality, we know they will be of type int
, right? Or even size_t
?
Can we change the signature of evaluateAndValidateArguments
to return a struct, something like
struct Arguments {
Arguments(Value targetOfSearch, int startIndex, boost::optional<int> endIndex) :
targetOfSearch(targetOfSearch), startIndex(startIndex), endIndex(endIndex) {}
Value targetOfSearch;
int startIndex;
boost::optional<int> endIndex;
};
It might be more complicated in the parsing code, but I think it would make this much easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how do you know that operands[2] is valid? Couldn't the user have left off the end index?
intrusive_ptr<ExpressionContextForTest> expCtx(new ExpressionContextForTest()); | ||
VariablesParseState vps = expCtx->variablesParseState; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete extra newline.
ASSERT_TRUE(shouldBeIndexOfArray); | ||
|
||
auto optimizedArgs = shouldBeIndexOfArray->getOperandList(); | ||
// All arguments should be ExpressionConstants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you change the optimize implementation to return what ExpressionNary::optimize()
returns then the whole expression should evaluate to a constant.
} | ||
|
||
TEST(ExpressionIndexOfArray, | ||
OptimizedExpressionIndexOfArrayWithConstantArgumentsShouldEvaluateProperly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good test, but because you're using all constant values, I think you will hit the case where all arguments are constant instead of when just the array is constant. If you change the search values (e.g. 2
and 10
) to be something like $x
, then use Document{{"x", 2}}
as the argument to evaluate(), you'll be testing what you want to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for the test below.
// Should evaluate to -1 if not found in range. | ||
ASSERT_VALUE_EQ(Value(-1), optimizedIndexNotFoundInRange->evaluate(Document())); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test where there are duplicate values in the array? Such a test should have caught the bug you mentioned with inserting into the unordered_map twice.
src/mongo/db/pipeline/expression.cpp
Outdated
index->second < args.endIndex) { | ||
return Value(index->second); | ||
} else { | ||
// If the item we are searching is not found or if not found in given range return -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I actually think this won't work. Consider the following:
{$indexOfArray: [[0, 1, "a", 0, 1, 1, "a"], "a", 3, 8]}
I think that the map here will be populated with 0-> 0, 1-> 1, "a" -> 2
, but then it has lost the information about where the second or third occurrences of those values are. I think we need to map a value to a vector<int>
of all the positions so that we can rule out those which are out of range.
… unorderedmap<vector<int> and added additional comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! I think all the changes I proposed are more cosmetic, they should do the same thing, but with less code :)
src/mongo/db/pipeline/expression.cpp
Outdated
Value startIndexArg = operands[2]->evaluate(root); | ||
uassertIfNotIntegralAndNonNegative(startIndexArg, getOpName(), "starting index"); | ||
startIndex = static_cast<size_t>(startIndexArg.coerceToInt()); | ||
startIndex = static_cast<int>(startIndexArg.coerceToInt()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a static_cast here.
src/mongo/db/pipeline/expression.cpp
Outdated
startIndex = static_cast<size_t>(startIndexArg.coerceToInt()); | ||
startIndex = static_cast<int>(startIndexArg.coerceToInt()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra newline.
src/mongo/db/pipeline/expression.cpp
Outdated
uassertIfNotIntegralAndNonNegative(endIndexArg, getOpName(), "ending index"); | ||
// Don't let 'endIndex' exceed the length of the array. | ||
endIndex = std::min(array.size(), static_cast<size_t>(endIndexArg.coerceToInt())); | ||
endIndex = std::min(arrayLength, static_cast<size_t>(endIndexArg.coerceToInt())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a static_cast here.
src/mongo/db/pipeline/expression.cpp
Outdated
for (size_t i = startIndex; i < endIndex; i++) { | ||
if (getExpressionContext()->getValueComparator().evaluate(array[i] == searchItem)) { | ||
return Value(static_cast<int>(i)); | ||
return ExpressionIndexOfArray::Arguments(vpOperand[1]->evaluate(root), startIndex, endIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the complier knows the return types, so you could just say
return {vpOperand[1]->evaluate(root), startIndex, endIndex}
Or, if you prefer to keep the name, this method is defined on ExpressionIndexOfArray, so you don't need to include the ExpressionIndexOfArray::
prefix here.
src/mongo/db/pipeline/expression.cpp
Outdated
virtual Value evaluate(const Document& root) const { | ||
auto args = evaluateAndValidateArguments(root, vpOperand, _indexMap.size()); | ||
auto indexVec = _indexMap.find(args.targetOfSearch); | ||
if (indexVec != _indexMap.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be easier to read as
if (indexVec == _indexMap.end()) {
return Value(-1);
}
// Search through the vector of indices for the first index in our range.
... // Un-indented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could just leave off the else, and move the return Value(-1);
to always be the last thing that happens, whether you get into the if statement or not.
fromjson("{ $indexOfArray : [ [0, 1, 2, 3, 4, 5] , '$x'] }"), | ||
expCtx->variablesParseState); | ||
auto optimizedIndexOfArray = expIndexOfArray->optimize(); | ||
ASSERT_VALUE_EQ(Value(1), optimizedIndexOfArray->evaluate(Document{{"x", 1}})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well add a bunch of these:
ASSERT_VALUE_EQ(Value(0), optimizedIndexOfArray->evaluate(Document{{"x", 0}}));
ASSERT_VALUE_EQ(Value(1), optimizedIndexOfArray->evaluate(Document{{"x", 1}}));
ASSERT_VALUE_EQ(Value(2), optimizedIndexOfArray->evaluate(Document{{"x", 2}}));
ASSERT_VALUE_EQ(Value(3), optimizedIndexOfArray->evaluate(Document{{"x", 3}}));
ASSERT_VALUE_EQ(Value(4), optimizedIndexOfArray->evaluate(Document{{"x", 4}}));
ASSERT_VALUE_EQ(Value(5), optimizedIndexOfArray->evaluate(Document{{"x", 5}}));
expCtx->variablesParseState); | ||
auto optimizedIndexOfArray = expIndexOfArray->optimize(); | ||
ASSERT_VALUE_EQ(Value(1), optimizedIndexOfArray->evaluate(Document{{"x", 1}})); | ||
auto IndexNotFound = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to re-parse or optimize, you can use the same one.
auto optimizedIndexOfArray = expIndexOfArray->optimize(); | ||
ASSERT_VALUE_EQ(Value(4), optimizedIndexOfArray->evaluate(Document{{"x", 4}})); | ||
|
||
auto IndexNotFoundInRange = Expression::parseExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can re-use optimizedIndexOfArray
ASSERT_VALUE_EQ(Value(2), | ||
optimizedIndexOfArrayWithDuplicateValues->evaluate(Document{{"x", 2}})); | ||
|
||
auto expIndexInRangeWithhDuplicateValues = Expression::parseExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think you can re-use the expression from above.
optimizedIndexInRangeWithDuplcateValues->evaluate(Document{{"x", 2}})); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline.
No description provided.