[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

atanu1991 · 2021-09-16T23:19:03Z

No description provided.

mbasmanova

@atanu1991 Overall looks good. Use make format-fix to fix formatting issues.

Consider editing PR title for clarity: Implement Filter::mergeWith for string filters

mbasmanova · 2021-09-21T10:42:18Z

velox/type/Filter.h

+    return filters_;
+  }
+
+  const bool nanAllowed() const {


nit: the return type can be just "bool"; is there any particular advantage of returning "const bool"?

mbasmanova · 2021-09-21T10:44:00Z

velox/type/Filter.cpp

+    case FilterKind::kIsNotNull:
+      return std::make_unique<BytesValues>(*this, false);
+    case FilterKind::kBytesValues: {
+      auto otherBytesValues = dynamic_cast<const BytesValues*>(other);


use static_cast when you know the type; it is faster than dynamic_cast

mbasmanova · 2021-09-21T10:46:13Z

velox/type/Filter.cpp

+      std::vector<std::string> newValues;
+      newValues.reserve(smallerFilter->values().size());
+
+      for (const auto& value: smallerFilter->values()) {


consider comparing the [lower, upper] ranges for the two filters before checking individual values; if there is no overlap, the loop over values can be skipped

mbasmanova · 2021-09-21T10:46:24Z

velox/type/Filter.cpp

+      return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed);
+    }
+    case FilterKind::kBytesRange: {
+      auto otherBytesRange = dynamic_cast<const BytesRange*>(other);


static_cast

mbasmanova · 2021-09-21T10:48:02Z

velox/type/Filter.cpp

+      return std::make_unique<BytesValues>(std::move(newValues), bothNullAllowed);
+    }
+    case FilterKind::kMultiRange: {
+      return mergeByteMultiRange(this, dynamic_cast<const MultiRange*>(other));


static_cast

mbasmanova · 2021-09-21T10:58:31Z

velox/type/Filter.cpp

+    assert(filter->kind() == FilterKind::kBytesRange ||
+                filter->kind() == FilterKind::kBytesValues);
+    auto merged = current->mergeWith(filter.get());
+    newValues.emplace_back(merged.release());


We need to consider the type of "merged". If it is AllFalse, we can simply skip it. If at the end of the loop we have no filters, we can return AllFalse. If we have just one filter, we can return it as is without wrapping into MultiRange. If we have all filters kBytesValues, then we want to make a new kBytesValues filter with combined set of values. If we have a mix of kBytesValues and kBytesRange, we'd want to combine all kBytesValues into one, then make MultiRange from a single kBytesValues and multiple kBytesRange. This will ensure we'll get the most efficient to evaluate filter as the result of the merge. MultiRange is the least efficient as it needs to loop over individual filters and evaluate each.

mbasmanova · 2021-09-21T11:04:03Z

velox/type/tests/FilterTest.cpp

+                                         "h", "i", "j", "k", "l", "m", "n",
+                                         "o", "p", "q", "r", "s", "t", "u", "v",
+                                         "w", "x", "y", "z",
+                                         "abca", "abcb", "abcc", "abcd"};


nit: Would you add some longer strings here (> 12 characters) as well as strings with upper case letters, numbers and other special characters?

mbasmanova · 2021-09-21T11:06:30Z

velox/type/tests/FilterTest.cpp

+  filters.push_back(between("p", "t"));
+  filters.push_back(between("p", "t", true));
+
+  filters.push_back(lessThanOrEqual("k"));


Would you also add > and < filters?

mbasmanova · 2021-09-21T11:07:05Z

velox/type/tests/FilterTest.cpp

+  std::vector<std::unique_ptr<Filter>> filters;
+  std::vector<std::unique_ptr<Filter>> filtersMultiRange;
+
+  // addUntypedFilters(filters);


Why commented out?

mbasmanova · 2021-09-21T11:07:49Z

velox/type/tests/FilterTest.cpp

+  filters.push_back(in({"e", "f", "g", "h"}, true));
+
+  filtersMultiRange.push_back(orFilter(
+      between("b", "f"), greaterThanOrEqual("e")));


MultiRange filter is expected to contain non-overlapping filters only.

atanu1991 · 2021-09-24T18:05:19Z

Created a new PR as velox is open sourced
https://github.com/facebookincubator/velox/pull/297/commits

…acebookincubator#233)

* remove TimeStampWithTimeZone function register * use PModeFunction instead of PModIntFunction * remove TimestampWithTimeZone test * code format

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2021

atanu1991 force-pushed the filterbytes branch 5 times, most recently from 99fa504 to 22aa839 Compare September 21, 2021 06:43

[WIP][Draft] Implement mergewith for bytesRange and bytesValue

a7db1ad

atanu1991 force-pushed the filterbytes branch from 22aa839 to a7db1ad Compare September 21, 2021 06:48

mbasmanova requested changes Sep 21, 2021

View reviewed changes

mbasmanova mentioned this pull request Sep 24, 2021

Implement Filter::mergeWith for MultiRange #154

Closed

atanu1991 closed this Sep 24, 2021

rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 17, 2023

Move the DWRF Write Extension initialize to OtherExtensionOverrides (f…

dbf490d

…acebookincubator#233)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

atanu1991 commented Sep 16, 2021

mbasmanova left a comment

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

mbasmanova Sep 21, 2021

atanu1991 commented Sep 24, 2021

[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

[WIP][Draft] Implement mergewith for bytesRange and bytesValue #233

Conversation

atanu1991 commented Sep 16, 2021

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

atanu1991 commented Sep 24, 2021