Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DYOD] Partial Hash Index #2386

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
299c428
introduce additional abstraction layer AbstractRangeIndex and start i…
bengelhaupt Jun 13, 2021
d28f883
add PartialHashIndexTest and make AbstractIndex generic by its positi…
bengelhaupt Jun 17, 2021
49fd944
add iterator test in PartialHashIndexTest
bengelhaupt Jun 17, 2021
6ca40a4
implement equals() on PartialHashIndex and rename AbstractRangeIndex …
bengelhaupt Jun 22, 2021
24a3f45
add ToDos
bengelhaupt Jun 22, 2021
feb52cd
introduce AbstractTableIndex and implement create_table_index get_tab…
bengelhaupt Jun 24, 2021
708ba91
change PartialHashIndex parameters and start temporarily modifying jo…
bengelhaupt Jun 28, 2021
ca0a16e
implement table index joining
bengelhaupt Jun 29, 2021
e734418
Enable remaining index join types
vxrahn Jul 2, 2021
8cd4702
implement fallback NLJ for index join + adapt JoinTestRunner
bengelhaupt Jul 2, 2021
cb91524
fix JoinTestRunner
bengelhaupt Jul 2, 2021
af3bfc9
Integrate into TPCH benchmarks
vxrahn Jul 14, 2021
a728b0e
Add IndexJoin to LQP Translator
vxrahn Jul 14, 2021
edc5d34
Implement templated PartialHashIndexImpl
vxrahn Jul 14, 2021
67aabc7
Fix tests and Segfault
bengelhaupt Jul 14, 2021
0e21ef4
Fix architecture
bengelhaupt Jul 15, 2021
7b3f535
Add copy constructor and assignment on IteratorWrapper and add not_eq…
bengelhaupt Jul 15, 2021
7d6a373
implement adding and removing chunks from PartialHashIndex
bengelhaupt Jul 17, 2021
d70eb74
integrate notequals predicate and revert renaming of AbstractIndex
bengelhaupt Jul 19, 2021
62639db
Resolve some ToDos
bengelhaupt Jul 19, 2021
9e85c53
Fix return type of equals
vxrahn Jul 19, 2021
c677687
Parametrize JoinIndexTest, revert configuration changes in JoinTestRu…
vxrahn Jul 20, 2021
d99cdc5
Add PHI memory consumption tests
vxrahn Jul 20, 2021
2caebc2
Fix OperatorsJoinIndexTest parametrization
vxrahn Jul 20, 2021
0f7408b
Write tests for PHI memory consumption
vxrahn Jul 20, 2021
8334fb6
Fix minor issues
vxrahn Jul 20, 2021
4b0bbad
Extend documentation strings
vxrahn Jul 20, 2021
d67fa24
Remove PartialIndexStatistics
vxrahn Jul 20, 2021
b487cfe
Restructure table-based joining in IndexJoin
vxrahn Jul 20, 2021
9fde53f
Support multiple table indexes per column, fix chunks_joined_with_ind…
vxrahn Jul 20, 2021
02e48d0
Delete AbstractTableIndex constructor
vxrahn Jul 20, 2021
2574e78
Formatting
bengelhaupt Jul 20, 2021
78ffa5a
Start documentation
bengelhaupt Jul 20, 2021
00a5966
Add documentation
vxrahn Jul 20, 2021
94e9a9d
Change get_table_indexes return type to pmr_vector
vxrahn Jul 20, 2021
cd7cb0d
Fix linter issues
vxrahn Jul 20, 2021
406922f
[empty] re-trigger CI
bengelhaupt Aug 2, 2021
24cb24a
fix compilation on clang
bengelhaupt Aug 4, 2021
8ac14a6
fix compilation on clang-9
bengelhaupt Aug 4, 2021
0e027b1
fix compilation on clang-9 #2
bengelhaupt Aug 4, 2021
c0c8f11
fix CI errors (missing import, invalid variable name, const reference…
bengelhaupt Aug 4, 2021
e582e2f
fix linter error (missing whitespace)
bengelhaupt Aug 4, 2021
45fd544
fix CI errors (rename in anonymous namespace, redundant include)
bengelhaupt Aug 4, 2021
8d6d813
Refactor AbstractIndex to AbstractChunkIndex, apply first feedback on…
vxrahn Aug 24, 2021
1a8af0f
Re-trigger CI
vxrahn Aug 24, 2021
64e4fa8
Re-trigger CI
bengelhaupt Aug 24, 2021
3e1d4c6
fix CI errors (missing include, wrong TPCH argument)
bengelhaupt Aug 24, 2021
3bd7787
fix CI errors (wrong TPCC argument)
bengelhaupt Aug 24, 2021
dc2b6d9
Re-trigger CI
bengelhaupt Aug 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ include_directories(
${PROJECT_SOURCE_DIR}/third_party/flat_hash_map
${PROJECT_SOURCE_DIR}/third_party/json
${PROJECT_SOURCE_DIR}/third_party/lz4
${PROJECT_SOURCE_DIR}/third_party/robin-map/include
${PROJECT_SOURCE_DIR}/third_party/zstd
${PROJECT_SOURCE_DIR}/third_party/tpcds-result-reproduction
)
Expand Down
43 changes: 42 additions & 1 deletion src/benchmarklib/abstract_table_generator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "operators/table_wrapper.hpp"
#include "storage/index/group_key/composite_group_key_index.hpp"
#include "storage/index/group_key/group_key_index.hpp"
#include "storage/index/partial_hash/partial_hash_index.hpp"
#include "storage/segment_iterate.hpp"
#include "utils/format_duration.hpp"
#include "utils/list_directory.hpp"
Expand All @@ -22,7 +23,8 @@ void to_json(nlohmann::json& json, const TableGenerationMetrics& metrics) {
{"binary_caching_duration", metrics.binary_caching_duration.count()},
{"sort_duration", metrics.sort_duration.count()},
{"store_duration", metrics.store_duration.count()},
{"index_duration", metrics.index_duration.count()}};
{"index_duration", metrics.index_duration.count()},
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
{"table_index_duration", metrics.table_index_duration.count()}};
}

BenchmarkTableInfo::BenchmarkTableInfo(const std::shared_ptr<Table>& init_table) : table(init_table) {}
Expand Down Expand Up @@ -323,6 +325,45 @@ void AbstractTableGenerator::generate_and_store() {
} else {
std::cout << "- No indexes created as --indexes was not specified or set to false" << std::endl;
}

/**
* Create table indexes if requested by the user
*/
if (_benchmark_config->table_indexes) {
std::cout << "- Creating table indexes" << std::endl;
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
const auto& indexes_by_table = _indexes_by_table();
if (indexes_by_table.empty()) {
std::cout << "- No indexes defined by benchmark" << std::endl;
}
for (const auto& [table_name, indexes] : indexes_by_table) {
const auto& table = table_info_by_name[table_name].table;

auto chunk_ids = std::vector<ChunkID>{};
for (auto chunk_id = ChunkID{0}; chunk_id < table->chunk_count(); ++chunk_id) {
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
chunk_ids.emplace_back(chunk_id);
}
for (const auto& index_columns : indexes) {
for (const auto& index_column : index_columns) {
std::cout << "- Creating table index on " << table_name << " [ ";
std::cout << index_column << " ";
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
std::cout << "with chunk size"
<< " ";
std::cout << chunk_ids.size() << " ";
mweisgut marked this conversation as resolved.
Show resolved Hide resolved
std::cout << "] " << std::flush;

Timer per_table_index_timer;
table->create_table_index<PartialHashIndex>(table->column_id_by_name(index_column), chunk_ids);
tjjordan marked this conversation as resolved.
Show resolved Hide resolved

std::cout << "(" << per_table_index_timer.lap_formatted() << ")" << std::endl;
std::cout << "(" << table->column_id_by_name(index_column) << ")" << std::endl;
mweisgut marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
metrics.table_index_duration = timer.lap();
std::cout << "- Creating table indexes done (" << format_duration(metrics.table_index_duration) << ")" << std::endl;
} else {
std::cout << "- No table indexes created as --table_indexes was not specified or set to false" << std::endl;
}
}

std::shared_ptr<BenchmarkConfig> AbstractTableGenerator::create_benchmark_config_with_chunk_size(
Expand Down
1 change: 1 addition & 0 deletions src/benchmarklib/abstract_table_generator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ struct TableGenerationMetrics {
std::chrono::nanoseconds sort_duration{};
std::chrono::nanoseconds store_duration{};
std::chrono::nanoseconds index_duration{};
std::chrono::nanoseconds table_index_duration{};
};

void to_json(nlohmann::json& json, const TableGenerationMetrics& metrics);
Expand Down
5 changes: 3 additions & 2 deletions src/benchmarklib/benchmark_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ namespace opossum {

BenchmarkConfig::BenchmarkConfig(const BenchmarkMode init_benchmark_mode, const ChunkOffset init_chunk_size,
const EncodingConfig& init_encoding_config, const bool init_indexes,
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
const int64_t init_max_runs, const Duration& init_max_duration,
const Duration& init_warmup_duration,
const bool init_table_indexes, const int64_t init_max_runs,
const Duration& init_max_duration, const Duration& init_warmup_duration,
const std::optional<std::string>& init_output_file_path,
const bool init_enable_scheduler, const uint32_t init_cores,
const uint32_t init_clients, const bool init_enable_visualization,
Expand All @@ -14,6 +14,7 @@ BenchmarkConfig::BenchmarkConfig(const BenchmarkMode init_benchmark_mode, const
chunk_size(init_chunk_size),
encoding_config(init_encoding_config),
indexes(init_indexes),
table_indexes(init_table_indexes),
max_runs(init_max_runs),
max_duration(init_max_duration),
warmup_duration(init_warmup_duration),
Expand Down
5 changes: 3 additions & 2 deletions src/benchmarklib/benchmark_config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ using TimePoint = std::chrono::high_resolution_clock::time_point;
class BenchmarkConfig {
public:
BenchmarkConfig(const BenchmarkMode init_benchmark_mode, const ChunkOffset init_chunk_size,
const EncodingConfig& init_encoding_config, const bool init_indexes, const int64_t init_max_runs,
const Duration& init_max_duration, const Duration& init_warmup_duration,
const EncodingConfig& init_encoding_config, const bool init_indexes, const bool init_table_indexes,
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
const int64_t init_max_runs, const Duration& init_max_duration, const Duration& init_warmup_duration,
const std::optional<std::string>& init_output_file_path, const bool init_enable_scheduler,
const uint32_t init_cores, const uint32_t init_clients, const bool init_enable_visualization,
const bool init_verify, const bool init_cache_binary_tables, const bool init_metrics);
Expand All @@ -31,6 +31,7 @@ class BenchmarkConfig {
ChunkOffset chunk_size = Chunk::DEFAULT_SIZE;
EncodingConfig encoding_config = EncodingConfig{};
bool indexes = false;
bool table_indexes = false;
int64_t max_runs = -1;
Duration max_duration = std::chrono::seconds(60);
Duration warmup_duration = std::chrono::seconds(0);
Expand Down
2 changes: 2 additions & 0 deletions src/benchmarklib/benchmark_runner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,7 @@ cxxopts::Options BenchmarkRunner::get_basic_cli_options(const std::string& bench
("e,encoding", "Specify Chunk encoding as a string or as a JSON config file (for more detailed configuration, see --full_help). String options: " + encoding_strings_option, cxxopts::value<std::string>()->default_value("Dictionary")) // NOLINT
("compression", "Specify vector compression as a string. Options: " + compression_strings_option, cxxopts::value<std::string>()->default_value("")) // NOLINT
("indexes", "Create indexes (where defined by benchmark)", cxxopts::value<bool>()->default_value("false")) // NOLINT
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
("table_indexes", "Create table indexes (where defined by benchmark)", cxxopts::value<bool>()->default_value("false")) // NOLINT
("scheduler", "Enable or disable the scheduler", cxxopts::value<bool>()->default_value("false")) // NOLINT
("cores", "Specify the number of cores used by the scheduler (if active). 0 means all available cores", cxxopts::value<uint32_t>()->default_value("0")) // NOLINT
("clients", "Specify how many items should run in parallel if the scheduler is active", cxxopts::value<uint32_t>()->default_value("1")) // NOLINT
Expand Down Expand Up @@ -531,6 +532,7 @@ nlohmann::json BenchmarkRunner::create_context(const BenchmarkConfig& config) {
{"build_type", HYRISE_DEBUG ? "debug" : "release"},
{"encoding", config.encoding_config.to_json()},
{"indexes", config.indexes},
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
{"table_indexes", config.table_indexes},
{"benchmark_mode", magic_enum::enum_name(config.benchmark_mode)},
{"max_runs", config.max_runs},
{"max_duration", std::chrono::duration_cast<std::chrono::nanoseconds>(config.max_duration).count()},
Expand Down
13 changes: 9 additions & 4 deletions src/benchmarklib/cli_config_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ BenchmarkConfig CLIConfigParser::parse_cli_options(const cxxopts::ParseResult& p
std::cout << "- Creating indexes (as defined by the benchmark)" << std::endl;
}

const auto table_indexes = parse_result["table_indexes"].as<bool>();
if (table_indexes) {
std::cout << "- Creating table indexes (as defined by the benchmark)" << std::endl;
}

// Get all other variables
const auto chunk_size = parse_result["chunk_size"].as<ChunkOffset>();
std::cout << "- Chunk size is " << chunk_size << std::endl;
Expand Down Expand Up @@ -132,10 +137,10 @@ BenchmarkConfig CLIConfigParser::parse_cli_options(const cxxopts::ParseResult& p
std::cout << "- Not tracking SQL metrics" << std::endl;
}

return BenchmarkConfig{
benchmark_mode, chunk_size, *encoding_config, indexes, max_runs, timeout_duration,
warmup_duration, output_file_path, enable_scheduler, cores, clients, enable_visualization,
verify, cache_binary_tables, metrics};
return BenchmarkConfig{benchmark_mode, chunk_size, *encoding_config, indexes, table_indexes,
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
max_runs, timeout_duration, warmup_duration, output_file_path, enable_scheduler,
cores, clients, enable_visualization, verify, cache_binary_tables,
metrics};
}

EncodingConfig CLIConfigParser::parse_encoding_config(const std::string& encoding_file_str) {
Expand Down
6 changes: 6 additions & 0 deletions src/lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,8 @@ set(
storage/frame_of_reference_segment/frame_of_reference_segment_iterable.hpp
storage/index/abstract_index.cpp
storage/index/abstract_index.hpp
storage/index/abstract_table_index.cpp
storage/index/abstract_table_index.hpp
storage/index/adaptive_radix_tree/adaptive_radix_tree_index.cpp
storage/index/adaptive_radix_tree/adaptive_radix_tree_index.hpp
storage/index/adaptive_radix_tree/adaptive_radix_tree_nodes.cpp
Expand All @@ -476,6 +478,10 @@ set(
storage/index/group_key/variable_length_key_store.hpp
storage/index/index_statistics.cpp
storage/index/index_statistics.hpp
storage/index/partial_hash/partial_hash_index.cpp
storage/index/partial_hash/partial_hash_index.hpp
storage/index/partial_hash/partial_hash_index_impl.cpp
storage/index/partial_hash/partial_hash_index_impl.hpp
storage/index/segment_index_type.hpp
storage/lqp_view.cpp
storage/lqp_view.hpp
Expand Down
31 changes: 31 additions & 0 deletions src/lib/logical_query_plan/lqp_translator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
#include "operators/index_scan.hpp"
#include "operators/insert.hpp"
#include "operators/join_hash.hpp"
// #include "operators/join_index.hpp"
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
#include "operators/join_nested_loop.hpp"
#include "operators/join_sort_merge.hpp"
#include "operators/limit.hpp"
Expand Down Expand Up @@ -353,6 +354,36 @@ std::shared_ptr<AbstractOperator> LQPTranslator::_translate_join_node(
const auto left_data_type = join_node->join_predicates().front()->arguments[0]->data_type();
const auto right_data_type = join_node->join_predicates().front()->arguments[1]->data_type();

// comment in if JoinIndex should be tested in benchmarks
/*
const auto& left_input_type = join_node->left_input()->type;
const auto& right_input_type = join_node->right_input()->type;

auto left_table_type = TableType::References;
auto right_table_type = TableType::References;

auto index_side = std::optional<IndexSide>{};
if (left_input_type == LQPNodeType::StoredTable) {
left_table_type = TableType::Data;
index_side = IndexSide::Left;
}

if (right_input_type == LQPNodeType::StoredTable) {
right_table_type = TableType::Data;
index_side = IndexSide::Right;
}

Assert(left_table_type != TableType::Data || right_table_type != TableType::Data,
"Both input tables are data tables. Add additional logic to make a decision about the IndexSide.");

if ((primary_join_predicate.predicate_condition == PredicateCondition::Equals || primary_join_predicate.predicate_condition == PredicateCondition::NotEquals) && index_side &&
JoinIndex::supports({join_node->join_mode, primary_join_predicate.predicate_condition, left_data_type,
right_data_type, !secondary_join_predicates.empty(), left_table_type, right_table_type,
index_side})) {
return std::make_shared<JoinIndex>(left_input_operator, right_input_operator, join_node->join_mode,
primary_join_predicate, std::move(secondary_join_predicates), *index_side);
}*/
mweisgut marked this conversation as resolved.
Show resolved Hide resolved

bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
// Lacking a proper cost model, we assume JoinHash is always faster than JoinSortMerge, which is faster than
// JoinNestedLoop and thus check for an operator compatible with the JoinNode in that order
constexpr auto JOIN_OPERATOR_PREFERENCE_ORDER =
Expand Down
3 changes: 2 additions & 1 deletion src/lib/operators/get_table.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,9 @@ std::shared_ptr<const Table> GetTable::_on_execute() {
++output_chunks_iter;
}

// Also forward table indexes here.
bengelhaupt marked this conversation as resolved.
Show resolved Hide resolved
return std::make_shared<Table>(pruned_column_definitions, TableType::Data, std::move(output_chunks),
stored_table->uses_mvcc());
stored_table->uses_mvcc(), stored_table->get_table_indexes());
}

} // namespace opossum
Loading