-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where can I find the source code of XPaxos? #3
Comments
Not yet released, please follow up subsequent releases |
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY After the commit for Bug#31991688, it was found that an idle system may not ever get around to truncating an undo tablespace when it is SET INACTIVE. Actually, it takes about 128 seconds before the undo tablespace is finally truncated. There are three main tasks for the function trx_purge(). 1) Process the undo logs and apply changes to the data files. (May be multiple threads) 2) Clean up the history list by freeing old undo logs and rollback segments. 3) Truncate undo tablespaces that have grown too big or are SET INACTIVE explicitly. Bug#31991688 made sure that steps 2 & 3 are not done too often. Concentrating this effort keeps the purge lag from growing too large. By default, trx_purge() does step#1 128 times before attempting steps #2 & #3 which are called 'truncate' steps. This is set by the setting innodb_purge_rseg_truncate_frequency. On an idle system, trx_purge() is called once per second if it has nothing to do in step 1. After 128 seconds, it will finally do steps 2 (truncating the undo logs and rollback segments which reduces the history list to zero) and step 3 (truncating any undo tablespaces that need it). The function that the purge coordinator thread uses to make these repeated calls to trx_purge() is called srv_do_purge(). When trx_purge() returns having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread() which will put the purge thread to sleep. It is woke up again once per second by the master thread in srv_master_do_idle_tasks() if not sooner by any of several of other threads and activities. This is how an idle system can wait 128 seconds before the truncate steps are done and an undo tablespace that was SET INACTIVE can finally become 'empty'. The solution in this patch is to modify srv_do_purge() so that if trx_purge() did nothing and there is an undo space that was explicitly set to inactive, it will immediately call trx_purge again with do_truncate=true so that steps #2 and #3 will be done. This does not affect the effort by Bug#31991688 to keep the purge lag from growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag has to be zero and there must be a pending explicit undo space truncate before this extra call to trx_purge is done. Approved by Sunny in RB#25311
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
…TH VS 2019 [#3] [noclose] storage\ndb\src\common\portlib\NdbThread.cpp(1240,3): warning C4805: '==': unsafe mix of type 'int' and type 'bool' in operation Change-Id: I33e3ff9845f3d3e496f64401d30eaa9b992da594
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
Move misplaced comment verbatim Change-Id: Iec47fcbdc5f145e8131dd24b3c34a66258ef74fd
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
…close] Make the range optimizer return AccessPaths instead of TABLE_READ_PLAN. This is the first step of getting rid of TABLE_READ_PLAN and moving everything into AccessPath; currently, it's just a very thin shell: 1. TRPs are still used internally, and AccessPath is created at the very end. 2. Child TRPs are still child TRPs (ie., there are no child AccessPaths). 3. All returned AccessPaths are still of the type INDEX_RANGE_SCAN, wrapping a TRP. 4. Some callers still reach directly into the TRP, assuming #3. Most callers (save for the aforemented #4) use a set of simple wrapper functions to access TRP-derived properties from AccessPaths; as we continue the transformation, this is the main place we'll change the interaction (ie., most of the calling code will remain unchanged). Change-Id: I3d9dc9e33c53d1e5124ea9c47b7d6d9270cd1906
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
Patch #3: Multi-value indexes are not used for MEMBER OF combined with OR. If we have a MEMBER OF predicate that is combined with other predicates with an OR operator, the range optimizer will not consider MEMBER OF as a candidate for using an index. Fixed by adding a case construct for Item_func::MEMBER_OF_FUNC in get_func_mm_tree(). json_overlaps() and json_contains() already has this type of support. This is a contribution by Yubao Liu. Change-Id: I1fd7a78091998437310973b3c24099ad554a58a6
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
This error happens for queries such as: SELECT ( SELECT 1 FROM t1 ) AS a, ( SELECT a FROM ( SELECT x FROM t1 ORDER BY a ) AS d1 ); Query_block::prepare() for query block #4 (corresponding to the 4th SELECT in the query above) calls setup_order() which again calls find_order_in_list(). That function replaces an Item_ident for 'a' in Query_block.order_list with an Item_ref pointing to query block #2. Then Query_block::merge_derived() merges query block #4 into query block #3. The Item_ref mentioned above is then moved to the order_list of query block #3. In the next step, find_order_in_list() is called for query block #3. At this point, 'a' in the select list has been resolved to another Item_ref, also pointing to query block #2. find_order_in_list() detects that the Item_ref in the order_list is equivalent to the Item_ref in the select list, and therefore decides to replace the former with the latter. Then find_order_in_list() calls Item::clean_up_after_removal() recursively (via Item::walk()) for the order_list Item_ref (since that is no longer needed). When calling clean_up_after_removal(), no Cleanup_after_removal_context object is passed. This is the actual error, as there should be a context pointing to query block #3 that ensures that clean_up_after_removal() only purge Item_subselect.unit if both of the following conditions hold: 1) The Item_subselect should not be in any of the Item trees in the select list of query block #3. 2) Item_subselect.unit should be a descendant of query block #3. These conditions ensure that we only purge Item_subselect.unit if we are sure that it is not needed elsewhere. But without the right context, query block #2 gets purged even if it is used in the select lists of query blocks #1 and #3. The fix is to pass a context (for query block #3) to clean_up_after_removal(). Both of the above conditions then become false, and Item_subselect.unit is not purged. As an additional shortcut, find_order_in_list() will not call clean_up_after_removal() if real_item() of the order item and the select list item are identical. In addition, this commit changes clean_up_after_removal() so that it requires the context to be non-null, to prevent similar errors. It also simplifies Item_sum::clean_up_after_removal() by removing window functions unconditionally (and adds a corresponding test case). Change-Id: I449be15d369dba97b23900d1a9742e9f6bad4355
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
…ILER WARNINGS Remove stringop-truncation warning in ndb_config.cpp by refactoring. Change-Id: I1eea7fe190926a85502e73ca7ebf07d984af9a09
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
Remove duplicated NdbEventOperationImpl::m_eventId which is only used in some printouts. Change-Id: Id494e17e3a483a8d049e9aaeb9f41bd6d4ccd847
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
-- Patch #1: Persist secondary load information -- Problem: We need a way of knowing which tables were loaded to HeatWave after MySQL restarts due to a crash or a planned shutdown. Solution: Add a new "secondary_load" flag to the `options` column of mysql.tables. This flag is toggled after a successful secondary load or unload. The information about this flag is also reflected in INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS. -- Patch #2 -- The second patch in this worklog triggers the table reload from InnoDB after MySQL restart. The recovery framework recognizes that the system restarted by checking whether tables are present in the Global State. If there are no tables present, the framework will access the Data Dictionary and find which tables were loaded before the restart. This patch introduces the "Data Dictionary Worker" - a MySQL service recovery worker whose task is to query the INFORMATION_SCHEMA.TABLES table from a separate thread and find all tables whose secondary_load flag is set to 1. All tables that were found in the Data Dictionary will be appended to the list of tables that have to be reloaded by the framework from InnoDB. If an error occurs during restart recovery we will not mark the recovery as failed. This is done because the types of failures that can occur when the tables are reloaded after a restart are less critical compared to previously existing recovery situations. Additionally, this code will soon have to be adapted for the next worklog in this area so we are proceeding with the simplest solution that makes sense. A Global Context variable m_globalStateEmpty is added which indicates whether the Global State should be recovered from an external source. -- Patch #3 -- This patch adds the "rapid_reload_on_restart" system variable. This variable is used to control whether tables should be reloaded after a restart of mysqld or the HeatWave plugin. This variable is persistable (i.e., SET PERSIST RAPID_RELOAD_ON_RESTART = TRUE/FALSE). The default value of this variable is set to false. The variable can be modified in OFF, IDLE, and SUSPENDED states. -- Patch #4 -- This patch refactors the recovery code by removing all recovery-related code from ha_rpd.cc and moving it to separate files: - ha_rpd_session_factory.h/cc: These files contain the MySQLAdminSessionFactory class, which is used to create admin sessions in separate threads that can be used to issue SQL queries. - ha_rpd_recovery.h/cc: These files contain the MySQLServiceRecoveryWorker, MySQLServiceRecoveryJob and ObjectStoreRecoveryJob classes which were previously defined in ha_rpd.cc. This file also contains a function that creates the RecoveryWorkerFactory object. This object is passed to the constructor of the Recovery Framework and is used to communicate with the other section of the code located in rpdrecoveryfwk.h/cc. This patch also renames rpdrecvryfwk to rpdrecoveryfwk for better readability. The include relationship between the files is shown on the following diagram: rpdrecoveryfwk.h◄──────────────rpdrecoveryfwk.cc ▲ ▲ │ │ │ │ │ └──────────────────────────┐ │ │ ha_rpd_recovery.h◄─────────────ha_rpd_recovery.cc──┐ ▲ │ │ │ │ │ │ │ │ │ ▼ │ ha_rpd.cc───────────────────────►ha_rpd.h │ ▲ │ │ │ ┌───────────────────────────────┘ │ │ ▼ ha_rpd_session_factory.cc──────►ha_rpd_session_factory.h Other changes: - In agreement with Control Plane, the external Global State is now invalidated during recovery framework startup if: 1) Recovery framework recognizes that it should load the Global State from an external source AND, 2) rapid_reload_on_restart is set to OFF. - Addressed review comments for Patch #3, rapid_reload_on_restart is now also settable while plugin is ON. - Provide a single entry point for processing external Global State before starting the recovery framework loop. - Change when the Data Dictionary is read. Now we will no longer wait for the HeatWave nodes to connect before querying the Data Dictionary. We will query it when the recovery framework starts, before accepting any actions in the recovery loop. - Change the reload flow by inserting fake global state entries for tables that need to be reloaded instead of manually adding them to a list of tables scheduled for reload. This method will be used for the next phase where we will recover from Object Storage so both recovery methods will now follow the same flow. - Update secondary_load_dd_flag added in Patch #1. - Increase timeout in wait_for_server_bootup to 300s to account for long MySQL version upgrades. - Add reload_on_restart and reload_on_restart_dbg tests to the rapid suite. - Add PLUGIN_VAR_PERSIST_AS_READ_ONLY flag to "rapid_net_orma_port" and "rapid_reload_on_restart" definitions, enabling their initialization from persisted values along with "rapid_bootstrap" when it is persisted as ON. - Fix numerous clang-tidy warnings in recovery code. - Prevent suspended_basic and secondary_load_dd_flag tests to run on ASAN builds due to an existing issue when reinstalling the RAPID plugin. -- Bug#33752387 -- Problem: A shutdown of MySQL causes a crash in queries fired by DD worker. Solution: Prevent MySQL from killing DD worker's queries by instantiating a DD_kill_immunizer before the queries are fired. -- Patch #5 -- Problem: A table can be loaded before the DD Worker queries the Data Dictionary. This means that table will be wrongly processed as part of the external global state. Solution: If the table is present in the current in-memory global state we will not consider it as part of the external global state and we will not process it by the recovery framework. -- Bug#34197659 -- Problem: If a table reload after restart causes OOM the cluster will go into RECOVERYFAILED state. Solution: Recognize when the tables are being reloaded after restart and do not move the cluster into RECOVERYFAILED. In that case only the current reload will fail and the reload for other tables will be attempted. Change-Id: Ic0c2a763bc338ea1ae6a7121ff3d55b456271bf0
xiewajueji
pushed a commit
that referenced
this issue
May 5, 2024
Bug#34486254 - WL#14449: Mysqld crash - sig11 at rpd::ConstructQkrnExprCond Bug#34381109 - Hypergraph offload Issue : LEFT JOIN test cases failing in i_subquery tests Bug#34432230: Enabling testcase along with BM failures in order_by_limit_extended_mixed_varlen and scgts Bug#34408615 - Assertion failure `args->ctx->m_lirid != kInvalidRelId' in ha_rpd_qkrn_expr.cc Bug#34471424 - HYPERGRAPH HeatWave Visible fields not populated Bug#34472083 - WL#14449: Mysqld crash - Assertion `extract_ctx.cmn_expr_or == nullptr' failed Bug#34395166 - HYPERGRAPH BUG: QUERIES not offloading on heatwave in MTR SUITE Bug#34056849 - WL#14449: Offload issue with dictionary encoding Bug#34381126 - Hypergraph offload Issue : QCOMP test file non offload bug Bug#34450394 - Hypergraph result mismatch Bug#34472373 WL#14449: Mysqld crash - Assertion `args->ctx->m_lirid != kInvalidRelId' failed Bug#34472354 WL#14449: Mysqld crash - sig11 at rpdrqce_check_const_cols_rpdopn Bug#34472069 WL#14449: Mysqld crash - Assertion `n < size()' failed Bug#34472058 WL#14449: Mysqld crash - sig11 at rpdrqc_construct_phyopt_bvfltr Bug#34143535 - WL#14449: task formation error Bug#34356273 - HYPERGRAPH BUG: CAST binary having issues with DICTIONARY ENCODING Bug#34381303 - Hypergraph offload Issue : LIMIT DUAL not offloading Bug#34356238 - HYPERGRAPH BUG: CAST DATE WITH DOUBLE_PRECISION Bug#34448736 - Hypergraph Result mismatch:Result mismatch with user variables BUG#34388727: Enabling testcases Bug#34413698 - Hypergraph Union issues Bug#34432241 - Hypergraph out of stack memory issue in rapid.qcomp_bugs_debug_notubsan_notasan Bug#34369934 - Hypergraph Performance : TPCDS q93 qcomp issue -2 Bug#34399991 - HYPERGRAPH BUG: crash in cp_i_subquery_dict MTR file Bug#34057893 - Fixing MTR timeout by reducing the partial JOIN search space Bug#33321588 Hypergraph Result Mismatch : Cannot process QEXEC JSON document expected for each HeatWave node in query [no-close] Bug#34395166 - HYPERGRAPH BUG: QUERIES not offloading on heatwave in MTR SUITE Bug#34086457 - Hypergraph offload Issue : constant not marked correctly Bug#34380519 BUG#33294870 BUG#34114292: Enabling testcases after these bug fixes. BUG#34079278 : Partially enabling testcases for fixes cases. BUG#33321588 : Fixing 'Cannot process QEXEC JSON document expected for each HeatWave node in query' Error Bug#34360222 - HYPERGRAPH BUG: QUERIES WITH RANGE LIKE 1+1 NOT OFFLOADING WITH DICT ENCODING Bug#34412319 - HyperGraph: sig 11 on bm mtr cp_blob_dict Bug#34403562 - HyperGraph: Signal 6 while running rapid.cp_blob_dict testcase Bug#34360341 - HYPERGRAPH BUG: QUERIES not offloading on heatwave with VARLEN ENCODING Bug#34012291 - Hypergraph Offload Issue : Subquery is OOM instead of error ER_SUBQUERY_NO_1_ROW Bug#34399868 - HYPERGRAPH BUG: Output mismatch in cp_i_index_merge Bug#34289251 - WL#14449: post-join filters set as inner joins' extra predicates are not handled Bug#34381354 - Hypergraph offload Issue : DATE COUNT not offloading Bug#34119506 - Hypergraph Result Mismatch : Decimal precision issue Bug#34399722 - HYPERGRAPH BUG: Output mismatch with mysql Bug#34360278 - HYPERGRAPH BUG: QUERIES not offloading on heatwave Bug#34369223 - HyperGraph: Offload failure when hypergraph is ON Bug#34361863 - Impossible Condition Cases Failing with HyperGraph Bug#34289797 - Hypergraph Optimizer: query projecting expression from inner side of outer join does not offload Bug#34128728 - Hypergraph Crash : Due to ZERO ROWS Bug#34066930 - Hypergraph Result Mismatch : Wrong result with Zero Rows Bug#34078549 - Hypergraph Result Mismatch : Wrong result with ZERO ROWS Select_varlen test Bug#33426211 - Hypergraph Offload issue : Due to the absence of few corner case optimizations Bug#34086457 - Hypergraph offload Issue : constant not marked correctly Bug#34299494 - Hypergraph : Disable commutative INNER JOIN Bug#34299823 - Hypergraph Optimizer: Issue with projection set for partial plan Bug#33380501 - WL#14449: expected error ER_SUBQUERY_NO_1_ROW but query succeeds Bug#33811377 - WL#14449: SUBQUERY item in JOIN's extra predicates is not detected in partial plan Bug#33410257 - Hypergraph Offload Issue : Rapidside ENUM issue * Add new file ha_rpd_estimate_ap.cc for costing AccessPath trees using the new Hypergraph Optimizer. * Rework function CollectAllBaseTables() to not return any size information -- instead it can simply be computed by iterating over the map passed to it. * Add member to Qkrn_context to store a pointer to the Hypergraph Optimizer object. When that is set we have a partial plan, otherwise it's the final plan. * Add a couple of new timers for Hypergraph-based costing * Replace all occurences of JOIN::having_for_explain with JOIN::having_cond, because the former one is not always populated correctly anymore. * Ignore SQL SELECT_BIG_RESULT hint as it does not have any meaning for RAPID. * Set flags for handlerton::secondary_engine_flags * Add new function Rapid_execution_context::IsQueryPushable() for partial plans. * Currently, the patch contains a fix in the costing code which enables costing of any temporary table. This is ported forwards from the change for bug 34162247. * Allow dumping partial plans by appending the overal partial plan ID for item dump and QKRN dumps. * Some const-ness fixes / improvements. * Add code in ha_rpd_qkrn_ap.cc to extract the projection list for the root translate state of a partial plan. * More fixes to partial plan projection set computation: In function ExtractGroupByHypergraph(), reuse function ExtractProjectionItemHypergraph() to extract sub-expressions correctly such that they math the current state_map. In function ExtractWindowHypergraph() take into account the current state_map, which was missing before and then for a base TABLE_SCAN we could pick up expressions from another base TABLE_SCAN, which led to offload errors. * In HashJoinPullJoinConditionFromExtraPredicate(), remove a superfluous check whether Item_func::type() == Item::FUNC_ITEM. * In TranslateHashJoin(), where we check the extra predicates also for inner joins, since those represent post join filters, initially we used UpdateItemListHashJoin(), which would project whole expressions from the post join filter, which leads to larger projection lists of the join and its children. For instance, for a query like SELECT t1.a, t2.b FROM t1 JOIN t2 ON ... WHERE t1.a > 5 OR t2.b > 10 the WHERE condition ends up as post-join filter. Then, with the previous approach, t1 would project "t1.a" and "t1.a > 5" and t2 would project "t2.b" and "t2.b > 10" and then the post join filter would degrade into "qkrcol(t1.a > 5) != 0 OR qkrcol(t2.b > 10) != 0". Change to use UpdateItemList() which extracts only Item_fields, i.e. base columns, to match the behavior from the old optimizer. * In ExtractProjectionItemHypergraph(), project all expressions for the final partial plan of a query block. This is necessary when e.g. a child query block is of the form SELECT 1, 2, 3, t1.a FROM t1 LIMIT 5 Then, if we always ignore all constants in the TABLE_SCAN then when creating the TOPK node for the LIMIT we'll try to project the constants from there, which is not supported. * Add TPC-H/DS perf job queries without STRAIGHT JOIN hints. * Dump hypergraph performance profile timers to QEXEC_pp JSON dump. * In Mtoq_table::GetCell() an earlier patch introduced a small optimization which was intended to skip an mrow_item in the case of aggregation Items (ITEM_SUM_FUNCs). The idea is that when one is <aggregate>(DISTINCT) and the other is just <aggregate>(), then they cannot be the same. Also, when the number of arguments differ, or when they are just different aggregates, then we don't need to call AreItemsEqual() after the special ITEM_SUM_FUNC code path. However, the "optimization" was wrong and skipped too many Items such that some were not found at all in the MtoQ table anymore. * Add more state to the hypergraph optimizer to remove code from HeatWave side. In particular: * To decide whether ORDER BY LIMIT OFFSET can be handled we only need to check a new flag. * Whether there are multiple base tables in a query block is also tracked through a hypergraph optimizer flag. * Add handling for hypergraph filters on top of derived tables and joins. Improve speed of Mtoq_table::GetCell ======================================== Before, we called Qkrn_context::GetQueryBlock() in each loop iteration over the cells for the given query block. That is, however, already done before the loop once and we can re-use that retrieved query block. For DISTINCT aggregation Items (SUM_FUNC_ITEM) we have a special code path to compare them as equal when e.g. Also, improve code of AreDistinctItemListEqual(): * Use only prefix incrementor. * Use copy operator to populate vector "it", which will reserve space to avoid multiple re-allocations. * Rename "it" to "first_copy" and "it1" to "iter". Bug 34289251 ============ Since the Hypergraph optimizer, post-join filters can be encoded as join extra predicates, which was not yet considered in HeatWave's AccessPath translation. This may result in wrong results, as in the bug's case, where the query is quite simple: SELECT * FROM t1 NATURAL JOIN t2 WHERE t1.b > t2.b; Here, the post-join filter "t1.b > t2.b" is encoded as a join extra predicate. The fix is to "simply" also consider those for inner joins, not only for outer joins and semi joins and ensure that an intermediate qkrnts is created for the post-join filter. Care had to be taken to also ensure that all of the predicates' sub-expressions are added to both projected and required items for the join itself. Additionally added a few comments to function calls for constants and added checks for return value of calls to UpdateItemListHashJoin(). Bugs 33380501, 33811377: ======================== * Reset Qkrn_context::m_leaf_table_count before counting the leaf tables * In CountLeafTables(): The hypergraph optimizer counts derived tables and common table expressions as leaf tables in their parent query blocks, so do not traverse any further. * Bug 33380501: patch by Kajori Banerjee. Adds function rpdrqcj_rejects_multiple_rows() which recursively checks JOIN key expressions whether they contain a qkrcol which has reject_multiple_rows_qkrcol == true. * Bug 33811377: Root cause: through IsQueryPushable we don't identify all cases where a subquery item is involved. In this case it is a join's extra predicate whose string representation is <in_optimizer>(8,<exists>(select #3)) When we try to translate the expression it is inside an Item_cond_and and since it is const_for_exection(), we call Item::is_null(), which then executes the subquery and then leads to a mysqld crash. The fix is to also check evaluate_during_optimization() which identifies such a case. This is added in several places where we call a function which potentially evaluates the Item (like is_null()). By that, we avoid interpreting the subquery item as a constant and we will try to fully translate this subquery item, which will then hit the default (unsupported) case in ConstructQkrnExprTreeSwitch() by which we bail out of query translation. * Avoid assertions in rapidjson due to calling IsQueryPushable() for every subplan. All calls to PrintOffloadTrace passed abort_query = true which however is wrong for partial plans, because we may get a supported one later. * Split cp_join_prepart into dict and varlen tests. * Make the Extract***Hypergraph functions for partial plans only return bool (false=success, true=error). * Fix extraction of required items of GROUP BY keys and aggregation expressions for partial plans. This must be done similar to AccessPath style extraction using CompileItem. Therefore, refactored function ExtractRequiredItemsForGroupBy() to use the same functionality for partial plan extraction, too. * In RapidEstimateQueryCostsAP() when we catch a RapidException, for now always print and offload trace and dump to screen the type and string message. * Pick up more test changes from the original branch. * Extract projection expressions from all WINDOWs' partition by and order by lists. * Re-introduce handling aggregate items in UpdateItemList, which was removed during code coverage-related clean-up for worklog 14344 (AP uptake). * We were not aware that for semi joins we need to always project anexpression from the inner side due to some QCOMP issue. Filed bug 34252350 to track this issue. * Fix issue with correlated semi joins where an OR condition was attached to the semi join as an additional condition (extra predicate) but was ignored because any non-Item::FUNC_ITEM was ignored when inspecting the extra predicates. Fix is to add Item::COND_ITEMs, too. * In the AP cost estimation code, print the QCOMP exception if one occurs to console (for debugging, will be removed later). * In TranslateDerivedTable(): When a derived table MATERIALIZE AccessPath is the root node and we're translating a partial plan, then when re-initializing the Qkrn_context for the parent query block (which contains the derived table's MATERIALIZE) we need to call InitializeQkrnContextPartialPlan() instead of InitializeQkrnContext(). * Fix issues when WITH ROLLUP modifier is present and when extracting expressions for the root AccessPath for partial plans. * The Qkrn_context flags m_has_gby, m_has_ftag, m_has_windowfunc, and m_has_oby were used wrongly: they only indicate whether the current query block has one of those, but they don't indicate whether corresponding qkrn nodes were created. * One issue with ENUM columns and WITH ROLLUP: when the ENUM column is inside an Item_rollup_group_item then do * Remove argument "is_outer_query_block" from TransformQkrn(). It is not needed as we can be sure that transformation is only done once perpartial plan or query fragment. Bug 33410257 - Hypergraph Offload Issue : Rapidside ENUM issue =============================================================== The algorithm for detecting whether there are string operations on an enum column in ContainsEnumOpn() was incomplete. It needs to keep track of the parent operations, similar to how ContainsUnsupportedCastEnumToChar() works. Bug 34471424 - HYPERGRAPH HeatWave Visible fields not populated =============================================================== For few partial plans in the bug's query, which contain a common table expression, the hidden bitmap is not populated properly which leads to an empty initial projection set for the first AccessPath inside the CTE. Fix is to ensure for partial plans that the projection items of the CTE are all marked as not hidden in the hidden bitmap. Note that there *may* be a different root cause which is specific to some rewrite where we have a CTE inside another derived table. Bug 34472083: ============= The code for extracting potential join predicates from parent FILTER AccessPaths was very strict by asserting that OR conditionals are not nested. This can be, however, the case and we should be more graceful. Especially for directly-nested ORs we can simply pretend as if those ORs had been merged already by the resolver/optimizer and proceed. For more complex nested OR/AND filter or join predicates now just completely skip trying to extract any predicates. Added test wl14449_bugs with dict and varlen variants. Bug 34395166: ============= * scgts: The query had multiple partial plans failing because the hidden_map for HeatWaveVisibleFields adapter was not correct for some Common Table Expressions. As a quick fix, this is now corrected on the HeatWave side and in the meantime will be discussed with the MySQL optimizer team whether this is actually a bug on their side. The issue seems to be the following: The affected SCGTS query has a quite deep nesting of CTEs and derived tables and UNIONs and one of the CTE ismactually merged into another one. In the query, CTE snapshot_inventory_item_sub3 is merged into CTE snapshot_inventory_item_sub4. while more CTEs may have been merged, we use only this example for illustration. Then, for computing the hidden_map, we check for each CTE instance its parent query block and search for all occurrences of Item_fields from that CTE. For that search, we use amongst others Query_block::top_join_list. Now in this case, the top_join_list however only contains the tables from the query block BEFORE the other CTE (and its tables and joins) was merged into that query block. In this case, the list only contains tables * po_octg_line * sub3b while it should contain tables * po_octg_line * snapshot_inventory_item_sub2 AS sub2 * state_prov * invtry_item * pt_bol_date Those tables can be found in Query_block::leaf_tables, but not in Query_block::top_join_list. The (temporary?) fix is to check both Query_block::leaf_tables and Query_block::top_join_list. * astro: The query is successfully offloaded again, re-enabled the test case and re-recorded the result file. Other Changes: ============== * ha_rpd_estimate_ap.cc: Correctly set qcomp_ret upon errors during QCOMP. * cloud_variables.result: Update due to latest bm pre-merge job. * qcomp_bugs_debug_notubsan_notasan.result: Update due to latest bm pre-merge job. Bug 34056849 - WL#14449: Offload issue with dictionary encoding ================================================================ 1. The test cases in bug_poc_dict are rigtfully not offloaded given MYSQL's type input. On mysql-8.0 there are implicit casts (e.g. string to double) which enable these test cases to offload on that branch, but MySQL behavior is not consistent across all comparison functions and BETWEEN. Limitations on the HyperGraph branch are consistent (see the newly added test cases) and are in agreement with known HeatWave limitations with dictionary encoding. 2. Marked test cases in bushy_replaylog_dict are offloaded on the latest branch. 3. Test cases in join_outer_dict are rightfully not offloaded given the AP plan: the join condition requires comparison of different dictionary encoded columns (coming from a different base column). With natural join the plan is different: the join condition is pushed into both tables and the join is a cartesian product - hence there's no issue with comparing dict-encoded columns. On 8.0 both plans contain a cartesian product with two pushed-down filters - hence the offload issue does not exist. Bug 34381126 : ================ Recording the testcases since bug not reproducible anymore Bug 34450394 : ================== Mismatch was due to insert into + select. Changing test case to use order by in select ensures the output is deterministic. Bug#34472373 =================== Partial plan 4: ====================== -> Inner hash join prefix over( SUBQUERY2_t1_SUBQUERY2_t2): * Projection set = [] -> FILTER prefix: * table_map = 0x0001 * Original filter = (unhex('') = concat_ws('0',SUBQUERY2_t1.col_varchar_key * Projection set = [unhex('')] * Required items = [unhex(''), concat_ws('0',SUBQUERY2_t1.col_varchar_key) * DummyFields = [(none)(=0), (none)(=0)] -> TABLE_SCAN prefix on SUBQUERY2_t1: * Actual filter = (unhex('') = concat_ws('0',SUBQUERY2_t1.col_varchar_key) * Projection set = [unhex(''), SUBQUERY2_t1.col_int_key] Problem : =================== ExtractRequiredItemsForFilter adds the constant to the required items. Bug#34472354 =================== Same fix as above resolved the issue. Bug#34472069 and Bug#34472058 : fixed in the latest branch sum_distinct-big is timing out Job 1094368 Bug#34143535 - WL#14449: task formation error ============================================== 1) The distinct inside an inner query block of a partial plan was notapplied. Updated the function ReadyToHandleDistinctAndOrderBy. 2) Problem : Earlier all the partial plans were dumped irrespective of any system variable or flag. Solution : Dump hypergraph partial plan only when system variable qkrn_print_translate_states is set to true. Bug 34356238 ============== Closed since diff was because we only match MySQL upto 5 decimal places. Bug 34448736 : =============== Closed since the diff matches with MySQL output. Bug 34413698 - Hypergraph Union issues ====================================== Problem: Union with hypergraph cases failed due to proper projection with special case of derived tables. Solution: Update Projection Populate such that with this special case of derived table + union, we appropriately populate projection set. Bug#34432241: ============== Unnecessary call to ItemToString in IsConstValueItem causing the issue. Removing the redundant call to ItemToString. Bug#34369934 ============ Join pattern changed due to the change in the order of the join keys with the hypergraph. Bug 34399991 - HYPERGRAPH BUG: crash in cp_i_subquery_dict MTR file =================================================================== Ensure filter items do not have subquery in them Bug#34395166 -P6 ================== Query : SELECT * FROM t1 WHERE c_id=@min_cid OR c_id=@max_cid; Has ZERO rows. But it was not getting offloaded from IsQueryPushable. Solution : Do not check for IsQueryPushable for ZERO_ROWS, FAKE_SINGLE_ROW and ZERO_ROWS_AGGREGATED. Bug#34395166 - P5 =================== Query : SELECT * FROM t3 WHERE col_varchar_10_latin1_key IN ( SELECT alias1.col_varchar_10_latin1_key FROM t1 AS alias1 LEFT JOIN t1 AS alias2 JOIN t2 AS alias3 ON alias2.col_varchar_10_latin1_key ON alias1.col_varchar_1024_utf8_key WHERE alias1.pk AND alias1.pk < 3 OR alias1.pk AND alias3.pk); The BV filter has the condition alias1.pk<>0 AND alias1.pk < 3. The hash join node projects the expression (alias1.pk < 3) instead of the individual columns. This later creates a problem during qkrn transformation of the BV filter as one child of the condition (AND) is a column instead of an expression. Bug#34395166 - P2 ==================== Query: SELECT RANK() OVER (ORDER BY AVG(id)) FROM t1; The item_map of (RANK() OVER (ORDER BY AVG(id))) = 0 Hence it was not projected from the window function of the partial plan 4. As a result the window node projected the dummy column leading to an offload issue. BUG#33321588 : ============== - Root cause for this issue is, the ordering of the query result is not in expected format. - Heatwave doesn't ensure ordering without explicit ORDER BY clause. - ORDER BY clause is added at required place. Bug 34086457 - Hypergraph offload Issue : constant not marked correctly ======================================================================= Problem: coalesce(col1, col2) is not marked as Item type ITEM_CACHE which leads to false value returned from isConstValue Solution: check for function with string constants in isConstValue Bug#34399868 - HYPERGRAPH BUG: Output mismatch in cp_i_index_merge ================================================================= This is actually *NOT* a HyperGraph bug! There are 2 test cases which are resulting in result mismatch on mysql-8.0 as well. A bug has been open for them. This did not show up before because cp_i_index_merge.test relies only on randomized data. That is also risking unstable mtr. A temporary solution here is to just put explain for all test cases that might be unstable - which is what this patch is doing. In the meantime the underlying bug 34423734 needs to be fixed on 8.0 and those concrete test cases (without randomness) will be appended to this test file. Bug#34399722 - HYPERGRAPH BUG: Output mismatch with mysql ============================================================== The ROW_NUMBER() is a window function or analytic function that assigns a sequential number to each row in the result set. The first number begins with one. Since there is no ordering metioned in the window function, the row numbers might be assigned in a different order. Bug#34361863 - Impossible Condition Cases Failing with HyperGraph ================================================================== for hypergraph Rapid_execution_context fields like m_subquery_type and m_best_outer_plan_num are not updated during join order enumeration. Hence checking them in HasOnlyInvalidPlans is not relevant. BUG#34289797 : ======================== Not reproducible, thus enabling the relevant testcase and re-recording the test file. Additionally, more queries offload with hypergraph, changing offload correspondingly for bushy_bugs_varlen and bushy_bugs_dict test files. BUG 34128728, Bug 34066930 Bug 34078549 Fix for Zero rows induced problems with Hypergraph Optimizer ================================================================== Problem: Hypergraph Optimizer (may) sometime propose JOIN plans with Zero Row in Inner/Outer Child. In this case, we handle by inserting Filter and ZeroRowAggregate replacing the ZeroRow AP. However, the PopulateProjectionSet does not pick up the correct state map of tables under the new Filter AP. TranslateHash join now does special handling for OuterJOIN and ZeroRows, but this was now being done for inner join as well, which is incorrect. Solution: We introduce m_zero_row_state_map member in Translate_State::filter struct, which is used to resolve the state map when we replace the Zero Row AP with Filter AP. In subsequent calls to GetUsedTables() from PopulateProjectionSet() for new Filter() AP, we are able to resolve the correct state_map, thus handling the projection correctly. Secondly, in TranslateHashJoin(), the special processing of zero rows in outer join case has additional check to ensure it's invoked only when outerjoin is present. Bug#34299494 - Hypergraph : Disable commutative INNER JOIN ============================================================ Say Query is SELECT * FROM t1, t2 WHERE t1.a=t2.a; At present the hypergraph optimizer explores both join orders (t1 t2) and (t2 t1). Since both are same with respect to HeatWave perspective, it does not makes sense to explore (t2 t1) when ( t1 t2) is already explored. Bug#34299823 - Hypergraph Optimizer: Issue with projection set for partial plan ============================================================== * In ExtractProjectionItemHypergraph(), also project expressions which do not reference any tables. This fixes queries where partial plans would otherwise have no SELECT list at all, for instance SELECT isnull(gen_rnd_ssn()) FROM ... * In TranslateHashJoin(), for extra predicates which are used as post-join filters, use UpdateItemListHashJoin() instead of UpdateItemList, because it updates local variables "has_from_inner" and "has_from_outer" which may be important for dummy projection. Example query is e.g. from data_masking.test:267: SELECT isnull(gen_rnd_ssn()), gen_rnd_pan(gen_range(1,1)) FROM t1 JOIN t1 AS t2 ON t1.e > t2.e GROUP BY gen_rnd_ssn(); For that query the GROUP BY key and the SELECT list do not contain any expressions over base table columns. Hence, when reaching a plan with the AGGREGATE AccessPath, we don't have any required items for it, but it needs at least one required item for dummy projection (see TranslateGroupBy() for step case 1). Further fixes: * Correctly handle the post-join filter retrieved through GetHyperGraphFilterCond(). First, all sub-expressions must be projected from the join node itself and then the post-join filter must be added to the intermediate qkrnts' filter, too. * Add function DumpItemToString() which simply prints the result of ItemToString() for the given Item to the console. This is just a small debug utility function because my gdb does not print that output anymore. Bug#34490127 - Always enforce Hypergraph Optimizer for HeatWave Bug#34303379 - mysqld got signal 6 in rpd::ExtractCommonExpression at ha_rpd_qkrn_ap.cc Bug#34242831 - [Sonny] Cost estimation: AttachToCommonParent: Expected a join node Bug#34151831 - Query offload fails at ExtractItemFromDerivedTable Bug#33659812 - LEFT JOIN OFFLOAD ISSUE: AttachToCommonParent: Expected a join node Bug 34490127: ============= * Set optimizer switch for using the hypergraph optimizer in the constructor of Rapid_execution_context to use always it for HeatWave going forward. * Added an offload Check in RapidOptimize() that indeed the Hypergraph Optimizer is used. * To take effect we had to move the code in sql_select.cc (function Sql_cmd_dml::prepare) which updates lex->using_hypergraph_optimizer to after the call to open_tables_for_query() such that the secondary engine's prepare hook is called before checking the optimizer switch, otherwise the first change above would stay ineffective. * Removed adding the cmake flag WITH_HYPERGRAPH_OPTIMIZER from file _helper.sh again such that we can test the current behavior that the hypergraph optimizer is disabled by default for other engines than HeatWave in PERF builds. We circumvent this behavior for HeatWave by setting the optimizer switch programmatically. * Remove the `--hypergraph` flag again from r_mtr.json. * Update test files to not source files not_hypergraph.inc or have_hypergraph.inc, because they test for hypergraph being enabled or not by setting the optimizer_switch, which is now not possible on PERF builds anymore and we don't need that behavior anymore with the above changes. * partition_dict.test and partition_varlen.test : Added missing enforcement of secondary engine. * Revert result for test autodb_hcs_console_queries * Re-enable tests from disabled.def : rapid.autodb_hcs_console_queries and rapid.qcomp_cost_model_evaluation_tpch. Re-record result for qcomp_cost_model_evaluation_tpch.result. * Re-record affected test results. Bugs 34303379, 34242831, 34151831, 33659812: ============================================ * Added test cases * Added a separate sonnys test file for all bugs which use the respective schema. WL#14449 - Query_term support * Implement remaining changes for proper Query_term support. * In the qkropn dumping code which tracks the use of tables and base table columns, checks were missing whether RapidSchema is nullptr. This led to crashes in some DEBUG MTR tests. * Stabilize and improve several tests WL#14449 - Handle FILTER AccessPaths with FALSE predicate Until Bug 34494609 is properly fixed, add functionality to replace FILTER AccessPaths with only FALSE-predicate by ZERO_ROWS or ZERO_ROWS_AGGREGATED as appropriate. The latter is used when ZERO_ROWS can be propagated all the way up to an aggregation AccessPath in the current Query_block. The ZERO_ROWS will be propagated as far up the current query block as possible. When the patch for this bug is fixed from MySQL side, this functionality can be removed again. Change-Id: I77e6b7a75bb9071d4ad4fbc22b445c4bd51a82c7
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: