diff --git a/docs/blogs/tech/DDL-Execution-Efficient.md b/docs/blogs/tech/DDL-Execution-Efficient.md new file mode 100644 index 000000000..037d0f453 --- /dev/null +++ b/docs/blogs/tech/DDL-Execution-Efficient.md @@ -0,0 +1,304 @@ +--- +slug: DDL-Execution-Efficient +title: 'Insights into OceanBase Database 4.0: How to Make DDL Execution Efficient and Transparent in a Distributed Database' +--- + +# Insights into OceanBase Database 4.0: How to Make DDL Execution Efficient and Transparent in a Distributed Database + +> About the author: Xie Zhenjiang, Senior Technical Expert of OceanBase, joined the company in 2015. He specializes in storage engine development and currently works on index, data definition language (DDL), and I/O resource scheduling in the index and DDL team. + +Looking back at the development of relational databases since their wide application, we can conclude that the shift from standalone to distributed architecture is undoubtedly a key transition, which is driven by emerging business needs and the explosive growth of data volume. + +On the one hand, a larger data volume nurtures more possibilities for socioeconomic development. On the other hand, it requires better database performance to curb the O&M costs and possible new faults coming along with the increase of storage nodes. Therefore, making operations on a distributed database transparent as on a standalone database becomes one of the keys to improving user experience. As frequently performed operations in database O&M, DDL operations should be transparent to both business developers and O&M engineers. + +Frontline O&M engineers often say: "You can only perform DDL operations late at night." or "It takes a great deal of time, sometimes weeks, to execute DDL statements." These are also challenges faced by database vendors. We believe that the solution lies in efficient and transparent DDL operations. In other words, a database should ensure that the execution of DDL statements is fast and does not disrupt other business development or O&M tasks. + +In OceanBase Database V4.0, we have made innovations based on the existing native online DDL capabilities. Firstly, we have implemented data synchronization through direct load to increase the availability of DDL operations. Secondly, we have improved the standalone execution performance and distributed execution scalability of DDL operations to speed up responses. Thirdly, we have supported more features, such as primary key change, partitioning rule modification, column type modification, and character set modification, to further enhance the native online DDL framework. + +We hope these updates can help users easily handle complex business scenarios. In this article, we will describe how [OceanBase Database V4.0](https://github.com/oceanbase/oceanbase) has achieved efficient and transparent DDL operations, and introduce the features and the design ideas of new DDL operations in OceanBase Database V4.0 from the following perspectives: + +* What DDL operations are more user-friendly? +* How do we achieve high-performance DDL operations in OceanBase Database? +* What's new about the DDL operations in OceanBase Database V4.0? +* Hands-on testing of new DDL operations in OceanBase Database V4.0 + +What DDL Operations Are More User-friendly? +=============== + + + +To answer this question, we need to understand the concept of DDL operations. In addition to data manipulation language (DML) statements that directly manipulate data, such as the SELECT, INSERT, UPDATE, and DELETE statements, database O&M also involves other statements, like the CREATE, ALTER, DROP, and TRUNCATE statements, which are intended to change table schemas or other database objects and are related to data definition. Statements of the latter type are referred to as DDL statements. Adding a column to a table and adding an index to a column, for example, are two everyday DDL operations. + +In the early days of database development, the execution of DDL statements was considered one of the most expensive database operations, because DDL operations rendered tables unreadable and blocked ongoing tasks at that time. It would hold back database services for a long time to execute DDL statements on a table containing a large amount of data, which was unacceptable for critical businesses that must stay online all the time. Online DDL was therefore introduced to keep user requests alive while executing DDL statements. So far, most online DDL-based databases on the market have not made DDL operations fully transparent to users. + +* Most standalone databases apply transient locks during online DDL operations. For example, DDL operations in large transactions in a MySQL database may block user requests. +* Online DDL operations in many distributed databases also disturb user requests in some business scenarios due to the limitations of the distributed architecture. +* Developed for standalone databases, online DDL focuses on addressing the impact of DDL operations on normal user requests. It does not consider the response to exceptions of a node, such as a server crash, in a distributed database. + +In this era of data explosion, the execution time of DDL statements is also a limiting factor in speeding up the business upgrade. In standalone databases, parallel sorting is usually used to maximize the execution speed of DDL statements. However, the speed is limited by the performance bottleneck of the standalone architecture. In distributed databases, an industry-wide practice is to complete data by simulating the insert operation, which cannot make full use of the performance of every single server and ignores the benefits of scalability. + +Arguably, the original online DDL feature alone can no longer catch up with the business needs today. + +We believe that the modern DDL feature should provide at least the following two benefits to better meet users' business needs. First, the execution of DDL statements does not affect DML or data query language (DQL) operations on the business side and succeeds despite exceptions such as server crashes in a distributed system. Second, DDL statements can be executed in parallel in both standalone and distributed systems to help users with rapid business innovation. + +How Do We Achieve High-performance DDL Operations in OceanBase Database? +================== + +We hope to build OceanBase Database into a database product that is highly efficient and transparent enough to users. + +When it comes to transparency, unlike their standalone cousins, distributed databases need to overcome node status inconsistency during DDL operations. To address this issue, most peer database vendors follow the "DDL first" principle in their product design, which cannot avoid the impact on user requests in some business scenarios. In contrast, we prioritize business requests in designing OceanBase Database. We have also tried our best to shield users from the impact of distributed execution so that they can execute DDL statements in a distributed database as in a standalone one. + +As for execution efficiency, we have accelerated data completion, the most time-consuming DDL operation, by integrating design ideas of a standalone database, rather than using the widely adopted insertion simulation method, and achieved scalable performance of data completion in a distributed database. This makes DDL operations sufficiently efficient in OceanBase Database. + +### Distributed online DDL: putting business requests first + +Before walking you through OceanBase online DDL, we have to mention the online asynchronous schema change protocol of Google F1, which was introduced in the paper _Online, Asynchronous Schema Change in F1_ and has been applied in many distributed databases, such as CockroachDB, to support online DDL operations. This protocol is complicated. Simply put, since table writes are not supposed to be disabled during the execution of DDL statements, it is likely that the schema version varies with database nodes. This protocol ensures data consistency by introducing multiple schemas in intermediate states. + +Further, Google F1 does not have a global list of members. It forces periodical increment of the schema version without taking into account the server or transaction status during DDL execution. Also, it ensures that no more than two schema versions are used at the same time in a cluster. Therefore, Google F1 puts a limit on the execution time of transactions. A node will kill itself and quit if it cannot get the latest schema version, thus affecting the execution of all transactions on it. In a word, Google F1 gives priority to the execution of DDL statements regardless of the impact on transactions. We call it a "DDL first" design. + +Unlike Google F1, OceanBase Database has a global list of members and coordinates with the members related to tables to be changed by DDL operations. The schema version is pushed forward only when the transaction status of all nodes meets the requirements for data consistency after a DDL operation. This way, the execution of general transactions is not affected. When a node cannot be refreshed to the latest schema version, instead of killing it, OceanBase Database restricts the execution of transactions related to the table on which DDL statements are being executed on the node. The execution of DDL statements on other tables is not affected. Apparently, the priority is given to business requests in OceanBase Database. + +We tested the impact of DDL execution on business requests in Google F1 and OceanBase Database by creating indexes. The following table shows the results: + +![1677827526](/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.png) + +_Table 1 Impact of index creation in OceanBase Database V3.x and Google F1_ + +In addition to giving priority to business requests and supporting transparent DDL execution, OceanBase Database V4.0 also enhances the high availability of DDL operations, so that the DDL execution time may not be prolonged in the case of node exceptions. We will provide more details on this later. + +### Efficient data completion + +Some DDL operations, such as creating indexes and adding columns, require data completion. Since OceanBase Database V1.4, we have classified these DDL operations into two types: instant DDL operations, which modify the schema and complete data asynchronously, and real-time DDL operations, which complete data in real time. + +Most distributed databases achieve real-time data completion by simulating the insert operation. The strong point of this method is that it simply completes the data by reusing data write capabilities of DML operations and synchronizes the insert operation to the backups, such as replicas and standby clusters. The problem is, the data writes go through a complex process of SQL execution, transaction execution, and memory ordering, and, if the storage architecture is based on the log-structured merge-tree (LSM-tree), multiple data compactions are performed, leading to poor performance. Therefore, we have integrated data sorting and direct load, two typical features of standalone databases, into the data completion operation in OceanBase Database. However, unlike standalone databases, OceanBase Database performs distributed sorting and optimizes the LSM-tree-based storage architecture to get better performance. + +**1. Distributed sorting** + +OceanBase Database V3.x reuses distributed sorting capabilities of the old SQL execution framework in DDL execution, which feature performance scalability. However, the efficiency of DDL execution on a single server falls short of expectation. OceanBase Database V4.0 performs distributed sorting based on the new SQL execution framework. The execution performance is significantly improved. + +**2. Optimization of the LSM-tree-based storage architecture** + +Unlike the B-tree, an update-in-place storage model commonly adopted in conventional databases, an LSM-tree-based storage architecture updates incremental data to incremental MemTables and writes data to persistent SSTables only by performing data compactions. This feature makes it much easier for OceanBase Database to accelerate data completion in DDL operations. On the one hand, operations like adding columns are natural instant DDL operations and the data can be asynchronously completed during compactions. On the other hand, for real-time DDL operations such as creating indexes, OceanBase Database can coordinate DDL and DML operations to get a version number where data completion is finished, and the transaction data of earlier versions is all committed. This way, the completed data is written to SSTables, and the incremental data generated by DML operations is written to MemTables. The incremental data generated during index creation can be maintained in real time without synchronizing data as in update-in-place storage. + +After years of development, OceanBase Database now supports most online DDL operations on indexes, columns, generated columns, foreign keys, tables, and partitions. + +![1724655654](/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.png) + +_Table 2 Online DDL operations supported by OceanBase Database V3.x_ + + + +What's New about the DDL Operations in OceanBase Database V4.0? +======================== + +### New DDL operations + +Before OceanBase Database V4.0, we had learned that some users often needed to change the structure of database objects such as primary keys and partitions to support their new business needs. As such DDL operations rewrite the data of the original tables, we call them data-rewrite DDL operations. + +So, what exactly are the purposes of data-rewrite DDL operations? + +* Modify the partitioning rules: If the data volume or workload of an originally small business has outgrown the capacity of a single server, and the user needs to specify some table columns in the WHERE clause of a SELECT or UPDATE statement, the user can perform data-rewrite DDL operations to partition the original table based on these columns to distribute the data volume or workload to multiple nodes. +* Modify the character set: If the collation of a column is mistakenly set to be case-insensitive, the user can perform data-rewrite DDL operations to change it to be case-sensitive. +* Change the column type: If, for example, a column of the INT type can no longer meet the business requirements, the user can perform data-rewrite DDL operations to change the type to VARCHAR. +* Change the primary key: If the self-defined ID column of a business table is used as the primary key, the user can perform data-rewrite DDL operations to use an auto-increment column as the primary key. + + + +When the business of a user grows, not only the business size gets bigger, but also more database features are profoundly engaged in the business. This means that the DDL operations must grow with the business to support the business development in the long run. We found, however, that it is not the case for most distributed databases on the market today. Some do not support enough features, such as the change of primary keys or partitioning rules; others rewrite data by simulating data reinsertion, where the existing data is exported for new data insertion, which is inefficient and may interrupt other transactions of users. + +OceanBase Database of an earlier version usually requires users to perform data-rewrite DDL operations by manually migrating the data in four steps: executing the DDL statements on the original table to create an empty table, exporting the data from the original table, writing the exported data to the new table, and renaming the original table and renaming the new table to the name of the original table. This method has many shortcomings. For example, it involves multiple steps and if a step fails, users must roll back the operations manually or by using external tools; the migration efficiency is low; and a server crash event makes it even harder to deal with idempotence issues when, for example, handling tables without primary keys. + +In OceanBase Database V3.X, data is not rewritten after column deletion or addition. In OceanBase Database V4.0, data is rewritten after column deletion, column addition, or newly supported column relocating, which enables immediate partition exchange. We also plan to offer an option involving no data rewrite, which does not support immediate partition exchange but can be much faster. + +OceanBase Database V4.0 supports native data-rewrite DDL operations. Users can get their job done, like modifying a partitioning rule or changing a primary key, character set, or column type, by simply executing a DDL statement, without caring about environmental exceptions during the operation. + +![1677827731](/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.png) + +_Table 3 New DDL operations supported in OceanBase Database V4.0_ + +To better support these new operations, we have enhanced the native online DDL feature by: + +* Supporting the atomic change of a table with multiple dependent objects +* Significantly improving the data-completion performance of the native online DDL +* Supporting the high-availability synchronization of the data generated by direct loads +* Performing data consistency checks on the data of both a table and the dependent objects of the table to ensure the data consistency of DDL operations. + + + +### Atomic change that ensures synchronized data updates + + + +The atomic change feature ensures that users see the updated table schema and data if the DDL operation succeeds, and the original table schema and data if the DDL operation fails. A data-rewrite DDL operation involves two jobs. First, the existing table data is modified based on the new table schema. Second, objects depending on the table, such as indexes, constraints, foreign keys, and triggers, are modified based on the new table schema. + +In a distributed database, the data of a table may be distributed on different nodes, which brings two challenges: + +* How to ensure the atomic change of the distributed data and dependent objects? +* How to ensure that users see only the updated table schema, given the fact that the latest update time of the table schema is different across nodes after the update is completed? + +In response, we have designed a table schema change process to ensure the atomic change of the data and multiple dependent objects in a distributed environment, and users can query and perform DML operations on the table based on the latest table schema after a DDL operation. + +The reason to ensure the atomic change is that unexpected database kernel exceptions may occur during a DDL operation. For example, we have a table with an INT-type column, which is used as a unique index. If we modify the column type to TINYINT and several values of the column exceed the range of the TINYINT type, all values of the rows where these invalid values reside will be truncated to the upper bound of the TINYINT type, resulting in duplicate values in the column, which does not meet the UNIQUE constraint. At this point, the data has been partially rewritten, and the DDL operation rolls back. The atomic change feature ensures that the user sees the original data rather than a messed-up table. + +### Parallel execution that improves data completion speed + + + +Most distributed databases migrate data by simulating the insert operation. This method has two drawbacks. First, the operation may contend with general business requests for row locks. Second, the performance is significantly lower in comparison with a conventional standalone database due to the control of transaction concurrency, the control of thread safety of in-memory indexes, and multiple data writes. + +To reduce the business impact of DDL operations and improve the DDL statement execution efficiency, OceanBase Database migrates data from the original table to the new table by using a method with distributed sorting and direct load, much like creating an index. Distributed sorting incurs less CPU overhead because fewer transactions are involved, the memory structure is maintained in order, and multiple compactions are avoided in the process. Direct load avoids data writes to MemTables and multiple compactions, which reduces the memory and I/O overhead. + +We have redesigned the distributed execution plan for data completion during DDL operations in OceanBase Database V4.0 based on the new parallel execution framework. The new plan has two parallel subplans. One consists of sampling and scanning operators, and the other consists of sorting and scanning operators. + +![1677827843](/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.png) + +_Figure 1 Distributed execution plan for data completion in OceanBase Database V4.0_ + +This plan makes full use of distributed and standalone parallel execution capabilities: + +The two parallel subplans may be scheduled at the same time based on the new framework and executed on multiple servers in a pipeline where the parallel subplan 1 returns rows to the parallel subplan 2 and then the rows are processed. + +To prevent data skew among partitions, we split each partition into multiple slices, which are processed by different SORT operators. Each SORT operator will process the data of multiple partitions. As a sample division algorithm is applied, the split partition slices are roughly equal in size, so that the sorting workload is balanced across operators. + +We have also adopted some techniques to improve the execution efficiency on a single server. For example, the vectorized engine is used for batch processing in the data completion process where possible; data writes to the local disk and data synchronization with other nodes are performed in parallel; more efficient sampling algorithms are applied; and a new framework is used for the static data engine to avoid the repeated copy of the row metadata. Those techniques help improve the performance of all operations involving data completion, such as index creation and data-rewrite DDL operations. + +### More stringent availability requirements + +Distributed sorting and direct load significantly improve the performance of data completion. The question is, how to synchronize data imported through direct load to follower replicas and standby clusters? OceanBase Database V4.0 supports the synchronization of data imported through direct load to SSTables to follower replicas and standby clusters over the Paxos protocol. During the data replay, only the data address and metadata of macroblocks in SSTables are replayed in the in-memory state machine. This solution has the following benefits: + +* Data imported through direct load is highly available and the DDL execution is not affected when a minority of nodes crash. +* Data imported through direct load is compressed by data encoding and general-purpose compression algorithms. The data size is much smaller than the original table, leading to fast data synchronization. +* The data synchronization to the follower replicas and standby clusters is based on the same logic. No special coding is required. + + + +### Enhanced data consistency check + +During a data-rewrite DDL operation, the data is migrated from the original table to a new table. The user data is expected to be consistent after the operation. OceanBase Database V4.0 performs consistency checks after a successful DDL operation to ensure data consistency and rolls back the DDL operation when an unexpected error occurs. Specifically, OceanBase Database V4.0 checks not only the new table, but also all of its dependent objects such as indexes, constraints, and foreign keys. A DDL operation succeeds only if the data of both the table and its dependent objects are consistent. + +Hands-on Testing of New DDL Operations in OceanBase Database V4.0 +===================== + +### Testing of new features + +**1. Perform primary key operations** + +(1) Add a primary key. +``` + OceanBase(admin@test)>create table t1(c1 int); +``` + + +(2) Drop a primary key. +``` + OceanBase(admin@test)>create table t1(c1 int primary key); +``` +(3) Change a primary key. + + +``` + OceanBase(admin@test)>create table t1(c1 int, c2 int primary key); +``` +**2. Modify partitioning rules** +(1) Convert a non-partitioned table into a partitioned table. +``` + OceanBase(admin@test)>alter table t1 partition by hash(c1) partitions 4;Query OK, 0 rows affected (1.51 sec) +``` + + +(2) Convert a non-partitioned table into a subpartitioned table. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 datetime); +``` + + +(3) Convert a partition into another partition. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 datetime, primary key(c1, c2)) +``` + + +(4) Convert a partition into a subpartition. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 datetime, primary key(c1, c2)) +``` +(5) Convert a subpartition into a partition. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 datetime, primary key(c1, c2)) partition by range(c1) subpartition by key(c2) subpartitions 5 (partition p0 values less than(0), partition p1 values less than(100)); +``` + + +(6) Convert a subpartition into another subpartition. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 datetime, primary key(c1, c2)) partition by range(c1) subpartition by key(c2) subpartitions 5 (partition p0 values less than(0), partition p1 values less than(100)); +``` + + +**3. Change the type of a column** + + +Users can change the data length and data type of a column, change a normal column to an auto-increment column, and change the character set of a column. + +(1) Shorten the data length of a column. +``` + OceanBase(admin@test)>create table t1(c1 varchar(32), c2 int, primary key(c1,c2)); +``` + + +(2) Increase the data length of a column. +``` + OceanBase(admin@test)>create table t1(c1 varchar(32), c2 int, primary key(c1,c2)); +``` + + +(3) Change the data type of a column. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 int, primary key(c1,c2)); +``` + + +(4) Change a normal column to an auto-increment column. +``` + OceanBase(admin@test)>create table t1(c1 int, c2 int, primary key(c1,c2)); +``` + + +(5) Change the character set of a column. +``` + @test +``` + + +**4. Change character sets** + +(1) Change the character sets of existing table data. +``` + OceanBase(admin@test)>create table t1 (c1 int, c2 varchar(32), c3 varchar(32), primary key (c1), unique key idx_test_collation_c2(c2)); +``` + +### Performance testing + +We have tested the performance of DDL execution by creating indexes. In the test, a number of data rows are imported into a table and the time consumption of index creation is measured. As a large part of the time is consumed for data completion, we can evaluate the data completion performance of the tested databases. + +**Configuration** + +1. Table schema: create table t1(c1 int, c2 varchar(755)) partition by hash(c1) partitions 10 +2. Data volume: 10 million rows +3. Resource configuration: one server, with the degree of parallelism set to 10 and the memory for sorting to 128 MB +4. Test scenarios: create index i1 on t1(c1) global; create index i1 on t1(c2) global; create index i1 on t1(c1,c2) global; create index i1 on t1(c2,c1) global; +5. Test metric: time consumption of index creation, in seconds +6. Tested databases: a standalone MySQL database, a distributed database A, and OceanBase Database V4.0 + +**Test results** +As shown in the following figure, OceanBase Database creates the index 10–20 times faster than database A, and 3–4 times faster than MySQL. Note that data completion is performed by simulating the insert operation in database A. Apparently, data completion by sorting and direct load significantly improves the performance of index creation. On the other hand, we have optimized the single-server performance of OceanBase Database V4.0, which therefore finishes data completion much faster than MySQL. + +![1677828576](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/pord/blog/2023-04/1677828576757.png) + +_Figure 2 Performance comparison_ + +Afterword +==== + +OceanBase Database V4.0 supports common data-rewrite DDL operations, such as changing primary keys, column types, and character sets, and modifying partitioning rules. We hope that the atomic change feature, enhanced data consistency check, and high-availability data synchronization can help users complete the required change by simply executing a DDL statement, without worrying about exceptions in the distributed environment. We have also improved the distributed and standalone parallel execution capabilities to speed up data completion during DDL operations. + +Hopefully, the DDL optimizations of OceanBase Database V4.0 can help users cope with changing business challenges with ease. \ No newline at end of file diff --git a/docs/blogs/tech/Real-time-AP.md b/docs/blogs/tech/Real-time-AP.md new file mode 100644 index 000000000..f0997cdf1 --- /dev/null +++ b/docs/blogs/tech/Real-time-AP.md @@ -0,0 +1,231 @@ +--- +slug: Real-time-AP +title: 'OceanBase Database 4.3 - Milestone Release for Real-time AP' +--- + + +In early 2023, OceanBase Database V4.1 was released. It is the first milestone version of the V4.x series and supports an integrated architecture for standalone and distributed modes. Such integrated architecture reduces the recovery time objective (RTO), a database reliability indicator, to less than 8 seconds, ensuring rapid database recovery from an unexpected failure. Unlike the V3.x series, the new version does not limit the number of partitions, providing higher capacity for processing large transactions. Core features such as the arbitration replica are supported to cut costs. + +In September 2023, OceanBase Database V4.2.1 was released. As the first Long-Term Support (LTS) version of the V4.x series, it augments all core features of the V3.x series, and demonstrates improved performance in many aspects such as stability, scalability, support for small specifications, and ease of diagnostics. Six months after its release, hundreds of customers have deployed this LTS version in their production environments for stable operations. + +To meet higher expectations on ease-of-use and capabilities of tackling miscellaneous workloads, we have released OceanBase Database V4.3.0, which is rigorously implemented on top of open design after thorough research. + +OceanBase Database V4.3.0 sets a significant milestone on our roadmap to achieve real-time analytical processing (AP). This version provides a columnar engine based on the log-structured merge-tree (LSM-tree) architecture, which implements hybrid columnar and row-based storage. The database also introduces a new vectorized engine based on column data format descriptions and a cost model based on columnar storage. This way, wide tables can be effectively processed and the query performance in AP scenarios is significantly improved without affecting transactional processing (TP) business scenarios. Overall, the new OceanBase Database version is well-suited for mixed workload scenarios involving complex analytics, real-time reporting, real-time data warehousing, and online transactions. The materialized view feature is provided. Query results are pre-calculated and stored in materialized views to improve real-time query performance, and support rapid report generation and data analysis. The kernel in the new version also extends online DDL and adds support for tenant cloning. It has optimized performance and system resource usage, and provides better system usability. In a test with the same hardware configurations, the performance of OceanBase Database V4.3.0 in wide-table queries is comparable with mainstream columnstore databases in the industry. + +Now, let's take a closer look at key updates of OceanBase Database V4.3.0: + +-   **TP and AP integration** + +-   **High-performance kernel** + +-   **Higher computing performance** + +-   **Ease-of-use enhancements** + + + +**1. TP and AP Integration** +--------------------- + +In addition to features of V4.2, such as highly concurrent real-time row updates, and point queries of the primary key indexes, OceanBase Database V4.3.0 introduces more AP services. Its scalable distributed architecture also supports high availability, strong consistency, and geo-disaster recovery. The new version provides a columnar engine and enhances vectorized execution, parallel computing, and distributed plan optimization. This way, the database supports both TP and AP business. + +### **(1) Integrated columnar and row-based storage** + +Columnar storage is one of the key capabilities of AP databases in complex large-scale data analysis and ad hoc queries of massive data. Columnar storage is a way to organize data files. Different from row-based storage, columnar storage physically arranges data in a table by column. When data is stored by column, the system can scan only the columns involved in the query and calculation, instead of scanning the entire row. This way, the consumption of resources such as I/O and memory is reduced and the calculation is accelerated. Moreover, columnar storage naturally provides better data compression conditions, making it easier to achieve higher compression ratios, thereby reducing the usage of storage space and network transmission bandwidth. + +However, columnar engines generally assume limited random updates and attempt to ensure that data in columnar storage is static. When a large amount of data is updated randomly, system performance will inevitably degrade. The LSM-tree architecture of OceanBase Database can process baseline data and incremental data separately, and therefore can solve the performance issue. Therefore, OceanBase Database V4.3.0 supports the columnar engine based on the current architecture, implementing integrated columnar and row-based data storage on an OBServer node with only one set of code and one architecture, and ensuring the performance of both TP and AP queries. + +To help users with AP requirements smoothly use the new version, OceanBase Database has adapted and optimized several modules, including the optimizer, executor, DDL, and transaction processing, for the columnar engine. These optimizations introduce a new cost model and vectorized engine based on columnar storage, enhancements to the query pushdown feature, and features like skip index, a new column-based encoding algorithm, and adaptive compactions. + +To make AP queries easy, we recommend that you run the following command in a MySQL or Oracle tenant of OceanBase Database to create a columnstore table by default: +``` + alter system set default_table_store_format = "column" +``` + +You can flexibly create a business table as a rowstore table, columnstore table, or hybrid rowstore-columnstore table based on the load type. You can also create a columnstore index for a rowstore table. + +![1713849286](/img/blogs/tech/Real-time-AP/image/1713849285226.png) + +![1713849297](/img/blogs/tech/Real-time-AP/image/1713849296536.png) + +The optimizer determines, based on estimated costs, whether to scan a hybrid rowstore-columnstore table by row or by column. + +![1713849311](/img/blogs/tech/Real-time-AP/image/1713849310310.png) + +### **(2) New vectorized engine** + +Earlier versions of OceanBase Database have implemented a vectorized engine based on uniform data format descriptions, offering performance significantly better than that of non-vectorized engines. However, the engine still has some performance deficiencies in deep AP scenarios. OceanBase Database V4.3.0 implements the vectorized engine 2.0, which is based on column data format descriptions, avoiding the memory usage, serialization, and read/write access overhead caused by ObDatum maintenance. Based on the reconstruction of data format descriptions, the new vectorized engine also reimplements more than 10 commonly used operators such as HashJoin, AGGR, HashGroupBy, and Exchange (DTL Shuffle), as well as over 20 MySQL expressions including relational operations, logical operations, and arithmetic operations. Subsequent V4.3.x versions will further improve and implement other operators and expressions based on the new vectorized engine to achieve better performance. + +### **(3) Materialized views** + +OceanBase Database V4.3.0 introduces materialized views. Materialized views are a key feature for AP business scenarios. By precomputing and storing the query results of views, real-time calculations are reduced to improve query performance and simplify complex query logic. Materialized views are commonly used for rapid report generation and data analysis scenarios. + +Materialized views need to store query result sets to optimize the query performance. Due to data dependency between a materialized view and its base tables, data in the materialized view must be refreshed accordingly when data in any base tables changes. Therefore, the materialized view refresh mechanism is also introduced in the new version, including complete refresh and incremental refresh strategies. Complete refresh is a relatively direct method. Each time a refresh operation is performed, the system re-executes the corresponding query statement of a materialized view to recalculate and overwrite the original result set. This method is applicable to scenarios with a small amount of data. Incremental refresh, by contrast, only deals with data that has been changed since the last refresh. To achieve accurate incremental refresh, OceanBase Database implements a materialized view log feature that is similar to Oracle Materialized View Log (MLOG). The feature tracks incremental data updates in base tables and records the updates in logs. This ensures that materialized views can be refreshed incrementally in a short period. Incremental refresh is particularly useful in business scenarios with large data volumes and frequent data changes. + + + +**2. High-performance Kernel** +--------------- + +The kernel in the new version has enhanced the cost model, added support for tenant cloning, extended online DDL, added Amazon Simple Storage Service (S3) as the backup and restore media, restructured the session management module, and optimized the log stream state machine and system resource usage, to improve database performance and stability in handling key business workloads. + +### **(1) Enhanced row estimation system** + +As the OceanBase Database version evolves, more cost estimation methods are available for the optimizer. For row estimation of each operator, a variety of algorithms, such as row estimation based on the storage layer, row estimation based on statistics, dynamic sampling, and default statistics, are supported. However, there are no clear strategies and complete control methods for using row estimation. OceanBase Database V4.3.0 reconstructs the row estimation system. Specifically, it prioritizes row estimation strategies based on scenarios and provides methods such as hints and system variables for you to manually intervene in the selection of a row estimation strategy. This version also enhances the predicate selectivity and number of distinct values (NDV) calculation framework to improve the accuracy of cost estimation by the optimizer. + +### **(2) Enhanced statistics** + +OceanBase Database V4.3.0 improves the statistics feature, statistics collection performance, and the compatibility and usability of statistics. Specifically, this version reconstructs the offline statistics collection process to improve the collection efficiency, optimizes the statistics collection strategies to automatically collect information about index histograms by default and collect statistics in a deductive manner, and ensures transaction consistency for online statistics collection. It is compatible with the `DBMS_STATS.COPY_TABLE_STATS` procedure of Oracle for statistics copying, and is also compatible with the `ANALYZE TABLE` statement of MySQL. It provides a command to cancel statistics collection, enriches the monitoring on the statistics collection progress, and enhances maintenance usability. It also supports the parallel deletion of statistics. + +### **(3) Adaptive cost model** + +In earlier versions of OceanBase Database, the cost model uses constant parameters measured by internal machines to represent hardware system statistics, and describes the execution overhead of each operator by using a series of formulas and constant parameters. However, in real business scenarios, different hardware environments may have different CPU clock frequencies, sequential or random read speeds, and NIC bandwidths, thereby resulting in cost estimation deviations. The optimizer cannot always generate optimal plans in different business environments because of these deviations. The new version implements the cost model in an optimized way to support the `DBMS_STATS` package for collecting or setting system statistics coefficients, thus adapting the cost model to hardware. It also provides the `DBA_OB_AUX_STATISTICS` view to display the system statistics coefficients of the current tenant. + +### **(4) Fixed session variables for function-based indexes** + +When a function-based index is created on a table, a hidden virtual generated column is added to the table and defined as the index key of the function-based index. The values of the virtual generated column are stored in the index table. The results of some built-in system functions are affected by session variables. The calculation result of a function varies based on the values of session variables, even if the input arguments are the same. When a function-based index or generated column is created in this version, session variables on which the function-based index or generated column depends are fixed in the column schema to improve stability. When values of the indexed column or generated column are calculated, fixed session variable values are used. Therefore, the calculation result is not affected by variable values in the current session. OceanBase Database V4.3.0 supports fixed values of the system variables `timezone_info`, `nls_format`, `nls_collation`, and `sql_mode`. + +### **(5) Online DDL expansion in MySQL mode** + +OceanBase Database V4.3.0 supports more online DDL scenarios for column type changes, including: + +-   **Conversion of integer types:** Online DDL operations, instead of offline DDL operations, are performed to change the data type of a primary key column, indexed column, generated column, column on which a generated column depends, or column with a `UNIQUE` or `CHECK` constraint to an integer type with a larger value range. + +-   **Conversion of the DECIMAL data type:** For columns that support the DECIMAL data type, online DDL operations are performed to increase the precision within any of the \[1, 9\], \[10, 18\], \[19, 38\], and \[39, 76\] ranges without changing the scale. + +-   **Conversion of the BIT or CHAR data type:** For columns that support the BIT or CHAR data type, online DDL operations are performed to increase the width. + +-   **Conversion of the VARCHAR or VARBINARY data type:** For columns that support the VARCHAR or VARBINARY data type, online DDL operations are performed to increase the width. + +-   **Conversion of the LOB data type:** To change the data type of a column that supports LOB data types to a LOB data type with a larger value range, offline DDL operations are performed for columns of the TINYTEXT or TINYBLOB data type, and online DDL operations are performed for columns of other data types. + +-   **Conversion between the TINYTEXT and VARCHAR data types:** For columns that support the TINYTEXT data type, online DDL operations are performed to change the VARCHAR(x) data type to the TINYTEXT data type if `x <= 255`, and offline DDL operations are performed if otherwise. For columns that support the VARCHAR data type, online DDL operations are performed to change the TINYTEXT data type to the VARCHAR(x) data type if `x >= 255`, and offline DDL operations are performed if otherwise. + +-   **Conversion between the TINYBLOB and VARBINARY data types:** For columns that support the TINYBLOB data type, online DDL operations are performed to change the VARBINARY(x) data type to the TINYBLOB data type if `x <= 255`, and offline DDL operations are performed if otherwise. For columns that support the VARBINARY data type, online DDL operations are performed to change the TINYBLOB data type to the VARBINARY(x) data type if `x >= 255`, and offline DDL operations are performed if otherwise. + +### **(6) Globally unique client session ID** + +Prior to OceanBase Database V4.3.0 and OceanBase Database Proxy (ODP) V4.2.3, when the client executes `SHOW PROCESSLIST` through ODP, the client session ID in ODP is returned. However, when the client queries the session ID by using an expression such as `connection_id` or from a system view, the session ID on the server is returned. A client session ID corresponds to multiple server session IDs. This causes confusion in session information queries and makes user session management difficult. In the new version, the client session ID generation and maintenance process is reconstructed. When the version of OceanBase Database is not earlier than V4.3.0 and the version of ODP is not earlier than V4.2.3, the session IDs returned by various channels, such as the `SHOW PROCESSLIST` command, the `information_schema.PROCESSLIST` and `GV$OB_PROCESSLIST` views, and the `connection_id`, `userenv('sid')`, `userenv('sessionid')`, `sys_context('userenv','sid')`, and `sys_context('userenv','sessionid')` expressions, are all client session IDs. You can specify a client session ID in the SQL or PL command `KILL` to terminate the corresponding session. If the preceding version requirements for OceanBase Database and ODP are not met, the handling method in earlier versions is used. + +### **(7) Improvement of the log stream state machine** + +In OceanBase Database V4.3.0, the log stream status is split into the in-memory status and persistent status. The persistent status indicates the life cycle of a log stream. After the OBServer node where a log stream resides breaks down and then restarts, the system determines whether the log stream should exist and what the in-memory status of the log stream should be based on the persistent status of the log stream. The in-memory status indicates the runtime status of a log stream, representing the overall status of the log stream and the status of key submodules. Based on the explicit status and status sequence of the log stream, underlying modules can determine which operations are safe to the log stream and whether the log stream has gone through a status change of the ABA type. For backup and restore or migration processes, the working status of a log stream is optimized after the OBServer node where the log stream resides restarts. This feature improves the stability of log stream-related features and enhances the concurrency control on log streams. + +### **(8) Tenant cloning** + +OceanBase Database V4.3.0 supports tenant cloning. You can quickly clone a specified tenant by executing an SQL statement in the sys tenant. After a tenant cloning job is completed, the created new tenant is a standby tenant. You can convert the standby tenant into the primary tenant to provide services. The new tenant and the source tenant share physical macroblocks in the initial state, but new data changes and resource usage are isolated between the tenants. You can clone an online tenant for temporary data analysis with high resource consumption or other high-risk operations to avoid risking the online tenant. In addition, you can also clone a tenant for disaster recovery. When irrecoverable misoperations are performed in the source tenant, you can use the new tenant for data rollback. + +### **(9) Support for S3 as the backup and restore media** + +Earlier versions of OceanBase Database support two types of storage media for backup and restore: file storage (NFS) and object storage such as Alibaba Cloud Object Storage Service (OSS) and Tencent Cloud Object Storage (COS). The new version supports Amazon S3 and S3-compatible object storage like Huawei Cloud Object Storage Service (OBS) and Google Cloud Storage (GCS) as the log archive and data backup destination. You can also use backup data on S3 and S3-compatible object storage for physical restore. + +### **(10) Proactive broadcast/refresh of tablet locations** + +In earlier versions, OceanBase Database provides the periodic location cache refresh mechanism to ensure that the location information of log streams is updated in real time and consistent. However, tablet location information can only be passively refreshed. Changes in the mappings between tablets and log streams can trigger SQL retries and read/write errors with a certain probability. OceanBase Database V4.3.0 supports proactive broadcast of tablet locations to reduce SQL retries and read/write errors caused by changes in mappings after transfer. It also supports proactive refresh to avoid unrecoverable read/write errors. + +### **(11) Migration of active transactions during tablet transfer** + +In the design of standalone log streams, data is in the unit of tablets, while logs are in the unit of log streams. Multiple tablets are aggregated into one log stream, saving the high cost of two-phase commit of transactions within a single log stream. To balance data and traffic among different log streams, tablets can be flexibly transferred between log streams. However, during the tablet transfer process, active transactions may still be handling the data, and even a simple operation may damage the atomicity, consistency, isolation, and durability (ACID) of the transactions. For example, if active transaction data on the transfer source cannot be completely migrated to the transfer destination during concurrent transaction execution, the atomicity of the transactions cannot be guaranteed. In earlier versions, active transactions were killed during the transfer to avoid transaction problems. This mechanism affects the normal execution of transactions to some extent. To resolve this problem, the new version supports the migration of active transactions during tablet transfer, which enables concurrent execution of active transactions and ensures that no abnormal rollbacks or consistency issues occur in concurrent transactions due to the transfer. + +### **(12) Memory throttling mechanism** + +Prior to OceanBase Database V4.x, only a few modules release memory based on freezes and minor compactions, and the MemTable is the largest part among them. Therefore, in earlier versions, an upper limit is set for memory usage of the MemTable, enabling it to run as smoothly as possible within the memory usage limit and avoiding writing failures caused by sudden memory exhaustion. In OceanBase Database V4.x, more modules that release memory based on freezes and minor compactions are introduced, such as the transaction data module. The new version provides more refined means to control the memory usage of various modules and supports the memory upper limit control of TxData and metadata service (MDS) modules. The two modules share memory space with the MemTable. When the sum of the memory usage of the three modules reaches `Tenant memory × _tx_share_memory_limit_percentage% × writing_throttling_trigger_percentage%`, overall memory throttling is triggered for the three modules. The new version also supports freezes and minor compactions of the transaction data table by time to reduce the memory usage of the transaction data module. By default, the transaction data table is frozen once every 1,800 seconds. + +### **(13) Optimization of DDL temporary result space** + +During DDL operations, many processes may store temporary results in materialized structures. Here are two typical scenarios: 1) During index creation, the system scans data in the base data table and sorts and inserts the obtained data to the index table. If the memory is insufficient during the sorting process, current data in the memory space will be temporarily stored in materialized structures to release the memory space for subsequent scanning. Data in the materialized structures is then merged and sorted. 2) In the columnar storage direct load scenario, the system first temporarily stores the data to be inserted into each column group in materialized structures, and then obtains data from the materialized structures for insertion. These materialized structures can be used in the `SORT` operator to store intermediate data required for external sorting. When the system inserts data into column groups, the data can be cached in materialized structures, avoiding additional overhead caused by repeated table scanning. As a result, the temporary files occupy considerable disk space. The new version eliminates unnecessary redundant structures to simplify the data flow, and supports encoding and compression of temporary results for storage on disks. This greatly reduces the disk space occupied by temporary files. + + + +**3. Higher Computing Performance** +--------------- + +The online analytical processing (OLAP) capabilities are significantly enhanced in the new version, achieving a performance boost in TPC-H 1TB and TPC-DS 1TB tests. The new version also optimizes PDML, read and write operations in OBKV, direct load performance of LOB data, and node restart performance. + + + +### **(1) Increased performance in the TPC-H 1TB test** + +The following figure shows the performance of a tenant with 80 CPU cores and 500 GB of memory of different OceanBase Database versions in the TPC-H 1TB test. Overall, the performance of V4.3.0 is about 25% higher than that of V4.2.0. + +![1713849772](/img/blogs/tech/Real-time-AP/image/1713849771931.png) + +Figure 1: Performance of V4.3.0 and V4.2.0 in the TPC-H 1TB test + + + +### **(2) Increased performance in TPC-DS 1TB test** + +The following figure shows the performance of a tenant with 80 CPU cores and 500 GB of memory of different OceanBase Database versions in the TPC-DS 1TB test. Overall, the performance of V4.3.0 is about 111% higher than that of V4.2.0. + +![1713849829](/img/blogs/tech/Real-time-AP/image/1713849828408.png) + +Figure 2: Performance of V4.3.0 and V4.2.0 in the TPC-DS 1TB test + + + +### **(3) OBKV performance optimization** + +Compared with those in V4.2.1, the OBKV single-row read-write performance is improved by about 70%, and the batch read-write performance is improved by 80% to 220%. + + + +### **(4) PDML transaction optimization** + +The new version implements optimizations at the transaction layer by supporting parallel commit, log replay, and partition-level rollbacks within transaction participants. Compared with earlier V4.x versions, the new version significantly improves the PDML execution performance and scalability in high concurrency scenarios. + + + +### **(5) I/O usage optimization for loading tablet metadata** + +OceanBase Database V4.x supports millions of partitions on a single machine. As the memory may fail to hold the metadata of millions of tablets, OceanBase Database V4.x supports on-demand loading of tablet metadata. OceanBase Database supports on-demand loading of metadata at the partition level and the subclass level within partitions. In a partition, metadata is split into multiple subclasses for hierarchical storage. In scenarios where background tasks require deeper metadata, the data read consumes more I/O resources. These I/O overheads are not a problem for local SSDs, but may affect system performance when HDD disks or cloud disks are used. OceanBase Database V4.3.0 aggregates frequently accessed metadata in storage, and only one I/O operation is required to access the metadata. This greatly reduces the I/O overhead in zero load scenarios and avoids the impact on foreground query performance caused by background task I/O overhead. In addition, the metadata loading process during the restart of an OBServer node is optimized. Tablet metadata is loaded in batches at the granularity of macroblocks, greatly reducing discrete I/O reads and speeding up the restart by several or even dozens of times. + + + +**4. Ease-of-use Enhancements** +---------------- + +The new version provides the index usage monitoring feature to help you identify and delete invalid indexes, and allows you to import a small amount of local data from the client. Features such as LOB INROW threshold configuration, remote procedure call (RPC) authentication certificate management, and parameter resetting are also provided to improve system usability. + + + +### **(1) Index usage monitoring** + +We usually create indexes to improve the query performance of the database. However, more and more indexes are created as data tables are used in more business scenarios by more operators. Unused indexes are a waste of storage space and increase the overhead of DML operations. In this case, you need to drop useless indexes to alleviate the burden on the system. However, you can hardly identify all useless indexes by manual efforts. Therefore, OceanBase Database V4.3.0 provides the index usage monitoring feature. After you enable this feature and set the sampling method, the index usage information that meets the rules is recorded in the memory of a user tenant and refreshed to the internal table once every 15 minutes. You can then query the `DBA_INDEX_USAGE` view to find out whether an index is referenced and drop useless indexes to release space. + +### **(2) Local import from the client** + +OceanBase Database V4.3.0 supports the `LOAD DATA LOCAL INFILE` statement for local import from the client. You can use the feature to import local files through streaming file processing. Based on this feature, developers can import local files for testing without uploading files to the server or object storage, improving the efficiency of importing a small amount of data. + +Note: To import local data from the client, make sure that: + + a. The version of OceanBase Command-Line Client (OBClient) is V2.2.4 or later. + + b. The version of ODP is V3.2.4 or later. If you directly connect to an OBServer node, ignore this requirement. + + c. The version of OceanBase Connector/J is V2.4.8 or later if you use Java and OceanBase Connector/J. + +You can directly use a MySQL client or a native MariaDB client of any version. + +The `SECURE_FILE_PRIV` variable is used to specify the server paths that can be accessed by the OBServer node. This variable does not affect local import from a client, and therefore does not need to be specified for local import. + +### **(3) LOB INROW threshold configuration** + +By default, LOB data of a size less than or equal to 4 KB is stored in INROW mode, and LOB data of a size greater than 4 KB is stored in the LOB auxiliary table. In some scenarios, INROW storage provides higher performance than auxiliary table-based storage. Therefore, this version supports dynamic configuration of the LOB storage mode. You can adjust the INROW threshold based on your business needs, provided that the threshold does not exceed the limit for INROW storage. + +### **(4) RPC authentication certificate management** + +When RPC authentication is enabled for a cluster, for an access request from a client, such as the arbitration service, primary/standby database, or OceanBase Change Data Capture (CDC), you need to place the root CA certificate of the client in the deployment directory of each OBServer node in the cluster, and then perform related configurations. This whole process is complicated. OceanBase Database V4.3.0 supports the internal certificate management feature. You can use the `DBMS_TRUSTED_CERTIFICATE_MANAGER` system package provided in the sys tenant to add, delete, and modify root CA certificates trusted by an OceanBase cluster. The `DBA_OB_TRUSTED_ROOT_CERTIFICATE` view is also provided in the sys tenant to display the list of client root CA certificates added to OBServer nodes in the cluster and the certificate expiration time. + +### **(5) Parameter resetting** + +In earlier versions, if you want to reset a parameter to the default value, you need to query the default value of the parameter first, and then manually set the parameter to the default value. The new version provides the `ALTER SYSTEM [RESET] parameter_name [SCOPE = {MEMORY | SPFILE | BOTH}] {TENANT [=] 'tenant_name'}` syntax for you to reset a parameter to the default value. The default value is obtained from the node that executes the statement. You can reset cluster-level parameters or parameters of a specified tenant in the sys tenant. You can also reset parameters for the current user tenant. On OBServer nodes, whether the `SCOPE` option is specified or not does not affect the implementation logic. For a parameter that takes effect statically, the default value is only stored on the disk but not updated to the memory. For a parameter that takes effect dynamically, the default value is stored on the disk and updated to the memory. + +**5. Afterword** +---------- + +OceanBase Database V4.3.0 sets a significant milestone on our roadmap to achieve real-time AP. We will keep updating AP features of subsequent versions to overcome challenges in real-world business scenarios. + +We would like to thank all our users and developers for their contributions to OceanBase Database V4.3.0. Their valuable suggestions are a powerful driving force that pushes OceanBase forward. We look forward to working with every user and developer in tackling critical workloads, developing modern data architectures, and building better and more user-friendly distributed databases. + +You can visit [**Release Notes**](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001971697) to learn more about the new OceanBase Database V4.3.0. \ No newline at end of file diff --git a/docs/blogs/tech/analysis-column.md b/docs/blogs/tech/analysis-column.md new file mode 100644 index 000000000..00bd59c8f --- /dev/null +++ b/docs/blogs/tech/analysis-column.md @@ -0,0 +1,234 @@ +--- +slug: analysis-column +title: 'OceanBase Database V4.3 Feature Breakdown: In-depth Analysis of Columnar Storage' +--- + +In scenarios involving large-scale data analytics or extensive ad-hoc queries, columnar storage stands out as a crucial feature for business workloads. Unlike row-based storage, columnar storage physically arranges the data in a table by column. When data is stored by column, the system can scan only the columns involved in the query and calculation, instead of scanning the entire row. This way, the consumption of resources such as I/O and memory is reduced, and the calculation is accelerated. Moreover, columnar storage naturally provides better data compression conditions, making it easier to achieve higher compression ratios, thereby reducing the usage of storage space and network transmission bandwidth. + +However, columnar engines generally assume limited random updates and attempt to ensure that data in columnar storage is static. When a large amount of data is updated randomly, system performance will inevitably degrade. The log-structured merge-tree (LSM-tree) architecture of OceanBase Database can process baseline data and incremental data separately, and therefore can solve the performance issue. OceanBase Database V4.3.0 supports the columnar engine based on the current architecture, implementing integrated columnar and row-based data storage within a database on only one architecture, and ensuring the performance of both transaction processing (TP) and analytical processing (AP) queries. + +To help users with AP requirements smoothly use the new version, OceanBase Database has adapted and optimized several modules, including the optimizer, executor, DDL, and transaction processing, for the columnar engine. These optimizations introduce a new cost model and vectorized engine based on columnar storage, enhancements to the query pushdown feature, and features like skip index, a new column-based encoding algorithm, and adaptive compactions. This post will dive into the columnar storage feature provided by OceanBase Database V4.3, the application scenarios of this feature, as well as its development planning. + +**1. Overall Columnar Storage Architecture** +------------ + +As a native distributed database, OceanBase Database stores user data in multiple replicas by default. OceanBase Database optimizes its self-developed LSM-tree-based storage engine to give full play to the multi-replica deployment mode and improve user experience in strong verification of data and reuse of migrated data. + +-  Baseline data: Unlike other databases with LSM-tree-based storage engines, OceanBase Database introduces the concept of daily major compaction. A user selects a global version on a regular basis or based on their operation, and a major compaction is initiated for all replicas of tenant data based on the version to generate baseline data of the version. The baseline data of the same version is physically consistent for all replicas. + +-  Incremental data: All data written after the latest version of baseline data is incremental data. Incremental data can be memory data written into MemTables or disk data compacted into SSTables. Incremental data contains multi-version data, and its replicas are maintained independently and are not necessarily consistent. + +Random updates in columnar storage scenarios are controllable. On this basis, OceanBase Database V4.3 provides a set of columnar storage implementation methods transparent to upper-layer business based on the characteristics of baseline data and incremental data: Baseline data is stored by column, and incremental data is stored by row. Users' DML operations are not affected, and upstream data and downstream data are seamlessly synchronized. Users can perform transaction operations on columnstore tables in the same way as on rowstore tables. In columnar storage mode, the data of each column is stored as an independent SSTable, and the SSTables of all columns are combined into a virtual SSTable as baseline data for columnar storage. Users can flexibly select a storage mode as needed when they create a table. Baseline data can be stored by row, column, or both row and column (with redundancy). + +![1716796000](/img/blogs/tech/analysis-column/image/1716795999198.png) + +OceanBase Database V4.3 makes optimizations from multiple dimensions such as optimizer and executor to adapt to the columnar storage mode in the storage engine. After switching to the columnar storage mode, users will not perceive business changes and can enjoy the same performance superiority as the row-based storage mode. By optimizing the columnar storage engine from all aspects, OceanBase Database integrates TP and AP and supports different types of business with one engine and one set of code, honing its capability of hybrid transaction/analytical processing (HTAP). + + + +**2. Native Advantages of OceanBase Database in Columnar Storage** +---------------------------- + +### **2.1 Fully-fledged LSM-tree engine** + +Compared with conventional databases, OceanBase Database has an inherent delta store, which perfectly suits the columnar storage strategy. Relying on the LSM-tree-based storage engine, the columnar storage mode in OceanBase Database provides full transactional support while delivering a performance level of basic operators that is comparable to conventional TP databases. The full transactional support endows OceanBase Database with natural excellence in system upgrades. Specifically, all transaction semantics and management are transparent to the business, allowing users to easily switch to the columnar storage mode without requiring application rewrites. This way, users can use a columnstore database in the same way as using a rowstore database. + +### **2.2 Honed execution engine** + +OceanBase Database boasts a superb execution engine and a general-purpose optimizer. In row-based storage mode, OceanBase Database realizes seamless integration with the vectorized storage engine to support vectorized execution without requiring application rewrites. What's more, the OceanBase Database optimizer estimates the costs of the row-based and columnar storage modes with only one set of code, allowing users' SQL engines to automatically select a storage mode. + +### **2.3 Flexible native distributed architecture** + +OceanBase Database supports a native distributed parallel query engine, and its application can be easily extended to heterogeneous columnstore replicas. Heterogeneous columnstore replicas stand out in scenarios where complete physical isolation is required, and will be supported in later versions of OceanBase Database. + +In a word, OceanBase Database's inherent advantages foster the introduction of columnar storage in V4.3. OceanBase Database supports the following three columnar storage modes without ostensibly changing the overall architecture: + +-  Columnstore baseline data + rowstore incremental data: Baseline data is stored by column, whereas incremental data is stored by row. + +-  Flexible rowstore/columnstore indexes: Users can create columnstore indexes on rowstore tables or the other way around. The two types of indexes can also be flexibly combined. All columnstore tables and indexes share the same underlying storage architecture. Therefore, OceanBase Database naturally supports both rowstore and columnstore indexes. + +-  Columnstore replicas: This feature is under development. Based on the native distributed architecture of OceanBase Database, this feature, once supported, will allow users to store incremental read-only replicas in columnar mode by performing compactions, with a few modifications to the original storage mode or the corresponding table. + + + +**3. How to Store Data by Column** +------------ + +### **3.1 Create a columnstore table by default** + +For online analytical processing (OLAP) scenarios, we recommend that users specify to create columnstore tables by default. This can be achieved by setting the following parameter: +``` + alter system set default_table_store_format = "column"; +``` + +Once the setting takes effect, a columnstore table is created by default if no column group is specified for the created table. +``` + OceanBase(root@test)>create table t1 (c1 int primary key, c2 int ,c3 int); + Query OK,0 rows affected (0.301 sec) + + OceanBase(root@test)>show create table t1; + + CREATE TABLE `t1` ( + `c1` int(11) NOT NULL, + `c2` int(11) DEFAULT NULL, + `c3` int(11) DEFAULT NULL, + PRIMARY KEY (`c1`) + ) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 + WITH COLUMN GROUP(each column) + + 1 row in set (0.101 sec) +``` + +### **3.2 Specify to create a columnstore table** + +To facilitate columnstore table creation, the `with column group` syntax is introduced in the table creation statement. If you specify `with column group (each column)` at the end of a `CREATE TABLE` statement, a columnstore table will be created. +``` + OceanBase(root@test)>create table tt_column_store (c1 int primary key, c2 int ,c3 int) with column group (each column); + Query OK,0 rows affected (0.308 sec) + + OceanBase(root@test)>show create table tt_column_store; + + CREATE TABLE `tt_column_store` ( + `c1` int(11) NOT NULL, + `c2` int(11) DEFAULT NULL, + `c3` int(11) DEFAULT NULL, + PRIMARY KEY (`c1`) + ) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP(each column) + + 1 row in set (0.108 sec) +``` + +### **3.3 Specify to create a hybrid rowstore-columnstore table** + +If users want to balance between AP business and TP business and can accept a specific degree of data redundancy, they can add `all columns` in the `with column group` syntax to enable rowstore redundancy. +``` + create table tt_column_row (c1 int primary key, c2 int , c3 int) with column group (all columns, each column); + Query OK, 0 rows affected (0.252 sec) + + OceanBase(root@test)>show create table tt_column_row; + CREATE TABLE `tt_column_row` ( + `c1` int(11) NOT NULL, + `c2` int(11) DEFAULT NULL, + `c3` int(11) DEFAULT NULL, + PRIMARY KEY (`c1`) + ) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 1 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 WITH COLUMN GROUP(all columns, each column) + + 1 row in set (0.075 sec) +``` + +### **3.4 Columnstore scan** + +Users can add `COLUMN TABLE FULL SCAN` in the plan to check the range scan for the columnstore table. +``` + OceanBase(root@test)>explain select * from tt_column_store; + +--------------------------------------------------------------------------------------------------------+ + | Query Plan | + +--------------------------------------------------------------------------------------------------------+ + | ================================================================= | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ----------------------------------------------------------------- | + | |0 |COLUMN TABLE FULL SCAN|tt_column_store|1 |7 | | + | ================================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_store.c1], [tt_column_store.c2], [tt_column_store.c3]), filter(nil), rowset=16 | + | access([tt_column_store.c1], [tt_column_store.c2], [tt_column_store.c3]), partitions(p0) | + | is_index_back=false, is_glOceanBaseal_index=false, | + | range_key([tt_column_store.c1]), range(MIN ; MAX)always true | + +--------------------------------------------------------------------------------------------------------+ +``` + +`COLUMN TABLE GET` in the plan indicates the get operation with a specified primary key on the columnstore table. +``` + OceanBase(root@test)>explain select * from tt_column_store where c1 = 1; + +--------------------------------------------------------------------------------------------------------+ + | Query Plan | + +--------------------------------------------------------------------------------------------------------+ + | =========================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ----------------------------------------------------------- | + | |0 |COLUMN TABLE GET|tt_column_store|1 |14 | | + | =========================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_store.c1], [tt_column_store.c2], [tt_column_store.c3]), filter(nil), rowset=16 | + | access([tt_column_store.c1], [tt_column_store.c2], [tt_column_store.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_store.c1]), range[1 ; 1], | + | range_cond([tt_column_store.c1 = 1]) | + +--------------------------------------------------------------------------------------------------------+ + 12 rows in set (0.051 sec) +``` + +Users may want to specify whether to perform a column store scan for a hybrid rowstore-columnstore table by using hints. To this end, the optimizer determines whether to perform a rowstore scan or columnstore scan for a hybrid rowstore-columnstore table based on costs. For example, for full table scans in a simple scenario, the system uses rowstore for generating a plan by default. +``` + OceanBase(root@test)>explain select * from tt_column_row; + +--------------------------------------------------------------------------------------------------+ + | Query Plan | + +--------------------------------------------------------------------------------------------------+ + | ======================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | -------------------------------------------------------- | + | |0 |TABLE FULL SCAN|tt_column_row|1 |3 | | + | ======================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), filter(nil), rowset=16 | + | access([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_row.c1]), range(MIN ; MAX)always true | + +--------------------------------------------------------------------------------------------------+ +``` + +Users can also forcibly perform a columnstore scan for the tt_column_row table by specifying the USE\_COLUMN\_TABLE hint.. +``` + OceanBase(root@test)>explain select /*+ USE_COLUMN_TABLE(tt_column_row) */ * from tt_column_row; + +--------------------------------------------------------------------------------------------------+ + | Query Plan | + +--------------------------------------------------------------------------------------------------+ + | =============================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | --------------------------------------------------------------- | + | |0 |COLUMN TABLE FULL SCAN|tt_column_row|1 |7 | | + | =============================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), filter(nil), rowset=16 | + | access([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_row.c1]), range(MIN ; MAX)always true | + +--------------------------------------------------------------------------------------------------+ +``` + +Similarly, users can use the NO_USE_COLUMN_TABLE hint to forcibly forbid columnstore scan for the table. +``` + OceanBase(root@test)>explain select /*+ NO_USE_COLUMN_TABLE(tt_column_row) */ c2 from tt_column_row; + +------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------+ + | ======================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | -------------------------------------------------------- | + | |0 |TABLE FULL SCAN|tt_column_row|1 |3 | | + | ======================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_row.c2]), filter(nil), rowset=16 | + | access([tt_column_row.c2]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_row.c1]), range(MIN ; MAX)always true | + +------------------------------------------------------------------+ + 11 rows in set (0.053 sec) +``` + + +**4. Vision for the Future** +---------- + +The introduction of the columnar storage feature in OceanBase Database V4.3 provides new storage solutions for users in data analysis and real-time analysis scenarios. OceanBase Database will keep optimizing this feature to bring the following benefits to users: + +**First, enriched experiences.** OceanBase Database supports a pure columnar storage engine for the time being, and will support user-defined column groups in the future to meet various analysis needs. Moreover, we are going to strengthen the direct load feature for incremental data to help users efficiently import data, thus shortening the preparation time for data analysis. + +**Second, enhanced performance.** We aim to enhance the skip index feature to better satisfy users' query requirements. We plan to unify the standard for storage formats and associate them with the vectorized engine. This way, the system will be able to identify different storage formats during SQL execution, helping users save overheads in data format conversion. + +**Third, more flexible deployment modes.** In later versions, OceanBase Database will support the heterogeneous replicas required by users in OLAP scenarios. We are also considering a cost-effective storage/computing splitting solution that is applicable to AP databases. \ No newline at end of file diff --git a/docs/blogs/tech/binlog-service.md b/docs/blogs/tech/binlog-service.md new file mode 100644 index 000000000..c7ac03115 --- /dev/null +++ b/docs/blogs/tech/binlog-service.md @@ -0,0 +1,174 @@ +--- +slug: binlog-service +title: 'OceanBase Binlog Service' +--- + +Foreword +-- + +MySQL is a globally renowned open source relational database and boasts high stability, reliability, and ease of use. Its popularity is mainly credited to a feature released in the early stage—binary log (binlog). + +MySQL binlogs are a set of log files that record all changes made to a MySQL database. The feature has won the favor of developers and enterprises since it was introduced in MySQL. MySQL binlog files store all SQL statements that change the database status in an easily readable binary format, enabling data integration and replication in databases. After years of accumulation, MySQL has developed some mature incremental parsing systems based on the logical binlog replication capability. These systems are widely applied to data integration, including [Canal](https://github.com/alibaba/canal) and [Debezium](https://github.com/debezium/debezium). + +OceanBase Database has been supporting the MySQL mode since a very early version, allowing MySQL users to switch to OceanBase Database at low costs. Considering that some mature MySQL parsing systems for incremental binlogs have been widely used, **OceanBase Database integrated existing systems** and released its own binlog service shortly thereafter. + +The binlog service supported in OceanBase Database provides similar features to MySQL, such as logging database changes. This is critical to ensuring the success of data migration and synchronization and guaranteeing data consistency. This service allows MySQL users to leverage familiar binlog-based solutions for monitoring, backup, and real-time data replication in OceanBase Database. + +Logical Binlog Replication Architecture in MySQL +------------------- + +![1719456500](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-06/1719456499955.png) + +Data replication in MySQL is implemented depending on binlogs, Specifically, MySQL uses binlogs to mainly record the logical changes of data in the SQL engine layer, instead of using redo logs to record physical changes in the transaction engine layer. This design has a specific purpose. + +MySQL aims to support multiple storage engines such as InnoDB and MyISAM. Each storage engine has its own characteristics and advantages. This MySQL mechanism allows seamless data replication and heterogeneous data synchronization between different storage engines. + +What's more, it ensures data consistency and demonstrates the superiority of MySQL in replication flexibility and system compatibility. By recording logical rather than physical data changes, MySQL successfully integrates diversified storage engine features and is widely adopted in replication architectures that require high compatibility and flexibility. + +The MySQL binlog service also plays a key role in the big data field. The service provides a Change Data Capture (CDC) tool to process and analyze data. + +**The advantages of the binlog service in the big data environment are as follows:** + +1. **Data synchronization**: The binlog service monitors and records data changes, including insert, update, and delete operations, in real time. Real-time data streams are captured by big data tools and synchronized to data lakes, data warehouses, or other big data processing systems for further analysis. +2. **Real-time analysis**: By leveraging the CDC tool provided by the binlog service, enterprises can push captured data changes to a stream processing engine such as Apache Kafka, Apache Flink, or Apache Storm for further analysis to support real-time business insights and decisions. +3. **Cross-platform data integration**: Maintaining data consistency across different databases and storage engines is a big challenge for database systems in the big data environment. The binlog service provides CDC tools such as Canal and Debezium for users to seamlessly synchronize data to Hadoop, Apache Hive, Elasticsearch, or any other target database, meeting their needs for data integration. +4. **System decoupling**: The binlog service allows event-based data architecture design. This means that data producers (databases) and consumers (for example, a big data processing service) can be loosely coupled and independently extended, improving the system flexibility and stability. +5. **Audit and tracking of historical data**: The binlog service supports the audit of historical data, which is necessary for analyzing the trend and mode of historical data and tracking audit records. + +Characteristics of the MySQL Binlog Service +--------------- + +The binlog service performs different behaviors in transaction storage engines and non-transaction storage engines. **This post involves only the binlog behaviors in the transaction storage engine InnoDB**. On the one hand, InnoDB is the default storage engine that is widely used among the multiple storage engines supported by MySQL. On the other hand, OceanBase Database natively supports transactions and needs to follow the MySQL binlog behaviors in the transaction storage engine InnoDB in MySQL mode. + +MySQL binlog files support two formats: Statement-Based Replication (SBR) and Row-Based Replication (RBR). Binlogs in the SBR format record the content of SQL statements and their context information and occupy small disk space. However, data synchronization issues may occur during SBR under certain circumstances. Relatively speaking, binlogs in the RBR format record the specific values of data before changes, without session context information attached. This avoids issues faced by SBR and ensures data consistency. + +Binlog parsing systems such as Canal and Debezium support only the RBR format. Considering the compatibility of these parsing systems and their wide application in data synchronization and integration, this post focuses on binlogs in the RBR format, which can be compatible with all the preceding parsing systems. + +MySQL binlogs consist of two groups of files: + +* Binlog files, which are numbered consecutively from 1. For example, MySQL binlogs are numbered 000001, 000002, and 000003, as shown in the preceding figure. +* Binlog index files, for example, mysql-bin.index in the preceding figure. The file is a text file and records the names of all existing binlog files. Return results of the `SHOW BINARY LOGS` statement are read from this index file. + +MySQL binlogs consist of different types of events. In MySQL 5.7, about 40 event types are supported. From the perspective of CDC-based incremental parsing, the following types of events need to be noticed: Format\_desc, Previous\_gtids, Gtid, Rotate, Query, Xid, Table\_map, Write\_rows, Update\_rows, and Delete\_rows. From the perspective of transactions, all binlog events related to a transaction are consecutive in a binlog, which means that events of different transactions do not overlap and binlog events of a transaction are not stored across binlogs. Here is a sample return result of the `SHOW BINLOG EVENTS` statement: +``` + mysql> SHOW BINLOG EVENTS IN 'mysql-bin.000009'; + +------------------+------+----------------+------------+-------------+--------------------------------------------------------------------+ + | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | + +------------------+------+----------------+------------+-------------+--------------------------------------------------------------------+ + | mysql-bin.000009 | 4 | Format_desc | 1147473732 | 123 | Server ver: 5.7.35-log, Binlog ver: 4 | + | mysql-bin.000009 | 123 | Previous_gtids | 1147473732 | 194 | ebd2d3b0-6399-11ec-86ea-0242ac110004:1-38 | + | mysql-bin.000009 | 194 | Gtid | 1147473732 | 259 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:39' | + | mysql-bin.000009 | 259 | Query | 1147473732 | 353 | create database test | + | mysql-bin.000009 | 353 | Gtid | 1147473732 | 418 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:40' | + | mysql-bin.000009 | 418 | Query | 1147473732 | 543 | use `test`; CREATE TABLE t1(id int primary key, v varchar(30)) | + | mysql-bin.000009 | 543 | Gtid | 1147473732 | 608 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:41' | + | mysql-bin.000009 | 608 | Query | 1147473732 | 733 | use `test`; CREATE TABLE t2(id int primary key, v varchar(30)) | + | mysql-bin.000009 | 733 | Gtid | 1147473732 | 798 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:42' | + | mysql-bin.000009 | 798 | Query | 1147473732 | 870 | BEGIN | + | mysql-bin.000009 | 870 | Table_map | 1147473732 | 918 | table_id: 114 (test.t1) | + | mysql-bin.000009 | 918 | Write_rows | 1147473732 | 963 | table_id: 114 flags: STMT_END_F | + | mysql-bin.000009 | 963 | Table_map | 1147473732 | 1011 | table_id: 115 (test.t2) | + | mysql-bin.000009 | 1011 | Write_rows | 1147473732 | 1056 | table_id: 115 flags: STMT_END_F | + | mysql-bin.000009 | 1056 | Xid | 1147473732 | 1087 | COMMIT /* xid=57 */ | + | mysql-bin.000009 | 1087 | Gtid | 1147473732 | 1152 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:43' | + | mysql-bin.000009 | 1152 | Query | 1147473732 | 1224 | BEGIN | + | mysql-bin.000009 | 1224 | Table_map | 1147473732 | 1272 | table_id: 114 (test.t1) | + | mysql-bin.000009 | 1272 | Update_rows | 1147473732 | 1328 | table_id: 114 flags: STMT_END_F | + | mysql-bin.000009 | 1328 | Table_map | 1147473732 | 1376 | table_id: 115 (test.t2) | + | mysql-bin.000009 | 1376 | Update_rows | 1147473732 | 1432 | table_id: 115 flags: STMT_END_F | + | mysql-bin.000009 | 1432 | Xid | 1147473732 | 1463 | COMMIT /* xid=61 */ | + | mysql-bin.000009 | 1463 | Gtid | 1147473732 | 1528 | SET @@SESSION.GTID_NEXT= 'ebd2d3b0-6399-11ec-86ea-0242ac110004:44' | + | mysql-bin.000009 | 1528 | Query | 1147473732 | 1600 | BEGIN | + | mysql-bin.000009 | 1600 | Table_map | 1147473732 | 1648 | table_id: 114 (test.t1) | + | mysql-bin.000009 | 1648 | Delete_rows | 1147473732 | 1693 | table_id: 114 flags: STMT_END_F | + | mysql-bin.000009 | 1693 | Xid | 1147473732 | 1724 | COMMIT /* xid=67 */ | + | mysql-bin.000009 | 1724 | Rotate | 1147473732 | 1771 | mysql-bin.000010;pos=4 | + +------------------+------+----------------+------------+-------------+--------------------------------------------------------------------+ + 28 rows in set (0.00 sec) +``` +Technical Mechanism of Parsing Systems Canal and Debezium +-------------------- + +Both Canal and Debezium are CDC tools, which are mainly used to monitor changes made to a database and broadcast these changes. Although Canal mainly serves MySQL while Debezium supports multiple database systems, they share similar technical mechanisms to parse and transfer data changes. + +Imitating the communication process of a slave MySQL server, Canal and Debezium parse MySQL binlogs, extract changes such as data addition, deletion, and update, and convert them into a unified format. + +1. **Establish a connection**: Canal or Debezium connects to the master MySQL server to imitate the behavior of the server. +2. **Request binlogs**: Canal or Debezium requests MySQL to send binlogs from a specific position. +3. **Parse binlogs**: Canal or Debezium reads binlog streams from the connection and parses them into identifiable data change events. +4. **Convert data**: Parsed events are converted into a general message format for subsequent systems to consume. +5. **Broadcast changes**: Data changes can be sent to various types of middleware or directly used by other systems such as the Kafka message queue, monitoring systems, caches, and search engines. + +Canal or Debezium disguises itself as a slave server, sets the server ID and UUID for itself, and locates and pulls binlog events by using the `COM_BINLOG_DUMP` or `COM_BINLOG_DUMP_GTID` protocol instruction. In addition, to support binlog parsing, the system variables `binlog_format` and `binlog_row_image` must be set to `ROW` and `FULL`, respectively. + +During binlog parsing, Canal or Debezium focuses on the DML and DDL operations that are involved in transactions. The main binlog events to be parsed are listed as follows. Some control binlog events also need to be parsed, including Format\_desc, Rotate, Previous\_gtids, and Gtid. + +| Event type | Description | +| ---------- | ----------- | +| Query | It indicates a BEGIN operation or a DDL operation of a transaction. | +| Xid | It indicates a COMMIT operation of a transaction. | +| Table\_map | In a transaction, a DML statement changes data in one or more tables. A Table\_map event is generated for each table involving data changes and is written to binlogs for recording the internal IDs and names of these tables. Table\_map events take precedence over all events generated by DML operations, including Write\_rows, Update\_rows, and Delete\_rows, in the binlog events of this transaction. | +| Write\_rows | An event of this type is generated by an INSERT statement. Each event can contain the insert records of multiple rows that correspond to the same table ID. If an INSERT statement inserts multiple rows of data, consecutive events of this type are generated. | +| Update\_rows | An event of this type is generated by an UPDATE statement. Each event can contain the update records of multiple rows that correspond to the same table ID. If an UPDATE statement updates multiple rows of data, consecutive events of this type are generated. | +| Delete\_rows | An event of this type is generated by a DELETE statement. Each event can contain the delete records of multiple rows that correspond to the same table ID. If a DELETE statement deletes multiple rows of data, consecutive events of this type are generated. | + +Although Write\_rows, Update\_rows, and Delete\_rows events record the data values before and after a change, the recorded metadata does not contain the column names in the changed tables. Therefore, you need to query MySQL metadata to obtain the schema definition of these tables during initialization before Canal or Debezium parses binlogs. Moreover, MySQL uses the same IP address and port (defaulted to port 3306) to provide SQL and binlog services. + +MySQL-compatible Binlog Service in OceanBase Database +--------------------------------------------- + +The binlog mode of OBLogProxy is designed for compatibility with MySQL binlogs. It allows you to synchronize OceanBase Database logs by using MySQL binlog tools. Thereby, you can smoothly use MySQL binlog tools with OceanBase Database. + +### Key technical points + +The binlog service in OceanBase Database consists of three parts: OceanBase Database kernel, OceanBase Database Proxy (ODP), and oblogproxy. oblogproxy is the core of the binlog service. + +The key technical points of the entire binlog service system are as follows: + +* OceanBase Database adopts the multitenancy design, where each tenant corresponds to a MySQL instant. Therefore, binlogs are created at the tenant level. +* Clogs of OceanBase Database are obtained by using obcdc and need to be converted into the FULL row image format and stored as binlogs. These binlogs can be parsed by multiple downstream MySQL systems and analyzed by MySQL binlog tools, and supports pullback as well. +* The MySQL communication protocol is supported. + +* `COM_BINLOG_DUMP` or `COM_BINLOG_DUMP_GTID` used for binlog dump is also a part of the MySQL communication protocol. ODP and oblogproxy need to identify and handle the two protocol instructions. +* When parsing binlogs, Canal or Debezium needs to query the metadata of MySQL to obtain the schema definitions of involved tables and check binlog-related system variables to confirm whether binlogs are in the expected format. In addition, Debezium allows you to export snapshots of full baseline data before it dumps binlogs. `SELECT` and `SHOW` statements are involved in the preceding scenarios, which are forwarded by ODP. +* MySQL uses the same IP address and port (defaulted to port 3306) to provide the SQL and binlog services. As for OceanBase Database, ODP uses the default port 2883 to access the SQL service, supports the binlog dump protocol, and forwards requests to the corresponding oblogproxy instance to enable compatibility with the MySQL binlog and SQL services. + +### Terms + +Related terms and their definitions are provided for you to better understand this post. + +* OceanBase database: refers to an OceanBase cluster. +* ODP: the OceanBase Database access proxy that provides unified access to SQL and binlog protocols and commands. +* oblogproxy: the core of the binlog service in OceanBase Database. +* MySQL binlog tools: tools for parsing incremental MySQL binlogs, such as Canal and Debezium. +* BC: the binlog converter module of oblogproxy. This module pulls and parses clog files and converts them into binlog files in the binlog format. +* BD: the binlog dumper module of oblogproxy. This module provides binlog subscription services for subscription requests of downstream services (MySQL binlog systems). +* BCM: the BC management module of oblogproxy. +* BDM: the BD management module of oblogproxy. + +### System architecture + +![1719456552](/img/blogs/tech/binlog-service/image/1719456552842.png) + +The preceding figure shows the entire technical architecture of the OceanBase binlog service that is compatible with the MySQL binlog system. The interaction process is as follows: + +* Create binlogs for the specified OceanBase tenant. Compared with the MySQL binlog service, this is an additional step for OceanBase Database users when needed. + +* Connect the MySQL client to oblogproxy and execute the `CREATE BINLOG` statement. +* Use BCM to create a BC submodule after oblogproxy receives the binlog creation request. + +* After the BC submodule finishes initialization, use ODP to execute the following binlog-related statements in the MySQL client to check the binlog status. ODP needs to identify these statements and forward them to the corresponding oblogproxy instances. BCM returns the result set. + +* `SHOW MASTER STATUS` +* `SHOW BINARY LOGS` +* `SHOW BINLOG EVENTS` + +* Use ODP to execute queries irrelevant to binlogs in the MySQL client, during which ODP directly queries the corresponding OceanBase tenant. +* Use MySQL binlog tools such as Canal and Debezium to send the `COM_BINLOG_DUMP` or `COM_BINLOG_DUMP_GTID` instruction to ODP. After receiving the instruction, ODP forwards the request to the corresponding oblogproxy instance. After receiving the `COM_BINLOG_DUMP` or `COM_BINLOG_DUMP_GTID` instruction, the oblogproxy instance creates a BD submodule by using BDM to provide the binlog dump service. + +Summary +-- + +The emergence of the OceanBase binlog service is a great leap in the compatibility between modern database technologies and the MySQL ecosystem. The OceanBase binlog service benefits from the development of database technologies and meets user needs for real-time data processing and analysis. This service extends the MySQL-relevant features of OceanBase Database and creates a flexible solution that applies to a distributed multi-tenant database environment. By seamlessly integrating existing MySQL binlog tools, OceanBase Database allows users to enjoy high performance, high scalability, and high availability and maintains compatibility with these MySQL binlog tools, adapting to the habits of original MySQL binlog service users. + +In terms of technologies, the OceanBase binlog service simplifies CDC and data replication, making data migration, synchronization, integration, review, and disaster recovery smoother. \ No newline at end of file diff --git a/docs/blogs/tech/column-store.md b/docs/blogs/tech/column-store.md new file mode 100644 index 000000000..f95804620 --- /dev/null +++ b/docs/blogs/tech/column-store.md @@ -0,0 +1,111 @@ +--- +slug: column-store +title: 'The Present and Future of Columnar Storage in OceanBase Database' +--- + +OceanBase Database V4.3 provides the columnar storage feature to support real-time analysis business. As an extension of [**In-depth Interpretation of Columnar Storage**](https://open.oceanbase.com/blog/11685131568), this article further explores the application and evolution of columnar storage in the OceanBase Database architecture and its development trend. + +**1. Background** +-------- + +In 1970, Edgar F. Codd invented the relational model for database management, ushering in a new era in the database field. In 1979, Oracle released the first commercial database edition. After that, database technologies were widely used in various industries. At that time, the data size of users was moderate, and data queries were simple. Therefore, a standalone database system could fully meet users' needs. + +As time went by, users' data size increased dramatically, with queries getting complex. In this case, a standalone database was not competent to handle users' requirements for transaction processing (TP) and analytical processing (AP). Considering this, Codd proposed the concept of online analytical processing (OLAP) and 12 principles in 1993. Since then, online transaction processing (OLTP) has been separated from OLAP, and many database products have been launched in respective domains. Ten years later, probably in 2005, Michael Stonebraker developed C-Store, which is the first column-oriented database model. This model proved the great potential in the AP field, and columnar storage henceforth became a necessity in an OLAP database. Despite the dominant position of OLTP products in the market, representative OLAP products, such as Greenplum (2006), Snowflake (2014), Databricks (2014), and ClickHouse (2016), emerged endlessly around the world. + +Although OLTP and OLAP products are independent of each other, users may require both capabilities at the same time. To support business, users often need to deploy two separate databases, one for OLTP and the other for OLAP, and use data synchronization tools to synchronize data from the OLTP database to the OLAP database. Such a deployment method brings a series of issues. + +Firstly, a redundant replica and database exist. Even though low-priced storage devices can be used to deploy the OLAP database, extra CPU and memory resources are still consumed. What's worse, O&M of the additional database increases costs. + +Secondly, latency occurs in synchronization between the OLAP and OLTP databases, and the latency duration is hard to control. Once an exception occurs in the OLAP database or the synchronization tool is faulty, several days may be taken to recover data, during which OLAP business is unavailable. + +Lastly, higher requirements are imposed on the real-time performance of OLAP as the Internet develops. Take online shopping as an example. In this typical OLAP scenario, the database system is expected to automatically recommend items to shoppers based on the historical order records and relevant information of the shoppers for a higher turnover. The response may lag behind the speed of online shoppers in browsing apps or web pages if data needs to be first synchronized to the OLAP database. + +Therefore, database users want a single database system to handle both OLAP and OLTP tasks while ensuring the performance of processing massive data. To solve this problem, Gartner proposed the concept of hybrid transaction and analytical processing (HTAP) to cover OLAP and OLTP capabilities in only one database system. + +Relatively speaking, row-based storage is more suitable for OLTP workloads, while columnar storage is more suitable for OLAP workloads. A real-time HTAP database often needs to support both row-based storage and columnar storage. Compared with deploying two separate databases, using a single HTAP database can eliminate synchronization latency and improve real-time data performance. However, the data redundancy issue is still not resolved. In fact, it is possible to use only one copy of data for HTAP, which depends on how we treat and use columnar storage. + + + +**2. Columnstore as a Replica** +----------- + +The columnstore replica solution implements HTAP more directly. Specifically, it sets up two independent engines within a single database system: one is row-oriented for OLTP and the other is column-oriented for OLAP. This solution makes data synchronization details imperceptible to users and provides zero-latency OLAP data access. Many top database products in the industry such as Google F1 Lightning and PingCAP TiDB adopt similar solutions. + +![1717380551](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-06/1717380551360.png) + +As shown in the preceding figure, the three replicas in node 1, node 2, and the primary node are supported by the rowstore engine to provide OLTP capabilities, and the replica in node 4 is supported by the columnstore engine to provide OLAP capabilities. Raft and change data capture (CDC) are used to synchronize data between the two engines. This solution is superior because it improves the isolation effect, which prevents data access to the OLAP engine from affecting the OLTP engine. + +However, the biggest disadvantage of this solution is the high costs, especially in scenarios involving massive data. This solution requires not only a redundant data replica, but also extra CPU and memory resources to support the columnstore engine, with no contribution to O&M cost reduction. In addition, dedicated engineers need to be assigned to handle exceptions in the independent columnstore engine. + + + +**3. Columnstore as an Index** +----------- + +A typical approach to implementing HTAP is through columnstore indexes, with SQL Server being a notable example. SQL Server introduced the columnstore index feature in 2012. However, columnstore indexes were read-only and could not be updated at that time. In 2016, SQL Server began to support updatable columnstore indexes for a more user-friendly experience. + +![1717385844](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-06/1717385843903.png) + +SQL Server also developed an internal columnstore engine to work with the existing rowstore engine, as shown in the preceding figure. The SQL layer interacts with underlying engines in a unified manner. It uses the rowstore engine to store data in rowstore tables and the columnstore engine to store columnstore indexes created on these tables. SQL Server allows rowstore and columnstore data to coexist, and multiple columnstore indexes can be created. This solution enables users to flexibly create indexes only for specific columns as needed, with lower data redundancy than the columnstore replica solution. In addition, SQL Server can leverage both rowstore and columnstore engines for SQL execution, significantly improving execution efficiency. + +In terms of implementation, the columnstore engine in SQL Server arranges a fixed number of rows to form a row group in a way similar to how a heap table is organized, instead of based on the order of the primary key. In a row group, columns are separately stored in different segments. A row group cannot be modified once it is generated. You can delete it by marking it in Delete Bitmap and update it by using DELETE and INSERT operations. INSERT operations are stored in Delta Store. The final query result combines data in the columnstore engine, Delete Bitmap, Delete Buffer, and Delta Store. + +![1717385867](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-06/1717385866859.png) + +The columnar storage solution of SQL Server resolves issues regarding latency, real-time performance, and costs. However, for index-organized tables (IOTs), columnstore indexes heavily rely on rowstores, and PRIMARY KEY and UNIQUE constraints also need to be maintained by using rowstores. Moreover, the maintenance of Delta Store and Delete Bitmap also requires costs, and the introduction of columnstore indexes has an impact on the performance of row-oriented OLTP workloads. + + + +**4. Columnstore as a Cache** +----------- + +In 2013, Oracle 12c introduced the In-Memory Column Store (IM column store) feature to store data in a cache using a columnar format. + +![1717385910](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-06/1717385909938.png) + +Strictly speaking, IM column store is more like an accelerated columnstore cache based on the rowstore architecture, rather than an independent columnstore engine. Oracle allows users to enable IM column store at different levels, including columns, partitions, tables, and tablespaces, featuring high flexibility. If IM column store is enabled for specific columns in specific tables, Oracle loads data of these columns from rowstores to the memory and stores the data in a columnar format. It needs to be noted that the data is still stored in rowstores and will not be directly stored on the disk. Operations such as create, update, and delete on these columns will be updated to columstores through the internal refresh mechanism. The System Global Area (SGA) in the buffer cache of Oracle undertakes most create, read, update, and delete (CRUD) operations on transactions. To enable IM column store, users need to allocate a separate memory area outside the buffer cache for columnar storage. + +This solution avoids costs incurred by disk data redundancy, provides real-time zero-latency OLAP capabilities, and allows users to flexibly configure the columstores as needed. However, its disadvantages are also obvious. On the one hand, memory costs are not reduced and memory resources, which are more valuable than disk resources, are consumed to support OLAP workloads. On the other hand, the data involved in OLAP workloads is massive. Therefore, it is not feasible to store all data in the memory. Once users access the disk, Oracle needs to read the requested data from rowstores and convert it into a columnar format for in-memory storage. In this scenario, columnar storage loses its superiority over row-based storage in reducing I/O costs. + + + +**5. Columnstore as Data** +----------- + +The underlying storage engines of both SQL Server and Oracle depend on the B-tree structure. If we open up our eyes and take the log-structured merge-tree (LSM-tree) structure into consideration, we will find that LSM-tree perfectly suits columnar storage. In an LSM-tree architecture, data is stored in MemTables and SSTables. MemTables are stored in memory and can be dynamically modified, which are naturally suitable for row-based storage. SSTables are stored on the disk and cannot be modified, which are more suitable for columnar storage. In OceanBase Database, SSTables are further divided into minor compaction SSTables and baseline SSTables. Generally, minor compaction SSTables store data modified recently, while baseline SSTables store old data. + +OLTP workloads are often short transactions such as insertion, small-range updates, and deletion and reads of recent data. Data involved in such workloads is usually stored in MemTables and minor compaction SSTables. Therefore, to ensure the performance of OLTP workloads, OceanBase Database uses the row-based storage strategy for MemTables and minor compaction SSTables, provides Bloom filters for baseline SSTables to block empty queries, and caches partial hotspot columnstore data to accelerate hot data queries. + +OLAP workloads are usually large queries, mainly involving data stored in baseline SSTables. OceanBase Database adopts the columnar storage strategy for baseline SSTables. Unlike SQL Server, OceanBase Database stores columns based on the order of the primary key. This way, OceanBase Database can quickly locate the row of the target data by using binary search when processing a small number of OLTP requests in columstores. By conducting proof of concept (POC) tests, many users acknowledge that the columnar storage solution in OceanBase Database can truly support OLTP workloads. + +By doing this, OceanBase Database can support both OLTP and OLAP capabilities with only one copy of data. In most cases, baseline SSTables store the majority of data records, and columnar storage features a higher data compression ratio than row-based storage. Therefore, the OceanBase Database architecture can minimize costs. However, challenges are posed on this architecture. + +One challenge is to isolate resources allocated for OLTP and OLAP workloads. Under an ideal scheduling mechanism, OLAP workloads can flexibly use resources for OLTP workloads and utilize most system resources when no OLTP workloads exist. This can be implemented theoretically, just like the fact that most databases can be deployed in a docker, but few users worry about the resource isolation capability of the docker. However, high-level isolation requirements still cannot be met. + +The other challenge is that the performance of some queries may be better when baseline SSTables adopt row-based storage. Alternatively, combining multiple columns for mixed storage may improve query performance. + + + +**6. Columnstore Is Everything** +----------- + +Based on the LSM-tree architecture, OceanBase Database can store data in a columnar format to minimize costs. However, this does not mean that columnar storage is the only choice for OceanBase Database. Instead, it allows users to flexibly treat a columnstore as a cache, index, or replica to explore more possibilities. + +-  First, treat a columnstore as a cache. OceanBase Database can store data in the memory in a columnar format and cache partial columnstore data to speed up hotspot data queries. + +-  Secondly, treat a columnstore as an index. OceanBase Database can store both rowstore and columnstore data in baseline SSTables, or aggregate partial columns to remove redundant data. The system queries a columnstore, rowstore, or row group based on the actual needs. + +-  Thirdly, treat a columnstore as a replica. OceanBase Database can apply row-based storage to the leader replica and columnar storage to read-only replicas, to improve resource isolation effects. + +-  In the near future, OceanBase Database may get rid of the limitations of these underlying storage methods, eliminate the barriers between OLTP and OLAP, and return to the origin of a database. If this comes true, OceanBase Database will organize data always in a format that is most suitable for the workload type and return the query results to users at the quickest speed. Users will only need to add resources if they feel that the query response is slow but do not want to take optimization measures. + + + +**7. Summary** +---------- + +OceanBase Database already supports the columnar storage feature in version 4.3 and more related features are under development. We hope that these features will be unveiled as soon as possible to allow more users to experience the convenience brought by columnar storage in real-time AP scenarios. + + + +![1717386085](/img/blogs/tech/column-store/image/1717386085767.png) \ No newline at end of file diff --git a/docs/blogs/tech/core-tech-ob.md b/docs/blogs/tech/core-tech-ob.md new file mode 100644 index 000000000..cdad0033b --- /dev/null +++ b/docs/blogs/tech/core-tech-ob.md @@ -0,0 +1,94 @@ +--- +slug: core-tech-ob +title: 'Decoding the Core Technologies of OceanBase Community Edition 4.x' +--- + +In the digital age, data is growing exponentially across industries. In the midst of increasing computational complexity, unlocking the value of massive data is a driving force for innovation. Against this backdrop, databases, the foundation of data processing, are confronting shifts in the market. On one hand, conventional solutions such as centralized databases and sharding struggle to rise to the challenges of massive data, with issues such as performance bottlenecks, insufficient analytical capabilities, and high costs becoming increasingly evident. On the other hand, distributed databases, with their inherent advantages in automatically distributing data across multiple nodes and allowing read/write operations on the cluster data from any node, along with features such as maintaining strong transactional consistency, are becoming the next-generation data management solutions. + +This article summarizes a presentation from the database forum of the Global Internet Architecture Conference (GIAC), delivered by Zheng Xiaofeng, a technical expert who leads the open source initiatives of OceanBase Database in South China. Focusing on OceanBase Database, a native distributed database, this article discusses a data management solution from its architecture to its features, community editions, ecosystem tools, future versions, and roadmap. + +![1691646145](/img/blogs/tech/core-tech-ob/image/1691646145286.png) + +When it comes to distributed databases, we naturally think of large-scale application scenarios. OceanBase Database was born in such a scenario. In 2010, when conventional relational databases struggled to support Taobao Favorites, OceanBase Database V0.1 was rolled out as a distributed storage solution. At the time, NoSQL was popular. We tried separating the storage layer but found that relational databases with a loosely coupled design incurred significant efficiency overheads and thus were unsuitable for transaction processing (TP) scenarios where low latency was required. + +To resolve this issue, we implemented an integrated architecture in OceanBase Database V1.0, which has also been used in versions 2.0, 3.0, and 4.0. The architecture was simple, with the storage, SQL, and transaction engines integrated into a single OBServer node. As OceanBase Database gained more external users, we found that they usually used more than one database. For example, a user might use MySQL and Oracle at the same time. Therefore, we added compatibility with Oracle on top of multitenancy during the commercialization of OceanBase Database V2.0. In OceanBase Database V3.0, we further enhanced its compatibility and performance. + +In June 2021, we open-sourced OceanBase Database V3.1 when the latest enterprise edition was OceanBase Database V3.2. For OceanBase Database V3.x, community editions differ from enterprise editions in performance. For OceanBase Database V4.0, which has an integrated architecture for standalone and distributed modes, the kernel features of the community edition in MySQL mode are the same as those of the enterprise edition in MySQL mode. + +![1691646186](/img/blogs/tech/core-tech-ob/image/1691646186227.png) + +![1691646209](/img/blogs/tech/core-tech-ob/image/1691646209339.png) + +The name of the integrated architecture for standalone and distributed modes in OceanBase Database V4.x has two meanings. On one hand, it indicates that the architecture supports both standalone and distributed deployments. On the other hand, it signifies that a tenant using standalone resources is in standalone mode even if the tenant is in a distributed OceanBase cluster. During this process, OceanBase Database can seamlessly switch between standalone and distributed modes. + +In addition to the standalone and distributed modes, OceanBase Database can also be deployed in primary/standby mode. In OceanBase Database V4.1, the primary and standby OceanBase clusters, which transmit archive logs through Alibaba Cloud Object Storage Service (OSS) or Network File System (NFS) for synchronization, are similar to primary and standby MySQL databases. Compared to the distributed mode, the primary/standby mode has the risk of data loss because it does not guarantee a recovery point objective (RPO) of zero. In this case, a three-replica deployment can meet your need for high availability and scalability. Multitenancy in OceanBase Database facilitates cluster management in large-scale business scenarios. For example, if a company has tens of thousands of servers but a handful of database administrators (DBAs), each DBA has to manage thousands of database instances. With multitenancy, many MySQL instances can be integrated into an OceanBase cluster, greatly reducing the number of instances that need to be managed. + +Throughout the process, the deployment modes of OceanBase Database are tailored to business needs so that companies only need to pay for the features they use. + +![1691646239](/img/blogs/tech/core-tech-ob/image/1691646238911.png) + + +![1691646264](/img/blogs/tech/core-tech-ob/image/1691646264872.png) + +With its integrated architecture for standalone and distributed modes, OceanBase Database is compatible with the classic mode and supports both TP and analytical processing (AP). Additionally, OceanBase Database achieves strong consistency, zero data loss, high availability, and seamless scalability through full data verification. + +In terms of the architecture, instead of using Storage Area Network (SAN) storage devices that require a dedicated network, OceanBase Database can be deployed in a shared-nothing cluster based on general servers and IDC networks. The cluster automatically manages the allocation and dynamic balancing of computing and storage resources, and supports auto scaling, which allows the linear increase of the read and write performance. SQL execution and data storage are supported on all service nodes, and partition replicas are autonomously managed by each node. The cluster runs only one type of database service processes, without depending on external services, which facilitates O&M management. OceanBase Database provides unified database services to applications, supports global indexes, and ensures the atomicity, consistency, isolation, and durability (ACID) properties of transactions. This means it performs as a standalone system in the development of applications. OceanBase Database adapts to various infrastructures, supporting flexible deployment modes such as three IDCs in the same city, three IDCs across two regions, and five IDCs across three regions. + +In terms of features such as native distributed capabilities, scalability, single-node performance, hybrid transaction/analytical processing (HTAP), low costs, and multitenancy, their technical principles are as follows: + +* **Native distributed capabilities**: As we all know, a distributed database features high scalability and availability. In terms of high scalability, OceanBase Database leverages the Paxos consensus protocol and native capabilities for seamless horizontal and vertical scaling, improving resource utilization and reducing costs. In terms of high availability, in addition to strong consistency among multiple replicas, OceanBase Database also verifies consistency between replicas, consistency between transaction commits, and disk data to ensure high data reliability. +* **Seamless scaling**: After you add OBServer nodes, the cluster automatically migrates data from old nodes to new ones. The entire process is transparent to applications. In Ant Group, the largest archive database stores petabytes of data. With multi-node copying at 500 MB/s, TB-level migration takes just a few hours. Take the Double 11 shopping festival as an example. Each year, Ant Group applies for a set of cloud servers half a month before the Double 11 shopping festival to distribute user data to IDCs in more zones to handle the sharp traffic spike. During the migration, read-only replicas are copied first, followed by a leader switch in seconds. After the Double 11 peak, resources are reclaimed. This is how OceanBase Database ensures highly flexible scalability. +* **Single-node performance comparable to the performance of a standalone database**: Distributed databases typically trade single-node performance for horizontal scalability. However, in online transaction processing (OLTP) business, increased latency in processing individual transactions is often unacceptable. In many scenarios, replacing a standalone database with a distributed one requires multiple distributed nodes to maintain the same business scale even without increasing performance, ultimately driving up costs instead of reducing them. + +OceanBase Database adopts an integrated architecture for standalone and distributed modes. When deployed in standalone mode, it is comparable to a standalone database and even outperforms some popular open source ones. + +* When deployed with three replicas in three IDCs, OceanBase Database provides higher availability than a conventional database that delivers the same performance in primary/standby mode. +* OceanBase Database supports linear vertical scaling by allowing you to upgrade the node specifications. +* OceanBase Database supports linear horizontal scaling by allowing you to deploy more nodes in each zone. + +In the following three cases, OceanBase Database incurs no multi-node access overhead for queries and transactions: + +* When an SQL statement involves only partitions on a single node, no network is required for data reads and writes. +* When a transaction involves only partitions on a single node, no distributed protocol overhead is incurred by transaction commits. +* When a transaction involves only partitions on a single node, no remote access to Global Timestamp Service (GTS) is required for reading consistent snapshots based on multiversion concurrency control (MVCC). + +Such an integrated architecture allows OceanBase Database to grow with the business and be adaptable to the needs of different customers, including small personal sites, core banking systems, and giant e-commerce platforms. + +For a distributed database in OLTP scenarios, 80% of reads and writes are single-node while 20% of reads and writes are cross-node. Our goal with the integrated architecture is to ensure strong performance for the 80% single-node transactions while improving efficiency for the remaining 20% cross-node transactions. OceanBase Database V3.x pre-partitions data, with each partition having its own log stream. The larger the number of log streams, the higher the CPU and memory consumption. In a distributed scenario that involves the two-phase commit (2PC) protocol or the Paxos consensus protocol for distributed transactions, transaction atomicity and durability rely on multiple log streams. This incurs more system overhead than a single log stream in standalone mode. + +In OceanBase Database V4.x, multiple log streams are merged into one, which greatly reduces the system load. This does not prevent a cluster from migrating data to another log stream during scaling for even data distribution. The migration process is fully automated and transparent to applications. + +In addition to merging log streams, we have also made other optimizations, such as reducing the overhead of the sys tenant, offering parallel capabilities on a single node, and enabling on-demand metadata loading in memory, to boost the single-node performance of OceanBase Database. In our comparison of OceanBase Database and MySQL on a 32-core server, OceanBase Database significantly outperforms MySQL. + +![1691646364](/img/blogs/tech/core-tech-ob/image/1691646364695.png) + +Another test demonstrates the strong vertical scalability of OceanBase Database. In a sysbench stress test where CPU resources were doubled, the single-node performance of OceanBase Database in queries per second (QPS) increased proportionally, fully utilizing the added hardware. + +![1691646408](/img/blogs/tech/core-tech-ob/image/1691646408458.png) + +* **HTAP**: Business scenarios of enterprise applications can be roughly classified into two types: OLTP and online analytical processing (OLAP). Large enterprises tend to deploy multiple database products to support OLTP and OLAP scenarios separately. This solution requires data to flow between different systems, causing latency and the risk of data inconsistency during data synchronization and leading to data redundancy among various systems. This inevitably drives up costs and hinders fast business adjustments in a competitive market. + +Therefore, we want OceanBase Database to support both TP and AP workloads in lightweight real-time analysis scenarios. An OceanBase cluster typically has three replicas. By default, the leader processes strong-consistency reads and writes, ensuring that TP and AP tasks are performed on the same set of data. The integrated architecture for standalone and distributed modes also allows you to configure flexible settings such as read/write splitting to suit different business needs. After migrating from the sharding solution of MySQL to an OceanBase cluster, one of our customers has reduced their total cost of ownership (TCO) by 35% and increased their AP capability by 30%. This fully demonstrates the excellence of OceanBase Database in HTAP. + +* **Low costs and high compression ratio**: Data compression is key to the reduction of storage space for massive data. OceanBase Database implements a distributed storage engine featuring a high compression ratio. Thanks to the adaptive compression technologies, this LSM-tree-based storage engine balances the system performance and compression ratio in a creative manner, which is impossible in a conventional database that stores data in fixed-size chunks. Moreover, data and logs are separately stored in the system to further reduce the storage costs. The LSM-tree-based storage engine greatly reduces the storage costs by using encoding and compression technologies. + +How does OceanBase Database help reduce costs? It starts with saving server and storage resources. Based on our experience, in large-scale scenarios, migrating from MySQL to OceanBase Database reduces the number of required servers with the same specifications. As the storage engine of OceanBase Database organizes data in hybrid row-column storage, OceanBase Database V4.x stores approximately five times the total data volume of MySQL per replica, although the exact ratio may vary depending on user data characteristics. + +* **Multitenancy and resource isolation**: Pools are a key approach to fine-grained database resource management in the cloud era. OceanBase Database adopts a native multitenant architecture, which supports multiple tenants in one cluster. Data and resources of a tenant are isolated from those of other tenants, and tenant resources are uniformly scheduled within the cluster. Users can create MySQL or Oracle tenants and configure the number, type, and storage location of data replicas as well as computing resources for each tenant. Database cluster management features are integrated on a platform, application databases are integrated into multiple tenants, and online scaling and configuration adjustment are supported to facilitate automatic O&M. + +OceanBase Database V4.1, released two months ago, introduces the following new features: + +* **Direct load is supported to speed up batch imports of massive data**, OBKV and multi-model data types such as GIS are available, and compatibility with MySQL 8.0 is enhanced. +* **Primary/standby clusters are supported for stability**. Primary/standby clusters in OceanBase Database V4.1 are based on Alibaba Cloud OSS and NFS. We plan to implement primary/standby clusters based on network transmission in the future. +* **A GUI installation tool is integrated for ease of use**. This allows you to deploy lightweight OCP Express and install an OceanBase cluster with a few clicks. +* **Performance is improved**. Compared to OceanBase Database V4.0, OceanBase Database V4.1 improves TP performance by 40% and AP performance by 15% in scenarios where 4-core servers are used. + +![1691646610](/img/blogs/tech/core-tech-ob/image/1691646610454.png) + +![1691646625](/img/blogs/tech/core-tech-ob/image/1691646625403.png) + +Additionally, the OceanBase team will collaborate with more ecosystem partners to offer comprehensive and seamless services before, during, and after database migration. We have provided several tools in the OceanBase ecosystem, and the most popular ones are OceanBase Developer Center (ODC), OceanBase Migration Service (OMS), and OceanBase Cloud Platform (OCP). We have also released and open-sourced lightweight OCP Express Community Edition with open APIs early this year. After deploying the service, you can access existing data management platforms through the open APIs to improve O&M efficiency and support business growth. + +In MySQL mode, OceanBase Database Community Edition 4.x offers the same features and capabilities as OceanBase Database Enterprise Edition 4.x. Even though OceanBase Database V4.x shows a clear performance boost over OceanBase Database V3.x, we're still focused on optimizing the kernel. In TP scenarios, we are improving the performance of OceanBase Database on small-specification servers, in an attempt to enable a single-node OceanBase Database instance to run faster than a single-node MySQL instance, so that OceanBase Database can meet user demands regardless of whether a small-specification server or a large cluster is used. In AP scenarios, we strive to implement more features, such as separation of hot and cold data and support for read-only external tables, to suit diverse business needs. + +We hope that OceanBase Database can help more enterprises overcome their business bottlenecks and that more users can build the OceanBase community with us. \ No newline at end of file diff --git a/docs/blogs/tech/end-to-end-tracing.md b/docs/blogs/tech/end-to-end-tracing.md new file mode 100644 index 000000000..526eac7b5 --- /dev/null +++ b/docs/blogs/tech/end-to-end-tracing.md @@ -0,0 +1,197 @@ +--- +slug: end-to-end-tracing +title: 'Insights into OceanBase Database 4.0: Issues Addressed by End-to-end Tracing, Starting with a Slow SQL Query' +--- + +# Insights into OceanBase Database 4.0: Issues Addressed by End-to-end Tracing, Starting with a Slow SQL Query + +> About the author: **Xiao Yi, Senior Technical Expert at OceanBase**, has supported Ant Group's Double 11 Shopping Festival multiple times. He is a key member of the TPC-C and TPC-H performance team and specializes in designing and developing SQL engine components, including link protocols, execution plans, and execution engines. + + + +In the previous article, we discussed what DDL challenges a database faces when it transitions from a standalone architecture to a distributed one and what solutions and design ideas OceanBase Database V4.0 has taken to ensure more efficient and transparent DDL operations for better user experience in O&M. In this article, we delve into fault tracing and diagnosis, another important capability in database O&M. + +First, let's read this conversation: +``` +A business manager complained: "The database requests are going to take a million years to finish. Could you have a look?" + +Rolling down the real-time monitoring records of the database node, the database administrator (DBA) felt strange: "I don't see any slow SQL statements here." + + + +The business manager asked: "What's going on then?" + +DBA: "Maybe there is something wrong with the connection between the client and the database node. Let me check the logs of the proxy server." + +One hour later... "The time consumption shown in the logs of the proxy looks good." the DBA frowned. "Maybe it's the network problem between the client and the proxy?" +``` + + +Well, this is a short story about troubleshooting a slow SQL statement in a distributed database. This issue, if not solved soon, will greatly affect the user experience, or even lead to service unavailability. That's why we made it a priority to offer simple, efficient diagnostics. Compared with standalone databases, a distributed database typically has a cluster of dozens or hundreds of servers, with multiple interlinked components working together to process user requests. It is more challenging to achieve fast and efficient fault diagnosis and location. + + + +OceanBase Database V4.0 has significantly improved its diagnostic capabilities by supporting visual end-to-end tracing of SQL requests. This feature helps users quickly locate the specific execution stage, machine, and module of a fault, with the detailed execution information provided. It makes O&M simple and efficient. In this article, we will share our thoughts on efficient database diagnosis and introduce to you the benefits and design ideas of the end-to-end tracing feature in [OceanBase Database](https://github.com/oceanbase/oceanbase) from the following perspectives: + +* **Purpose of end-to-end tracing** +* **Benefits of end-to-end tracing** +* **Design of end-to-end tracing** +* **Performance of end-to-end tracing in OceanBase Database V4.0** + +## Purpose of End-to-end Tracing + +In OceanBase Database, a user request is first sent to OBProxy, a SQL request proxy service, which routes the request to one of the OBServer nodes of the OceanBase cluster. Then, the request is processed by many modules in different engines, such as the SQL engine, storage engine, and transaction engine, depending on the request type. The request may also access data on multiple OBServer nodes by remote procedure call (RPC) tasks. At last, the result is returned to the client. + +![1678085682](/img/blogs/tech/end-to-end-tracing/image/1678085682712.png) + +_Figure 1 SQL request execution processes in OceanBase Database_ + +If a user request returns an error or is executed slowly, it may be caused by the execution fault of a component or the connection problems between components. OceanBase Database of earlier versions has provided users with a range of monitoring and diagnostic capabilities, such as SysStat, SQL Audit, Trans Stat, Tenant Dump, Slow Trans, and Slow Query, and OceanBase Cloud Platform (OCP), the database management platform of OceanBase, has supported visual diagnostic operations such as transaction, TopSQL, and SlowSQL diagnostics based on the output of those monitoring capabilities. However, these capabilities cannot provide enough information for the O&M team to quickly check for issues and efficiently restore the faulty service from an end-to-end perspective. It often takes a long time, sometimes with the help of component experts, to merely locate the execution stage, machine, or module where the issue occurs. + +To further improve the diagnostic efficiency of user request exceptions in a distributed system, OceanBase Database V4.0 supports end-to-end tracing. This feature traces the information of user SQL requests executed by different components at different stages of the entire data processing link, and presents the information to users in a visual way, allowing users to quickly hunt down the target. + +## Benefits of End-to-end Tracing + + +### End-to-end tracing of transactions and SQL statements + +OceanBase Database V4.0 supports user-facing fine-grained end-to-end tracing of transactions and SQL statements. For a business department, the total time consumed by a business service is often of greater concern. In an online transaction processing (OLTP) system, for example, a business service usually consists of one or more transactions. Therefore, it is more practical to take a transaction as the elementary tracing unit. The end-to-end tracing feature creates a trace for each transaction, and records the execution information of each SQL statement in the OBClient > OBProxy > OBServer link in the transaction. By combing through a trace, users are able to quickly find the SQL statements executed in the transaction and get the execution information at OBClient. + +In a real business system that utilizes end-to-end tracing, once users find a slow SQL request or transaction, they can quickly locate the very execution stage that drags down the progress of the whole execution link. Or, if users notice that it takes a long time to initiate an SQL request since the end of the last one in a transaction, they can consult with the business department to figure out the possible problems in the business logic. + +![1678085788](/img/blogs/tech/end-to-end-tracing/image/1678085789075.png) + +_Figure 2 SQL requests in a transaction_ + +### End-to-end tracing in a distributed system + +OceanBase Database V4.0 supports end-to-end tracing in a distributed system. In the distributed architecture of OceanBase Database, OBProxy may route a received user request to any one of the OBServer nodes in the cluster, and the requested data may be distributed across multiple OBServer nodes. Moreover, the execution engine will assign SQL execution tasks to different OBServer nodes. If a cluster has many OBServer nodes, questions arise. Which OBServer nodes handle these SQL requests and tasks? How much time does each module on an OBServer node take? These are common concerns for O&M personnel. + +The end-to-end tracing feature allows users to trace the entire execution link of SQL requests in a distributed scenario that involves multiple OBServer nodes. Users can find details such as the OBServer nodes that received requests, the OBServer nodes that executed remote tasks, and the scheduling status and execution time of each task. + +![1678085839](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/pord/blog/2023-04/1678085839669.png) + +_Figure 3 Execution process of a distributed request_ + +### Convenient association between diagnostics and the business system + +Many users have built their own monitoring and diagnostic systems. When a request gets slow or an error is reported in the database, users may need to quickly associate the event with the corresponding SQL diagnostics in the system to get troubleshooting done faster. End-to-end tracing allows users to easily associate diagnostics with the business system. They can set an app trace ID for a request from the business system to the database by using the Java Database Connectivity (JDBC) or SQL API. The app trace ID is recorded in the end-to-end tracing information. + +When an error is reported for a request or database call, users can use the corresponding app trace ID to quickly search for the associated database trace in the end-to-end diagnostic system, and then view the time consumption of the request or database call at each execution stage of the database link and the point where the error is reported, so as to identify the component that triggers the error in a short period of time. + +### Multiple end-to-end information display modes + +OCP allows users to quickly find faulty requests by using different metrics, such as the time consumption, trace ID, and SQL statement ID. Also, OCP clearly displays the execution information of the entire execution link from the client to each component of the database, which helps users locate the problematic stage in no time. + +![1678085878](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/pord/blog/2023-04/1678085878241.png) + +_Figure 4 A link details page on OCP_ + +OceanBase Database V4.0 also supports interactive end-to-end tracing operations. For example, if users manually execute an SQL statement in the command line, and want to inspect the execution link of the statement to get the time consumption of each stage for performance analysis or optimization, they can use the Show Trace feature to easily spot the performance bottlenecks. The following sample code shows the execution process of two distributed parallel tasks (px\_task) by running the `Show Trace` command. Users can specify keywords and options in the command to see more details. +``` + OceanBase(admin@test)>select/*+parallel(2)*/ count(*) from t1; + +----------+ + | count(*) | + +----------+ + | 5 | + +----------+ + 1 row in set (0.005 sec) + + OceanBase(admin@test)>show trace; + +-------------------------------------------+----------------------------+------------+ + | Operation | StartTime | ElapseTime | + +-------------------------------------------+----------------------------+------------+ + | obclient | 2023-03-01 17:51:30.143537 | 4.667 ms | + | └─ ob_proxy | 2023-03-01 17:51:30.143716 | 4.301 ms | + | └─ com_query_process | 2023-03-01 17:51:30.145119 | 2.527 ms | + | └─ mpquery_single_stmt | 2023-03-01 17:51:30.145123 | 2.513 ms | + | ├─ sql_compile | 2023-03-01 17:51:30.145133 | 0.107 ms | + | │ └─ pc_get_plan | 2023-03-01 17:51:30.145135 | 0.061 ms | + | └─ sql_execute | 2023-03-01 17:51:30.145252 | 2.350 ms | + | ├─ open | 2023-03-01 17:51:30.145252 | 0.073 ms | + | ├─ response_result | 2023-03-01 17:51:30.145339 | 2.186 ms | + | │ ├─ px_schedule | 2023-03-01 17:51:30.145342 | 1.245 ms | + | │ │ ├─ px_task | 2023-03-01 17:51:30.146391 | 1.113 ms | + | │ │ │ ├─ get_das_id | 2023-03-01 17:51:30.146979 | 0.001 ms | + | │ │ │ ├─ do_local_das_task | 2023-03-01 17:51:30.147012 | 0.050 ms | + | │ │ │ └─ close_das_task | 2023-03-01 17:51:30.147237 | 0.014 ms | + | │ │ └─ px_task | 2023-03-01 17:51:30.146399 | 0.868 ms | + | │ │ ├─ get_das_id | 2023-03-01 17:51:30.147002 | 0.001 ms | + | │ │ ├─ do_local_das_task | 2023-03-01 17:51:30.147036 | 0.041 ms | + | │ │ └─ close_das_task | 2023-03-01 17:51:30.147183 | 0.011 ms | + | │ └─ px_schedule | 2023-03-01 17:51:30.147437 | 0.001 ms | + | └─ close | 2023-03-01 17:51:30.147536 | 0.049 ms | + | └─ end_transaction | 2023-03-01 17:51:30.147571 | 0.002 ms | + +-------------------------------------------+----------------------------+------------+ +``` + +### Integration with other diagnostic features + +Now we know that users can quickly locate the faulty component or module with the help of the end-to-end tracing feature. What if users want to dig deeper and get more execution details? No worries. The end-to-end tracing feature is integrated with other diagnostic features designed for different modules, which helps users get diagnostic insights. + +For example, if users, with the help of the end-to-end tracing feature, confirm that the SQL execution engine is the culprit of a slow SQL request, they can launch the SQL Plan Monitor feature based on the sql\_trace\_id parameter of the SQL request to check out the execution information of operators and threads of the associated execution plan. As shown in Figure 5, we can see the details of each operator, such as the CPU time (the green bar in the DBTime column), the waiting time (the red bar in the DBTime column), and the number of rows returned. + +![1678085987](/img/blogs/tech/end-to-end-tracing/image/1678085987788.png) + +_Figure 5 Execution information displayed by SQL Plan Monitor_ + +## Design of End-to-end Tracing + + +The figure below shows the key OceanBase components that make the end-to-end tracing feature possible. In this section, we will describe in detail the OpenTracing data model, which we use to record the trace information, generation of trace data, and integrated data analysis and display on OCP. + +![1678086049](/img/blogs/tech/end-to-end-tracing/image/1678086049739.png) + +_Figure 6 OceanBase components that enable the end-to-end tracing feature_ + +### Data model + +The end-to-end tracing feature of OceanBase Database uses the OpenTracing model to record data. This model is widely used in a large number of distributed tracing systems. In the figure below, the left part shows the OpenTracing model, and the right part shows the corresponding end-to-end tracing data model of OceanBase Database. Each trace corresponds to one database transaction and multiple spans. An SQL request corresponds to a span, which records the information about an execution process. In addition, each span records a log that is persisted to the trace file. + +![1678086081](/img/blogs/tech/end-to-end-tracing/image/1678086081133.png) + +_Figure 7 End-to-end tracing data models_ + +### Generation of trace data + +One of the key capabilities of the end-to-end tracing feature is to generate complete and valid trace data. On the one hand, we have studied each component in the entire request execution link, making careful decisions on the stages and information to be recorded in specific spans, to ensure that the end-to-end tracing data is accurate and useful. On the other hand, we have also taken account of the performance impact of trace data generation, which is caused mainly by the overhead for recording the trace data into the memory, and that for writing the trace data to the trace file. To minimize the impact on performance and provide users with more useful information for end-to-end diagnostics, OceanBase Database supports various control strategies. Users can set different sampling frequencies for traces to record complete trace information. OceanBase Database also writes the full trace information of slow and faulty SQL statements that users are more concerned about into the trace file. + +![1678086110](/img/blogs/tech/end-to-end-tracing/image/1678086110289.png) + +_Figure 8 Generation of trace logs for each component_ + +The trace files are independently stored on the machines that host the obproxy and observer processes. Considering that a database client interacts with the business server, the end-to-end tracing information of OBClient is not recorded on the business server, but transferred to OBProxy. + +### Integrated data analysis and display on OCP + +OCP allows users to search for the trace information of a request by specified conditions and view the details of the execution link in a GUI. The trace information comes from the trace logs of the obproxy and observer processes on different servers. OCP provides special backend collectors to collect and parse the trace logs, and then store them in Elasticsearch.  The collected data is the raw span data, and the data of the same trace may be scattered in different spans on different servers. It is hard to search data by span tags. Therefore, the OCP server regularly merges key span data of a trace, such as the time consumption at each stage and important tags, into one data record to construct a profile of the trace. This way, users can efficiently query the trace information by different combined conditions and the results can be neatly presented on pages. + + + +## Performance of End-to-end Tracing in OceanBase Database V4.0 + + +You must remember the slow SQL story at the beginning of this article. So, what changes can the end-to-end tracing feature bring to the O&M work? + +By using the feature, if users notice that business requests are slow, they can simply navigate to the end-to-end trace search page on OCP, sort SQL requests by time consumption, find the most time-consuming requests in a certain time period, and check for requests with unexpectedly long execution time. If a time-consuming SQL request is confirmed with the help of TopSQL diagnostics, or the app trace ID of the related user is obtained, the trace ID can be used as a filter to narrow down the scope of search. + +![1678086185](/img/blogs/tech/end-to-end-tracing/image/1678086185409.png) + +_Figure 9 Request searching on the end-to-end trace search page on OCP_ + +Once the time-consuming SQL request is identified, we can diagnose what went wrong exactly. At this point, users can click the trace ID on the OCP page to expand the trace information of the request, as shown in Figure 10. The measured execution time is 4.47 ms at OBClient, 4.366 ms at OBProxy, and 3.246 ms at the OBServer node. Based on the normal time consumption of each end, we can come to the conclusion that the OBServer node took the most time in the execution. Going deeper, we can see that a large part of the time was consumed at the SQL compile stage, at which the SQL execution plan was generated. We can now come to a preliminary conclusion that the execution of this SQL request was slow because it failed to hit the execution plan. + +![1678086219](/img/blogs/tech/end-to-end-tracing/image/1678086219476.png) + +_Figure 10 End-to-end tracing details presented on OCP_ + +On the end-to-end tracing details page on OCP, we can see the path by which the SQL request calls each module, and the time consumption of each stage. For example, users can tell simply by checking the timeline that the transport of the request from OBClient to OBProxy did not take too much time. However, if users want to know the time consumed from OBClient initiating the request to OBProxy receiving the request, they can click the span of OBClient and that of OBProxy respectively. As shown in Figure 11, we can quickly figure out that the difference between the start times of the two stages is 187 μs. In other words, users are able to analyze an issue in a more detailed way. + + +![1678086322](/img/blogs/tech/end-to-end-tracing/image/1678086322794.png) + +_Figure 11 Details of end-to-end tracing stages_ + +## Afterword + +The end-to-end tracing feature of OceanBase Database V4.0 achieves the observability of each transaction and SQL request, allowing users to efficiently diagnose and locate a fault. We believe this new feature will further speed up the troubleshooting process and make database O&M easier and more efficient. As an important part of enhancing the usability of OceanBase Database, we will also focus on providing a better O&M experience by integrating more features into OceanBase Database V4.x, such as Active Session History (ASH), Realtime SQL Plan Monitor, and Logical Plan Manager. Feel free to leave your comments below and share your ideas on database diagnostics. \ No newline at end of file diff --git a/docs/blogs/tech/flashback-query.md b/docs/blogs/tech/flashback-query.md new file mode 100644 index 000000000..ea2d72af5 --- /dev/null +++ b/docs/blogs/tech/flashback-query.md @@ -0,0 +1,338 @@ +--- +slug: flashback-query +title: 'Practice of Flashback Queries in OceanBase Database' +--- + + Misoperations such as deleting data by mistake are common in the daily work of database administrators (DBAs). To help them address these issues, we need to know how to restore data. + + OceanBase Database supports record-specific flashback queries, which allow you to obtain data of a specific historical version. Let's have a look at how to use this feature in advance when it might be needed unexpectedly. + + In a flashback query, `undo_retention` is used to specify the time range of data versions to be retained in minor compactions. When `undo_retention` is set to `0`, multi-version minor compaction is disabled, which indicates that only the latest version of row data is retained in the minor compaction file. When `undo_retention` is set to a value greater than 0, multi-version minor compaction is enabled, and multiple versions of row data within the specified period in seconds are retained in the minor compaction file. To recover accidentally deleted data, you can first increase the value of `undo_retention` and set `undo_retention` to the default value after data is restored. + +Default value: + +1800, in seconds + +Value range: + +\[0, 4294967295\] + +## 1. Preparations + +Change the value of `undo_retention` and enable the recycle bin. + +``` + #Change the value of undo_retention. + obclient [test]> ALTER SYSTEM SET undo_retention=1800; + Query OK, 0 rows affected (0.004 sec) + + obclient [test]> SHOW PARAMETERS LIKE 'undo_retention'; + +-------+----------+-----------------+----------+----------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------+---------+-------------------+---------------+-----------+ + | zone | svr_type | svr_ip | svr_port | name | data_type | value | info | section | scope | source | edit_level | default_value | isdefault | + +-------+----------+-----------------+----------+----------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------+---------+-------------------+---------------+-----------+ + | zone1 | observer | 192.168.150.116 | 2882 | undo_retention | INT | 1800 | the low threshold value of undo retention. The system retains undo for at least the time specified in this config when active txn protection is banned. Range: [0, 4294967295] | TENANT | TENANT | DEFAULT | DYNAMIC_EFFECTIVE | 1800 | 1 | + +-------+----------+-----------------+----------+----------------+-----------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------+---------+-------------------+---------------+-----------+ + 1 row in set (0.004 sec) + + # Enable the recycle bin and log in again to reconnect to the database. + bclient [test]> SET GLOBAL recyclebin = on; + Query OK, 0 rows affected (0.002 sec) + + obclient [test]> SHOW VARIABLES LIKE 'recyclebin'; + +---------------+-------+ + | Variable_name | Value | + +---------------+-------+ + | recyclebin | ON | + +---------------+-------+ + 1 row in set (0.001 sec) +``` + +## 2. Flash Back to the State Before a DML Operation + +### 2.1 Create a table and prepare the required data +``` + obclient [test]> create table banjin_flash (id int ,name varchar(10),dizhi varchar(10),primary key (id)); + insert into banjin_flash values (1,'zhangsan','Beijing'); + insert into banjin_flash values (2,'lisi','Shanghai'); + insert into banjin_flash values (3,'wangwu','Tianjin'); + Query OK, 0 rows affected (0.050 sec) + + obclient [test]> insert into banjin_flash values (1,'zhangsan','Beijing'); + Query OK, 1 row affected (0.008 sec) + + obclient [test]> insert into banjin_flash values (2,'lisi','Shanghai'); + Query OK, 1 row affected (0.001 sec) + + obclient [test]> insert into banjin_flash values (3,'wangwu','Tianjin'); + Query OK, 1 row affected (0.001 sec) + + obclient [test]> insert into banjin_flash values (4,'zhaoliu','Hebei'); + Query OK, 1 row affected (0.001 sec) +``` + +### 2.2 Modify the table + +Modify the table and record the current date and time returned by the NOW() function to facilitate subsequent data restoration. +``` + obclient [test]> select now(); + +---------------------+ + | now() | + +---------------------+ + | 2024-10-20 17:31:13 | + +---------------------+ + 1 row in set (0.000 sec) + + obclient [test]> update banjin_flash set dizhi = 'Hunan' where name='lisi'; + Query OK, 1 row affected (0.003 sec) + Rows matched: 1 Changed: 1 Warnings: 0 + + obclient [test]> select now(); + +---------------------+ + | now() | + +---------------------+ + | 2024-10-20 17:31:30 | + +---------------------+ + 1 row in set (0.000 sec) + + obclient [test]> delete from banjin_flash; + Query OK, 4 rows affected (0.002 sec) + + obclient [test]> select now(); + +---------------------+ + | now() | + +---------------------+ + | 2024-10-20 17:31:52 | + +---------------------+ + 1 row in set (0.000 sec) +``` + +### 2.3 Flash back data + +``` + obclient [test]> select * from banjin_flash; + Empty set (0.001 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:31:30') * 1000; + +----+----------+--------+ + | id | name | dizhi | + +----+----------+--------+ + | 1 | zhangsan | Beijing | + | 2 | lisi | Hunan | + | 3 | wangwu | Tianjin | + | 4 | zhaoliu | Hebei | + +----+----------+--------+ + 4 rows in set (0.000 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:31:13') * 1000; + +----+----------+--------+ + | id | name | dizhi | + +----+----------+--------+ + | 1 | zhangsan | Beijing | + | 2 | lisi | Shanghai | + | 3 | wangwu | Tianjin | + | 4 | zhaoliu | Hebei | + +----+----------+--------+ + 4 rows in set (0.000 sec) +``` + +In the results of the preceding two flashback queries, data at different points in time is returned: the data before the deletion is returned for the first query, and the data before the update is returned for the second query. + +You can insert the restored data into the backup table for subsequent operations. + +## 3. Flash Back to the State Before a DDL Operation + +### 3.1 Flash back the table to the state before an ADD COLUMN operation +``` + obclient [test]> alter table banjin_flash add column dianhua decimal(11) default 1; + Query OK, 0 rows affected (0.038 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:44:43') * 1000; + +----+----------+--------+---------+ + | id | name | dizhi | dianhua | + +----+----------+--------+---------+ + | 1 | zhangsan | Beijing | 1 | + | 2 | lisi | Hunan | 1 | + | 3 | wangwu | Tianjin | 1 | + | 4 | zhaoliu | Hebei | 1 | + +----+----------+--------+---------+ + 4 rows in set (0.002 sec) + + obclient [test]> alter table banjin_flash add column dianhua1 decimal(11) ; + Query OK, 0 rows affected (0.034 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:44:43') * 1000; + +----+----------+--------+---------+----------+ + | id | name | dizhi | dianhua | dianhua1 | + +----+----------+--------+---------+----------+ + | 1 | zhangsan | Beijing | 1 | NULL | + | 2 | lisi | Hunan | 1 | NULL | + | 3 | wangwu | Tianjin | 1 | NULL | + | 4 | zhaoliu | Hebei | 1 | NULL | + +----+----------+--------+---------+----------+ + 4 rows in set (0.001 sec) +``` + +In the preceding flashback query result, the default value is used in the added columns. If no default value is available, NULL is used. + +### 3.2 Restore the table to the state before a DROP COLUMN operation + +``` + obclient [test]> alter table banjin_flash drop column dianhua1; + Query OK, 0 rows affected (0.251 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:44:43') * 1000; + ERROR 1412 (HY000): Unable to read data -- Table definition has changed +``` + +The table cannot be restored after a column is dropped. The following error is returned in this case: ERROR 1412 (HY000): Unable to read data -- Table definition has changed. + +### 3.3 Flash back the table to the state before a DROP TABLE operation +``` + obclient [test]> drop table banjin_flash; + Query OK, 0 rows affected (0.022 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:51:30') * 1000; + ERROR 1146 (42S02): Table 'test.banjin_flash' doesn't exist +``` + +If the table is dropped, it cannot be directly restored. An error is returned, indicating that the table does not exist. + +In this case, restore the table from the recycle bin and perform a flashback query again. +``` + obclient [test]> select * from banjin_flash; + Empty set (0.001 sec) + + obclient [test]> show recyclebin; + +--------------------------------+---------------+-------+----------------------------+ + | OBJECT_NAME | ORIGINAL_NAME | TYPE | CREATETIME | + +--------------------------------+---------------+-------+----------------------------+ + | __recycle_$_1_1729417897979024 | banjin_flash | TABLE | 2024-10-20 17:51:37.978166 | + +--------------------------------+---------------+-------+----------------------------+ + 1 row in set (0.002 sec) + + obclient [test]> FLASHBACK TABLE __recycle_$_1_1729417897979024 TO BEFORE DROP; + Query OK, 0 rows affected (0.033 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:51:30') * 1000; + +----+----------+--------+ + | id | name | dizhi | + +----+----------+--------+ + | 1 | zhangsan | Beijing | + | 2 | lisi | Shanghai | + | 3 | wangwu | Tianjin | + | 4 | zhaoliu | Hebei | + +----+----------+--------+ + 4 rows in set (0.008 sec) +``` + +## 4. Flash Back to the State Before a TRUNCATE Operation + +TRUNCATE is a special DDL operation, for which the flashback feature needs to be described separately. +``` + obclient [test]> insert into banjin_flash values (1,'zhangsan','Beijing'); + Query OK, 1 row affected (0.007 sec) + + obclient [test]> insert into banjin_flash values (2,'lisi','Shanghai'); + Query OK, 1 row affected (0.001 sec) + + obclient [test]> insert into banjin_flash values (3,'wangwu','Tianjin'); + Query OK, 1 row affected (0.001 sec) + + obclient [test]> insert into banjin_flash values (4,'zhaoliu','Hebei'); + Query OK, 1 row affected (0.001 sec) + + obclient [test]> + obclient [test]> select now(); + +---------------------+ + | now() | + +---------------------+ + | 2024-10-20 18:42:47 | + +---------------------+ + 1 row in set (0.000 sec) + + obclient [test]> update banjin_flash set dizhi = 'Hunan' where name='lisi'; + Query OK, 1 row affected (0.002 sec) + Rows matched: 1 Changed: 1 Warnings: 0 + + obclient [test]> select now(); + +---------------------+ + | now() | + +---------------------+ + | 2024-10-20 18:42:48 | + +---------------------+ + 1 row in set (0.000 sec) + + obclient [test]> truncate table banjin_flash; + Query OK, 0 rows affected (0.040 sec) + + obclient [test]> SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 18:42:47') * 1000; + ERROR 1412 (HY000): Unable to read data -- Table definition has changed +``` + +"ERROR 1412 (HY000): Unable to read data -- Table definition has changed" is reported when you flash back the table to the state before it was truncated. + +The definition and procedure of a TRUNCATE operation are as follows according to the OceanBase Database official website: + +To execute the `TRUNCATE TABLE` statement, you must have the `DROP` privilege on the table. It is a DDL statement. + +`TRUNCATE TABLE` and `DELETE FROM` have the following differences: + +* **A TRUNCATE operation drops a table and creates it again.** It is much faster than deleting data row by row, especially for large tables. +* The output of `TRUNCATE TABLE` always indicates that 0 rows were affected. +* When you use `TRUNCATE TABLE`, the table management program does not record the last `AUTO_INCREMENT` value, but resets it to zero. +* The `TRUNCATE TABLE` statement cannot be executed during transactions or when the table is locked. Otherwise, an error is returned. +* If the table definition file is valid, you can use the `TRUNCATE TABLE` statement to recreate the table as an empty table, even if the data or indexes are corrupted. + +Although the table is dropped before recreation, the dropped table is not moved to the recycle bin. Proceed with caution. +``` + obclient [test]> TRUNCATE TABLE BANJIN_FLASH; + Query OK, 0 rows affected (0.042 sec) + + obclient [test]> show recyclebin; + Empty set (0.002 sec) +``` + +## 5. Command Summary +``` + # Modify the value of undo_retention. + ALTER SYSTEM SET undo_retention=1800; + + # View the undo_retention parameter. + SHOW PARAMETERS LIKE 'undo_retention'; + + # Enable the recycle bin. + bclient [test]> SET GLOBAL recyclebin = on; + + # View the recycle bin status. + SHOW VARIABLES LIKE 'recyclebin'; + + # Perform a flashback query. + SELECT * FROM banjin_flash AS OF SNAPSHOT time_to_usec('2024-10-20 17:31:13') * 1000; + # Alternatively, use the following two statements together. + SELECT time_to_usec('2024-10-20 06:42:40') * 1000; + SELECT * FROM banjin_flash AS OF SNAPSHOT 1729377760000000000; + + # View the recycle bin. + show recyclebin; + + # Restore a dropped table from the recycle bin. + FLASHBACK TABLE __recycle_$_1_1729417897979024 TO BEFORE DROP; + + # Table and data + create table banjin_flash (id int ,name varchar(10),dizhi varchar(10),primary key (id)); + insert into banjin_flash values (1,'zhangsan','Beijing'); + insert into banjin_flash values (2,'lisi','Shanghai'); + insert into banjin_flash values (3,'wangwu','Tianjin'); + insert into banjin_flash values (4,'zhaoliu','Hebei'); + + #Data operations + select now(); + update banjin_flash set dizhi = 'Hunan' where name='lisi'; + select now(); + delete from banjin_flash; + select now(); + + alter table banjin_flash add column dianhua decimal(11) default 1; + + alter table banjin_flash drop column dianhua; +``` \ No newline at end of file diff --git a/docs/blogs/tech/image-search-vector-search.md b/docs/blogs/tech/image-search-vector-search.md new file mode 100644 index 000000000..3a31efaa6 --- /dev/null +++ b/docs/blogs/tech/image-search-vector-search.md @@ -0,0 +1,230 @@ +--- +slug: image-search-vector-search +title: '[OceanBase Practices] Building an Image Search Application Based on the Vector Search Technology of OceanBase Database' +--- + +# [OceanBase Practices] Building an Image Search Application Based on the Vector Search Technology of OceanBase Database + +> This article is a contest entry of [「Making Technology Visible | The OceanBase Preacher Program 2024」](https://open.oceanbase.com/blog/essay-competition), a technical writing contest. If you are a tech enthusiast, participate in this contest to bring code to life with your words while getting a chance to win a ¥10,000 prize! + +## 1. Introduction to Vector Search + +OceanBase Database offers powerful vector search capabilities, allowing you to use dense floating-point vectors with up to 16,000 dimensions and to calculate various distance metrics such as Manhattan distance, Euclidean distance, inner product, and cosine similarity. Its vector indexes are based on Hierarchical Navigable Small World (HNSW), a technology that supports incremental updates and deletes without affecting the recall rate. OceanBase Database also supports fusion queries with scalar filtering and provides flexible access methods. You can execute SQL queries based on the MySQL protocol by using a client written in any programming language or by using the Python SDK. In addition, OceanBase Database has adapted to artificial intelligence (AI) application development frameworks, such as LlamaIndex and DB-GPT, and the AI application development platform Dify to expand the support for AI application development. + +### 1.1 Key terms + +#### (1) Unstructured data + +Unstructured data refers to data without a defined format or structure, including text, images, audio, videos, social media content, emails, and log files. Due to the complexity and variety of unstructured data, processing it requires specific tools and technologies such as natural language processing, image recognition, and machine learning. + +#### (2) Vector + +A vector is the projection of an object in a high-dimensional space. Mathematically, a vector is a floating-point array with the following characteristics: + +Each element in the array is a floating-point number that represents a dimension of the vector. + +The size, namely, the number of elements, of the vector array indicates the dimensionality of the entire vector space. + +#### (3) Embedding + +Embedding is a process of extracting the content and semantics from unstructured data such as images and videos through deep learning based on the neural network to convert the unstructured data into feature vectors. The embedding technology maps raw data from a high-dimensional (sparse) space to a low-dimensional (dense) space and converts multimodal data with abundant features into a multidimensional array (vector). + +#### (4) Vector similarity search + +In an era of exponential data growth, users often need to quickly retrieve required information from massive amounts of data. For example, for an online literature database, the product catalog of an e-commerce platform, or a constantly growing multimedia content library, an efficient search system is required to quickly locate content of interest for users. Given the increasing amounts of data, conventional keyword-based search methods can no longer meet the requirements of users on search accuracy and speed. The vector search technology thus emerges. Vector similarity search converts unstructured data such as text, images, and audio into vectors through feature extraction and vectorization techniques, and then measures their similarity to capture deep semantic information of the data, thereby providing more accurate and efficient search results. + +### 1.2 Scenarios + +* **RAG** + Retrieval-augmented generation (RAG) is an AI framework that retrieves facts from external knowledge bases to provide the most accurate and latest information for large language models (LLMs). It not only boosts the quality of model-generated content but also deepens users' understanding of the generation process. The RAG technology is often used in conjunction with retrieval and generation techniques in intelligent Q&A systems and knowledge bases to accelerate information retrieval and processing efficiency. +* **Personalized recommendation** + The recommendation system can recommend content that users may be interested in based on their historical behavior and preferences. After receiving a recommendation request, the system calculates the similarity based on the characteristics of the user, and then returns items that the user may be interested in as the recommendation results. This technology is commonly used in recommendations for restaurants and tourist attractions to precisely meet user needs. +* **Image/Text search** + An image/text search task aims to find results in a large-scale image/text database that are most similar to the specified image or text. By storing image or text characteristics in a vector database and using efficient indexing techniques for similarity calculation, the system can quickly return the matching results. This technology applies to scenarios such as facial recognition, delivering an accurate and efficient search experience. + +## 2. Core Features of Vector Search + +OceanBase Database can store, index, and retrieve vector data. The following table describes the core features. + +| **Core feature** | **Description** | +|---------------------|--------------------------------------------------------------------| +| Vector data type | You can store floating-point vectors with up to 16,000 dimensions. | +| Vector indexes | Exact search and approximate nearest neighbor search (ANNS) are supported. You can calculate the L2 distance, inner product distance, and cosine distance. HNSW indexes are supported. An index column can contain up to 2,000 dimensions. | +| Operators for vector search | Basic operators, such as addition, subtraction, multiplication, comparison, and aggregation operators, are supported.| + +Observe the following limitations: + +* By default, OceanBase Database uses the nullsFirst comparison mode, in which NULL values are placed first during sorting. We recommend that you add a `NOT NULL` condition in queries. +* You cannot define both a vector index and a full-text index on the same table. + +## 3. Architecture of an Image Search Application + +An image search application stores images as vectors in a database. When you upload an image through the corresponding user interface (UI), the application converts the image into a vector, searches the database for similar vectors, and displays the similar vectors as images on the UI. + +![image-20241130234246523](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205573.png) + +## 4. Procedure + +### 4.1 Use Docker to deploy OceanBase Database + +Install and start Docker. +``` + root@oceanbase:~# apt-get install docker-ce + Reading package lists... Done + Building dependency tree... Done + Reading state information... Done + docker-ce is already the newest version (5:27.3.1-1~ubuntu.24.04~noble). + 0 upgraded, 0 newly installed, 0 to remove and 26 not upgraded. + + root@oceanbase:~# systemctl start docker && systemctl enable docker + Synchronizing state of docker.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. + Executing: /usr/lib/systemd/systemd-sysv-install enable docker + root@oceanbase:~# systemctl status docker +``` + +Run the following command to start the Docker container of OceanBase Database for installation: +``` + docker run --name=ob433 -e MODE=mini -e OB_MEMORY_LIMIT=8G -e OB_DATAFILE_SIZE=10G -e OB_CLUSTER_NAME=ailab2024 -e OB_SERVER_IP=127.0.0.1 -p 127.0.0.1:2881:2881 -d quay.io/oceanbase/oceanbase-ce:4.3.3.1-101000012024102216 +``` + +Run the following command to check whether the boot process of OceanBase Database is completed: +``` + docker logs -f ob433 +``` + +> It takes 2–3 minutes for initialization. Once "boot success!" appears, the boot process is completed and you can press **Ctrl+C** to exit the log view. + +![image-20241202233404049](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205630.png) + +### 4.2 Test the connectivity of OceanBase Database + +After installing OceanBase Database by using a Docker container, run the following command to test the database connectivity: +``` + root@oceanbase:~/image-search# mysql -h127.0.0.1 -P2881 -uroot@test -A -p + Enter password: + Welcome to the MySQL monitor. Commands end with ; or \g. + Your MySQL connection id is 3221487647 + Server version: 5.7.25 OceanBase_CE 4.3.3.1 (r101000012024102216-2df04a2a7a203b498f23e1904d4b7a000457ce43) (Built Oct 22 2024 17:42:50) + + Copyright (c) 2000, 2024, Oracle and/or its affiliates. + + Oracle is a registered trademark of Oracle Corporation and/or its + affiliates. Other names may be trademarks of their respective + owners. + + Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + + mysql> +``` + +### 4.3 Enable vector search in OceanBase Database + +> Before using vector indexes, estimate the memory usage based on the index data of a tenant and configure an upper limit. For example, the following command sets the maximum vector index memory to 30% of the tenant's total memory: +``` + mysql> ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30; + Query OK, 0 rows affected (0.01 sec) +``` + +The default value of `ob_vector_memory_limit_percentage` is `0`, which means no memory is allocated for vector indexes. In this case, an error occurs when you create an index. + +### 4.4 Clone the project code repository to your local server + +``` + git clone https://gitee.com/oceanbase-devhub/image-search.git + cd image-search +``` + +![image-20241202233532784](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205401.png) + +### 4.5 Install dependencies +``` + poetry install +``` + +If the following output is displayed, all dependencies have been installed: +``` + root@oceanbase:~/image-search# poetry install --no-root + Installing dependencies from lock file + + No dependencies to install or update +``` + +### 4.6 Set environment variables + +``` + # Run this command in the /image-search path. + $ cp .env.example .env + # Update the database information in the .env file. + vim .env +``` + +Update the following parameters in the `.env` file as needed. For other parameters, keep their default values. +``` + HF_ENDPOINT=https://hf-mirror.com + + DB_HOST="127.0.0.1" ## The IP address of the tenant + DB_PORT="2881" ## The port number + DB_USER="root@test" ## The username of the tenant + DB_NAME="test" ## The database name + DB_PASSWORD="" ## The password corresponding to the username of the tenant +``` + +### 4.7 Upload your image dataset to the server + +> Upload your image dataset to a specific server directory and take note of its absolute path. In this case, the absolute path is `/home/ubuntu/zebra/`. + +![image-20241203211702633](/img/blogs/tech/image-search-vector-search/image/202412032205613.png) + +> You can also use the command-line window to check whether the image dataset is successfully uploaded. + +![image-20241203212152736](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205552.png) + +### 4.8 Start the image search application + + +> Run the following command to start the image search application: + + poetry run streamlit run --server.runOnSave false image_search_ui.py + +![image-20241203000401007](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205658.png) + +> If the application is successfully started, the output shown in the following figure is displayed. + +![image-20241203213937547](https://cnlog-img-xybdiy.oss-cn-shanghai.aliyuncs.com/img/202412032205556.png) + +> Use one of the URLs to access the UI of the image search application. + +![image-20241203214220422](/img/blogs/tech/image-search-vector-search/image/202412032205459.png) + +### 4.9 Open the UI of the image search application + + +> In **Image Base** under **Loading Setting**, enter the absolute path of the directory where the images are stored on the server. Click **Load Images**. Once the images are loaded, you can perform image search operations. + +![image-20241203212247459](/img/blogs/tech/image-search-vector-search/image/202412032205465.png) + +> Wait for the images to be loaded. + +![image-20241203151029796](/img/blogs/tech/image-search-vector-search/image/202412032205078.png) + +> If the content shown in the following figure appears, all images are successfully loaded. + +![image-20241203154156967](/img/blogs/tech/image-search-vector-search/image/202412032205524.png) + +## 5 Test Image Search + + +> Click **Browse files** and choose the image of a zebra that you have prepared in advance. + +![image-20241203212546208](/img/blogs/tech/image-search-vector-search/image/202412032206402.png) + +> After you upload the image, all similar images in the image dataset are displayed, including their distances and paths. + +![image-20241203212712128](/img/blogs/tech/image-search-vector-search/image/202412032206075.png) + +Now we have built an image search application based on the vector search capability of OceanBase Database. + +## 6 References + + +[Build an image search application with OceanBase Database](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001970972) \ No newline at end of file diff --git a/docs/blogs/tech/legacy-monitoring-system.md b/docs/blogs/tech/legacy-monitoring-system.md new file mode 100644 index 000000000..e24a58764 --- /dev/null +++ b/docs/blogs/tech/legacy-monitoring-system.md @@ -0,0 +1,263 @@ +--- +slug: legacy-monitoring-system +title: 'Using OceanBase Database to Store Data of a traditional Monitoring System' +--- + + +I work for a leading new energy company headquartered in Ningbo, specializing in the development, manufacturing, and sales of photovoltaic products. + +As a company with billions in revenue, our monitoring system is one of the most critical IT management tools, playing a vital role in ensuring business continuity and risk forecasting. In 2022, we chose Zabbix as our monitoring system to track metrics for servers, operating systems, middleware, databases, and network equipment deployed around the globe. It also provides early warnings for our business systems, ensuring accurate alerting for anomalies. Additionally, it provides metrics for IT facility inspections and event sourcing, enabling IT managers to quickly access historical data for various system components. + +Architecture of Our Monitoring System +-------- + +We chose Zabbix because it's open-source, and has a stable architecture and the capabilities to "monitor everything". It's a great fit for our company, which primarily relies on conventional architectures with minimal cloud-native infrastructure. Plus, our IT team already had experience with Zabbix, so the learning curve was low. + +When I joined the company, I was tasked with continuously optimizing and improving the newly implemented Zabbix monitoring system to enhance its timeliness and accuracy. However, since the monitoring system worked with a MySQL 8.0 database, it soon ran into issues due to limitations of the MySQL architecture. + +Pain Points of Our MySQL-Based Monitoring Architecture +---------------- + +First of all, high availability bottlenecks of different architectures we tried. + +* Master-slave architecture: + + * Without read-write splitting, the performance was no better than a standalone system with only one node. If the master node failed, we had to shut it down for a failover and validate data consistency between master and slave nodes. I don't recommend this solution. + * We could implement read-write splitting in two ways: modifying the DAL code (not recommended because it would hinder feature iterations), or introducing middleware like ProxySQL (which added complexity and reduced reliability). + +* Dual-master architecture: + + * We adopted a single-write approach with Keepalived to provide virtual IP addresses for easy failover. + * A dual-write approach would require controlling row IDs to avoid primary key conflicts and data redundancy. It also required code modifications. I don't recommend this solution. + +* MySQL Group Replication (MGR)-based architecture: + + * While it offered strong consistency, it required middleware like ProxySQL for read-write splitting. In our tests, MGR was prone to an "avalanche effect," where the failure of one node could crash the entire cluster. + * The biggest vulnerability of a replication-based architecture is that it cannot retain the disk-consuming binlogs for long due to the high write volume of Zabbix. If replication breaks for too long, it is impossible to resynchronize master and slave nodes. + +Secondly, read-write conflict. Zabbix wrote a massive amount of data, which often led to read-write conflicts (optimistic locks) if O&M and business teams perform operations such as querying monitoring data, converting historical data to trend data, and comparing alerts during peak hours. Additionally, Zabbix's housekeeper service would periodically clear expired monitoring data, which could cause pessimistic locks, drastically reducing database performance. As the data volume grew, the conflict became more pronounced. + +Thirdly, capacity issues. Despite extensive optimizations to reduce the number of monitoring items and data retention period, we still needed to store a huge amount of data. After just over a year, Zabbix's database exceeded 1 TB, with the largest table holding over 700 million records. In addition to business data, binlogs of InnoDB also occupied significant storage space. + +Here are my thoughts on architecture optimization based on our specific business scenarios. + +Optimizing MySQL: A Temporary Fix +--------------- + +Among so many optimization cases, I'll talk about a typical one. + +Zabbix provided monitoring templates. The Linux monitoring template alone contained over 100 monitoring items. Given our 2,000 plus production servers, we got 200,000 monitoring items. If we collected data once every 5 minutes, the database would write 2.4 million records per hour to the `history` table. + +A 5-minute collection interval was just theoretical, as longer intervals would reduce data accuracy. The values of some monitoring items, like metrics for traffic, and CPU, memory, and I/O usage, were often collected once every 1 minute or so for higher accuracy. This would result in even more monitoring data. + +Additionally, Zabbix would extract a full hour of data of a monitoring item from the `history` and `history_unit` tables on an hourly basis, performing calculations to get its minimum, average, and maximum values, and then insert them into the `trends` and `trends_unit` tables. + +![1727347610](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1727347612203.png) + +This process involved large result sets, heavy calculation workload, and significant caching, often leading to issues like rapid creation of temporary tables, excessive disk I/O load, high swap usage, and large transactions. + +As the pessimistic locks mentioned earlier would cause historical data cleanup tasks to fail, more and more data would be built up. Table locks also made it impossible to use the dump method for backups, forcing the use of physical backups. However, since the `DELETE` statement was used to clear historical data, it left fragmented space in primary business tables, requiring defragmentation before creating backups, which was quite cumbersome. + +Highly frequent `INSERT` and `DELETE` operations also generated massive amounts of binlogs, taking up large storage space. Reducing the binlog retention period would cause fast synchronization point override. If the master node and a slave node were disconnected, you might need to restore the slave node from scratch. + +To address these issues, we optimized the following tables. + +| Table name | Description | Data type | +|-------------|----------------------------------|-----------------------| +| history | Stores the raw historical data. | Floating-point numbers| +| history_uint| Stores the raw historical data. | Unsigned numbers | +| history_str | Stores the raw short strings. | Characters | +| history_text| Stores the raw long strings. | Text | +| history_log | Stores the raw log strings. | Logs | +| trends | Stores the hourly statistics. | Floating-point numbers| +| trends_uint | Stores the hourly statistics. | Unsigned numbers | +| auditlog | Stores the audit logs. | Logs | + +Zabbix stores historical data in tables with a name starting with `history`, and trend data in those with a name starting with `trends`. + +* History tables store different types of raw data of all monitoring items collected from clients, and have similar schemas. If the data of a monitoring item is collected once a minute, 86,400 records are generated for this monitoring item on a daily basis. + +* The `history` table stores numeric data in the following columns: `itemid` of the BIGINT(20) type, `clock` of the INT(11) type, `value` of the BIGINT(20) type, and `ns` of the INT(11) type. Each record consumes approximately 24 bytes (8 + 4 + 8 + 4), resulting in about 2 MB of data per day. +* The `history_str` and `history_text` tables store string and text data in the `value` column of the VARCHAR(255) type in utf8mb4_bin and `value` column of the TEXT type in utf8mb4_bin, respectively. A VARCHAR(255) column in utf8mb4 can store up to 1,020 bytes, while a TEXT column can store up to 65,535 bytes. If, in extreme cases, the data of a text- or string-type item is collected once a minute, approximately 85 MB or 5 GB of data can be generated each day. +* The size of historical data depends on factors like the number of monitoring items, retention periods, data types, and collection intervals. + +* Trends tables store aggregated hourly data, including the minimum, average, and maximum values for each monitoring item. Essentially, trends tables are compressed versions of history tables, reducing resource demands. + +* Trends tables only use the data of numeric history tables. The `history_str`, `history_log`, and `history_text` tables do not have corresponding trends tables. +* The data of trends tables are converted from history tables by Zabbix's housekeeper service, which is similar to a scheduled task. The conversion process inevitably compromises the database performance. + +Starting with Indexes +----- + +The `history` table stores the supra-second part of the timestamp of a monitoring item in the `clock` column (in seconds), and the sub-second part in the `ns` column (in nanoseconds). + +The sub-second part is usually not a concern. Zabbix often uses both columns in its internal statistical queries. However, the `ns` column is not indexed, leading to full table scans. To address this issue, we optimized the `history` table like this: + + CREATE TABLE `history_old` ( + `itemid` bigint(20) unsigned NOT NULL, + `clock` int(11) NOT NULL DEFAULT '0', + `value` double NOT NULL DEFAULT '0', + `ns` int(11) NOT NULL DEFAULT '0', + KEY `history_1` (`itemid`, `clock`) BLOCK_SIZE 16384 LOCAL + ) DEFAULT CHARSET = utf8mb4 + + CREATE TABLE `history` ( + `itemid` bigint(20) unsigned NOT NULL, + `clock` int(11) NOT NULL DEFAULT '0', + `value` double NOT NULL DEFAULT '0', + `ns` int(11) NOT NULL DEFAULT '0', + PRIMARY KEY (`itemid`, `clock`, `ns`) + ) DEFAULT CHARSET = utf8mb4 + + +> Other history tables were optimized similarly. + +Table partitioning +--- + +We partitioned history and trends tables, creating and dropping partitions periodically. This approach posed some challenges: + +First, given the huge table size, directly operating on tables would be risky and inefficient. So, we would create a table and insert data from the source table using the `insert /*+ ENABLE_PARALLEL_DML PARALLEL(2) */ … select …` statement, which was made possible by the [parallel DML](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001973796) feature of OceanBase Database. While this worked for smaller tables, it failed for larger, busier tables. + +Second, using the dump method for backups required significant downtime. + +Third, when using DataX for data synchronization, it was difficult to determine the last synchronization point if the process was interrupted, making it hard to resume synchronization. + +MySQL Parameter Optimization +--------- + +The most frequent operations in Zabbix were writing new data to the database, converting history data to trends, deleting expired data, and querying monitoring data or generating reports (using Grafana). These operations—especially queries involving sorting, aggregation, and calculations, as well as large INSERT and DELETE transactions—put immense pressure on MySQL, prolonging the processing time. The simplest way to speed up processing was to adjust the following MySQL parameters to keep as much data in memory as possible: `innodb_buffer_pool_size`, `query_cache_size`, `tmp_table_size`, `innodb_log_buffer_size`, `sort_buffer_size`, `read_buffer_size`, `join_buffer_size`, and `binlog_cache_size`. + +However, we just couldn't endlessly add more memory once it hits the physical limits, and large transactions would excessively consume disk I/O resources when, for example, dirty pages were flushed to a disk. Besides, scaling up hardware would involve downtime, causing additional costs. + +My point is, performance bottlenecks are inevitable as long as we keep using those MySQL and OS optimization and scaling methods. We need something new. + +Seeking a New Solution +------ + +We needed a new data storage solution that could not only address our architectural bottlenecks and business pain points, but meet three requirements: + +1\.     The solution must support the syntax, functions, expressions, data types, table partitioning methods, character sets, and collations of MySQL. + +2\.     It provides hybrid transaction and analytical processing (HTAP) capabilities, and has a data engine that efficiently handles both transaction processing (TP) and analytical processing (AP) workloads in a relational architecture. + +3\.     It provides high availability and multi-active capabilities without relying on additional technologies. + +Zabbix Server supports both **MySQL** and **PostgreSQL**. However, on the one hand, our team lacked expertise in **PostgreSQL**, and on the other hand, we had already developed numerous applications and reports based on MySQL databases. Switching the database system to **PostgreSQL** would require redevelopment. Therefore, we gave up **PostgreSQL** and narrowed our selection down to MySQL-compatible databases. + +I mentioned earlier that we also deployed the InnoDB engine. So, we considered TokuDB, an optimized version of InnoDB. + +We were also interested in emerging databases like **TiDB, Huawei GaussDB, openGauss, OceanBase Database, and Dameng Database**. + +* **TiDB** was ruled out due to its lack of support for MySQL stored procedures and foreign keys. +* **openGauss**, based on **PostgreSQL**, was a secondary option. +* **Huawei GaussDB** required paid licenses. +* **Dameng Database** also required paid licenses. + +OceanBase Database passed our first screening for its MySQL compatibility, distributed architecture, and open-source. + +Based on compatibility considerations, **TokuDB** and **OceanBase Database** were shortlisted for our new solution. + +| Comparison item | TokuDB | OceanBase Database | +|----------------------------- |--------------|--------------------| +|Deployment | Easy | Easy | +|Disaster recovery architecture | Master-slave | Distributed | +| Data compression ratio | Medium | High | +| Support for domestic products | No | Yes | +|Performance |Data write performance was optimized, lacking evidence for data read performance. | To be tested | + +The data read performance of TokuDB didn't meet our expectations. OceanBase Database, on the other hand, stood out with its support for HTAP, multi-active high availability, and a vibrant community with plenty of resources to help us get started. So, we decided on OceanBase Database. + +Deploying OceanBase Database +----------- + +Following the official documentation, we began the deployment process. Along the way, we encountered a few minor hiccups, which were resolved with the help of the official documentation and some advice from the community. I'll share the details in a future post for your reference. + +Overall, the migration from MySQL to OceanBase Database was surprisingly smooth. Since we deployed OceanBase Database Community Edition, we couldn't leverage the OceanBase Migration Assessment (OMA) tool to evaluate the migration process beforehand. Instead, we relied on experience and some helpful tips from the community. We performed the following checks before the migration: + +* Character set check. OceanBase Database doesn't support all MySQL character sets or collations, so a pre-migration evaluation is crucial. +* OceanBase Database doesn't have an event scheduler. If your original database uses events, you'll need to find alternative solutions (more on this later). +* Unlike some other MySQL-compatible databases, OceanBase Database is case-insensitive by default. Make sure to check the `lower_case_table_names` setting in your original database. +* If you plan to enable reverse synchronization, ensure that the necessary accounts are created and authorized in the original database, and that the `omstxndb` database is ready. +* If your database has foreign key constraints, remember to disable them before migration using the **SET FOREIGN_KEY_CHECKS=0;** statement. + +![1727348341](/img/blogs/tech/legacy-monitoring-system/image/1727348343480.png) + +For more details on these steps, refer to the [official documentation](https://en.oceanbase.com/docs/community-oms-en-10000000001836496). One particular challenge we faced was with Zabbix's history and trends tables, which should be partitioned. Partitions should be managed by scheduled tasks; otherwise the workload would be overwhelming for administrators. In MySQL, we automated this using events and stored procedures. However, as I said earlier, OceanBase Database doesn't support events. Would it be a dead end? No, God has opened a window, and handed in OceanBase Developer Center (ODC), an open-source, enterprise-grade database collaboration platform. ODC's partitioning plan module served as a more advanced alternative to the combination of events and stored procedures in MySQL. I've posted [an article on this](https://open.oceanbase.com/blog/12521093139) if you're interested in the details. + +A New Chapter with OceanBase Database +----- + +Six months into running OceanBase Database with Zabbix, we've achieved the same performance with significantly fewer hardware resources, compared to our previous MySQL-based architecture. + +![1727348390](/img/blogs/tech/legacy-monitoring-system/image/1727348392419.png) + +Our developers and system administrators were not against the migration given the fact that OceanBase Database is MySQL-compatible. The high reliability and HTAP capabilities of OceanBase Database also made it easier to communicate the migration plan with the operations team. After the migration, the performance improvements and an 80% reduction in storage space left the business team thoroughly impressed. + +The performance boost is immediately noticeable. Queries that used to take at least 4 seconds to render historical data (spanning weeks or months) now load almost instantly. Additionally, the frequency of performance alerts from Zabbix has dropped significantly since the migration. + +In short, OceanBase Database has met our expectations, and we're continually exploring its new features. Here's a quick summary of how OceanBase Database aligns with our business needs: + +* Parallel processing: For operations involving a large amount of data, we can enable parallel processing to significantly boost efficiency. +* Hot scaling: Thanks to OceanBase Cloud Platform (OCP), we can scale CPU, memory, and other database resources, and modify zones and parameters while the database is running, which minimizes the downtime. +* High compression ratio: When importing data to our Zabbix database, we noticed that the size was compressed from 1.2 TB in MySQL to just 260 GB in OceanBase Database—an 80% reduction in storage space. +* Inherent scalability and distributed capabilities: OceanBase supports both horizontal scaling by adding zones and vertical scaling by increasing resource specifications, with data synchronization handled automatically in the backend. This provides robust data redundancy. We tested an MGR-based architecture, which, while ensuring consistency, was a nightmare to maintain. Any node failure risked bringing down the entire cluster. +* Slow query isolation: Slow queries are placed in a separate queue, so that they don't interfere with smaller, faster queries or cause congestion. + +OceanBase Database provides a suite of tools that has made database management more automated and convenient. + +OCP has been a lifesaver, offering a range of practical features: + +* Cloud-native and multitenancy: OCP allows unified management of databases, simplifying their creation and O&M. The multitenancy feature ensures resource isolation between tenants. +* SQL diagnostics: Integrated SQL diagnostics make it easy to monitor top, slow, and parallel SQL statements. We can quickly diagnose and optimize SQL execution on a GUI-based interface. + +![1727348486](/img/blogs/tech/legacy-monitoring-system/image/1727348487990.png) + +* Backup: OCP natively supports physical and log backups. + +![1727348516](/img/blogs/tech/legacy-monitoring-system/image/1727348518029.png) + +OceanBase Migration Service (OMS) Community Edition supports real-time data migration between OceanBase Database Community Edition and different types of data sources like MySQL, PostgreSQL, TiDB, Kafka, and RocketMQ. It also supports data migration between MySQL tenants of OceanBase Database Community Edition. Key features of OMS are as follows: + +* Streamlines the migration process with automatic operations. +* Supports schema synchronization, data synchronization, incremental synchronization, and reverse synchronization in one task, reducing the workload of O&M engineers. +* Allows synchronization of specific tables and table data based on rules. + +ODC is packed with handy features. For example: + +* Provides an integrated SQL development environment, which supports SQL auditing and double confirmation for high-risk operations. +* Supports workflows and allows data source access authorization, enhancing collaboration efficiency. +* Supports integration with Active Directory of your enterprise for secure account management. ODC also doubles as a database access management platform, much like a bastion host. The best part? It's free! +* Works with OceanBase Database in MySQL mode, MySQL databases, and Oracle databases, making it a great cost-effective tool for database development. +* Supports scheduled tasks: + * SQL scheduling: Executes SQL statements as scheduled, similar to MySQL event scheduler. + * Partitioning plans: Supports partition creation and dropping. For more information, see [this article](https://open.oceanbase.com/blog/12521093139). + * Data archiving: Migrates historical data from one table or database to another based on predefined rules. + * Data cleanup: Schedules data cleanup tasks based on specific rules. +* Tracks the status of all tasks, facilitating O&M. +* Provides all these features out of the box, saving time and effort. + +Looking Ahead +---- + +Given features of OceanBase Database and our business scenarios, we're planning to migrate more systems to OceanBase Database. Currently, we're working on migrating our production device data collection system and reporting system. + +Similar to Zabbix, this data collection system has the following characteristics: + +* Large data amount. Production device data is collected once a few seconds, leading to high concurrency and a tremendous data size. +* Time series of data. We designed some wide tables with time series columns, requiring complex queries to transform them into long tables. We've used window functions to compare collected results within the time series. In OceanBase Database of a later version, we can do that by making use of the columnar storage. +* Regular archiving and cleanup. This can be handled using the data archiving and cleanup features of ODC. You can enable parallel execution of cleanup tasks to speed things up. +* Table partitioning. Large tables are partitioned for easier management. + +The reporting system has the following characteristics: + +* It requires HTAP capabilities. OceanBase Database puts slow queries in a separate queue, minimizing the impact of large transactions and queries on the execution of smaller, faster ones. + +While OceanBase Database has been a game-changer for us, it's not flawless. Execution plans can be difficult to read and sometimes get performance jitters or deteriorate, leading to unstable SQL execution. System logs can be hard to understand. Additionally, the support for Chinese language can be better. In our obdumper test, the backup of tables with names in Chinese failed. The community is working on a fix, and we temporarily resolved it by rolling the database back to an older version. + +Summary +-- + +OceanBase Database is a powerful database platform. From trial to production, we've encountered both challenges and surprises. When issues arise, we will first try to figure out a solution on our own based on the documentation. If that doesn't work, we can always reach out to the OceanBase community, which has been incredibly responsive, offering timely and accurate support—much better than other open-source communities. + +As OceanBase Database continues to evolve, we're excited to explore new features like columnar storage and materialized views in V4.3. The journey may be long, but with OceanBase, we'll eventually reach our destination. \ No newline at end of file diff --git a/docs/blogs/tech/native-distributed.md b/docs/blogs/tech/native-distributed.md new file mode 100644 index 000000000..7a850a979 --- /dev/null +++ b/docs/blogs/tech/native-distributed.md @@ -0,0 +1,181 @@ +--- +slug: native-distributed +title: 'Treat the Symptoms and the Root Cause! OceanBase Native Distributed + Single-machine Distributed Integration Solves the Problem of Sharding from the Root' +--- + +As enterprise data volumes grow, traditional standalone centralized databases struggle to meet increasing demands. Many enterprises turn to MySQL sharding as a solution for scaling data storage and processing. However, while sharding addresses immediate capacity needs for processing massive data, it introduces significant complexities. + +Despite decades of database advancements and the rapid rise of distributed database technologies, the debate between sharding and distributed systems persists, with the latter not yet fully replacing the former. Many enterprises believe that sharding their existing MySQL databases allows them to leverage their current expertise and address immediate scaling needs caused by business growth. Given that sharding can handle current workloads, these enterprises question the necessity of migrating to a distributed database architecture. + + + +**1. Sharding: A Short-term Fix for Growing Pains** +----------------------- + +"Premature optimization is the root of all evil." This adage, coined by computer scientist Donald Knuth in his seminal work, The Art of Computer Programming, is a widely held principle among programmers. Knuth argued that focusing excessively on efficiency and performance optimization too early in the development process is often counterproductive, leading to wasted resources. He advocated prioritizing the actual business requirements during programming and delaying optimization efforts until necessary. Database sharding might appear to embody this principle, offering a seemingly natural progression by incrementally adapting existing systems to accommodate growing data volumes, thereby avoiding a complete architectural overhaul. + +Indeed, when sharding adequately addresses immediate scaling needs, a full-fledged distributed system might seem excessive. + +However, this approach can ultimately prove detrimental. **While sharding ostensibly arises from the need for distributed management of massive data as business grows, its underlying motivation lies in the rapidly evolving demands of a growing business. Ultimately, the core requirement is a flexible infrastructure capable of supporting continuous business development and iteration.** + +Sharding, while designed to address rapid business growth, often hinders the flexibility of upper-layer business iterations. This inflexibility presents two key challenges: + +-  Sharding introduces tight coupling with the business logic. Changes to the underlying database schema, such as scaling or restructuring, necessitate corresponding modifications to the business layer. This shifts the burden of database management onto the business layer, preventing enterprises from focusing on business development, and hindering rapid iteration. + +-  Sharded architectures introduce significant complexity, increasing O&M overhead. Evolving business requirements frequently demand re-evaluation and redesign of the sharding strategy to maintain performance. However, business requirements vary in different growth stages of an enterprise. The constant adaptation becomes unsustainable in the long run, preventing a consistent and scalable approach to data management throughout the business lifecycle. + +In the initial stages of an enterprise, when data volume is low, a single, modestly-specced MySQL server suffices. As the business grows, scaling up the server or sharding the database becomes necessary. However, further expansion brings new challenges: demands for greater scalability, the need for high-availability solutions like primary/standby disaster recovery for critical services, and the ability to efficiently scale down resources as business needs contract. Each growth phase necessitates significant system overhauls, making sharding a mere stopgap. **Sharding neither addresses the fundamental need for high-performance distributed transactions nor provides the flexibility required to adapt to the evolving demands of a dynamic business environment.** + +In a blog post, Randall Hyde, author of "The Art of Assembly Language" and a seasoned software expert, clarifies that "no premature optimization" shouldn't be misinterpreted as "no optimization." He emphasizes that Donald Knuth's original intent was to prioritize addressing systemic, macro-level performance bottlenecks over getting bogged down in micro-optimizations. Hyde further contends that performance considerations should be integral to the initial design phase. Experienced developers instinctively anticipate potential performance pitfalls, while less experienced developers often overlook this crucial aspect, mistakenly believing that minor adjustments later on can remedy any performance deficiencies. + +So, what kind of database solution can fundamentally address the challenges of sharding at the outset of system design? + +OceanBase Database provides a solution by combining native distributed capabilities with an integrated architecture that supports both standalone and distributed modes. By encapsulating all distributed functionality within the database itself, OceanBase Database allows enterprises to rapidly iterate without needing to manage the underlying complexity. This approach also caters to the evolving needs of enterprises throughout their lifecycle, eliminating the need for frequent and disruptive system overhauls. OceanBase Database empowers enterprises to focus on expanding their core business, offering a database solution that scales seamlessly alongside their growth. + + + +**2. Addressing the Symptoms: A Native Distributed Architecture Simplifies Underlying Complexity and Enables Elastic Scaling for Rapid Business Iteration** +------------------------------------- + +Sharding solutions present several drawbacks. First, managing cross-shard transactions is complex, often compromising data integrity and consistency. Second, database performance is highly dependent on the chosen sharding strategy, leading to increased architectural complexity and O&M overhead. Poorly designed sharding strategies can create hotspots, concentrating read/write operations on specific shards and significantly impacting overall performance. + +Third, cross-shard queries, data aggregation, and report generation introduce significant complexity and performance overhead, often necessitating specialized optimization techniques and middleware. Capacity planning, scaling, and data migration require careful redesign and pose substantial challenges to system stability and data integrity. Finally, sharding often requires application code to directly or indirectly be aware of the underlying sharding logic, increasing coupling between the application and the database, and hindering maintainability and scalability. + +These shortcomings stem from the fact that sharded databases, even with middleware, are fundamentally centralized systems adapted for distributed use. Their core architecture isn't designed for distributed transactions and struggles to meet the demands of a truly distributed environment. + +**In contrast, OceanBase Database's native distributed architecture avoids the middleware-based approach to distributed transactions common in sharded systems. Instead, it incorporates distributed principles at its core, from system architecture design to distributed transaction implementation, creating a database truly built for distributed environments.** + +OceanBase Database utilizes a partitioned table architecture to achieve horizontal scalability and data management within its native distributed architecture. This fundamentally addresses the complexities associated with traditional sharding approaches, which heavily rely on middleware and intricate partitioning strategies. OceanBase Database's underlying architecture enables data to be distributed across multiple compute nodes (OBServer nodes). Replicas of a partition can reside in different zones, with the Paxos protocol ensuring cross-node data consistency and guaranteeing atomicity, consistency, isolation, and durability (ACID) properties for distributed transactions among replicas. This architecture also supports multi-replica and cross-IDC disaster recovery, providing financial-grade high availability. Furthermore, OceanBase Database simplifies horizontal scaling and capacity adjustments through its internal partitioning mechanism, eliminating the need for users to design and maintain complex sharding strategies, thereby reducing the overall complexity of distributed database design and O&M. + +OceanBase Database employs a routing mechanism that shields applications from the underlying logic. Applications interact with the distributed data as if it were a standalone database, without needing to know the physical location of the data. When a client initiates a SQL query, OceanBase Database Proxy (ODP) uses the partitioning key to determine the partition where the requested data resides, and routes the query to an appropriate OBServer node for execution. OceanBase Database transparently handles distributed query execution and transactions, shielding applications from the complexities of sharding. This transparent routing enables applications to remain agnostic to the underlying data distribution. Furthermore, OceanBase Database supports online scaling and data migration, providing flexibility for evolving business needs. + +![1726743792](/img/blogs/tech/native-distributed/image/1726743793097.png) + +Figure 1: Comparison of a sharding solution with OceanBase Database's native distributed architecture + + + +OceanBase Database's native distributed architecture offers several advantages over sharding solutions, resulting in lower costs and improved performance. Its shared-nothing architecture utilizes commodity hardware, eliminating the need for expensive, high-end storage and proprietary licensing fees often associated with sharded deployments. This significantly reduces both hardware and software costs while maximizing resource utilization. Furthermore, OceanBase Database's distributed query optimizer intelligently schedules and executes plans in distributed environments, minimizing cross-node data transfer and processing, thus ensuring high query performance. Features like table partitioning and local indexes further enhance query efficiency and reduce latency. + + + +**3. Addressing the Root Cause: An Integrated Architecture for Standalone and Distributed Modes Supports the Entire Lifecycle of Enterprises** +------------------------------ + +A typical enterprise's database evolution often follows a path driven by increasing scale: starting with small-footprint MySQL deployments for small-scale business, progressing to larger-scale Oracle instances for medium-scale business, then to Oracle RAC for improved scalability in large-scale business, and finally potentially to DB2 on midrange servers for core business. + +![1726743852](/img/blogs/tech/native-distributed/image/1726743852693.png) + +Figure 2: Database selection strategies based on business scale + +This traditional path suffers from several key limitations. First, scalability is inherently constrained, preventing seamless adaptation to fluctuating business demands. Second, each upgrade introduces significant costs and complexity due to extensive hardware and software replacements. Third, this upgrade path is largely irreversible, making it difficult to adapt to unforeseen business changes and potentially leading to substantial sunk costs. + +With the release of OceanBase Database V4.0 in 2022 and the introduction of its "integrated architecture for standalone and distributed modes," OceanBase Database began offering a single solution to address the evolving database needs of enterprises across different growth stages and scales. This allows the database to scale seamlessly alongside the business, fulfilling the enterprise need for a database solution that supports their entire lifecycle. + +OceanBase Database's integrated architecture for standalone and distributed modes offers two key deployment advantages. First, a single OBServer node can be deployed in either standalone or distributed mode. Second, within a distributed OceanBase cluster, individual tenants can also be deployed in standalone mode. Furthermore, both tenants and the overall cluster can flexibly switch between standalone and distributed deployments as needed. + +![1726743883](/img/blogs/tech/native-distributed/image/1726743883873.png) + +Figure 3: OceanBase Database's integrated architecture for standalone and distributed modes + +As shown in the preceding figure, OceanBase Database's integrated architecture for standalone and distributed modes allows enterprises to flexibly adjust their database deployment at any stage of growth, choosing the model that best suits their current needs. + +Initially, enterprises can deploy OceanBase Database on a smaller server. As data grows, they can scale vertically by migrating to a larger server. OceanBase Database supports high availability and disaster recovery through primary/standby and three-replica deployments. For massive data growth, enterprises can seamlessly scale horizontally by expanding the cluster. + +While "integrated architecture for standalone and distributed modes" sounds simple, and handling complex distributed transactions might imply ease of standalone deployment, the reality is more nuanced. OceanBase Database's extensive experience in distributed transactional processing (TP), culminating in breaking the TPC-C world record, demonstrates its prowess in distributed transactions. The challenge of an integrated architecture for standalone and distributed modes lies in seamlessly integrating standalone and distributed modes: + +**(1) How can database performance in small-scale and standalone deployments be optimized to match that of a dedicated standalone database?** + +Distributed systems, due to the large overhead inherent in ensuring atomicity and durability for distributed transactions in log stream design, often underperform standalone centralized databases when handling standalone or small-scale transactions. This performance gap can deter adoption, leading some enterprises to opt for costly vertical scaling of existing hardware rather than migrating to a distributed architecture. + +Specifically, log streams are fundamental to ensuring atomicity and durability in database transactions. In distributed databases, protocols like two-phase commit (2PC) are employed to achieve atomicity based on log streams, while consensus algorithms like Paxos guarantee durability. These mechanisms introduce larger overhead compared to standalone databases. Furthermore, multi-point writing in a distributed database results in the generation of multiple log streams. The number of log streams affects the number of participants in 2PC and Paxos consensus, thereby impacting the overhead of distributed transactions. + +In typical distributed systems, the number of log streams corresponds to the number of data shards. Larger datasets require more shards, leading to increased overhead and performance degradation for distributed transactions. To achieve standalone performance comparable to traditional standalone databases, reducing the number of log streams is crucial. + +OceanBase Database addresses this by tying the number of log streams to the number of nodes. In a standalone deployment, OceanBase Database functions much like a traditional standalone database with a single log stream. During distributed scaling, the number of log streams scales with the number of nodes, significantly mitigating the performance penalty of distribution. + +This approach allows OceanBase Database to achieve performance comparable to standalone databases in both standalone and small-scale distributed deployments. + +![1726744014](/img/blogs/tech/native-distributed/image/1726744014446.png) + +Figure 4: Sysbench performance comparison (4C16G): + +OceanBase Database Community Edition V4.0 vs. RDS for MySQL 8.0 + +As shown in the preceding figure, in a 4C16G environment, Sysbench benchmarks show OceanBase Database V4.0 achieving twice the insert and update performance of MySQL 8.0. Performance in other tested operations is comparable. Furthermore, OceanBase Database demonstrates lower storage costs in standalone deployments. In a TPC-H 100GB benchmark, OceanBase Database V4.0's storage cost was only one-quarter that of MySQL. + +**(2) How can we seamlessly switch between standalone and distributed deployments without compromising performance?** + +Having addressed performance concerns in both standalone and distributed modes, the next challenge for an integrated architecture for standalone and distributed modes is ensuring seamless horizontal scalability without sacrificing performance. + +**Beyond the previously mentioned log stream optimization, which reduces overhead during horizontal scaling by adjusting the number of streams based on the number of nodes, OceanBase Database implements several methods for enhancing scaling performance, both manually and automatically.** + +To support scalability and multi-point writes, the distributed database OceanBase Database provides table partitioning, dividing a table's data across multiple partitions. Tables using the same partitioning method can be grouped into a table group. During load balancing, table groups ensure related data resides on the same server, as all tables within a table group are bound to the same log stream. This minimizes cross-server operations, significantly reducing data transfer overhead and improving performance in connection-intensive scenarios. In the ideal case, if all tables involved in a transaction belong to the same table group, the transaction becomes a standalone transaction, eliminating distributed transaction overhead. + +OceanBase Database also provides automated scheduling to enhance scaling performance. For instance, it automatically aggregates remote procedure calls (RPCs) and, through automatic load balancing, co-locates partitions involved in a distributed transaction to minimize distributed overhead. + +![1726744125](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1726744125429.png) + +Figure 5: tpmC values of OceanBase Database varying with different number of nodes in a TPC-C benchmark + +The preceding figure shows the tpmC value changes of an OceanBase cluster ranging from 3 to 1,500 nodes in a TPC-C benchmark. The results show that OceanBase cluster performance scales linearly with cluster size even with a 10% distributed transaction workload. + + + +**4. Combining Native Distributed Capabilities with an Integrated Architecture for Standalone and Distributed Modes Simplifies Modern Data Architecture Upgrades** +---------------------------------- + +OceanBase Database's native distributed capabilities and integrated architecture for standalone and distributed modes have been widely adopted to replace sharding solutions and empower enterprises to modernize their data infrastructure. + +**💡 Kwai: OceanBase Database Replaces MySQL Sharding to Handle Peak Traffic Loads on a 100+TB Cluster** + +Kwai is a short-video mobile application developed by Beijing Kuaishou Technology Co., Ltd. Launched in 2011 as "GIF Kuaishou," a GIF creation and sharing app, it transitioned to a short-video community in 2012 and rebranded as Kwai in 2014. Kuaishou Technology went public on the main board of the Hong Kong Stock Exchange in 2021. By the end of that year, Kwai boasted 308 million daily active users and 544 million monthly active users on average, making it a top short-video platform in China. + +Kwai initially relied on MySQL for its database solution. However, as order volumes and business data surged, the performance of its standalone centralized deployment became a bottleneck. Sharding was implemented as a temporary solution to address storage and performance challenges. As the business continued to grow, the number of its MySQL database shards proliferated, eventually exceeding 300. This significantly increased O&M costs and complexity, requiring continuous application modifications to adapt to the ever-increasing number of shards. Kwai recognized that sharding was merely a stopgap measure, not a long-term solution. They needed a database solution that could deliver the required performance while simplifying operations and O&M. + +After evaluating various distributed databases, Kwai ultimately selected OceanBase Database and deployed it in core business scenarios. + +Take the transaction verification scenario as an example. E-commerce platforms typically experience a stable daily traffic volume of 80,000 to 90,000 queries per second (QPS). However, during large-scale live streaming events, user traffic surges dramatically, increasing QPS by a factor of ten or even a hundred, reaching millions. Even with compression, the data volume can exceed 100 terabytes. Furthermore, during these live streams, the business is extremely sensitive to latency and system stability. Prior to implementing OceanBase Database, transaction verification relied on a MySQL sharded architecture. This involved splitting large tables into smaller shards and distributing read/write traffic across multiple MySQL instances. The inability of this sharded solution to guarantee cross-shard data consistency and transaction atomicity led to potential data inconsistencies, particularly in complex scenarios or during error conditions. This resulted in inaccurate transaction verification results, including missing refunds, incorrect deduction amounts, and ultimately, financial losses. + +With OceanBase Database implemented, upstream services continue to write directly to the MySQL cluster. Simultaneously, each write to an upstream MySQL shard is replicated in real time to OceanBase Database via binlog streaming. During transaction verification, queries against the upstream MySQL cluster trigger identical queries against OceanBase Database. The results from both databases are then compared to ensure the accuracy and consistency of order status across the entire financial system. + +The figure below illustrates the performance of online transaction verification. The upper-left chart shows the daily QPS, averaging around 90,000. The upper-right chart shows the query response time, generally remaining below 10 ms. The peak of 10,000 ms (10 seconds) occurs nightly during a full compaction, when a dedicated thread is started to delete a substantial volume of historical data. This elevated latency is acceptable to the business. The lower-left chart shows the daily transaction volume, averaging around 10,000 TPS. The lower-right chart shows the transaction response time, which ranges from 5 ms to 10 ms. OceanBase Database delivers the required response time, satisfying latency requirements, and maintains system stability. + +![1726744218](/img/blogs/tech/native-distributed/image/1726744218420.png) + +Figure 6: OceanBase Database performance for transaction verification at Kwai + +**Now, Kwai has deployed eight OceanBase clusters, managing over 800 TB of data across more than 200 servers, with the largest cluster exceeding 400 TB.** Leveraging OceanBase Database , Kwai has achieved flexible resource scaling, reduced data synchronization latency by 75%, significantly lowered storage costs (equivalent to the hardware costs of 50 servers), and achieved disaster recovery with a recovery point objective (RPO) of 0 and a recovery time objective (RTO) of under 8 seconds. A single OceanBase cluster can replace over 300 MySQL instances, dramatically reducing O&M costs. + + + +**💡 iFLYTEK: OceanBase Database's Flexible Scaling and Native Distributed Architecture Empower Rapid Business Iteration** + +iFLYTEK (SHE: 002230) is a renowned provider of intelligent speech and artificial intelligence solutions in the Asia-Pacific region. Since its inception, iFLYTEK has maintained a leading position in core technologies such as intelligent speech, natural language understanding, and computer vision. + +iFLYTEK previously relied on MySQL for its business database. In 2023, a critical new business application launched with initially low volume but subsequently experienced explosive growth, leading to a rapid increase in data volume and disk usage. This application also required multi-dimensional, real-time report analytics for business decision-making, quickly exceeding MySQL's capacity and highlighting the need for greater scalability. + +When evaluating database upgrade options, iFLYTEK compared sharding their existing MySQL deployment with adopting a native distributed database. While iFLYTEK had extensive experience with MySQL and a mature O&M infrastructure, sharding would require significant code changes to their applications and increase O&M overhead. Given the rapid iteration and frequent updates of the new business, including the creation and modification of large tables, coupled with the criticality of this phase, minimizing disruption was paramount. Continuing with MySQL and implementing sharding would have necessitated extensive modifications, adding considerable effort. + +After extensive evaluation, iFLYTEK selected OceanBase Database to upgrade its existing database infrastructure. Leveraging OceanBase Database's native distributed architecture, iFLYTEK benefited from its scalability, maintainability, hybrid transactional/analytical processing (HTAP) capabilities, high protocol compatibility, and rapid migration capabilities. This enabled iFLYTEK to effectively support the rapid iteration and deployment of its new business systems. + +iFLYTEK conducted tpmC performance tests in a production environment, comparing a three-node OceanBase cluster against both a standalone MySQL database and a sharded MySQL database. The test environment utilized SSDs, 96 CPU cores, and 384 GB of memory. Results showed that MySQL slightly outperformed the OceanBase cluster at concurrency levels below 64. However, the OceanBase cluster demonstrated a significant performance advantage beyond 128 concurrent connections. As concurrency increased further, the performance of the OceanBase cluster continued to scale, while MySQL performance peaked at the concurrency of 256. + +![1726744283](/img/blogs/tech/native-distributed/image/1726744284269.png) + +Figure 6: tpmC performance comparison in stress tests (96C384G): + +OceanBase Database vs. MySQL vs. sharded MySQL database + +Furthermore, the most time-consuming queries in the system were identified and compared between MySQL and OceanBase Database. The results showed that OceanBase Database outperformed MySQL by a factor of 7 to 40, depending on the complexity of the SQL queries. + +**Deployed at iFLYTEK in 2023, OceanBase Database has ensured stable operations while providing flexible scalability and HTAP capabilities. It has also reduced iFLYTEK's storage costs by 50%.** + + + +**5. Summary** +---------- + +While database sharding has been a popular solution for meeting enterprises' requirements for massive data storage and processing in the short term, it often introduces challenges such as maintaining distributed transaction consistency, decreased query performance, increased complexity, tight coupling with application logic, and limited scalability. Sharding addresses the immediate need for increased capacity but fails to provide a long-term solution for achieving high performance in distributed transactions or adapting to evolving business requirements in different development stages of enterprises. + +OceanBase Database addresses these challenges through its native distributed architecture, guaranteeing ACID properties for distributed transactions while providing seamless scalability and supporting rapid upgrades without business disruption. Its integrated architecture, combining the strengths of standalone and distributed modes, enables it to adapt to evolving business needs throughout an enterprise's lifecycle, eliminating the need for sharding and its associated complexities. \ No newline at end of file diff --git a/docs/blogs/tech/ob-db-transform.md b/docs/blogs/tech/ob-db-transform.md new file mode 100644 index 000000000..d0bc8de52 --- /dev/null +++ b/docs/blogs/tech/ob-db-transform.md @@ -0,0 +1,201 @@ +--- +slug: ob-db-transform +title: 'OceanBase Helps Enterprises Cope with the Challenges of Database Transformation in the Deep Waters' +--- + +On November 16, 2023, OceanBase Database held its 2023 annual product launch in Beijing and officially announced its commitment to an integrated database product strategy for critical business workloads. At the event, Yang Zhifeng, general manager of the product department at OceanBase Database, delivered a keynote speech on helping enterprises navigate the challenges in a critical phase of database transformation. + + +![1701397534](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2023-12/1701397513558.png) + + +**Here is the complete speech.** + +Hello, everyone. My presentation today sits at a crucial juncture. I'll be covering three main areas: the challenges enterprises face as digital transformation enters a critical phase, how OceanBase Database is helping enterprises address these challenges, and how our products are solving practical problems for our customers. I hope to share my perspective, from a product standpoint, on OceanBase Database's journey over the past few years in facilitating core system upgrades, and the work we've done to help our customers navigate this complex landscape and achieve greater success. + + + +**1. Challenges Faced by Enterprises in a Critical Phase of Digital Transformation** +--------------------- + +Let's begin by exploring the challenges enterprises face as core system upgrades enter a critical phase. I'll start by sharing some insights from the CCID Consulting report, "Core Database Upgrade and Selection Guide 2023". + +In the first half of 2023, CCID Consulting conducted in-depth interviews with 160 enterprises. The research revealed several key findings: Firstly, among domestically-produced database products deployed by Chinese enterprises, OceanBase Database ranked first. Secondly, regarding core system upgrades, the surveyed enterprises prioritized stability, compatibility, and code reliability. Based on these criteria, CCID Consulting assessed several database products. OceanBase Database achieved the highest matching score of 36.5, positioning it as the top choice for future core system upgrades. Finally, when asked about their future database selection preferences, the surveyed enterprises ranked OceanBase Database as their preferred option for future database deployments. + +![1701397692](/img/blogs/tech/ob-db-transform/image/1701397672185.png) + +Why is OceanBase Database becoming the preferred database for core system upgrades? Its roots in core systems date back to 2014 when it was first deployed to power Alipay's core transaction processing. This high-volume financial environment demanded high concurrency and availability. Forged in this crucible of extreme requirements, OceanBase Database is now expanding to serve a broader range of customers. + +Today, enterprises across diverse sectors are choosing OceanBase Database to underpin their core systems. We've identified several key technical routes driving these adoptions, particularly in core system upgrades: + +### **Route 1: Smooth Migration from Oracle to OceanBase Database, with Zero Application Changes** + +This route epitomizes smooth database upgrades. A large insurance group, for example, migrated over 300 Oracle databases to OceanBase Database in just a year, with new systems going live almost weekly. This rapid migration was a remarkable achievement made possible by close collaboration with the customer's exceptional technical team. OceanBase Database's seamless migration and Oracle compatibility preserved the company's existing software investment, requiring virtually no changes to tens of millions of lines of application code. This approach, which avoids the complexity and risk of simultaneous application and database modifications, proved far more efficient, reliable, and manageable. + +![1701398009](/img/blogs/tech/ob-db-transform/image/1701397989150.png) + +### **Route 2: Migration from DB2 to an Oracle Tenant of OceanBase Database, with Minimal Application Changes to Implement a Single-core Heterogeneous Architecture** + +This route targets customers using DB2, particularly small- and medium-sized banks. Few commercially available products offer seamless migration from DB2. Oracle, offering comparable functionality to DB2 and superior capabilities to many open source alternatives, has been a common replacement. With OceanBase Database's Oracle compatibility mode, customers like Changshu Rural Commercial Bank can migrate their DB2-based midrange server applications to OceanBase Database with minimal code changes, replicating almost all of their existing functionality. Enabling Oracle compatibility features within DB2 can further streamline this process. + +This migration approach allows for long-term two-way synchronization between OceanBase Database and DB2, enabling DB2 to serve as a standby database in the long run. OceanBase Database also supports diverse CPU architectures and in-house hardware within a single database, a capability we call "heterogeneous-hardware support." + +![1701398126](/img/blogs/tech/ob-db-transform/image/1701398106532.png) + +### **Route 3: Cloud + OceanBase Database + LDC-based Architecture** + +This route is suitable for enterprises that want to significantly refactor their applications by using a logical data center (LDC)-based architecture while upgrading their core systems. If an enterprise is willing to invest in refactoring their applications, the LDC-based architecture can significantly reduce application failure rates and substantially improve overall application availability, delivering comprehensive benefits beyond the database layer. + +The debit card system of a large state-owned bank provides a compelling case study. Originally running on a DB2 mainframe, the bank migrated its application to OceanBase Database and created over 100 isolated tenants in OceanBase Database. Each tenant housed a discrete application unit, mirroring the application's decomposition. This significantly reduced the overall failure rate. While a seamless upgrade was precluded by the legacy DB2 mainframe environment, the bank realized the benefits of this LDC-based approach. Throughout the migration, OceanBase Database delivered the necessary database capabilities, empowering the bank's modernization efforts. + +![1701398230](/img/blogs/tech/ob-db-transform/image/1701398210021.png) + +### **Route 4: Smooth Migration from MySQL to OceanBase Database** + +This route is ideal for enterprises in the Internet and new retail industries. These enterprises, drawn to OceanBase Database's significant cost reduction capabilities, choose it for their core system migrations. For example, GCash consolidated over 240 MySQL databases onto just 16 OceanBase clusters, drastically simplifying the O&M workload of database administrators (DBAs). Furthermore, their data footprint shrunk from 5 TB to 500 GB, a 90% reduction, resulting in a 35% overall cost savings. Our analysis indicates that enterprises with MySQL deployments exceeding 32 cores can typically achieve cost savings of over 30% by migrating to OceanBase Database. + +![1701398315](/img/blogs/tech/ob-db-transform/image/1701398294996.png) + + + +**2. Enterprises of All Sizes Wade into a Critical Phase of Digital Transformation** +------------------------ + +In today's market, enterprises of all sizes, including the small- and medium-sized banks we're discussing today, are wading into a critical phase of digital transformation. Based on conversations with numerous customers, we've observed this critical phase generally consists of three stages: + +![1701398389](/img/blogs/tech/ob-db-transform/image/1701398369362.png) + +Begin with a pilot program focusing on non-critical business modules. Next, implement the upgrade in a core, but not business-critical, module, and expand the scope from simpler peripheral business modules to those of greater complexity. Finally, assuming success in the second phase, proceed with widespread deployment. Throughout this process, organizations of all sizes – from small enterprises to large organizations – are likely to face several common challenges: + +(1) After expanding from peripheral to core systems, it becomes clear that database requirements differ. Core systems often demand robust analytical processing (AP) capabilities beyond simple key-value (KV) stores. + +(2) As deployments scale, cost becomes a primary concern for small- and medium-sized enterprises. While large organizations may be less cost-sensitive, they prioritize efficient replication, guaranteed performance and stability, and demonstrable return on investment (ROI). + +Over the past year, many OceanBase Database customers have progressed to the third phase. To address their evolving needs, we've focused on two key areas of product iteration, driving the evolution of OceanBase Database in these trends. + + + +### **Trend 1: Enhanced Compatibility** + +- #### **Continuously Improve Compatibility and Minimize Application Migration Costs** + +OceanBase Database uniquely supports both MySQL and Oracle compatibility within a single cluster. Over the past three years, OceanBase Database has significantly matured its compatibility features, going beyond basic functionality and addressing nuanced details often overlooked. For instance, OceanBase Database supports the GB 18030-2022 character set, the latest mandatory national standard, in both MySQL and Oracle modes. Notably, OceanBase Database delivered Oracle-compatible support for GB 18030-2022 even before Oracle itself. + +![1701398592](/img/blogs/tech/ob-db-transform/image/1701398571836.png) + +OceanBase Database V4.0 introduces significantly enhanced DDL support. Leveraging our offline DDL framework, OceanBase Database now supports the full spectrum of DDL operations. Previously, DBAs had to re-import data if a component was incorrectly configured. Now, they can easily modify even complex table schema attributes like partitioning keys with a single DDL statement. + +OceanBase Database will continue enhancing its MySQL compatibility, aiming to become a superior MySQL alternative. This provides customers with flexibility in choosing their preferred compatibility mode. For those who prefer MySQL mode, we are committed to bringing features from both our Oracle mode and years of OceanBase Database development into the MySQL experience. This effectively creates an "enhanced MySQL." For example, we've introduced DBLink functionality in MySQL mode, a feature familiar to Oracle users, further bridging the gap between the two ecosystems. + +Other features, such as SQL Plan Management (SPM), are now also available in MySQL mode. Previously OceanBase Database-specific performance views, along with other capabilities, are now provided in both MySQL and Oracle modes. This is a natural progression, aiming to help our customers complete smooth application migration as they enter a critical phase of digital transformation. By offering feature parity, we simplify application migration and reduce the need for extensive adaptation and rewriting, which becomes increasingly challenging as the number of migrated applications grows. + + + +- #### **Core Systems Require Databases to Handle Both Complex Transactions and Analytics** + +This requirement for hybrid transactional/analytical processing (HTAP) often presents a compatibility challenge. While achieving syntactic compatibility with open source databases can be relatively straightforward, migrating core systems to these alternatives often reveals performance limitations, despite functional equivalence. Legacy databases like Oracle and DB2, initially designed without a strict separation between transactional processing (TP) and AP workloads, have fostered applications that rely on this unified HTAP capability at the core system level. + +OceanBase Database V4.0 significantly enhances AP capabilities compared to version 3.0. This is evidenced by a 3.4x performance improvement in the TPC-DS benchmark, representing complex analytical workloads, and a 6x improvement in the TPC-H benchmark. Data import performance for both AP scenarios and core system migrations also sees a 6x increase. + +Furthermore, OceanBase Database V4.0 provides robust data integration capabilities in both MySQL and Oracle modes. Data can be integrated via DBLink, enabling queries against other databases directly within the OceanBase Database engine, or through external tables, facilitating access to data stored in files. Both features are fully supported in the latest release of OceanBase Database. + +![1701398809](/img/blogs/tech/ob-db-transform/image/1701398788515.png) + +OceanBase Database has significantly enhanced its resource isolation capabilities for mixed HTAP workloads. The latest version offers improved resource isolation through cgroups, managing CPU, memory, and IOPS, and introduces fine-grained resource isolation at the SQL statement level. + +Within a database, OceanBase Database can distinguish between batch processing applications and interactive applications. Dedicated resource groups can be assigned to batch processing applications, providing enhanced control. OceanBase Database supports resource grouping based on users, and its latest version allows binding specific SQL statements to resource groups, limiting their CPU and IOPS consumption. This granular control effectively addresses the challenges of managing mixed workloads in core systems. + + + +- #### **Continuously Refine the Migration Strategy to Evolve from Data Migration to Architecture Integration** + +For enhancing kernel functionality, roughly 50% of development effort is often dedicated to ensuring seamless migration during database upgrades. OceanBase Database, refined over a decade of real-world deployment in core systems, incorporates this expertise directly into its product and services. Through extensive customer engagements and practical experience, OceanBase Database has developed a comprehensive methodology for this critical process. + +![1701398892](/img/blogs/tech/ob-db-transform/image/1701398871970.png) + +Customers can use OceanBase Migration Assessment (OMA) in their existing environments to generate a comprehensive assessment report before migrating data, even without deploying OceanBase Database. This report details compatibility, identifies SQL statements requiring rewriting, and provides intelligent diagnostic recommendations, including table partitioning strategies for optimal performance. + +Once ready to migrate, OceanBase Migration Service (OMS) facilitates seamless data transfer from Oracle, MySQL, PostgreSQL, DB2, or HBase to OceanBase Database. For example, a large insurance group, as mentioned above, leveraged OMS to migrate hundreds of terabytes of data from over 300 systems, showcasing OMS's proven migration capabilities. + +After applications are switched over to the new OceanBase Database, OMS establishes a reverse synchronization link to the source database, enabling parallel operation for an extended period. Core systems, which often serve as intermediary data hubs, also need to synchronize data to downstream systems. To meet this end, OMS also supports data subscription. + +Currently, OceanBase Database supports data synchronization with various cloud services like AnalyticDB for MySQL (ADB), Hologres, and MaxCompute through built-in integrations. Users can also use Kafka to compile a program to subscribe to messages for downstream data warehouse synchronization. Unlike Oracle, MySQL benefits from a rich ecosystem of tools. Therefore, rather than building bespoke integrations for each downstream target, OceanBase Database natively supports the MySQL binlog protocol. This allows existing MySQL data replication and synchronization tools to seamlessly integrate with OceanBase Database, fostering compatibility within the MySQL ecosystem. + +### **Trend 2: Enhanced Stability** + +Database users always prioritize stability and reliability. Over the past year, OceanBase Database has dedicated significant effort to enhancing stability. For example, instead of solely focusing on recovery time objective (RTO) and recovery point objective (RPO), we've prioritized ensuring the continuous operation of databases and applications, even in the most demanding scenarios. While this sounds straightforward, achieving such resilience requires meticulous attention to every technical detail. + +![1701399063](/img/blogs/tech/ob-db-transform/image/1701399043319.png) + +- #### **Enhanced Stability Ensures Business Continuity under More Demanding Conditions** + +Firstly, OceanBase Database V4.0 redefines high availability with an RPO of 0 and an RTO of under 8 seconds. This significant improvement from a 30-second RTO was achieved through meticulous refinements, such as replacing the previous polling mechanism with cluster-wide broadcasts upon node failure. This ensures prompt notification of primary database switchovers to frontend applications. + +Secondly, cross-region deployment significantly reduces network bandwidth consumption. While network bandwidth wasn't a major concern for us internally during Alipay's demanding database upgrades, we recognize that it can be a significant constraint for many external customers, particularly those with cross-region deployment. Over the past two to three years, OceanBase Database has prioritized addressing this challenge. OceanBase Database V4.0 reduces cross-replica network bandwidth consumption by 30% to 40%, with TPC-C workloads demonstrating a 30% reduction in storage bandwidth. + +Thirdly, OceanBase Database offers a variety of flexible disaster recovery modes and comprehensive security enhancements. Regarding stability and reliability, I want to revisit Paxos, a fundamental yet crucial topic. While Paxos's three-replica architecture provides high availability, it also offers a significant, often overlooked benefit: tolerance to network jitter. Discussions about high availability must consider failure types. Clean failures, such as IDC outages, network disconnections, or fiber cuts, are relatively straightforward for distributed systems to handle. However, real-world scenarios often involve transient network issues like jitter. OceanBase Database, with its Paxos-based multi-replica architecture, is uniquely positioned to tolerate such disruptions. + + + +- #### **Arbitration Service: Automatic Leader Election Improves Automatic Zone-disaster Recovery** + +With the release of OceanBase Database V4.0, we've been highlighting a key new feature: the arbitration service. I'd like to take this opportunity to share how this feature can be utilized to enhance system stability. We've identified two key scenarios where the arbitration service can significantly improve the robustness of OceanBase Database deployments. + +![1701399233](/img/blogs/tech/ob-db-transform/image/1701399212565.png) + +The leftmost deployment in the preceding figure illustrates the traditional primary/standby database configuration. In this mode, the primary and secondary databases store two copies of data and require two sets of computing resources. Therefore, two sets of server resources are deployed. In addition, one bandwidth plan is required between the primary and standby databases, and all transactions write to a single copy of the data. In this scenario, a failure can lead to data loss, namely, the RPO is greater than 0. + +OceanBase Database enhances the disaster recovery capabilities of this deployment with a three-node high availability solution. Previous versions of OceanBase Database referred to this as a "log replica." By employing a majority-vote mechanism across the three nodes, data loss is prevented even if a minority of nodes fail. This "data loss" doesn't refer to the loss of uncommitted transactions, but rather the potential loss of committed data that would occur in a traditional primary/standby failover. OceanBase Database introduced a unique design in earlier versions where, despite maintaining three log copies, only two copies of the data were stored, as shown in the middle of the preceding figure. + +This approach, however, presents a bandwidth challenge. Data effectively occupies twice the network capacity, which can impact stability, especially in bandwidth-constrained environments. + +To address this, we've introduced an arbitration service in OceanBase Database. Think of it as an enhanced version of the log replica marked in green in the figure. When a transaction is committed, OceanBase Database does not write the transaction log to the arbitration service, meaning data is not synchronized to the arbitration replica. However, Paxos logs are synchronized to the arbitration replica, allowing the arbitration service to participate in distributed leader elections based on Paxos. This ensures strong consistency (RPO=0) and high availability at a lower cost by requiring only one bandwidth plan between the leader and followers. This reduction in bandwidth not only improves cost-efficiency but, crucially, enhances stability. + + + +- #### **Arbitration Service: Reduce Cross-region Bandwidth Consumption to Enhance the Stability of Three IDCs across Two Regions** + +The arbitration service also enhances the stability of three IDCs across two regions by reducing cross-region bandwidth consumption. + +![1701399577](/img/blogs/tech/ob-db-transform/image/1701399556978.png) + +Consider a traditional two-region, three-IDC deployment, as illustrated on the leftmost of the preceding figure. If the primary region fails, switching to a standby database in the standby region inevitably results in data loss, the extent of which is unpredictable. This poses a significant challenge. + +With OceanBase Database's two-region, three-IDC, five-replica deployment, a standby IDC automatically takes over if the primary IDC fails, ensuring no data loss and eliminating the need for data correction. However, while this maintains business continuity, it can lead to performance degradation. The remaining three replicas, along with the remote replicas, form a majority, requiring all writes to go across regions, impacting overall stability. + +Introducing the arbitration service addresses this issue. OceanBase Database converts the remote replicas into arbitration nodes. Upon primary IDC failure, the secondary IDC, with the participation of the arbitration replicas, transitions from a five-replica to a three-replica configuration. This facilitates rapid recovery without performance degradation. + + + +**3. Empower Customers with Practical Solutions** +----------------------- + +### **(1) Comprehensive Management Tools for Uninterrupted Core System Availability** + +Beyond the features discussed, we'll now address a key concern for DBAs: the management tools OceanBase Database provides to help customers tackle real-world challenges. + +![1701399780](/img/blogs/tech/ob-db-transform/image/1701399759969.png) + +First, planned O&M operations. These include scaling and rolling upgrades. For instance, if a server malfunctions, OceanBase Database can automatically replace it using its management platform, OceanBase Cloud Platform (OCP). These capabilities have been integral to OceanBase Database for a considerable time. + +Second, automated failure handling. Before OceanBase Database's distributed architecture, DBAs often had to handle failures, such as primary or standby database outages, during off-hours, sometimes with response times as tight as 10 minutes. OceanBase Database's automated failover capabilities have significantly reduced this burden. In the vast majority of failure scenarios, DBA intervention is unnecessary. Even in less common outages, the system typically recovers automatically within 8 seconds. This demonstrates how distributed technology drastically improves distributed disaster recovery capabilities. + +How does OceanBase Database, as a native distributed database, differ from databases that implement distributed transactions on top of sharding? Consider the classic two-phase commit problem: if the coordinator fails, distributed transactions can become blocked, as dictated by the protocol itself. OceanBase Database addresses this issue fundamentally. Because OceanBase Database participants are highly available, maintained by a three-replica architecture, they do not lose their state. This inherent high availability allows OceanBase Database to effectively eliminate a phase from the traditional two-phase commit process. This not only improves performance but also prevents suspended transactions, significantly reducing the operational burden on DBAs. + +Third, emergency response. OceanBase Database provides built-in emergency measures that can be manually triggered at the database kernel level. These include operations like follower-to-leader switchover and SQL throttling, such as limiting QPS per statement. While the effective use of these features relies on DBA experience, OceanBase Database offers a new product, OceanBase Autonomy Service (OAS), to simplify this process. OAS encapsulates the best practices gleaned from numerous core system upgrades, automating and streamlining many emergency procedures. + + + +### **(2) OAS: Ensure Core System Stability** + +OAS is an autonomous service provided by OceanBase Database. It leverages data collection and analysis, combined with expert knowledge. Honed through extensive experience with numerous customer core systems, OAS incorporates best practices and solutions derived from real-world DBA operational management and customer scenarios. These accumulated experiences are then formalized and integrated into the OAS product. OceanBase Database currently offers two ways to access these features. + +First, users of the latest OCP version can find them within the new Diagnostics Center, which includes updated functionality for resource management, monitoring and alerting, backup and restore, and session diagnostics. Second, for users of earlier OCP versions, OceanBase Database provides a standalone package containing these features, eliminating the need to upgrade to the latest OCP version. + +Internally, OCP monitors for anomalous events and rule violations. When detected, these trigger automated operational responses and self-healing actions. OceanBase Database has also incorporated new features for capacity planning and real-time SQL diagnostics. + +![1701399870](/img/blogs/tech/ob-db-transform/image/1701399849562.png) + +"Passion makes the years fly by" is a favorite motto of our founder, Yang Zhenkun, and one that resonates deeply with me. Thirteen years ago, we began as a small internal project with the simple goal of using technology to simplify the management and use of massive datasets. With the help and support of thousands of customers, OceanBase Database has evolved from version 1.x to 4.x, progressing from a native distributed architecture to an integrated architecture supporting both standalone and distributed deployments. Looking ahead, we are committed to partnering with even more customers to build truly practical and user-friendly solutions for critical business workloads, providing a robust database foundation for upgrading core systems. + +Let me share a brief story. During the development of OceanBase Database V1.0, even with numerous ongoing tasks, our founder, Yang, insisted we pause everything and spend over a month refactoring our entire codebase to meticulously check every C++ return value. This commitment to quality is reflected in our open source code today. We encourage you to explore the OceanBase Database code and report any unchecked return values as bugs. This dedication to building a robust and reliable product is a responsibility we take seriously, both for the long-term success of OceanBase Database and for our customers. That's all my presentation. Thanks for your attention. \ No newline at end of file diff --git a/docs/blogs/tech/pushdown-tech.md b/docs/blogs/tech/pushdown-tech.md new file mode 100644 index 000000000..f4dc1ecfc --- /dev/null +++ b/docs/blogs/tech/pushdown-tech.md @@ -0,0 +1,288 @@ +--- +slug: pushdown-tech +title: 'Distributed Pushdown Techniques in OceanBase Database' +--- + +# Distributed Pushdown Techniques in OceanBase Database + +> I have been studying the book "An Interpretation of OceanBase Database Source Code" and noticed that it contains very little content about the SQL executor. Therefore, I want to write some blog posts about the SQL executor as a supplement to this book. In my previous post [Adaptive Techniques in the OceanBase Database Execution Engine](https://open.oceanbase.com/blog/5250647552), I introduced some representative adaptive techniques in the executor, based on the assumption that you have a basic understanding of the two-phase pushdown technique for HASH GROUP BY. If you are unfamiliar with the multi-phase pushdown technique of the executor, you are welcome to read this post to learn about common adaptive distributed pushdown techniques in OceanBase Database. + +What Is Distributed Pushdown? +======== + +To better utilize parallel execution capabilities and reduce CPU and network overheads during distributed execution, the optimizer often pushes down some operators to lower-layer compute nodes when it generates execution plans. This is to make full use of the computing resources of the cluster to improve the execution efficiency. Next, I'm going to introduce the most common distributed pushdown techniques in OceanBase Database. + +LIMIT Pushdown +======== + +Let me first talk about the pushdown of the LIMIT operator. The following are two SQL statements for creating a table named orders and reading 100 rows from the orders table, respectively: +``` + CREATE TABLE `orders` ( + `o_orderkey` bigint(20) NOT NULL, + `o_custkey` bigint(20) NOT NULL, + `o_orderdate` date NOT NULL, + PRIMARY KEY (`o_orderkey`, `o_orderdate`, `o_custkey`), + KEY `o_orderkey` (`o_orderkey`) LOCAL BLOCK_SIZE 16384 + ) partition by range columns(o_orderdate) + subpartition by hash(o_custkey) subpartitions 64 + (partition ord1 values less than ('1992-01-01'), + partition ord2 values less than ('1992-02-01'), + partition ord3 values less than ('1992-03-01'), + partition ord77 values less than ('1998-05-01'), + partition ord78 values less than ('1998-06-01'), + partition ord79 values less than ('1998-07-01'), + partition ord80 values less than ('1998-08-01'), + partition ord81 values less than (MAXVALUE)); + + select * from orders limit 100; +``` + + +The following plan shows a very common scenario of distributed pushdown: +``` + explain select * from orders limit 100; + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query Plan | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ================================================================= | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ----------------------------------------------------------------- | + | |0 |LIMIT | |1 |2794 | | + | |1 |└─PX COORDINATOR | |1 |2794 | | + | |2 | └─EXCHANGE OUT DISTR |:EX10000|1 |2793 | | + | |3 | └─LIMIT | |1 |2792 | | + | |4 | └─PX PARTITION ITERATOR| |1 |2792 | | + | |5 | └─TABLE FULL SCAN |orders |1 |2792 | | + | ================================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | limit(100), offset(nil) | + | 1 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | 2 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | dop=1 | + | 3 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | limit(100), offset(nil) | + | 4 - output([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), filter(nil) | + | force partition granule | + | 5 - output([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), filter(nil) | + | access([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), partitions(p0sp[0-63], p1sp[0-63], p2sp[0-63], p3sp[0-63], p4sp[0-63], p5sp[0-63], | + | p6sp[0-63], p7sp[0-63]) | + | limit(100), offset(nil), is_index_back=false, is_global_index=false, | + | range_key([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), range(MIN,MIN,MIN ; MAX,MAX,MAX)always true | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +You can see that Operators 0 and 3 in the plan are both LIMIT operators. Operator 0 is pushed down to generate Operator 3, reducing the number of rows scanned by Operator 5, a TABLE SCAN operator, from each partition of the orders table. Each thread of the TABLE SCAN operator scans at most 100 rows. This reduces the overhead in data scan by the TABLE SCAN operator and the network overhead in sending data to Operator 1 for aggregation. At present, in OceanBase Database, an EXCHANGE operator will send a packet after it receives 64 KB data from a lower-layer operator. If a LIMIT operator is not pushed down, massive data may be scanned, leading to a high network overhead. + + + +In actual business scenarios, a LIMIT operator is usually used in combination with the ORDER BY keyword. If the ORDER BY keyword is used in the preceding example, a TOP-N SORT operator, which has much higher performance than a SORT operator, will be generated in the plan. + +``` + explain select * from orders order by o_orderdate limit 100; + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query Plan | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ================================================================= | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ----------------------------------------------------------------- | + | |0 |LIMIT | |1 |2794 | | + | |1 |└─PX COORDINATOR MERGE SORT | |1 |2794 | | + | |2 | └─EXCHANGE OUT DISTR |:EX10000|1 |2793 | | + | |3 | └─TOP-N SORT | |1 |2792 | | + | |4 | └─PX PARTITION ITERATOR| |1 |2792 | | + | |5 | └─TABLE FULL SCAN |orders |1 |2792 | | + | ================================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | limit(100), offset(nil) | + | 1 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | sort_keys([orders.o_orderdate, ASC]) | + | 2 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | dop=1 | + | 3 - output([orders.o_orderkey], [orders.o_custkey], [orders.o_orderdate]), filter(nil) | + | sort_keys([orders.o_orderdate, ASC]), topn(100) | + | 4 - output([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), filter(nil) | + | force partition granule | + | 5 - output([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), filter(nil) | + | access([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), partitions(p0sp[0-63], p1sp[0-63], p2sp[0-63], p3sp[0-63], p4sp[0-63], p5sp[0-63], | + | p6sp[0-63], p7sp[0-63]) | + | is_index_back=false, is_global_index=false, | + | range_key([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), range(MIN,MIN,MIN ; MAX,MAX,MAX)always true | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +If the LIMIT operator is not pushed down, Operator 3 will be a SORT operator. In this case, each thread needs to sort and send all the scanned data to the upper-layer data flow operation (DFO). A DFO is a sub-plan. Adjacent DFOs are separated with an EXCHANGE operator. For more information, see [Schedule distributed execution plans](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001973795). + +The purpose of pushing down the LIMIT operator is to end execution in advance to reduce calculation and network overheads. + + + +AGGREGATION Pushdown +============== + +Let me take the following statement that contains the GROUP BY clause as an example to describe distributed pushdown in aggregation. + + select count(o_totalprice), sum(o_totalprice) from orders group by o_orderdate; + + + +This SQL statement queries the daily order count and sales amount. If you want to execute the statement in parallel, the most straightforward approach would be to distribute data in the table based on the hash values of the GROUP BY column (`o_orderdate`). This way, all rows with the same `o_orderdate` value are sent to the same thread. The threads can aggregate received data in parallel. + +However, this plan requires a shuffle of all data in the table, which may lead to a very high network overhead. Moreover, if data skew occurs in the table, for example, a large number of orders were placed on a specific day, the workload of the thread responsible for processing orders of this day will be much heavier than that of other threads. This long-tail task may directly lead to a long execution time for the query. + +To address these issues, the GROUP BY operator is pushed down to generate the following plan: +``` + explain select count(o_totalprice), sum(o_totalprice) from orders group by o_orderdate; + +-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query Plan | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ===================================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | --------------------------------------------------------------------- | + | |0 |PX COORDINATOR | |1 |2796 | | + | |1 |└─EXCHANGE OUT DISTR |:EX10001|1 |2795 | | + | |2 | └─HASH GROUP BY | |1 |2795 | | + | |3 | └─EXCHANGE IN DISTR | |1 |2794 | | + | |4 | └─EXCHANGE OUT DISTR (HASH)|:EX10000|1 |2794 | | + | |5 | └─HASH GROUP BY | |1 |2793 | | + | |6 | └─PX PARTITION ITERATOR| |1 |2792 | | + | |7 | └─TABLE FULL SCAN |orders |1 |2792 | | + | ===================================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([INTERNAL_FUNCTION(T_FUN_COUNT_SUM(T_FUN_COUNT(orders.o_totalprice)), T_FUN_SUM(T_FUN_SUM(orders.o_totalprice)))]), filter(nil) | + | 1 - output([INTERNAL_FUNCTION(T_FUN_COUNT_SUM(T_FUN_COUNT(orders.o_totalprice)), T_FUN_SUM(T_FUN_SUM(orders.o_totalprice)))]), filter(nil) | + | dop=1 | + | 2 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(orders.o_totalprice))], [T_FUN_SUM(T_FUN_SUM(orders.o_totalprice))]), filter(nil) | + | group([orders.o_orderdate]), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(orders.o_totalprice))], [T_FUN_SUM(T_FUN_SUM(orders.o_totalprice))]) | + | 3 - output([orders.o_orderdate], [T_FUN_COUNT(orders.o_totalprice)], [T_FUN_SUM(orders.o_totalprice)]), filter(nil) | + | 4 - output([orders.o_orderdate], [T_FUN_COUNT(orders.o_totalprice)], [T_FUN_SUM(orders.o_totalprice)]), filter(nil) | + | (#keys=1, [orders.o_orderdate]), dop=1 | + | 5 - output([orders.o_orderdate], [T_FUN_COUNT(orders.o_totalprice)], [T_FUN_SUM(orders.o_totalprice)]), filter(nil) | + | group([orders.o_orderdate]), agg_func([T_FUN_COUNT(orders.o_totalprice)], [T_FUN_SUM(orders.o_totalprice)]) | + | 6 - output([orders.o_orderdate], [orders.o_totalprice]), filter(nil) | + | force partition granule | + | 7 - output([orders.o_orderdate], [orders.o_totalprice]), filter(nil) | + | access([orders.o_orderdate], [orders.o_totalprice]), partitions(p0sp[0-63], p1sp[0-63], p2sp[0-63], p3sp[0-63], p4sp[0-63], p5sp[0-63], p6sp[0-63], | + | p7sp[0-63]) | + | is_index_back=false, is_global_index=false, | + | range_key([orders.o_orderkey], [orders.o_orderdate], [orders.o_custkey]), range(MIN,MIN,MIN ; MAX,MAX,MAX)always true | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +In this plan, each thread will pre-aggregate the data it reads before distributing the data. The pre-aggregation job is done by Operator 5, a GROUP BY operator. Then, Operator 5 will send the aggregation results to its upper-layer operator. Operator 2, another GROUP BY operator, will aggregate the received data again. After Operator 5 pre-aggregates the data, the data amount will remarkably decrease. This can decrease the network overhead caused by data shuffle and reduce the impact of data skew on the execution time. + + + +Then, let me demonstrate the execution process of the preceding SQL statement. +``` + select count(o_totalprice), sum(o_totalprice) from orders group by o_orderdate; +``` + +The original data comprises seven rows. The amount of each order is CNY 10. The orders were placed on June 1, June 2, and June 3. + +![](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/132317/1692171521743-ab26c6af-8238-494e-8490-ffa519e1eaa6.png) + +The following figure shows the execution process, where the degree of parallelism (DOP) is set to 2. + +![](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/132317/1692171612573-508f9122-0df1-49d2-8ded-8f74f07284a2.png) + +The first thread in the upper-left corner scans three rows, and the second thread in the lower-left corner scans four rows. Data with the same date, namely, data in the same group, is marked with the same color. + +The first thread aggregates the three rows it scans, which are distributed in two groups. The dates of two rows are June 1. Therefore, for June 1, the order count is 2 and the sales amount is 20. The date of one row is June 3. Therefore, for June 3, the order count is 1 and the sales amount is 10. The four rows scanned by the second thread are also distributed in two groups. Two rows are generated after aggregation. This part of job is completed by Operator 5 in the plan. + +Then, the two threads distribute the data based on the hash values of the `o_orderdate` column. Data with the same date is sent to the same thread. This part of job is completed by Operators 3 and 4 in the plan. + +Each thread on the right side will aggregate the received data again. The two rows of June 3 scanned by the two threads on the left side, which are marked in red, are sent to the thread in the lower-right corner. The two rows are aggregated again by the operator on the right side. After aggregation, the order count is 2 and the sales amount is 20. The two rows are finally aggregated into one row. This part of job is completed by Operator 2 in the plan. + +Then, all data is sent to the coordinator, which will summarize the data and send the final calculation results to the client. + + + +JOIN FILTER Pushdown +============== + +In a JOIN operator, the join filters of the left-side table will be pushed down to the right-side table to perform pre-filtering and partition pruning for data in the right-side table. + +Pre-filtering +---- + +When a hash join is executed, the data in the left-side table is always read first to build a hash table. Then, the data in the right-side table is used to probe the hash table, and if successful, the data will be sent to the upper-layer operator. If a reshuffle is performed on the data in the right-side table of the hash join, the network overhead may be high, which is subject to the data amount of the right-side table. In this case, join filters can be used to reduce the network overhead caused by data shuffle. + +Let me take the plan shown in the following figure as an example.![](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/132317/1692168493602-7a9b2897-7671-40a9-b307-fa163a1ac803.png) + +In the plan, Operator 2, a HASH JOIN operator, reads data from the left-side table. During the read, it will use the t1.c1 join key to create a join filter, which is done by Operator 3, a JOIN FILTER CREATE operator. The most common form of a join filter is Bloom filter. After the join filter is created, it is sent to the right-side DFO, which contains Operator 6 and other lower-layer operators. + +Operator 10, a TABLE SCAN operator, has a filter sys\_op\_bloom\_filter(t2.c1), which specifies to use values of t2.c1 in the right-side table to quickly probe the hash table based on the Bloom filter. If a value of t2.c1 does not match any value of t1.c1, the row where the t2.c1 value is located in the t2 table can be pre-filtered and does not need to be sent to the HASH JOIN operator. + +Partition pruning +---- + +Join filters can be used not only for row filtering but also for partition pruning (or filtering). Assume that t1 is a partitioned table and the join key is also its partitioning key. A plan shown in the following figure can be generated. + +![](https://intranetproxy.alipay.com/skylark/lark/0/2023/png/132317/1692168758156-8ebf1d39-ab7e-462a-a97d-14eb7ef1671e.png) + +In this plan, Operator 3 is a PARTITION JOIN FILTER CREATE operator. It will detect the partitioning method of the right-side t1 table of the hash join. When it obtains a row in the left-side table from the lower-layer operator, it will use the c1 value to calculate the partition to which this row belongs in the right-side t1 table, and record the partition ID in the join filter. The join filter that contains the partition ID will be used on Operator 8 for partition pruning for the right-side table of the hash join. When the table scan operator scans each partition in the right-side table, it will verify whether the partition ID exists in the join filter. If not, it can skip the entire partition. + +A join filter can be used for data pre-filtering and partition pruning, thereby reducing the overheads in data scan, network transmission, and hash table probes. At present, OceanBase Database supports only Bloom filters in versions earlier than V4.2. OceanBase Database supports two new types of join filters starting from V4.2: In filter and Range filter. The new join filters can significantly improve the performance in some scenarios, especially when the left-side table contains a few distinct values or contains continuous values. + + + +Other Distributed Pushdown Techniques +======== + +Apart from the preceding common distributed pushdown techniques that are easy to understand, OceanBase Database also supports more adaptive distributed pushdown techniques, such as adaptive two-phase pushdown for window functions and three-phase pushdown for aggregate functions. + +This post will not provide a detailed introduction to the more complex distributed pushdown techniques used by OceanBase Database. The following are sample execution plans of the two distributed pushdown techniques for those who are interested in conducting further research. + +Adaptive two-phase pushdown for window functions: +``` + select /*+parallel(3) */ + c1, sum(c2) over (partition by c1) from t1 order by c1; + Query Plan + =================================================== + |ID|OPERATOR |NAME | + --------------------------------------------------- + |0 |PX COORDINATOR MERGE SORT | | + |1 | EXCHANGE OUT DISTR |:EX10001| + |2 | MATERIAL | | + |3 | WINDOW FUNCTION CONSOLIDATOR | | + |4 | EXCHANGE IN MERGE SORT DISTR | | + |5 | EXCHANGE OUT DISTR (HASH HYBRID)|:EX10000| + |6 | WINDOW FUNCTION | | + |7 | SORT | | + |8 | PX BLOCK ITERATOR | | + |9 | TABLE SCAN |t1 | + =================================================== +``` + +Three-phase pushdown for aggregate functions: +``` + select /*+ parallel(2) */ + c1, sum(distinct c2),count(distinct c3), sum(c4) from t group by c1; + Query Plan + =========================================================================== + |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| + --------------------------------------------------------------------------- + |0 |PX COORDINATOR | |1 |8 | + |1 |└─EXCHANGE OUT DISTR |:EX10002|1 |7 | + |2 | └─HASH GROUP BY | |1 |6 | + |3 | └─EXCHANGE IN DISTR | |2 |6 | + |4 | └─EXCHANGE OUT DISTR (HASH) |:EX10001|2 |6 | + |5 | └─HASH GROUP BY | |2 |4 | + |6 | └─EXCHANGE IN DISTR | |2 |4 | + |7 | └─EXCHANGE OUT DISTR (HASH)|:EX10000|2 |3 | + |8 | └─HASH GROUP BY | |2 |2 | + |9 | └─PX BLOCK ITERATOR | |1 |1 | + |10| └─TABLE FULL SCAN |t |1 |1 | + =========================================================================== +``` + + +Preview of the Next Post +==== + +This post introduces several typical distributed pushdown techniques in the executor of OceanBase Database, based on the assumption that you have a basic understanding of the distributed execution of the database. If you are unfamiliar with the parallel execution techniques of the executor, please look forward to the next post [Parallel Execution Techniques of OceanBase Database](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I). \ No newline at end of file diff --git a/docs/blogs/tech/query-perf.md b/docs/blogs/tech/query-perf.md new file mode 100644 index 000000000..9c7ca45a0 --- /dev/null +++ b/docs/blogs/tech/query-perf.md @@ -0,0 +1,445 @@ +--- +slug: query-perf +title: 'Insights into OceanBase Database 4.0: Distributed Query Performance Optimization in OceanBase Database' +--- + + +> **Wang Guoping | Senior Technical Expert of OceanBase** + +> Wang is the technical director of the OceanBase Database SQL engine team. He joined OceanBase in 2016 and is responsible for the R&D of the SQL engine. He graduated from Harbin Institute of Technology in 2008 and received his PhD from the National University of Singapore in 2014. His main research direction during his PhD was multi-query optimization and processing in the database field. Before joining OceanBase, he was responsible for database R&D in Huawei. + + + +Performance is one of the important metrics for measuring a database system and also a major concern in the database system field. OceanBase Database V3.x provides a relatively sound optimizer engine, standalone execution engine, parallel execution engine, and vectorized execution engine. In May 2021, OceanBase Database V3.x ran the TPC-H benchmark and ranked first in the 30,000 GB Results list. It achieved a result of 15.26 million QphH@30,000 GB, which showcases its core performance. OceanBase Database has proved its distributed query performance and linear scalability by running this benchmark. + + + +During the massive application of OceanBase Database V3.x, performance issues still occur in some business scenarios. For example, non-optimal execution plans are generated in specific distributed scenarios, the execution engine has no tolerance for non-optimal execution plans, and parallel execution threads cannot be fully used to speed up queries in specific scenarios. To address these issues, when we started to design OceanBase Database V4.0, we thought about how to optimize the SQL engine to improve the distributed query performance. **The distributed query optimization and distributed execution engine** fundamentally determine the distributed query performance of the SQL engine. Let's talk about our thoughts from these two aspects. + + +**How does OceanBase Database V4.0 perform distributed query optimization?** + +As we all know, query optimization is the focus and difficulty in database kernel development, and also the key point that determines the database query performance. Query optimization aims to select the optimal execution plan for each SQL statement. Generally, an SQL statement has many equivalent execution plans whose performance may vary by orders of magnitude. Therefore, query optimization fundamentally determines the query performance. OceanBase Database is a distributed relational database system, which means it inherently needs to perform distributed query optimization. In a relational database system, query optimization is always difficult in development. Distributed query optimization raises the level of the difficulty. Next, let's talk about the challenges in distributed query optimization compared with standalone query optimization. + + + + + +## 1. Challenges in distributed query optimization + + + +**Significantly expanded plan enumeration space** + + + +In query optimization, the optimizer needs to select an implementation method for each operator in an execution plan. In a standalone scenario, the optimizer only needs to consider the implementation of the operator on a single server. In a distributed scenario, the optimizer also needs to consider the distributed implementation of the operator. For example, in a standalone scenario, the implementation methods for a join operator include hash join, merge join, and nested loop join. In a distributed scenario, the implementation methods include partition-wise join, partial partition-wise join, hash-hash distribution join, and broadcast distribution join. When these distributed implementation methods are combined with standalone implementation methods, the plan enumeration space for distributed query optimization will be significantly expanded, posing challenges for the optimization. + + + +**More physical attributes to be maintained** + + + +In standalone query optimization, operator order is a very important physical attribute because it may be used to speed up subsequent operators. The operator order determines whether tuples in the database are output based on a specific order after the operator is executed. For example, tuples are output in the order of (a,b,c) after the index (a,b,c) is scanned, because OceanBase Database preserves the order during index scan. The operator order is related to the implementation of specific operators. It may even affect the cost of subsequent operators. Therefore, after each operator is executed, query optimization will maintain the physical attribute "order", and execution plans with a useful order will be retained during plan pruning. + + + +In distributed query optimization, another physical attribute is partition information. Partition information mainly includes the data partitioning method and the physical location of each partition. Partition information fundamentally determines the distributed algorithm selected for an operator. For example, whether a join can be implemented as a partition-wise join depends on the join key and the table partition information. As partition information can also affect the cost of subsequent operators, the physical attribute "partition information" also needs to be maintained during distributed query optimization. Partition information maintenance will finally affect plan pruning and selection and increase the complexity in distributed query optimization. + + + +**More accurate distributed cost model needed** + + + +In query optimization, cost is the standard to evaluate an execution plan. Generally, cost represents the execution time of an execution plan or the amount of database system resources, such as CPU, I/O, and network resources, occupied by the execution plan. In a standalone scenario, the cost model needs to consider only the CPU and I/O costs. In a distributed scenario, apart from CPU and I/O costs, the cost model also needs to consider the network transmission cost, degree of parallelism (DOP) for queries, and cost in specific distributed optimization scenarios such as cost calculation for a Bloom filter. These factors increase the complexity in the design and fitting of a distributed cost model, as well as the complexity in distributed query optimization to some extent. + + + + + +## 2. Two-phase distributed query optimization in OceanBase Database V3.x + + + +To decrease the complexity caused by distributed query optimization, OceanBase Database V3.x adopts two-phase distributed query optimization, which is a common solution in the industry. + + + +In the first phase, based on the assumption that all tables are stored on the local server, the optimizer selects a local optimal execution plan by using the existing standalone query optimization capabilities. + + + +In the second phase, based on the fixed join order and local algorithms, the optimizer selects a distributed algorithm for each operator by using a simple distributed cost model. + + + +The following figure shows an example of two-phase distributed query optimization for query Q1. In the first phase, the optimizer selects a local optimal execution plan, as shown on the left side of the figure. MJ represents merge join, HJ represents hash join, and HGBY represents hash group by, which are local algorithms. In the second phase, based on the fixed join order and local algorithms, the optimizer selects a distributed algorithm for each operator by using a simple distributed cost model. In this example, the partition-wise join algorithm is selected for the MJ node, and the hash-hash distribution join algorithm is selected for the HJ node. + + +``` + create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4; + create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4; + create table R3(a int primary key, b int, c int, d int) partition by hash(b) partitions 5; + select R2.c, sum(R3.d) from R1, R2, R3 where R1.a = R2.a and R2.C = R3.C group by R2.C; +``` + + + +![](/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.png) + + + +Two-phase distributed query optimization significantly decreases the optimization complexity. However, during massive commercial use of OceanBase Database V3.x, the effects of two-phase distributed query optimization are sometimes not as expected due to the following reasons: + + + +**A non-optimal local algorithm is selected when partition information is ignored** + + + +During two-phase distributed query optimization, if partition information is ignored in the first phase, a non-optimal local algorithm will generally be selected. The following example shows a query Q2 and its execution plan in the first phase. During local optimization in the first phase, if the selectivity of predicate R1.c = 100 is low, a few rows in the R1 table meet this condition. In this case, the optimizer will select a nested loop join for this query. Specifically, for each row in the R1 table that meets the condition, data that meets the condition is quickly obtained from the R2 table based on the index `idx`. In actual execution, however, the execution time of the nested loop join is much longer than that estimated by the optimizer. This is because R2 is a partitioned table with 100 partitions and the operation performed for each row in the R1 table must be performed in each partition in the R2 table during the nested loop join, which increases the execution time by 100 times. In this case, the optimal plan may be a hash join rather than a nested loop join. In this scenario, partition information is not considered during the optimization in the first phase. As a result, the standalone costs of operators are incorrectly estimated in the first phase, and a non-optimal local algorithm is selected for the query. + + +``` + create table R1(a int primary key, b int, c int); + create table R2(a int primary key, b int, c int, index idx(b)) partition by hash(a) partitions 100; + Q2: select * from R1, R2 where R2.b = R1.b and R1.c = 100; + /*Execution plan for the first phase*/ + | ============================================= + |ID|OPERATOR |NAME |EST. ROWS|COST | + --------------------------------------------- + |0 |NESTED-LOOP JOIN| |970299 |85622| + |1 | TABLE SCAN |r1 |990 |40790| + |2 | TABLE SCAN |r2(idx)|1 |44 | + ============================================= + + Outputs & filters: + ------------------------------------- + 0 - output([r1.a], [r1.b], [r1.c], [r2.a], [r2.b], [r2.c]), filter(nil), + conds(nil), nl_params_([r1.b]) + 1 - output([r1.b], [r1.c], [r1.a]), filter([r1.c = 100]), + access([r1.b], [r1.c], [r1.a]), partitions(p0) + 2 - output([r2.b], [r2.a], [r2.c]), filter(nil), + access([r2.b], [r2.a], [r2.c]), partitions(p0) +``` + + + + + +**A non-optimal join order is selected when partition information is ignored** + + + +During two-phase distributed query optimization, if partition information is ignored in the first phase, a non-optimal join order will generally be selected. The following figure shows a query Q3 and two groups of local plans and distributed plans generated for it. In the first group, the join order is ((R2, R3), R1). In the second group, the join order is ((R1, R2), R3). If partition information is not considered, the optimizer may select the ((R2, R3), R1) join order in the first phase. However, this join order may incur more network transmission costs in the second phase. As shown in the following figure, the tables R1, R2, and R3, as well as the join results of R2 and R3, all need to be transmitted over the network. ((R1,R2), R3) may be a better join order. This is because in the second phase, only R3 and the join results of R1 and R2 need to be transmitted. Since a partition-wise join can be performed on R1 and R2, the two tables do not need to be transmitted over the network. In business scenarios, it is common that an inappropriate join order is selected due to the ignorance of partition information. + + +``` + create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;create table R3(a int primary key, b int, c int, d int) partition by hash(b) partitions 5;Q3: select R2.c, sum(R3.d) from R1, R2, R3 where R1.a = R2.a and R2.b = R3.b; +``` + + + +![](/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.png) + + + +In the foregoing two scenarios, a non-optimal join order and a non-optimal local algorithm are selected because partition information is not considered during optimization in the first phase. Through the two scenarios, we can see that the drawbacks of two-phase distributed query optimization are obvious. Next, let's talk about how OceanBase Database V4.0 performs distributed query optimization to resolve these issues. + + + + + +## 3. Distributed query optimization in OceanBase Database V4.0 + + + +OceanBase Database V4.0 uses the one-phase optimization method for distributed queries. In this method, the optimizer enumerates both local and distributed algorithms in the same phase and estimates the costs by using a distributed cost model. OceanBase Database V4.0 restructures the entire distributed query optimization method from two-phase optimization to one-phase optimization. + + + +To facilitate understanding of the one-phase distributed query optimization method, I first want to introduce the bottom-up dynamic programming method in System-R. Given an SQL statement, System-R uses the bottom-up dynamic programming method to enumerate joins and select a join algorithm. For a join that involves N tables, this method will enumerate execution plans for each subset by size. For each enumeration subset, the method will select an optimal plan as follows: + + + +* Enumerate all standalone join algorithms, maintain the physical attribute "order", and calculate the costs based on a standalone cost model. +* Retain the plan with the lowest cost and those with a useful order. The order in a plan is useful when and only when this order is useful for the allocation of subsequent operators. + + + +The following figure shows an example of plan enumeration for a join that involves four tables. The method will first enumerate plans for all base tables of size 1. For each base table, the method will enumerate all indexes and retain plans with the lowest cost and a useful order. Then, the method will enumerate plans for each subset of size 2. For example, to enumerate all execution plans for the join of `{R1,R2}`, the method will consider all standalone join algorithms and combine them with all plans retained for R1 and R2. The method will continue enumeration until execution plans are enumerated for the subset of size 4. + + + +![](/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.png) + + + +Based on the standalone query optimization method of System-R, OceanBase Database V4.0 implements distributed query optimization as follows: + + + +1\. For each enumeration subset, OceanBase Database will enumerate the distributed algorithms of all operators, use a distributed cost model to calculate the cost of each distributed algorithm, and maintain two physical attributes: order and partition information. + + + +2\. For each enumeration subset, OceanBase Database will retain the plan with the lowest cost, plans with a useful order, and plans with useful partition information. Partition information is useful when and only when it is useful for subsequent operators. In the scenario shown in the following figure, plan P1 uses a hash-hash distribution join, and plan P2 uses a broadcast distribution join for the R2 table. Though P2 has a higher cost than P1, P2 inherits the partition information of the R1 table, which will be useful for the subsequent GROUP BY operator. Therefore, P2 will also be retained. + + +``` + create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4; + create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4; + select R1.a, SUM(R2.c) from R1, R2 where R1.b = R2.b group by R1.a; +``` + + + +![](/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.png) + + + +OceanBase Database V4.0 uses the one-phase distributed query optimization method, which involves a much larger plan space than standalone query optimization. Facing the issue of a large plan space, OceanBase Database V4.0 provides a variety of methods for quick plan pruning. It also provides new join enumeration algorithms to support distributed plan enumeration for ultra-large tables. Thanks to these techniques, OceanBase Database V4.0 effectively reduces the distributed plan space and improves the distributed query optimization performance. Our experimental results also show that OceanBase Database V4.0 can enumerate distributed plans for 50 tables within seconds. + + +**How does OceanBase Database V4.0 improve the performance of the distributed execution engine?** + + + +Compared with OceanBase Database V3.x, OceanBase Database V4.0 has made many improvements in the execution engine. It has implemented new distributed and standalone algorithms, such as null-aware hash anti-join, shared broadcast hash join, hash-based window function, and partitioned Bloom filter. It has also improved the implementation of the entire vectorized engine, developed ultimate parallel pushdown techniques, and initiated the development of adaptive techniques. These efforts have greatly improved the performance of both distributed and standalone queries. Here I want to introduce the adaptive techniques and parallel pushdown techniques of OceanBase Database V4.0. + + + +## 4. Development towards an adaptive execution engine + + + +In business scenarios of OceanBase Database, we found that the execution engine has no tolerance for non-optimal execution plans generated by the optimizer. When the optimizer generates non-optimal execution plans, the execution engine cannot adjust the plans to improve the execution performance. Although the optimizer is designed to choose the optimal execution plans for database queries, the optimizer itself is not perfect. For example, it cannot accurately estimate the total number of rows. So, the optimizer may pick a less optimal execution plan, or even a lousy one. + + + +To resolve this issue, OceanBase Database V4.0 starts to develop an adaptive execution engine. An adaptive execution engine identifies some non-optimal execution plans based on the real-time execution status and adjusts them accordingly to improve the execution performance. We believe that once an execution engine reaches a certain stage of development, it must use adaptive techniques to address the issue of non-optimal execution plans generated by the optimizer. However, we also do not believe that adaptive techniques can handle all scenarios of non-optimal plans. + + + +OceanBase Database V4.0 implements adaptive GROUP BY/DISTINCT parallel pushdown, which can prevent performance downgrade caused by non-optimal plans in GROUP BY/DISTINCT parallel pushdown scenarios. Before we dive into the adaptive technique, let me briefly introduce the GROUP BY/DISTINCT parallel pushdown technique. As a general technique in distributed execution, GROUP BY/DISTINCT parallel pushdown is often used to push down the GROUP BY operator in advance to pre-aggregate some data. This reduces the workload of network transmission, thus improving the performance. The following figure shows an example where the execution plan pushes down the GROUP BY operator to Operator 5 for data pre-aggregation, so that the network transmission workload of Operator 4 is reduced to achieve higher performance. However, note that GROUP BY parallel pushdown does not necessarily improve the performance. It sometimes backfires, mainly because it consumes extra computing resources. GROUP BY parallel pushdown brings benefits only when the performance improvement in network transmission surpasses the extra computing cost. + + +``` + create table R1(a int primary key, b int, c int) partition by hash(a) partitions 4; + explain select b, sum(c) from R1 group by b; + | ========================================================== + |ID|OPERATOR |NAME |EST. ROWS|COST| + ---------------------------------------------------------- + |0 |PX COORDINATOR | |1 |10 | + |1 | EXCHANGE OUT DISTR |:EX10001|1 |10 | + |2 | HASH GROUP BY | |1 |9 | + |3 | EXCHANGE IN DISTR | |1 |9 | + |4 | EXCHANGE OUT DISTR (HASH)|:EX10000|1 |8 | + |5 | HASH GROUP BY | |1 |8 | + |6 | PX PARTITION ITERATOR | |1 |7 | + |7 | TABLE SCAN |r1 |1 |7 | + ========================================================== + + Outputs & filters: + ------------------------------------- + 0 - output([INTERNAL_FUNCTION(r1.b, T_FUN_SUM(T_FUN_SUM(r1.c)))]), filter(nil), rowset=256 + 1 - output([INTERNAL_FUNCTION(r1.b, T_FUN_SUM(T_FUN_SUM(r1.c)))]), filter(nil), rowset=256, dop=1 + 2 - output([r1.b], [T_FUN_SUM(T_FUN_SUM(r1.c))]), filter(nil), rowset=256, + group([r1.b]), agg_func([T_FUN_SUM(T_FUN_SUM(r1.c))]) + 3 - output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256 + 4 - (#keys=1, [r1.b]), output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256, dop=1 + 5 - output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256, + group([r1.b]), agg_func([T_FUN_SUM(r1.c)]) + 6 - output([r1.b], [r1.c]), filter(nil), rowset=256 + 7 - output([r1.b], [r1.c]), filter(nil), rowset=256, + access([r1.b], [r1.c]), partitions(p[0-3]) +``` + + + +In OceanBase Database of earlier versions, the optimizer determines whether to push down the GROUP BY operator based on cost estimation. However, the optimizer may sometimes incorrectly estimate the number of rows. As a result, the GROUP BY operator is not pushed down or is incorrectly pushed down, compromising the execution performance. To resolve this issue, OceanBase Database V4.0 introduces adaptive GROUP BY/DISTINCT parallel pushdown. The optimizer will always push down the GROUP BY/DISTINCT operator and determine whether to skip the pushed down GROUP BY/DISTINCT operator by sampling part of the data of the operator during execution. The challenge of this technique lies in how to determine whether the pushed down operator has satisfactory pre-aggregation performance. The OceanBase solution is to control the performance of the hash table of the pushed down operator by limiting the table within the L3 cache and perform multiple rounds of sampling to prevent misjudgment due to continuous non-aggregation of data. The key points of the solution are described as follows: + + + +* The execution engine limits the hash table within the L2 cache (1 MB) and, in the case of unsatisfactory pre-aggregation performance, marks the hash table as discarded. If the pre-aggregation performance is good, the execution engine expands the hash table to the L3 cache (10 MB) and, if more memory is needed during the execution, marks the hash table as discarded. +* If the hash table is discarded, the execution engine returns and releases all rows of the table, and then rebuilds the hash table to start the next round of sampling. +* If pre-aggregation fails to achieve satisfactory performance in five consecutive rounds of sampling, the execution engine skips the pushed down GROUP BY operator. + + + +Adaptive GROUP BY/DISTINCT parallel pushdown incurs extra overhead for sampling and computing, which are required to determine whether to skip the pushed down operator during the execution. However, our tests based on various data distribution modes indicate that the extra overhead can be kept within 10%, which is much lower than the performance gain. + + + +We are also working on more adaptive techniques, such as the adaptive creation and detection of Bloom filters, adaptive tuning of nested loop joins and hash joins, and adaptive tuning of broadcast distribution joins and hash-hash distribution joins. We believe that these adaptive techniques can elevate the capabilities of the execution engine to a new level, making the execution engine more robust. This way, when the optimizer generates a non-optimal or lousy execution plan, the execution engine can adjust the plan to improve the query performance. + + + + +## 5. Development towards ultimate parallel pushdown + + + +Parallel pushdown in the execution of distributed queries is a technique where the computing of some operators is pushed down to improve performance. Generally, this technique improves the performance of distributed queries by performing executions at the maximum parallelism or by reducing network transmission. It significantly improves the performance of distributed queries by orders of magnitude in many cases. The GROUP BY/DISTINCT parallel pushdown technique described in the previous section is a typical example of parallel pushdown techniques. Compared with OceanBase Database V3.x, OceanBase Database V4.0 provides well-developed parallel pushdown techniques, which work on almost all operators in analytical processing (AP) scenarios, such as GROUP BY, ROLLUP, and DISTINCT, and window functions. + + + +The following table compares OceanBase Database V3.x and OceanBase Database V4.0 in parallel pushdown. + +![](/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.png) + + +In OceanBase Database V4.0, the implementation of parallel pushdown varies based on operators. Due to the complexity in parallel execution, each implementation is confronted with different challenges. Here we won't introduce each implementation of parallel pushdown. Let's talk about the three-phase parallel pushdown technique for DISTINCT aggregate functions, to illustrate the advantages of parallel pushdown. The following example shows a query Q1 that contains two DISTINCT aggregate functions. In OceanBase Database V3.x, parallel pushdown cannot be performed for Q1. The execution plan of Q1 shows that all deduplication logic and aggregate logic are calculated by Operator 0, which does not support parallel execution, leading to poor overall execution performance. + + +``` + create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4; + Q1: select sum(distinct c), sum(distinct d) from R1 where a = 5; + | ===================================================== + |ID|OPERATOR |NAME |EST. ROWS|COST| + ----------------------------------------------------- + |0 |SCALAR GROUP BY | |1 |2365| + |1 | PX COORDINATOR | |3960 |2122| + |2 | EXCHANGE OUT DISTR |:EX10000|3960 |1532| + |3 | PX PARTITION ITERATOR| |3960 |1532| + |4 | TABLE SCAN |r1 |3960 |1532| + ===================================================== + + Outputs & filters: + ------------------------------------- + 0 - output([T_FUN_SUM(distinct r1.c)], [T_FUN_SUM(distinct r1.d)]), filter(nil), + group(nil), agg_func([T_FUN_SUM(distinct r1.c)], [T_FUN_SUM(distinct r1.d)]) + 1 - output([r1.c], [r1.d]), filter(nil) + 2 - output([r1.c], [r1.d]), filter(nil), dop=1 + 3 - output([r1.c], [r1.d]), filter(nil) + 4 - output([r1.c], [r1.d]), filter(nil), + access([r1.c], [r1.d]), partitions(p[0-3]) +``` + + + + + +To improve the distributed execution performance of a query that uses a DISTINCT aggregate function, OceanBase Database V4.0 introduces the three-phase parallel pushdown logic. The following example shows the three-phase parallel pushdown logic for a query that uses a DISTINCT aggregate function. The details are as follows: + + + +**In the first phase**, the DISTINCT logic is pushed down for partial deduplication. In this example, the job in this phase is completed by Operator 6. + + + +**In the second phase**, data is repartitioned based on the deduplicated column, and then full deduplication and partial pre-aggregation calculation are performed. In this example, the job in this phase is completed by Operators 3, 4, and 5. + + + +**In the third phase**, the results obtained in the second phase are aggregated. In this example, the job in this phase is completed by Operators 0, 1, and 2. + + + +Compared to the execution without operator pushdown, the three-phase parallel pushdown technique has two performance benefits. First, it allows data deduplication and pre-aggregation at the maximum parallelism. Second, data deduplication by using the DISTINCT pushdown technique reduces the workload of network transmission. +``` + create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4; + select sum(distinct c) from R1 where a = 5; + | =========================================================== + |ID|OPERATOR |NAME |EST. ROWS|COST| + ----------------------------------------------------------- + |0 |SCALAR GROUP BY | |1 |1986| + |1 | PX COORDINATOR | |1 |1835| + |2 | EXCHANGE OUT DISTR |:EX10001|1 |1835| + |3 | MERGE GROUP BY | |1 |1835| + |4 | EXCHANGE IN DISTR | |1 |1683| + |5 | EXCHANGE OUT DISTR (HASH)|:EX10000|1 |1683| + |6 | HASH GROUP BY | |1 |1683| + |7 | PX PARTITION ITERATOR | |3960 |1532| + |8 | TABLE SCAN |r1 |3960 |1532| + =========================================================== + + Outputs & filters: + ------------------------------------- + 0 - output([T_FUN_SUM(T_FUN_SUM(distinct r1.c))]), filter(nil), + group(nil), agg_func([T_FUN_SUM(T_FUN_SUM(distinct r1.c))]) + 1 - output([T_FUN_SUM(distinct r1.c)]), filter(nil) + 2 - output([T_FUN_SUM(distinct r1.c)]), filter(nil), dop=1 + 3 - output([T_FUN_SUM(distinct r1.c)]), filter(nil), + group(nil), agg_func([T_FUN_SUM(distinct r1.c)]) + 4 - output([r1.c]), filter(nil) + 5 - (#keys=1, [r1.c]), output([r1.c]), filter(nil), dop=1 + 6 - output([r1.c]), filter(nil), + group([r1.c]), agg_func(nil) + 7 - output([r1.c]), filter(nil) + 8 - output([r1.c]), filter(nil), + access([r1.c]), partitions(p[0-3] +``` + + + +The preceding example shows how the three-phase parallel pushdown technique works for an aggregate function with only one DISTINCT keyword. The question is, is it still effective for aggregate functions with more DISTINCT keywords? The answer is yes. The trick is that in Phase 1, we create a replica of the data set for each aggregate function that has N DISTINCT keywords and tag the replica to indicate its association with this aggregate function. Similar operations are performed in Phases 2 and 3, except for some minor differences in terms of implementation. The following example shows the three-phase pushdown logic for a query that uses two DISTINCT aggregate functions. AGGR_CODE is used to mark the redundant data generated by each DISTINCT aggregate function. + + +``` + create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4;select sum(distinct c), sum(distinct d) from R1 where a = 5; + | =========================================================== + |ID|OPERATOR |NAME |EST. ROWS|COST| + ----------------------------------------------------------- + |0 |SCALAR GROUP BY | |1 |13 | + |1 | PX COORDINATOR | |2 |13 | + |2 | EXCHANGE OUT DISTR |:EX10001|2 |12 | + |3 | HASH GROUP BY | |2 |11 | + |4 | EXCHANGE IN DISTR | |2 |10 | + |5 | EXCHANGE OUT DISTR (HASH)|:EX10000|2 |9 | + |6 | HASH GROUP BY | |2 |8 | + |7 | PX PARTITION ITERATOR | |1 |7 | + |8 | TABLE SCAN |r1 |1 |7 |=========================================================== + + Outputs & filters: + ------------------------------------- + 0 - output([T_FUN_SUM(T_FUN_SUM(dup(r1.c)))], [T_FUN_SUM(T_FUN_SUM(dup(r1.d)))]), filter(nil), rowset=256, group(nil), agg_func([T_FUN_SUM(T_FUN_SUM(dup(r1.c)))], [T_FUN_SUM(T_FUN_SUM(dup(r1.d)))]) + 1 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256 + 2 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256, dop=1 + 3 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256, group([AGGR_CODE]), agg_func([T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]) + 4 - output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256 + 5 - (#keys=3, [AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256, dop=1 + 6 - output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256, group([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), agg_func(nil) + 7 - output([r1.c], [r1.d]), filter(nil), rowset=256 + 8 - output([r1.c], [r1.d]), filter(nil), rowset=256, + access([r1.c], [r1.d]), partitions(p[0-3]) +``` + + + +Parallel pushdown is common in distributed scenarios. In OceanBase Database V3.x, the distributed query performance often deteriorates due to the imperfection of the parallel pushdown feature. OceanBase Database V4.0 can resolve such issues to improve the distributed query performance. + + +## 6. Afterword + + + +In the end, I want to share with you the actual improvements made by OceanBase Database V4.0 in distributed query performance. Compared with OceanBase Database V3.x, OceanBase Database V4.0 implements a new distributed cost model, a distributed query optimization framework, a set of well-developed parallel pushdown techniques, and adaptive techniques. The development of these techniques is driven by our understanding of customer requirements and distributed systems. + + + +We tested the techniques by running the TPC-DS 100 GB benchmark. The test results show that the new techniques significantly improve the distributed query performance. The total execution duration of 99 queries decreases from 918s to 270s. The following figure compares the performance of queries in OceanBase Database V3.x and OceanBase Database V4.0 in the TPC-DS 100 GB benchmark. + + + +![](/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.png) + +Performance comparison between OceanBase Database V3.x and V4.0 in the TPC-DS 100 GB benchmark + + + +These are our thoughts on the value and technical evolution of distributed query optimization of OceanBase Database V4.0. Databases are foundational software in essence. For software users, we hope that later OceanBase Database V4.x versions can bring better user experience and higher query performance based on distributed query optimization and technical innovations in the execution engine. + + + +Follow us in the [OceanBase community](https://open.oceanbase.com/blog). We aspire to regularly contribute technical information so we can all move forward together. + + + +Search 🔍 DingTalk group 33254054 or scan the QR code below to join the OceanBase technical Q&A group. You can find answers to all your technical questions there. + + + +![](https://gw.alipayobjects.com/zos/oceanbase/f4d95b17-3494-4004-8295-09ab4e649b68/image/2022-08-29/00ff7894-c260-446d-939d-f98aa6648760.png) \ No newline at end of file diff --git a/docs/blogs/tech/real-time-analytics.md b/docs/blogs/tech/real-time-analytics.md new file mode 100644 index 000000000..cfb8cd879 --- /dev/null +++ b/docs/blogs/tech/real-time-analytics.md @@ -0,0 +1,198 @@ +--- +slug: real-time-analytics +title: 'Release of OceanBase Database V4.3.3: The First GA Version for Real-time Analytics' +--- + +We're excited to announce that OceanBase Database V4.3.3, a General Availability (GA) version, was officially released. We unveiled this new version at this year's product launch in the last week. As the first GA version targeted at real-time analytical processing (AP) scenarios, OceanBase Database V4.3.3 is significantly optimized and improved in many aspects. Its integrated capabilities can better meet users' needs in real-time analysis and diversified business scenarios. + +In early 2024, we released OceanBase Database V4.3.0, marking a critical step toward real-time analytics with its log-structured merge-tree (LSM-tree)-based columnar storage engine. Forged in the crucible of dozens of real-world business scenarios, **OceanBase Database V4.3.3 further advances AP performance and features. By providing the hybrid transaction/analytical processing (HTAP) capabilities, it helps users shorten the response time and improve the throughput for complex workloads.** + +Multiple breakthroughs have been made in OceanBase Database V4.3.3. We have remarkably improved the system performance for complex workloads, especially for AP workloads. We have also optimized the columnar storage engine and extended its application scenarios, including columnstore tables, columnstore indexes, hybrid rowstore-columnstore tables, and columnstore replicas. + +In addition, the system presents more flexibility in processing diversified schema types and data types, such as materialized views, external tables, RoaringBitmaps, and arrays, with the introduction of vectorized engine 2.0. With columnstore replicas in a new form, resources for transaction processing (TP) and AP workloads are physically isolated to avoid interference between the two types of workloads. This way, high performance and stability can be ensured in complex scenarios such as real-time data analytics and decision. + +The TPC-H benchmark results show that compared with the 99 seconds taken by V4.3.0, the GA version of OceanBase Database V4.3.3 spent only 60 seconds querying a dataset of 1 TB, improving the performance by 64%. OceanBase Database V4.3.3 meets diverse needs for data storage and analysis in different business scenarios, while ensuring faster response to massive data analysis requests at a higher throughput. + +**In terms of AI features, OceanBase Database V4.3.3 introduces vector retrieval to support vector data and indexes. By leveraging powerful multi-model integration and distributed storage, this version noticeably simplifies the AI application technology stack and helps enterprises construct AI-powered applications efficiently.** + +**The release of V4.3.3 symbolizes that OceanBase Database makes significant progress in the integration of real-time AP and AI-powered vectorized engine.** Next, let's take a deeper dive into the main features and highlights of the GA version of OceanBase Database V4.3.3. + +-  Columnar storage + +-  Vectorized engine 2.0 + +-  Materialized view + +-  External table + +-  Data import and export + +-  Complex data types + +-  Vector search + +-  Full-text index + +-  Reliability improvement + + + +**1. AP Features** +------------- + +### **1.1 Columnar storage** + +In scenarios involving large-scale data analytics or extensive ad-hoc queries, columnar storage stands out as a crucial feature of an AP database. Columnar storage is a way to organize data files. Different from row-based storage, columnar storage physically arranges data in a table by column. When data is stored by column, the system can scan only the columns involved in the query and calculation, instead of scanning the entire row. This way, the consumption of resources such as I/O and memory is reduced, and the calculation is accelerated. In addition, columnar storage naturally has better data compression conditions and usually offers a higher compression ratio, thereby reducing the required storage space and network transmission bandwidth. + +OceanBase Database supports the columnar engine based on the LSM-tree-based architecture, implementing integrated columnar and row-based data storage on an OBServer node with only one set of code and one architecture, and ensuring the performance of both TP and AP requests. We provide several columnar storage solutions to meet users needs in different business scenarios. + +**-  Columnstore table**: This solution applies only to AP business, which features higher analytic performance. On this basis, if users want to perform point queries on a table at high performance, they only need to create rowstore indexes on the table. + +**-  Columnstore index**: This solution mainly targets TP business. If a few analytic needs are involved, users can create columnstore indexes only on the columns to be analyzed in a rowstore table. + +**-  Hybrid rowstore-columnstore table**: This solution applies only when the boundary between TP and AP businesses is not clear, namely when both online transaction processing (OLTP) and online analytical processing (OLAP) workloads exist in a business module. In this case, users can create hybrid rowstore-columnstore tables to store business data, and the optimizer determines whether to store the data by row or column based on the costs. Resource groups can be created to isolate resources at the user or SQL statement level. + +**-  Columnstore replica**: Users can configure dedicated zones to store read-only columnstore replicas based on their TP clusters if they want to physically isolate resources in HTAP scenarios. This way, TP business can access only rowstore zones, while AP business can access columnstore zones in weak-consistency read mode. + + + +### **1.2 Vectorized engine 2.0** + +Earlier versions of OceanBase Database have implemented a vectorized engine based on uniform data format descriptions, offering performance significantly better than that of non-vectorized engines. However, the engine still has some performance deficiencies in deep AP scenarios. The new version of OceanBase Database implements the vectorized engine 2.0, which is based on column data format descriptions, avoiding the memory usage, serialization, and read/write access overhead caused by ObDatum maintenance. Based on the new column data format descriptions, OceanBase Database optimizes the implementation mechanisms of operators and expressions, remarkably increasing the computing performance in case of a large data volume. + + + +### **1.3 Materialized views** + +By precomputing and storing the query results of materialized views, real-time calculations are reduced to improve query performance and simplify complex query logic. Materialized views are commonly used for rapid report generation and data analysis scenarios. + +Materialized views need to store query result sets to optimize the query performance. Due to data dependency between a materialized view and its base tables, data in the materialized view must be refreshed accordingly when data in any base tables changes. Therefore, the materialized view refresh mechanism is also introduced in the new version, including complete refresh and incremental refresh strategies. Complete refresh is a more direct approach where each time the refresh operation is executed, the system will re-execute the query statement corresponding to the materialized view, completely calculate and overwrite the original view result data. This method is suitable for scenarios with relatively small data volumes. Incremental refresh, by contrast, only deals with data that has been changed since the last refresh. To ensure precise incremental refreshes, OceanBase Database has implemented a materialized view log mechanism similar to that in Oracle databases, which tracks and records incremental update data of the base table in detail through logs, ensuring that the materialized view can be quickly incrementally refreshed. Incremental refresh is suitable for business scenarios with substantial data volumes and frequent data changes. + +Non-real-time materialized views can be refreshed on a regular basis or manually to handle queries in most analysis scenarios. However, real-time materialized views are more suitable in business scenarios requiring high real-time performance. Therefore, OceanBase Database provides real-time computing capabilities based on materialized views and materialized view logs (mlogs) outperforming common views. + +The new version also allows users to rewrite a query based on a materialized view. When the system variable `QUERY_REWRITE_ENABLED` is set to `True`, users can enable automatic rewriting in the materialized view creation statement. After automatic rewriting is enabled, the system can rewrite table queries into materialized view-based queries without requiring users to specify the materialized view name in the SQL statement, thus reducing the rewrite costs. + +To support PRIMARY KEY constraints on materialized views, OceanBase Database allows users to specify a primary key for a materialized view to optimize the performance in scenarios such as single-row query, range query, or association based on the primary key. + + + +### **1.4 External table** + +OceanBase Database has supported external tables in the CSV format since a very early version and introduces more supported formats including GZIP, DEFLATE, and ZSTD in the new version. As the AP business develops, the need for reading external data sources in the Parquet format is increasing in some data lake scenarios. Therefore, the new version of OceanBase Database supports external tables in the Parquet format. Users can import data into internal OceanBase tables through external tables or directly use external tables for cross-data source joint queries and analysis. + +The new version also supports external table partitioning, which is similar to the LIST partitioning for common tables, and provides syntax for both manual and automatic partitioning. In automatic partition creation mode, the system groups files by partition based on the definition of the partitioning key. In manual partition creation mode, users need to specify the path to the data file of each partition. In this case, the system implements partition pruning based on the partitioning conditions for an external table query, thereby reducing the number of files to scan and improving the query performance. + +Meanwhile, to ensure the timeliness of the file directories scanned by external tables, the new version introduces the automatic refresh feature. With this feature, users can use the `AUTO_REFRESH` option to specify the directory refresh method (manual, real-time, or periodic) during external table creation, and manage scheduled refresh tasks by using the DBMS\_EXTERNAL\_TABLE.REFRESH\_ALL\_TABLE(interval int) subprogram together with the preceding option. + + + +### **1.5 Data import and export** + +While TP business mainly involves data insert operations, batch data import and data processing that require high performance are more common in AP business. OceanBase Database supports the following import methods: direct load, external table import, partition exchange, overwriting, import from the client, and regular import. + +**-  Direct load**: This feature simplifies the data loading path and skips the SQL, transaction, MemTable, and other modules to directly persist data into SSTables, which significantly improves the data import efficiency. Direct load supports the import of both full data and incremental data. Data in tables needs to be rewritten during full direct load, which means it is better to import an empty table by using this method. If users need to import data into a table multiple times, they can use the incremental direct load feature. With this feature, the database writes only new data rather than repeatedly writing all existing data. This ensures high import performance. + +**-  External table import**: To achieve better analysis performance in the current stage, users can use the `INSERT INTO SELECT` statement to import an external table into the internal OceanBase database. Direct load can be used together with this method to improve the import performance. + +**-  Partition exchange**: This feature allows users to modify the partition and table definitions in the data dictionary to migrate data with minimal delay from one table to a partition in another table without physically replicating the data. This method applies to scenarios where cold data needs to be archived and distinguished from hot data. + +**-  Overwriting**: In data warehouses, overwriting is common in periodic data refresh, data conversion, data cleansing, and data correction. OceanBase Database supports table- and partition-level overwriting. Specifically, the database can empty old data and write new data in a table or partition in an atomic manner. Based on the full direct load capability, executing the `INSERT OVERWRITE` statement can improve the import performance. + +**-  Local import from the client using the `LOAD DATA LOCAL INFILE` statement**: This feature enables the import of local files through streaming file processing. Based on this feature, developers can import local files for testing without uploading files to the server or object storage media, improving the efficiency of importing a small amount of data. + +**-  Regular import**: Different from direct load, regular import needs to be optimized by the SQL engine, which applies to scenarios where multiple constraints exist. + +Real-time data import is supported in multiple methods, as described above. However, real-time data import requires the session to wait until the import is complete and cannot be interrupted during this process, which is inconvenient when a large amount of data is to be imported. To address this issue, OceanBase Database provides the asynchronous job scheduling capability. Users can use the `SUBMIT JOB`, `SHOW JOB STATUS`, and `CANCEL JOB` statements to respectively create an asynchronous import job, query the job status, and cancel a job. + +In terms of data export, the kernel of OceanBase Database enables users to execute the `SELECT INTO OUTFILE` statement to export text files, and supports parallel table data reading and external file writing. It also allows data export based on user-defined partitioning rules. We will strive to fully support the external table export feature, which is implemented by using the `INSERT OVERWRITE` statement, in later versions of OceanBase Database. + + + +### **1.6 Complex data types** + +In the era of big data, enterprises are increasingly keen on data mining and analysis. Featuring efficient computing with less storage space, RoaringBitmap plays a key role in business scenarios such as user profiling, personalized recommendations, and precise marketing. The MySQL mode of OceanBase Database supports the RoaringBitmap data type and improves performance in the calculation and deduplication of a large amount of data by storing and operating a group of unsigned integers. To meet multi-dimensional analysis needs, the new version of OceanBase Database supports more than 20 expressions for cardinality calculation, set calculation, bitmap judgment, bitmap construction, bitmap output, and aggregate operation. + +ARRAY is a common complex data type in AP business scenarios. An array can store multiple elements of the same type. If you need to manage and query multi-valued attributes that cannot be effectively represented by relational data, the ARRAY data type is an appropriate choice. OceanBase Database supports the ARRAY data type in MySQL mode. During table creation, you can define a column as an array of numeric or character values, which can also be an embedded array. You can also create an expression for querying or writing array objects. The `array_contains` expression and `ANY` operator can be used to verify whether an array contains a specific element. Moreover, you can also use operators, such as `+`, `-`, `=`, and `!=`, to calculate and judge the elements in an array. + +The multi-valued index feature applies to JSON documents and other collection data types, effectively facilitating element retrieval. OceanBase Database in MySQL mode supports the multi-valued index feature for JSON data. You can create an efficient secondary index on a JSON array of multiple elements. This enhances the capabilities to query complex JSON data structures while ensuring the data model flexibility and data query performance. + + + +**2. AP Performance Improvement** +------------- + +### **2.1 Benchmark tests** + +**2.1.1 TPC-H (1 TB): a decline from 99s to 60s** + +The TPC-H (1 TB) benchmark test results show a great performance increase in OceanBase V4.3.3 compared with V4.3.0. The execution time of queries is 99s in V4.3.0, which is reduced to 60s in V4.3.3, improving performance by 64%. As shown in the following figure, V4.3.3 presents obviously higher performance than V4.3.0 in multiple query tasks, further demonstrating the optimization effects of OceanBase Database in real-time analysis scenarios. + +![1730172379](/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.png) + +| | 4.2.1 LTS | 4.3.0 | 4.3.3 | +| -- | --------- | ---- | ---- | +| Time consumed (s) | 126.32 | 99.14 | 60.41 | +| Performance improvement | | 27% | 64% | + + + +**2.1.2 Better performance in ClickBench tests** + +The ClickBench test results show that, in cold run scenarios, ClickHouse spent 139.57s executing queries, while OceanBase Database V4.3.3 spent 90.91s, 54% higher than the query performance of ClickHouse. In two hot run scenarios, the execution times of ClickHouse were 44.05s and 36.63s, respectively, while OceanBase improved the query performance by 26% (34.92s) and 6% (34.08s), respectively. These figures indicate a great performance increase of OceanBase Database in multiple query executions. + +![1730172553](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-10/d6296341-cd71-4d29-bc42-ac72a1c76baf.png) + + + +### **2.2 Scenario-specific parameter initialization** + +As an integrated database, OceanBase Database supports multiple business types, including express OLTP, complex OLTP, OLAP, HTAP, and KV. Default system settings may be unable to suit all scenarios. For example, the recommended I/O read method varies depending on business scenarios. Therefore, OceanBase Database provides recommended settings of key parameters based on different business types by leveraging cloud platforms and OceanBase Cloud Platform (OCP), in order to achieve the optimal out-of-the-box performance. + + + +**3. Enhanced AP Stability** +-------------- + +When an SQL query involves a large amount of data, the memory may be insufficient. In this case, the temporary intermediate results of some operators must be materialized. The execution of the SQL query fails if the disk space is fully occupied by the materialized data. OceanBase Database supports compressing temporary results of SQL queries. This feature effectively reduces the disk space occupied temporarily, so as to support query tasks with higher computing workload. + + + +**4. Vector Search** +---------- + +The development and popularization of AI applications have triggered explosive growth in unstructured data such as images, videos, and texts. With embedding algorithms, unstructured data can be represented as high-dimensional vectors for analysis and processing. Vector databases emerged during this process. A vector database is a fully managed solution for processing unstructured data, which is used for storing, indexing, and retrieving embedding vectors. **Vector indexes are an essential capability of a vector database. They convert a keyword-based search into a vectorized retrieval to turn a deterministic search into a similarity search, meeting the requirements for retrieving large-scale, high-dimensional vectors.** + +OceanBase Database in MySQL mode supports vector type storage, vector indexes, and embedding vector retrieval. It supports the storage of float vectors with at most 16,000 dimensions, basic operations such as addition, subtraction, multiplication, comparison, and aggregation, as well as Approximate Nearest Neighbor Search (ANNS), along with Hierarchical Navigable Small World Network (HNSW) indexes for at most 2,000 dimensions. It can be used for Retrieval-Augmented Generation (RAG) to adapt to business scenarios such as image and video retrieval, behavior preference recommendation, security and fraud detection, and ChatGPT-like applications. + +Currently, OceanBase Database has integrated some application frameworks like LlamaIndex and DB-GPT to allow quick construction of AI-powered applications. Adaption to other frameworks is under planning as well. + + + +**5. Full-text Retrieval** +---------- + +In relational databases, indexes are often used to accelerate queries based on precise value matching. Common B-tree indexes cannot be applied to scenarios where a large amount of text data needs to be queried in fuzzy search mode. In this case, full table scans can only query data row by row, failing to meet performance requirements in case of a large volume of text data. On top of this issue, SQL rewriting also fails to support queries in complex scenarios such as approximate matching and correlation sorting. + +**To address these issues, OceanBase Database supports the full-text index feature. This feature allows users to preprocess text content and create keyword-based indexes to effectively improve full-text retrieval efficiency.** MySQL-compatible full-text retrieval has been supported now and will be extended to support more complex retrieval logic features for higher performance. + + + +**6. Reliability Improvement** +----------- + +OceanBase Database V4.3.3 supports tenant cloning. Users can quickly clone a specified primary or standby tenant by executing an SQL statement in the sys tenant. After a tenant cloning job is completed, the created tenant is a standby tenant. Users can convert the standby tenant into the primary tenant to provide services. The new tenant and the source tenant share physical macroblocks in the initial state, but new data changes and resource usage are isolated between the tenants. Users can clone an online tenant for temporary data analysis with high resource consumption or other high-risk operations to avoid risking the online tenant. In addition, users can also clone a tenant for disaster recovery. When irrecoverable misoperations are performed in the source tenant, they can use the new tenant for data rollback. + +**In addition, OceanBase Database V4.3.3 provides a quick restore feature.** In OceanBase Database of earlier versions, physical restore is a process of restoring the full data. A physical restore is completed only after all the data (minor compaction data and baseline data) and logs are restored. Then, users can log in to and use the restored tenant. If a large amount of data is to be restored to a tenant, the restore will take a long time and users need to reserve sufficient disk space for the tenant at the very beginning to ensure a successful restore. In some scenarios, a tenant is restored only for query and verification purposes and will be destroyed later. If only a few tablets are involved during the query, a full restore costs too high and is a waste of storage space, time, and network bandwidth. + +The quick restore feature supported in the new version allows users to implement read services by restoring only logs rather than data to the local server. In addition, the data backup feature allows users to build an intermediate-layer index for a backup SSTable based on the backup address. With this index, OBServer nodes can randomly read data from the backup SSTable like reading local data. + + + +**7. Summary** +---------- + +The GA version V4.3.3 is an important breakthrough of OceanBase Database in real-time analysis and AP scenarios, signifying a stride toward building a modern database architecture. **In later 4.3.x versions, we will keep optimizing and enhancing AP features and build integrated product capabilities to meet diversified needs in different business scenarios.** + +We'd like to extend thanks to every user and developer who has provided support and made contributions to OceanBase Database V4.3.3. Your feedback and suggestions give impetus to our product upgrades and help us thrive on challenges. We hope to join hands with users to build a more efficient and powerful distributed database product in the future. + +You can read [**What's New**](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001783557) to learn more about the general availability of OceanBase Database V4.3.3. \ No newline at end of file diff --git a/docs/blogs/tech/small-specification-deployment.md b/docs/blogs/tech/small-specification-deployment.md new file mode 100644 index 000000000..4c1c89866 --- /dev/null +++ b/docs/blogs/tech/small-specification-deployment.md @@ -0,0 +1,212 @@ +--- +slug: small-specification-deployment +title: 'Insights into OceanBase Database 4.0: Support for Small-Specification Deployment to Make Distributed Databases More Accessible' +--- + + +> **Author | Zhao Yuzhong, a senior technical expert of OceanBase.** He joined Alipay in 2010 to help with the R&D of the distributed transaction framework, and has engaged in the R&D of storage engines as an OceanBaser since 2013. + + + +With the emergence of more scenarios and the growth of data volume in recent years, distributed databases have rapidly spread across a variety of sectors, providing great solutions for data-intensive and high-concurrency applications with their technical capabilities such as data consistency, high availability, and elastic scaling. A distributed database is often deployed on multiple servers to ensure high availability and performance. Therefore, to handle small-scale simple scenarios in the early days of their business, users tend to deploy a centralized database that costs less and exhibits higher performance under small specifications. The problem is, sooner or later, the centralized database will be bottlenecked as the business size grows, and adjustments or restructuring of the database architecture by then can be extremely challenging and costly. + + + +OceanBase Database Community Edition V4.0 was released at the Apsara Conference 2022. It is the industry's first MySQL-compatible integrated database that supports both standalone and distributed deployment modes. This version provides many much-expected capabilities, such as enhanced online analytical processing (OLAP) capabilities. Featuring an integrated architecture, it can be deployed in standalone mode with a few clicks and can stably run in a production system with small hardware specifications, such as 4 CPU cores and 16 GB of memory (4C16G). This reduces the deployment costs and improves its usability. We hope that the dual technical advantages of the integrated architecture can bring perpetual benefits for database users. + + + +According to their feedback, users are highly interested in the integrated architecture of OceanBase Database Community Edition V4.0 and its support for small-specification deployment. We believe that small-specification deployment is not only about providing all necessary features in standalone mode. More importantly, it delivers higher performance with the same hardware configuration. In this article, we will, from the following three perspectives, share our thoughts on small-sized distributed databases, and our innovative ideas and solutions about the integrated architecture that supports both standalone and distributed deployment: + + + +* Reasons for choosing a small-specification distributed database +* Key techniques for small-specification deployment +* Performance of OceanBase Database with small specifications + + + + + +## 1. Reasons for Choosing a Small-specification Distributed Database +---------------- + +Over the past decade or so since its founding in 2010, OceanBase has broken the world records in TPC-C and TPC-H tests, empowered the Double 11 shopping festival every year, and ensured that every transaction was safely and efficiently executed. Pushing through all kinds of challenges, OceanBase Database, as a fully self-developed native distributed database, has proved its scalability and stability. From OceanBase Database V2.2 which topped the TPC-C ranking for the first time with 203 Elastic Compute Service (ECS) servers, to a later version that took the crown again with 1,554 ECS servers, the performance of OceanBase Database rose linearly with the number of servers. + + + +On the other hand, as OceanBase Database caught the attention from industries other than the financial sector, we realized that not all users were faced with the amount of data comparable to the Double 11. In fact, standalone databases are just enough to tick all the boxes of many users in the early days of their business, when the data volume is rather small. Therefore, it is a great help to provide minimal database specifications for users to begin with. In this way, users are able to break in at very low costs. Also, with the great scalability of OceanBase Database, users can flexibly scale out their database systems later to take care of the increasing data volume and performance requirements. + + + +### 1.1 From small to large: Basic database requirements in a business that grows + + + +The latest OceanBase Database V4.0 supports a minimum deployment specification of 4C8G. It's just a typical configuration of a nice laptop. In other words, OceanBase Database V4.0 can be deployed and stably run on a personal computer. + + + +As user business grows, OceanBase Database V4.0 can be scaled out to support changing needs over the entire lifecycle of the business. OceanBase Database V4.0 helps users find better solutions to cost reduction, efficiency improvement, and business innovation. + + + +* In its early days, user business handles small amounts of data and has few requirements for disaster recovery. The user can deploy and run OceanBase Database V4.0 on a single server and perform cold backup regularly to protect its data system from possible disasters. +* As its business grows, the user can vertically scale up the specifications of the existing server. To meet its requirements for disaster recovery, the user can add another server to build a primary/standby architecture, which provides the online disaster recovery capability. (Manual intervention is still required during disaster recovery due to the limits of the primary/standby architecture.) +* When its business expands to certain size and data becomes more important, the user can simply upgrade to the three-replica architecture, which ensures high availability with three servers and supports automatic disaster recovery. When a server fails, the three-replica architecture of OceanBase Database V4.0 guarantees business recovery in 8s with zero data loss. In other words, the recovery time objective (RTO) is less than 8s and the recovery point objective (RPO) is 0. +* When user business experiences even greater growth and each server has been upgraded to the highest configurations, the user has to deal with this "happy trouble" as Taobao and Alipay did. In this case, the transparent distributed scalability of OceanBase Database allows the user to scale its cluster out from three to six, nine, or even thousands of servers. + + + +![](/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.png) + +_Figure 1 Deployment evolution: OceanBase Database vs conventional databases_ + + + +### 1.2 Smooth transitions that ensure linear performance improvement + + + +The integrated architecture of OceanBase Database supports smooth transition from standalone to distributed multi-cluster deployment mode, keeping the performance improvement at a linear speed. + + + +Thanks to the good vertical scalability of OceanBase Database, the configuration upgrade of the server in standalone mode usually achieves linear performance improvement. When a user scales a distributed cluster from three to six servers, for example, distributed transactions are often introduced, which, in most cases, results in performance loss. However, OceanBase Database reduces the probability of distributed transactions through a variety of mechanisms, such as the TableGroup mechanism that binds multiple tables together, and the well-designed load balancing strategies. + + + +The good distributed scalability of OceanBase Database also helps maintain linear performance improvement as the number of servers increases. For example, in the TPC-C test, which involves about 10% of distributed transactions, the performance improvement of OceanBase Database remained linear as more nodes were added to the cluster. + + + +![](/img/blogs/tech/small-specification-deployment/image/d6979385-99d3-4ed9-a378-d2a8962e3342.png) + +_Figure 2 Performance of OceanBase Database with different number of nodes in the TPC-C test_ + + + +More importantly, all operations performed in scaling from a standalone OceanBase database to an OceanBase cluster of thousands of nodes are transparent to the business. Users do not need to modify the code of their upper-level business applications, or manually migrate their operation data. If you use OceanBase Cloud, you can perform backup, scaling, and O&M operations all on the same platform, which is quite convenient. + + + +From the first day of the development of OceanBase Database V4.0, we have been thinking about how to run a distributed database on small-specification hardware, yet delivering high performance, so that users benefit from cost-effective high availability in their respective scenarios. OceanBase Database V4.0 not only provides all necessary features in standalone mode, and also delivers higher performance with the same hardware configuration. + + + + + +## 2. Key Techniques for Small-specification Deployment +------------- + + + +In the fundamental software sector, it is very hard to make a database system "large" because the system will be increasingly vulnerable to failures as more nodes are added to it. In our second TPC-C test, for example, we built an OceanBase cluster of 1,554 ECS servers. In such a cluster, the frequency of a single-server failure is about once a day or every other day. The point is we have to make the product sufficiently stable and highly available to keep such a jumbo-sized cluster up and running. + + + +It is equally hard to make a database system "small" because it requires getting down to every detail, much like using a microscope to arrange the usage of every slice of resource. Not only that, some proper designs or configurations in a large system may be totally unacceptable in a smaller one. What's more challenging is that we must make the system suitable for both large and small hardware specifications. This requires us to weigh up between large and small specifications when designing the database system, so as to minimize the additional overhead of a distributed architecture while allowing the database system to make adaptive responses according to hardware specifications in many scenarios. Now, let's talk about the technical solution of OceanBase Database V4.0 by taking the usage of CPU and memory, the two major challenges, as an example. + + + +### 2.1 Reducing CPU utilization through dynamic control of log streams + + + +To build a small database, OceanBase Database V4.0 needs to control the CPU utilization in the first place. In versions earlier than V4.0, OceanBase Database would generate a Paxos log stream for each partition of a data table to ensure data consistency among multiple replicas based on the Paxos protocol. This is a very flexible design because Paxos groups are based on partitions, which means that partitions can be migrated between servers. However, this design puts heavy workloads on the CPU because each Paxos log stream consumes overhead for leader selection, heartbeat, and log synchronization. Such additional overhead occupies a moderate percentage of the CPU resource if servers have large specifications, or the number of partitions is small, but causes an unbearable burden for small-specification servers. + + + +How do we solve that issue in OceanBase Database V4.0? We go straight-forward and reduce the number of Paxos log streams. If we can reduce the number of Paxos log streams to the same as that of servers, the overhead for Paxos log streams is roughly equal to that for logs in a conventional database in primary/standby mode. + + + +![](/img/blogs/tech/small-specification-deployment/image/5830c47b-96eb-4303-9398-8f1b080610a4.png) + +_Figure 3 Dynamic log streams of a cluster based on OceanBase Database V4.0_ + + + +OceanBase Database V4.0 generates a Paxos log stream for multiple data table partitions and dynamically controls the log streams. As shown in the figure above, the database cluster consists of three zones, and each zone has two servers deployed. Assume that two resource units are configured for a tenant. In this case, two Paxos log streams are generated for the tenant, with one containing P1, P2, P3, and P4 partitions and the other containing P5 and P6 partitions. + + + +* When the two servers are not load-balanced, the load balancing module of OceanBase Database migrates partitions between the Paxos log streams. + +* To scale out the cluster, a user can split one Paxos log stream into multiple Paxos log streams and migrate them as a whole. + +* To scale in the cluster, the user can migrate multiple Paxos log streams and merge the streams. + + + +With dynamic log stream control, OceanBase Database V4.0 greatly reduces the CPU overhead of the distributed architecture and guarantees high availability and flexible scaling. + + + +### 2.2 Achieving high concurrency with a small memory space through dynamic metadata loading + + + +The second challenge that OceanBase Database V4.0 needs to take in building a small database is to optimize memory usage. For the sake of performance, OceanBase Database of versions earlier than V4.0 stored some metadata in memory. The memory usage of this portion of metadata was not high if the total memory size was large, but unacceptable for a small-specification server. To support ultimate performance at small specifications, we have achieved dynamic loading of all metadata in OceanBase Database V4.0. + + + +![](/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.png) + +_Figure 4 SSTable hierarchical storage_ + + + +As shown in the figure above, we store an SSTable in a hierarchical structure. To be specific, we store the microblocks of the SSTable in partitions and maintain only the handle of the partitions in memory. The requested data is dynamically loaded by using a KV cache only when the partitions need to be accessed. In this way, OceanBase Database V4.0 is capable of processing highly concurrent requests for massive amounts of data with a small memory size. + + + + + +## 3. Performance of OceanBase Database with Small Specifications +------------ + + + +To test the actual performance of OceanBase Database with small specifications, we deployed OceanBase Database Community Edition V4.0 in 1:1:1 mode based on three 4C16G servers and compared its performance with that of RDS for MySQL 8.0, which was also deployed on 4C16G servers. The comparison was performed by using Sysbench and the results show that OceanBase Database Community Edition V4.0 outperforms RDS for MySQL 8.0 in most data processing scenarios. In particular, under the same hardware specifications, OceanBase Database Community Edition V4.0 handles a throughput 1.9 times that of RDS for MySQL 8.0 in INSERT and UPDATE operations. + + + +![](/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.png) + +_Figure 5 Throughput performance test results of OceanBase Database Community Edition V4.0 and RDS for MySQL 8.0 on Sysbench (4C16G)_ + + + +We also compared the two at specifications of 8C32G, 16C64G, and 32C128G, which are most popular among users. As the server specifications increase, the performance gap widens between OceanBase Database Community Edition V4.0 and RDS for MySQL 8.0. At 32C128G specifications, OceanBase Database Community Edition V4.0 achieves a throughput 4.3 times that of RDS for MySQL 8.0 with 75% less response time. + + + +![](/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.png) + +_Figure 6 Throughput performance test results of OceanBase Database Community Edition V4.0 and RDS for MySQL 8.0 on Sysbench_ + + + +![](/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.png) + +_Table 1 Performance (throughput and response time) test results of OceanBase Database Community Edition V4.0 and RDS for MySQL 8.0 on Sysbench_ + + + + + +Afterword +---- + +OceanBase Database has achieved ultimate performance in the TPC-C test with a massive cluster of more than a thousand servers, and ultimate resource usage in standalone performance tests at small specifications, such as 4C16G. What's behind those achievements is our unshakable faith in our mission to make data management and use easier. Streamlining services for customers with every effort is the motto of every OceanBase engineer. Growing fast, OceanBase Database is not yet perfect. We still have a lot to do to optimize its performance with higher specifications and save more resources in a database with even smaller specifications. OceanBase Database Community Edition V4.0 is now available and we are looking forward to working with all users to build a general database system that is easier to use. + + + +Follow us in the [OceanBase community](https://open.oceanbase.com/blog). We aspire to regularly contribute technical information so we can all move forward together. + +Search 🔍 DingTalk group 33254054 or scan the QR code below to join the OceanBase technical Q&A group. You can find answers to all your technical questions there. + + + +![](https://gw.alipayobjects.com/zos/oceanbase/f4d95b17-3494-4004-8295-09ab4e649b68/image/2022-08-29/00ff7894-c260-446d-939d-f98aa6648760.png) \ No newline at end of file diff --git a/docs/blogs/tech/tablet.md b/docs/blogs/tech/tablet.md new file mode 100644 index 000000000..bd78496b7 --- /dev/null +++ b/docs/blogs/tech/tablet.md @@ -0,0 +1,26 @@ +--- +slug: tablet +title: 'What Is a Tablet in OceanBase Database V4.0? Why Is It Introduced?' +--- + +**OceanBase Database V4.0 introduces the concept of tablets. So, what is a tablet?** + +You may have heard about this concept if you are familiar with the field of storage and databases. This concept can be traced back to the age when Google launched Bigtable. It simply refers to a part of data rows in a table, and the definition is still in use to this day. Actually, tablets can be found in the open source code as far back as OceanBase Database V0.4. Similar to Bigtable, a table in OceanBase Database V0.4 can be automatically split into multiple tablets. However, tablets were replaced with user-defined partitions in later versions of OceanBase Database, such as OceanBase Database Community Edition V3.1, for compatibility with traditional databases. + + + +**Why does OceanBase Database reintroduce this concept in V4.0?** + +This is because we want to distinguish partitions from tablets to improve flexibility. In OceanBase Database, partitions are a logical concept visible to users. For example, users can decide whether to use the HASH or RANGE partitioning method and the number of partitions. Unlike partitions, tablets are a physical concept. Generally, a non-partitioned table without indexes has only one partition, which corresponds to a tablet. The partition is identified by a unique ID in the table, and the tablet is identified by a unique ID in the tenant. A mapping exists between the partition ID and tablet ID to facilitate SQL routing. The mapping may be changed in some situations, for example, when partitions are exchanged. (The partition exchange feature has not been supported yet.) + + + +**How do partitions and tablets map to each other?** + +A partition in a table may contain one or more tablets, and the number of tablets contained is determined by the number of local indexes and whether the table contains large object (LOB) columns. A tablet is added to the partition each time a local index is created. If the table contains LOB columns, two more tablets are added. + + + +**Summary** + +A partition is a logic unit and is visible to users, which is the minimum unit for load balancing in OceanBase Database. Partitions mainly interact with the SQL layer. A tablet is a physical unit and is invisible to users, which is the minimum unit for compactions in OceanBase Database. Tablets mainly interact with the storage layer. \ No newline at end of file diff --git a/docs/blogs/tech/troubleshoot.md b/docs/blogs/tech/troubleshoot.md new file mode 100644 index 000000000..0ab41ce92 --- /dev/null +++ b/docs/blogs/tech/troubleshoot.md @@ -0,0 +1,148 @@ +--- +slug: troubleshoot +title: 'Five Steps to Troubleshoot Process Crashes Based on Logs' +--- + +> About the author: Hu Chengqing, a database administrator (DBA) at Action Technology, specializes in fault analysis and performance optimization. For further discussion, subscribe to his blogs on [Jianshu](https://www.jianshu.com/u/a95ec11f67a8). +> This article is original content from the open source community of Action Technology. Unauthorized use is prohibited. For reposts, please contact the editor and cite the source. +> It will take you about 5 minutes to read the following content. + +Background +---- + +The observer process crashes are hard to diagnose. They are typically caused by program bugs, corrupt files, bad disk sectors, or bad memory blocks. + +A core dump file is automatically configured during the cluster deployment to capture memory information in the event of process crashes. It contains a snapshot of the program status at failure and the stack information for all threads, which are useful in debugging and crash analysis. + +Sometimes, the core dump file may fail to be generated. In such cases, we must obtain stack information from `observer.log` to pinpoint the crash location in code and identify the cause. This method is what this article will discuss. + +_This method applies to OceanBase Database of all versions as of the article's publication._ + +Procedure +---- + +### 1\. Find the crash logs + +The observer process generates a section of logs similar to the following one upon a crash. You only need to search by the **CRASH ERROR** keywords. + +``` + CRASH ERROR!!! sig=11, sig_code=2, \ + sig_addr=7f3edd31dffb, timestamp=1725496052323606, \ + tid=57605, tname=TNT_L0_1002, \ + trace_id=20970917872454-1707004480400037, \ + extra_info=((null)), lbt=0x9baead8 \ + 0x9b9f358 0x7f43d58e562f \ + 0x7f43d52525fc 0x95eeda9 \ + 0x95ec568 0x95e6c0c \ + 0x95e4c33 0x9cbf4c7 \ + 0x93be9ee 0x939e320 \ + 0x93bd64e 0x939c105 \ + 0x939c6e6 0x2cff1c1 \ + 0x9918a74 0x9917461 0x9913f1e +``` + + +### 2\. Obtain the stack information of the crashed threads + +Parse memory addresses to get the stack information, where each memory address corresponds to one stack frame. +``` + addr2line -pCfe /home/admin/oceanbase/bin/observer \ + 0x9baead8 0x9b9f358 0x7f43d58e562f 0x7f43d52525fc \ + 0x95eeda9 0x95ec568 0x95e6c0c 0x95e4c33 0x9cbf4c7 \ + 0x93be9ee 0x939e320 0x93bd64e 0x939c105 0x939c6e6 \ + 0x2cff1c1 0x9918a74 0x9917461 0x9913f1e +``` + + +The output is as follows: + +* Check the stack information from the top down, and ignore the first four lines, which are the fixed stack for processing crashes. +* The crash occurs at line 5, in the `ObMPStmtExecute::copy_or_convert_str` function. + +``` + safe_backtrace at ??:? + oceanbase::common::coredump_cb(int, siginfo_t*) at ??:? + ?? ??:0 + ?? ??:0 + oceanbase::observer::ObMPStmtExecute::copy_or_convert_str(oceanbase::common::ObIAllocator&, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, oceanbase::common::ObString const&, oceanbase::common::ObString&, long) at ??:? + oceanbase::observer::ObMPStmtExecute::parse_basic_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::common::ObObj&) at ??:? + oceanbase::observer::ObMPStmtExecute::parse_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::sql::TypeInfo*, oceanbase::sql::TypeInfo*, oceanbase::common::ObObjParam&, short) at ??:? + oceanbase::observer::ObMPStmtExecute::before_process() at ??:? + oceanbase::rpc::frame::ObReqProcessor::run() at ??:? + oceanbase::omt::ObWorkerProcessor::process_one(oceanbase::rpc::ObRequest&, int&) at ??:? + oceanbase::omt::ObWorkerProcessor::process(oceanbase::rpc::ObRequest&) at ??:? + oceanbase::omt::ObThWorker::process_request(oceanbase::rpc::ObRequest&) at ??:? + oceanbase::omt::ObThWorker::worker(long&, long&, int&) at ??:? + non-virtual thunk to oceanbase::omt::ObThWorker::run(long) at ??:? + oceanbase::lib::CoKThreadTemp >::start()::{lambda()#1}::operator()() const at ??:? + oceanbase::lib::CoSetSched::Worker::run() at ??:? + oceanbase::lib::CoRoutine::__start(boost::context::detail::transfer_t) at ??:? + trampoline at safe_snprintf.c:? +``` + +### 3\. Locate the line of code where the crash occurs + +To locate the last line of code executed within the `ObMPStmtExecute::copy_or_convert_str` function, use GNU Debugger (GDB) 9.0 or later on the debug version to parse the memory addresses. + +``` + ## Download the debug package of the corresponding version. If you are using an enterprise version, contact OceanBase Technical Support. + https://mirrors.aliyun.com/oceanbase/community/stable/el/7/x86_64/ + + ## Install the debug package. + rpm2cpio oceanbase-ce-debuginfo-3.1.5-100010012023060910.el7.x86_64.rpm |cpio -div + + ## Use GDB to open the binary file. + gdb ./usr/lib/debug/home/admin/oceanbase/bin/observer.debug + + ## Parse the memory addresses. + (gdb) list *0x95eeda9 + 0x95eeda9 is in oceanbase::observer::ObMPStmtExecute::copy_or_convert_str(oceanbase::common::ObIAllocator&, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, oceanbase::common::ObString const&, oceanbase::common::ObString&, long) (./src/observer/mysql/obmp_stmt_execute.cpp:1428). + (gdb) list *0x95ec568 + 0x95ec568 is in oceanbase::observer::ObMPStmtExecute::parse_basic_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::common::ObObj&) (./src/observer/mysql/obmp_stmt_execute.cpp:1237). + (gdb) list *0x95e6c0c + 0x95e6c0c is in oceanbase::observer::ObMPStmtExecute::parse_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::sql::TypeInfo*, oceanbase::sql::TypeInfo*, oceanbase::common::ObObjParam&, short) (./src/observer/mysql/obmp_stmt_execute.cpp:1372). + (gdb) list *0x95e4c33 + 0x95e4c33 is in oceanbase::observer::ObMPStmtExecute::before_process() (./src/observer/mysql/obmp_stmt_execute.cpp:512). + 507 in ./src/observer/mysql/obmp_stmt_execute.cpp +``` + + +Additional information: + +The call stack in this case is as follows: +``` + ... + ->ObMPStmtExecute::before_process() + -->ObMPStmtExecute::parse_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::sql::TypeInfo*, oceanbase::sql::TypeInfo*, oceanbase::common::ObObjParam&, short) + --->ObMPStmtExecute::parse_basic_param_value(oceanbase::common::ObIAllocator&, unsigned int, oceanbase::common::ObCharsetType, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, char const*&, oceanbase::common::ObTimeZoneInfo const*, oceanbase::common::ObObj&) + ---->ObMPStmtExecute::copy_or_convert_str(oceanbase::common::ObIAllocator&, oceanbase::common::ObCollationType, oceanbase::common::ObCollationType, oceanbase::common::ObString const&, oceanbase::common::ObString&, long) +``` + +### 4\. Analyze the code + +The crash occurs within the `ObMPStmtExecute::copy_or_convert_str` function at **obmp\_stmt\_execute.cpp:1428**. + +![Line 1428](http://action-weikai.oss-accelerate.aliyuncs.com/20241022/filename.png) + +#### Purpose of the function + +The `ObMPStmtExecute::copy_or_convert_str` function copies or converts the string specified by `src`, a request parameter from the statement protocol, based on the specified character set, and stores the result in `out`. `sig=11` in the crash information refers to signal 11, which indicates that the program accessed an invalid memory address. This is usually because a null pointer is used or the accessed memory is already released. + +The crash occurs at `MEMCPY(buf + extra_buf_len, src.ptr(), src.length());`, where the `MEMCPY` function copies the source string to the allocated memory. + +* **buf + extra\_buf\_len**: the target address, which is the offset of the buffer pointer `buf` plus `extra_buf_len` +* **src.ptr()**: the pointer to the source string +* **src.length()**: the length of the source string, which specifies the number of bytes to be copied + +Here, we can conclude that `src.ptr()` is a null pointer. If a core dump file is available, all you need to do for confirmation is to print the pointer variable by using GDB. + +### 5\. Search the knowledge base + +Search the official knowledge base by the name of the crashed function, which is **copy\_or\_convert\_str** in this case and find the corresponding [bug](https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000430545?back=kb). + +_The code snippet where the crash occurs matches the bug description: When the `execute` protocol is processed, the `send long data` protocol has not finished handling `param_data`, causing the `execute` protocol to read a null pointer during the conversion of `param_data` and triggering a crash._ + +Conclusion +-- + +In most cases, you can analyze logs by following the preceding five steps to quickly identify the cause of an observer process crash. I hope you find this article useful. \ No newline at end of file diff --git a/docs/blogs/users/1st-financial.md b/docs/blogs/users/1st-financial.md new file mode 100644 index 000000000..d07c9410f --- /dev/null +++ b/docs/blogs/users/1st-financial.md @@ -0,0 +1,75 @@ +--- +slug: 1st-financial +title: 'OceanBase Database: The No.1 Distributed Database in the Financial Industry' +tags: + - User Case +--- + + +On December 31, the China Center for Information Industry Development (CCID) released its 2024 China Financial Industry Database Market Research Report. In the report, database vendors in the financial sector were rated in various dimensions, such as product mix, technological approaches, typical projects, and competencies. OceanBase was ranked as **the leader holding the largest share of the distributed database market**. In the **banking, insurance, and securities** submarkets, OceanBase also secured the top position with the highest overall score, making it the only distributed database vendor positioned in the "Leaders" quadrant in all the three submarkets. + +![1735626914](/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.png) + +![1735626924](/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.png) + +![1735626939](/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.png) + +![1735626949](/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.png) + +As a core part of foundational software, databases have always been a focal point in building information systems in the financial industry. In 2023, China's financial database management system market maintained its rapid growth, reaching a value of CNY 6.79 billion, with a growth rate of 17.3%. It is estimated that by 2026, the market size will expand to CNY 11.26 billion. + +In recent years, the distributed database market has been growing at an accelerated pace, driven by factors such as the maturing distributed database technology and increasingly stringent data security and compliance requirements. The report highlights that **many financial institutions are upgrading their databases to distributed architectures to enhance database processing capabilities and scalability for large-scale business systems**. + + + +**1. Distributed Databases: A Must-Have for Core Financial Systems** +-------------------------- + +As technology matures, distributed databases have become the go-to choice for financial institutions due to their advantages in flexibility, scalability, high availability, and security. Their wide adoption in the financial industry has proven their value. Similarly, industries with high business complexity and heavy reliance on databases, such as telecommunications and transportation, are also accelerating their transformation, adopting distributed databases as a key technology for core system upgrades. + +OceanBase Database, a native distributed database designed to handle Alipay's traffic peaks during the Double 11 shopping festival, has been honed over a decade in financial scenarios. Thanks to in-house technological innovations, OceanBase Database has achieved critical breakthroughs in auto-scaling, high availability, and multi-active disaster recovery, delivering exceptional performance to meet the digital needs of various financial institutions in different business scenarios. + +In CCID's 2023 Core System Database Upgrade Selection Reference, OceanBase Database earned the highest score of 36.5, **ranking first for future core system selections**. Additionally, in its 2024 Practice Guide for Key Business System Database Upgrade, CCID recognized the native distributed architecture, represented by OceanBase Database, as the optimal path for key business system upgrades, and OceanBase Database gained a score of 4.15, **the highest in the comprehensive evaluation of upgrade paths**. + +![1735627125](/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.png) + +According to International Data Corporation (IDC), a globally authoritative IT market research and consulting firm, **China's distributed transactional database market is growing rapidly**. Distributed transactional databases are expected to account for one-fifth of the relational database market. + + + +**2. Deep Expertise in Core Financial Scenarios: Safeguarding Key Business Systems** +------------------------ + +CCID identifies banking, insurance, and securities as the "three pillars" of the financial industry. They represent major application scenarios of databases. The CCID report notes that OceanBase Database has already reached the top sales of distributed databases in China's financial industry. OceanBase has secured its leading position in the banking, insurance, and securities submarkets, outperforming other vendors. It is the only distributed database vendor positioned in the "Leaders" quadrant in the all three submarkets. + +Notably, an IDC report released in July 2024 indicates that, in the financial industry, the market size of distributed transactional databases had approached USD 200 million in 2023. **OceanBase Database claimed a 17.1% market share, ranking first among independent database vendors. It also secured the top spot of local deployments in both the financial industry and the insurance and securities submarkets**. + +![1735627272](/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.png) + +![1735627281](/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.png) + +![1735627291](/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.png) + + + +To enhance the disaster recovery capabilities of financial institutions, OceanBase Database pioneered the first disaster recovery architecture that consists of five internet data centers across three regions, setting a new standard for automatic lossless disaster recovery in case of city-wide failures. It achieves a recovery point objective (RPO) of 0 seconds and a recovery time objective (RTO) of less than 8 seconds, ushering the industry into the era of second-level fault recovery. Additionally, OceanBase Database ensures data security by adopting end-to-end encryption and supports Chinese ShangMi (SM) encryption algorithms. + +To handle key business workloads, OceanBase Database V4.2.5 was released in 2024. This long-term support (LTS) version further enhances its online transaction processing (OLTP) performance in terms of high read/write performance, compatibility, and stability, as required by core systems. The test results show a 26% improvement in TP performance compared to the previous version. With the release of OceanBase Database V4.3.3, a General Availability (GA) version, OceanBase Database expanded its analytical processing (AP) capabilities on top of TP to meet real-time analytical needs. + +So far, OceanBase Database serves all policy banks, five out of six state-owned major banks, over 20 banks with assets exceeding CNY 1 trillion, and nearly 100 banks with assets exceeding CNY 100 billion in China. In addition, 75% of top-tier securities firms, 70% of top-tier insurance companies, and 50% of top-tier fund companies have chosen OceanBase Database for their key business system upgrades, aiming to provide high-quality data services. Here, the top 20 in a sector are referred to as the top tier. + + + +●   **Bank of Communications** (BCM) partnered with OceanBase Database in 2022 to tackle key business system challenges and, being **the first in China**, succeeded in upgrading its core credit card system from a mainframe to a home-grown distributed architecture. Dozens of BCM's core business systems, such as its ECIF, debit card hosts, and accounting systems, were then migrated to OceanBase Database. After the migration, the system handles six times more transactions per second (TPS), with batch processing efficiency improved by seven times. The system now handles over 1 billion transactions every day. The distributed solution enables the bank to handle peak traffic with ease while improving O&M efficiency. Compared to the mainframe solution, the distributed solution saves over 15,000 MIPS resources, translating to about CNY 700 million. + +●   In the securities industry, OceanBase Database supported **Guotai Junan Securities** in upgrading over ten core business systems, including the institutional trading system, user system, account system, and clearing system. The company now runs dozens of systems on a 15-server cluster in an architecture comprising three internet data centers across two regions with greatly improved O&M efficiency. The resource utilization has more than doubled, and hardware costs have been substantially reduced. + +●   Based on reliability and cost-effectiveness considerations, **Sunshine Insurance Group** (SIG) chose OceanBase Database for its database upgrade, covering multiple core business systems, such as the property insurance, life insurance, and asset management systems. So far, SIG has built over 20 OceanBase clusters and replaced nearly 400 database instances for more than 200 business systems, slashing hardware resource costs by over 50% with the O&M workload being significantly relieved. + +●   In the second half of 2024, OceanBase Database helped **Ping An Fund** build a new-generation transfer agent (TA) system and has fully supported all its business systems, such as its self-built, listed open-ended fund (LOF), exchange-traded fund (ETF), and separately managed account systems. Running on the new database, the business systems have experienced higher operational efficiency. In particular, the clearing performance has increased by four times, reducing the time required for the daily business clearing from 2 hours to under 30 minutes. + + + +Today, OceanBase serves **over 2,000 customers**, keeping a customer growth of **higher than 100%** over the last four years. It is rapidly expanding its service territory in the financial industry **from core systems of top-tier companies to mid-tier ones**, aiming to help more market players tackle their key business workloads. + +Looking ahead, OceanBase will continue to deepen its technological innovations, hone its product capabilities, and assist more financial institutions in building modern data architectures, laying a solid foundation for the high-quality development of the financial industry. \ No newline at end of file diff --git a/docs/blogs/users/Gartner.md b/docs/blogs/users/Gartner.md new file mode 100644 index 000000000..cbf7d7391 --- /dev/null +++ b/docs/blogs/users/Gartner.md @@ -0,0 +1,83 @@ +--- +slug: Gartner +title: 'OceanBase Is Named an "Honorable Mention" Again in the Report *Magic Quadrant™ for Cloud Database Management Systems* Released by Gartner' +tags: + - User Case +--- + + +Recently, Gartner, a global IT market research and consulting company, released its latest report, *Magic Quadrant™ for Cloud Database Management Systems.* **OceanBase is one of the 10 companies worldwide named an "Honorable Mention" in the report for two consecutive years.** In another Gartner report, *Voice of the Customer for Cloud Database Management Systems*, released in 2024, OceanBase was recognized as a "Customers' Choice" in the Asia-Pacific region and ranked as a "Strong Performer" globally. + + + +**1 OceanBase Cloud is Coming to the Fore, Serving More Than 700 Customers in Just Two Years** +-------------------------------- + +The global market share of cloud databases is growing. According to *Forecast Analysis: Database Management Systems, Worldwide*, research conducted by Gartner in August 2024, the spending on cloud dbPaaS will increase from 61% of the entire database management system (DBMS) market in 2023 to 78% in 2028. + +As early as 2022, OceanBase officially launched OceanBase Cloud, its cloud database product, taking a key step in its cloud strategy. Based on its capabilities such as multi-level auto scaling, large-scale cost reduction, hybrid transaction/analytical processing (HTAP) real-time analysis, and multi-infrastructure, OceanBase Cloud provides customers with integrated cloud database services. + +**In 2024, OceanBase Cloud emerged as the second growth curve within OceanBase,** serving more than 700 customers with an annual customer growth rate of up to 130%. **At the same time, OceanBase Cloud has accelerated its global expansion**, covering more than 100 zones in more than 30 geographic regions in the Americas, Europe, and Asia, providing consistent cloud database services to customers worldwide. It also supports the infrastructure of mainstream public clouds including Alibaba Cloud, Amazon Web Services (AWS), Google Cloud, Huawei Cloud, and Tencent Cloud. + +OceanBase Cloud has achieved leapfrog development, expanding globally within two years after its launch. Among the existing clusters of OceanBase Cloud customers, the largest number of CPU cores in a single cluster exceeds 6,400 and the largest amount of data in a single cluster exceeds 1.2 PB. Service specifications ranging from 1C to 104C are available to meet the business requirements of developers and enterprises. + + + +**2 OceanBase Cloud Provides Optimal Solutions to Five Core Scenarios** +----------------------------- + +Based on rich industry practices, OceanBase Cloud has developed solutions to five typical scenarios for enterprises, including **conventional database migration to the cloud, high concurrency, HTAP real-time analysis, multi-model integration, and multi-cloud disaster recovery**. OceanBase Cloud aims to provide an optimal path for customers to migrate their databases to the cloud, simplify the technology stack, reduce costs, and increase efficiency. + +![1735210734](/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.png) + +- **Conventional database migration to the cloud:** Fully compatible with MySQL and Oracle, OceanBase Cloud provides an automatic upgrade solution that integrates data migration, real-time data synchronization, and incremental data subscription. This solution ensures smooth migration of applications in scenarios such as the migration of offline databases to the cloud, replacement of self-managed databases on the cloud, and hybrid database deployment on and off the cloud. + +- **High concurrency:** OceanBase Cloud has developed a log-structured merge-tree (LSM-tree) architecture for storage and computing based on more than ten years of experience from Alipay in the extreme scenarios of "Double 11." This architecture supports multi-level auto scaling to flexibly use cloud resources and provide nearly unlimited processing capabilities. During off-peak hours, quick scale-in is performed to significantly reduce resource usage costs. + +- **HTAP real-time analysis:** OceanBase Cloud joins online transaction processing (OLTP) and online analytical processing (OLAP) workloads to meet the requirements for transaction processing and real-time analysis. This year, OceanBase Database V4.3.3, the first General Availability (GA) version of OceanBase Database, is released for real-time analytical processing (AP) scenarios, which uses columnstore replicas to greatly improve data processing efficiency. + +- **Multi-model integration:** In addition to traditional structured data, OceanBase Cloud also supports a variety of data types including JSON, XML, and GIS. It can store massive data in key-value pairs and is compatible with multi-model database systems such as HBase and Redis. In OceanBase Database V4.2.5, the long-term support (LTS) version released this year, OBKV-Redis and OBKV-HBase are supported to provide more efficient data processing capabilities for key business scenarios. + +- **Multi-cloud unified technology stack:** As a native distributed database service, OceanBase Cloud does not rely on underlying hardware and is compatible with infrastructure, such as load balancers and object storage, of major cloud vendors. This prevents customers from being affected by the instability of single-cloud infrastructure and ensures business stability and continuity. + + + +**3 OceanBase Cloud Builds a Cloud Foundation for Data from Various Industries** +------------------------------- + +OceanBase Cloud is widely favored in the **retail, manufacturing, and Internet finance** industries. Many enterprises in the industries use OceanBase Cloud to build their modern data architecture, including domestic leading enterprises such as Ideal Auto, vivo, Haidilao, Didi, Ctrip, and XGIMI Technology, as well as overseas enterprises such as DANA, GCash, and PalmPay. + +🚀 **In the retail industry, OceanBase Cloud has become the preferred multi-cloud database service for the top 100 retail enterprises and leading independent software vendors (ISVs) in China.** + +In 2022, **Haidilao** adopted OceanBase Cloud for its membership system. After the system ran stably for one year, Haidilao continued to use OceanBase Cloud to upgrade its inventory management databases, achieving a 45% increase in real-time analysis computing power and a 50% reduction in database overall costs. OceanBase Cloud helps Haidilao significantly reduce costs, increase efficiency, and handle traffic peaks with ease. + +**POP MART** also uses OceanBase Cloud to build a next-generation distributed system for blind box drawing. The new system reduces the scaling time by 90% and ensures system continuity of up to 99.999% in high-concurrency scenarios such as product releases. This enables the system to flexibly handle hundredfold increases in traffic, providing users with a smoother box-drawing experience. + +🚀 **In the Internet finance industry, OceanBase Cloud is serving more than 60% of payment customers whose transaction amount reaches CNY 100 billion.** + +OceanBase Cloud powers **Haier Consumer Finance** to complete the upgrade of its core business systems, including the credit core system, accounting engine, active accounting system, passive accounting system, clearing platform, and messaging platform. The upgrade has enabled automated O&M for Haier Consumer Finance, shortened the response time of paging queries from 4 to 5 seconds to less than 1 seconds, and saved 85% of the storage space. + +**Flyway**, a one-stop cross-border payment service platform, deploys clusters across the China (Hong Kong) and US (Silicon Valley) regions based on OceanBase Cloud to achieve real-time two-way data synchronization, covering multiple core systems such as the risk control and compliance system and intelligent risk control system. For disaster recovery scenarios, the recovery point objective (RPO) reaches 0, the recovery time objective (RTO) is less than 8 seconds, and the response speed of complex query requests is significantly improved. + +In 2023, African payment company **PalmPay** officially launched its core systems powered by OceanBase Cloud, which ran normally with a growth of tens of millions of users. The core accounting database achieved an 86% cost reduction and the monthly database expenditure decreased by 80%. + +🚀 **In the smart manufacturing industry, over 60% of consumer electronics enterprises whose transaction amount reaches CNY 100 billion in China are using OceanBase Cloud.** + +Currently, **vivo** has partnered closely with OceanBase Cloud to intelligently upgrade 17 core business modules of the inventory center in its marketing system, improving the SQL performance by more than 10 times, increasing the storage compression ratio by 5.7 to 15.3 times, and saving storage resources by about 80%. + +In July 2023, **XGIMI Technology** upgraded and transformed more than 40 RDS for MySQL databases for key business fields such as procurement and media resource management, together with OceanBase Cloud. After the upgrade, the RDS for MySQL databases were merged into only two clusters, greatly reducing computing power costs, and the throughput of index-based single-table scans was increased to 10 times that of MySQL. + +In the future, OceanBase Cloud will further enhance its integrated service capabilities, accelerate service adaptation with cloud vendors, and seamlessly connect more cloud technology stacks to help enterprises build modern data architectures. OceanBase Cloud will also continue exploring new technologies and application scenarios to deliver more innovative solutions to customers. + + + +* * * +``` +Gartner, Magic Quadrant for Cloud Database Management Systems, December 18, 2024 + +Gartner, Voice of the Customer for Cloud Database Management Systems, May 24, 2024 + +Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. + +GARTNER, MAGIC QUADRANT, and PEER INSIGHTS are trademarks of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. +``` \ No newline at end of file diff --git a/docs/blogs/users/RAG-CUSRI.md b/docs/blogs/users/RAG-CUSRI.md new file mode 100644 index 000000000..4bbaf29ba --- /dev/null +++ b/docs/blogs/users/RAG-CUSRI.md @@ -0,0 +1,140 @@ +--- +slug: RAG-CUSRI +title: 'Implementing RAG: Application of OceanBase Database at CUSRI' +tags: + - User Case +--- + +> This article is part of [Making Technology Visible | The OceanBase Preacher Program 2024], a technical writing contest. If you are a tech enthusiast, join us in this contest to bring code to life with your words while getting a chance to win a prize of CNY 10,000! +> +> Author: Qiu Yonggang, OceanBase R&D Lead at China Unicom Software Research Institute (CUSRI). He is in charge of the development, support, and O&M of Distributed China Unicom Database (CUDB), the proprietary relational distributed database of China Unicom. + +Generative artificial intelligence (AI) technology has experienced rapid development in recent years, giving rise to large models such as ChatGPT by OpenAI, Qwen by Alibaba Cloud, and ERNIE Bot by Baidu. These models have garnered wide attention for their applications in natural language processing and conversational systems. However, despite their powerful reasoning capabilities, these models cannot directly handle enterprise-specific data and knowledge in real-world applications, limiting their scope of use. In this context, vector databases, as a core component of the Retrieval-Augmented Generation (RAG) architecture, have gradually demonstrated their indispensable capabilities. + +The RAG architecture lifts the limitations of large language models (LLMs) in handling enterprise-specific data by combining pre-trained LLMs with real-time internal data of enterprises. Leveraging the powerful search capabilities of vector databases, developers can do real-time, accurate generation tasks based on enterprise data without the need to retrain models. In this article, I will share how China Unicom successfully implemented RAG in our real business scenarios using the vector search capabilities of OceanBase Database to help developers and database administrators (DBAs) perform database infrastructure-related queries and management more efficiently, thereby improving business response speed and accuracy. + +Background and Challenges: RAG Applications at CUSRI +------------------ + +The database platform of CUSRI serves thousands of internal users across various domains from application development to O&M management. Managing such a vast and complex database ecosystem presents several long-standing challenges: the diversity of database types, significant version differences, high stability requirements for production systems, and inefficiencies caused by discrepancies between testing and production environments. In addition, the heavy workloads of daily database O&M make it hard to improve the system response speed. + +Specifically, we needed to address the following major ones: + +1\. **Management of multiple databases and versions**: China Unicom uses many database products, which necessitates frequent version updates and maintenance activities. Ensuring consistency across different versions and quickly locating the causes of issues became a major challenge in O&M. + +2\. **Efficient management of the production environment and its discrepancies with the testing environment**: The stability of production systems is crucial. How to ensure their stability while quickly resolving their issues was a pressing concern. Additionally, discrepancies between the testing and production environments could lead to performance deviations or potential failures. Efficiently managing and balancing the two to enhance overall system reliability and response speed was key to improving database agility. + +3\. **Improving productivity and responsiveness**: In the face of changing business needs, quickly obtaining necessary information in a complex and dynamic database environment and responding promptly became the core issue in enhancing database O&M efficiency. + +To address these challenges, we developed an intelligent database expert ChatDBA based on the RAG architecture. By combining database expertise with our internal O&M data, ChatDBA allows developers and DBAs to query database status, troubleshoot issues, and obtain recommendations using natural language, thus substantially reducing repetitive tasks. This solution not only improves problem-solving efficiency but also allows the team to focus more on crucial tasks. The following figure illustrates the overall process of this solution. + +![1733391747](/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.png) + +As shown in the figure, files about general and specialized database knowledge, both internal and external, are systematically organized and imported into a knowledge base. Then, files are sliced, converted into vectors by a vectorization embedding model, and stored in a vector database. This way, our LLM can use professional knowledge of DBAs to significantly improve the recall capability and accuracy when answering questions. On top of that, a RAG-based Q&A system is introduced to enhance the LLM's comprehension and communication capabilities for specific questions by retrieving data from external knowledge bases, thereby helping improve the text processing efficiency and quality to generate more accurate and richer text content. ChatDBA has access to extensive database knowledge and experience. It provides comprehensive, high-quality technical consultation services and solutions to database users and maintainers, making databases more accessible and improving database O&M efficiency. + +Database Selection: Upgrade from a Dual-database Architecture to an Integrated Database +--------------------- + +Initially, we deployed a MySQL relational database for data storage and a Milvus vector database. Along with increasing data volumes and business requirements, we soon ran into two problems: the existing databases could not scale horizontally to handle more data, and maintaining two database systems was complicated. + +So, we began searching for a database that supports both relational and vector data. During the selection process, we noticed that a lab release of OceanBase Database V4.x offered powerful vector search and hybrid query capabilities, which motivated us to evaluate the performance of dedicated vector databases, standalone databases, and distributed databases in handling vector data. Here are the comparison results: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Comparison of vector databases for RAG applications
Key vector featureDedicated vector database (Milvus)Standalone database capable of vector processing (PostgreSQL)Distributed database capable of vector processing (OceanBase Database)
Vector processing capabilities and performanceVector query performanceHigh. It is optimized for large-scale vector data processing.Medium. The performance depends on database scalability.High. It is optimized for massive data storage and queries and supports complex queries.
Hybrid vector queryIt does not support hybrid queries with conventional databases.It supports basic vector queries but does not support complex hybrid queries.It supports hybrid queries of vectors, scalars, and other conventional data, and is suitable for complex fusion queries.
Interface flexibilityIt supports SDKs, but does not support SQL.It supports SQL and uses plugins to handle vector queries.It supports both SQL and SDKs, offering more flexible interface options.
Scalability and integrationScalabilityHigh. It can scale horizontally to handle more vector data.It provides limited scalability, which depends on the database performance.High. It supports distributed architectures and is capable of handling massive amounts of data.
Integration with conventional dataNone. It can process only vector data.Strong. It can handle both relational and vector data.Strong. It is capable of handling hybrid queries of relational and vector data.
Operation and maintenance complexityHigh. We must manage both vector and other databases.Average. Additional performance optimizations are required, and we need to manually implement hybrid queries of vector and structured data. The existing O&M systems can be reused.Low. Transaction processing (TP), analytical processing (AP), and AI workloads are handled within one database. Simplify the operational complexity brought by multiple databases
High availability and disaster recoveryHigh availabilityIt supports disaster recovery and high availability, but must be deployed independently.It supports high availability, but its disaster recovery capabilities may not be as strong as expected.Strong. It can be deployed in standalone or distributed mode. It supports active-active/distributed disaster recovery strategies and automatic failover, and is suitable for scenarios with demanding business continuity requirements.
Backup and recovery strategiesIt supports periodic full and incremental backups.It supports full and incremental backups.It supports full and incremental backups and recovers services immediately after a fault occurs.
+ + +After a thorough comparison, we leaned towards an integrated solution based on OceanBase Database. This choice not only simplifies the technology stack but also offers significant advantages in performance, scalability, and ease of management. The tested version of OceanBase Database can process dense vectors with over 16,000 dimensions and calculate multiple types of vector distance, such as Manhattan distance, Euclidean distance, dot product, and cosine distance. It also allows us to create Hierarchical Navigable Small World (HNSW) indexes, perform incremental updates, and delete vectors, and supports hybrid filtering based on vectors, scalars, and semi-structured data. Within its native distributed architecture, these features make OceanBase Database an efficient all-in-one platform that is scalable and simplifies management. + +![1733391787](/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.png) + +The testing and verification results indicated that the vector search capabilities of OceanBase Database fully met our needs, particularly in supporting ChatDBA. More importantly, those vector search capabilities are backed by a full-fledged product ecosystem, which further enhances its feasibility in real-world production environments. Compared with the open source version of Milvus, OceanBase Database demonstrates clear advantages: + +1\. **Easy O&M**: The vector search capabilities of OceanBase Database are provided in OceanBase Cloud Platform (OCP), a dedicated O&M and management tool, which makes database O&M much easier. OCP also provides a suite of features, such as GUI-based fast deployment, hardware resource management, monitoring and alerting, and backup and recovery. Milvus, on the contrary, offers basic database features, lacks comprehensive O&M support, and has security vulnerabilities. + +2\. **High availability and auto-scaling**: The vector search feature of OceanBase Database inherits the high availability of its native distributed architecture, which supports distributed deployment, auto-scaling, and automatic rapid recovery based on the Paxos protocol when a single node fails. In contrast, Milvus can be deployed only on a single server and lacks high availability and horizontal scalability, which is unacceptable in production environments. + +3\. **Resource isolation based on multitenancy**: The vector search feature of OceanBase Database supports resource isolation between tenants. Combined with its high scalability, OceanBase Database provides us with a secure and flexible database-as-a-service (DBaaS) service. We can quickly create database instances using existing resource pools and adjust instance resources as needed. Milvus, on the other hand, lacks resource isolation capabilities, leading to a waste or a shortage of resources, especially when it is deployed on a physical server. + +4\. **SQL support**: The vector search feature of OceanBase Database supports standard SQL operations. Developers can interact with the database using familiar client tools like DBeaver and Navicat. This makes the database more accessible and improves development efficiency. Milvus, however, does not support SQL. Developers can operate data only through APIs and scripts, which is less user-friendly. + +5\. **Rapid migration**: We can use OceanBase Migration Service (OMS) to migrate data to a vector database based on OceanBase Database from a homogeneous or heterogeneous database, or the other way around. Using OMS, we successfully migrated test data from Milvus to OceanBase Database. Milvus itself does not support data migration, and we must rebuild data after a cross-environment migration, which is time-consuming and causes serious impacts on business operations. + +In the performance test, we simulated the actual production scenarios, and created only one instance to cope with tasks that were previously handled by two database systems. Compared to the dual-database deployment, the test instance fully met our performance requirements with approximately 30% fewer resources while greatly reducing resource utilization. This translates to at least a 30% reduction in hardware resource costs. The following figure compares the performance of mainstream vector databases. We can see that OceanBase Database, represented by the VSAG curve, outperforms others. VSAG is a vector indexing algorithm jointly developed by OceanBase and Ant Group. + +![1733391805](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-12/2188bef1-212f-4dfd-bc2c-6b58926b4924.png) + + + +Results and Benefits: Building a RAG-based Modern Data Infrastructure +------------------ + +Given the testing and verification results, we decided to upgrade our MySQL + Milvus architecture to a modern solution and made necessary adaptations, which did not take too much effort. OceanBase Database is fully compatible with SQL syntax, so we modified nothing major except some configurations. We did not even replace the driver package. As for the Milvus vector database, we updated its dependency packages and adjusted database operation methods. Since OceanBase Database supports SQL operations on vector data, and our team was familiar with SQL syntax, the adaptation job was done quickly. We completed all adaptations in about one week and finished functionality verification in less than two weeks. + +In early October 2024, when OceanBase Database V4.3.3, a stable version supporting vector search, was released, we initiated the upgrade of our production databases. Using OMS, we efficiently and smoothly migrated data from Milvus to OceanBase Database. After the upgrade, our two databases were merged into an integrated architecture, which reduced hardware resource usage by about 30% while fully meeting our business performance requirements. The native distributed architecture of OceanBase Database not only significantly improves system stability and minimizes the risk of single points of failure (SPOFs), but also provides scalability for future business growth. This upgrade simplifies the technology stack, alleviates the workload of our O&M team, and lays a flexible, reliable, and scalable technical foundation for long-term business development. + +Summary +---- + +CUSRI upgraded the underlying architecture of ChatDBA to a modern solution based on RAG and OceanBase Database. Thanks to its extraordinary capabilities to handle relational and vector data, a single OceanBase cluster can meet our needs of processing multiple types of workloads and data. The hardware resource usage was reduced by about 30%, and with tools like OCP and OMS, O&M was greatly streamlined to improve team efficiency. + +This project proved the importance of RAG-based vector search capabilities for building an efficient Q&A system. OceanBase Database, as an integrated database, not only supports multimodal data processing and multi-scenario integration but also excels in performance and stability. The new design simplifies the system architecture and provides robust technical support for more advanced business needs in the future, marking a key step in developing a unified, efficient, and intelligent database solution. Looking ahead, we plan to further expand the application of OceanBase Database, and streamline the technology stack and cut O&M costs through modern data architecture upgrades. \ No newline at end of file diff --git a/docs/blogs/users/SAIC-Volkswagen.md b/docs/blogs/users/SAIC-Volkswagen.md new file mode 100644 index 000000000..0c55f0b36 --- /dev/null +++ b/docs/blogs/users/SAIC-Volkswagen.md @@ -0,0 +1,38 @@ +--- +slug: SAIC-Volkswagen +title: 'OceanBase Has Successfully Partnered with SAIC Volkswagen' +tags: + - User Case +--- + + + +![1724923468](/img/blogs/users/SAIC-Volkswagen/image/1724923468990.png) + +Recently, SAIC Volkswagen Automotive Co., Ltd. (hereinafter referred to as "SAIC Volkswagen") has migrated its core business systems, such as the bonus point and coupon system, to OceanBase Cloud, a native distributed database service. **The leading capabilities of OceanBase Cloud have helped achieve an 85% reduction in storage costs, increase business continuity to 99.999%, and improve query performance by five times, enhancing the data management capabilities of SAIC Volkswagen to better meet various user requirements.** + +![1727344692](/img/blogs/users/SAIC-Volkswagen/image/1727344692079.png) + +SAIC Volkswagen is one of the oldest automobile joint ventures in China. It produces and sells more than 30 models under the Volkswagen, Audi, and Skoda brands, covering a wide range of market segments such as A0 class, A class, B class, C class, sport utility vehicle (SUV), and multi-purpose vehicle (MPV). In the first quarter of 2024, SAIC Volkswagen sold 265,000 vehicles, a year-on-year increase of 11.4%, among which 28,000 new energy vehicles were sold with a year-on-year increase of 171.3%. The surge in data volume caused by rapid business growth poses the following challenges to the original open source databases used by the core business systems of SAIC Volkswagen: + +- **High sharding workloads:** The original databases could not meet performance requirements and needed to be sharded. However, due to the large number of rows in a single table and the high data growth rate, the sharding solution was cost-ineffective and involved high risks, so SAIC Volkswagen turned to native distributed databases. + +- **Scaling difficulties:** The CPU load of original databases continued to rise, making it difficult to add resources for business activities or perform online auto scaling in high-concurrency scenarios. This affected the user experience. + +- **Query performance bottleneck:** As data volume was continuously increasing, the original databases experienced low performance in complex queries and, in some cases, failed to query reports, delaying the feedback on business operations. + +In order to further enhance its operational capabilities, provide users with better car buying and using experience, and develop in the digital era, SAIC Volkswagen initiated a new round of database upgrades. After comprehensively evaluating database services based on their migration workloads, product capabilities, business flexibility, and best practices, SAIC Volkswagen finally chose OceanBase Cloud. + +OceanBase Cloud is a cloud database service launched by OceanBase for users of all sizes. It provides multi-model, multi-tenant, and multi-workload capabilities, meeting 80% of data management needs with only one database. It also allows users in different regions to access high-quality enterprise-level database products and services, helping simplify the technology stack and build a modern data architecture. + +**OceanBase Cloud is fully compatible with nearly all MySQL syntaxes and data types used by SAIC Volkswagen. With its traffic replay feature, OceanBase Cloud enhanced the efficiency of full regression testing, ensuring a fast, smooth, and stable business migration from original databases.** After the migration, the diverse requirements of core business systems were met, boosting efficiency at lower costs. + +- **85% savings on storage costs and 15% reduction in total cost of ownership (TCO):** SAIC Volkswagen replaced dozens of original databases with only four OceanBase clusters to simplify the architecture, and utilized the multitenancy capability of OceanBase Cloud to consume resources and manage O&M in an efficient manner. In addition, the advanced compression technology developed based on the log-structured merge-tree (LSM-tree) architecture significantly reduced storage costs by 85% and TCO, including maintenance and operational costs, by 15%. + +- **99.999% business continuity:** The cutting-edge automatic failover capability of OceanBase Cloud guarantees a recovery point objective (RPO) of 0 and a recovery time objective (RTO) of less than 8 seconds, ensuring business continuity when an error occurs in server nodes, zone, or region, preventing costly and complex business failures and data losses. After the database upgrade, the bonus point and coupon system of SAIC Volkswagen achieved a business continuity of 99.999% to support 24/7 stable running of key business systems. + +- **Query performance improved by five times:** The hybrid transaction/analytical processing (HTAP) capabilities of OceanBase Cloud freed SAIC Volkswagen from complex extract, transform, and load (ETL) operations and redundant data. Transaction processing (TP) and real-time analytic processing (AP) workloads were performed by using the same set of data while their servers were isolated from each other to avoid business interference and additional costs. The new bonus point and coupon system handles large data volumes and complex business logic, and delivers a five-fold improvement in query performance. + +- **Support for auto scaling:** Native distributed databases support auto scaling and linear performance growth without stopping servers or modifying applications. This enables SAIC Volkswagen to add computing and storage resources as its business develops, without requiring sharding. The horizontal scaling expands business with low transformation workloads and easily copes with business requirements at all times. + +SAIC Volkswagen always insists on innovative and market-oriented development to better satisfy users. SAIC Volkswagen has embarked on a new journey with OceanBase Cloud, elevating its data management capabilities to support diverse user needs. In the future, the two parties will collaborate to tackle more key business system challenges and ensure every drive counts. \ No newline at end of file diff --git a/docs/blogs/users/Sunshine-Insurance.md b/docs/blogs/users/Sunshine-Insurance.md index a561e514d..776eaef30 100644 --- a/docs/blogs/users/Sunshine-Insurance.md +++ b/docs/blogs/users/Sunshine-Insurance.md @@ -5,8 +5,7 @@ tags: - User Case --- - -![1732192973](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-11/043bb459-a79c-457d-afd1-f6d383c17aad.png) +![1732192973](/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.png) Sunshine Insurance was established in July 2005 and grew into a fully licensed insurance group in less than three years. The group made it to the list of China's top 500 enterprises within five years after its founding, and has been recognized as one of "China's 500 Most Valuable Brands" by the World Brand Lab for 14 years in a row. It is one of the fastest-growing medium-sized insurance companies in the industry. @@ -21,23 +20,23 @@ At the 2024 OceanBase Annual Conference, **Yang Qinghua, the head of Sunshine Di In its digital transformation journey, SIG has always focused on empowering business development with technology and promoting technological innovation in house. It has resolutely implemented regulatory requirements, quickly responded to industry trends, and driven continuous upgrades and reforms in its IT architecture. SIG's technological evolution has gone through four main stages. -![1732193245](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-11/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.png) +![1732193245](/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.png) -**○   Early Stage (2004-2014): In this stage, SIG focused on online transformation**. A series of core business systems were deployed to free some staff from tedious form work. Most systems were externally sourced. +- **Early Stage (2004-2014): In this stage, SIG focused on online transformation**. A series of core business systems were deployed to free some staff from tedious form work. Most systems were externally sourced. -**○   Stage 1.0 (2015-2017): In this stage, SIG emphasized adaptation to internet-based business based on the distributed architecture and configurable design**. A typical move was the building of new-generation core business systems starting in 2015. Drawing on experiences of peer companies, SIG developed capabilities such as local data centers, process engines, and rule configuration, further expanding the application of internet technologies in its business systems. +- **Stage 1.0 (2015-2017): In this stage, SIG emphasized adaptation to internet-based business based on the distributed architecture and configurable design**. A typical move was the building of new-generation core business systems starting in 2015. Drawing on experiences of peer companies, SIG developed capabilities such as local data centers, process engines, and rule configuration, further expanding the application of internet technologies in its business systems. -**○   Stage 2.0 (2018-2022): In this stage, SIG took action to develop its mobile capabilities, data processing capabilities, and cloud-native technologies**. It developed a series of B2C and B2B apps to improve user experience, started building a data mid-end platform and, in response to trends in cloud-native transformation, implemented service governance, containerization, and DevOps technologies, starting with its core systems. +- **Stage 2.0 (2018-2022): In this stage, SIG took action to develop its mobile capabilities, data processing capabilities, and cloud-native technologies**. It developed a series of B2C and B2B apps to improve user experience, started building a data mid-end platform and, in response to trends in cloud-native transformation, implemented service governance, containerization, and DevOps technologies, starting with its core systems. -**○   Stage 3.0 (2023-Present): In this stage, SIG steered toward intelligentization.** With the rise of AI, SIG has been gradually moving to system intelligentization. Guided by the philosophy of "one server serving a group of customers," the group has developed in-house AI and large-model capabilities to achieve intelligent decision-making systems, early warning systems, and robot employees, while further strengthening its cloud-native infrastructure. +- **Stage 3.0 (2023-Present): In this stage, SIG steered toward intelligentization.** With the rise of AI, SIG has been gradually moving to system intelligentization. Guided by the philosophy of "one server serving a group of customers," the group has developed in-house AI and large-model capabilities to achieve intelligent decision-making systems, early warning systems, and robot employees, while further strengthening its cloud-native infrastructure. With the fast business growth and the unceasing technological upgrade, the complex application architecture and business requirements posed new challenges to data systems, including the following three challenges to databases: -**○   Autonomous control**: Smoothly upgrading to a domestic technology stack was a significant challenge in data management. +- **Autonomous control**: Smoothly upgrading to a domestic technology stack was a significant challenge in data management. -**○   Management of multiple data sources**: With the advancement of cloud-native and intelligent technologies, many of SIG's business systems were shifted to microservice-based architectures. Some large core systems even consisted of dozens or hundreds of microservices, leading to more database instances and more issues in database selection and data asset management. +- **Management of multiple data sources**: With the advancement of cloud-native and intelligent technologies, many of SIG's business systems were shifted to microservice-based architectures. Some large core systems even consisted of dozens or hundreds of microservices, leading to more database instances and more issues in database selection and data asset management. -**○   Performance, availability, and scalability**: Databases hit bottlenecks in terms of non-functional requirements such as the performance, availability, and scalability. The conventional technology stack made it costly to meet new business demands, especially for internet-based services, which were affected by insufficient availability and scalability of the data layer, leading to increased costs and risks. +- **Performance, availability, and scalability**: Databases hit bottlenecks in terms of non-functional requirements such as the performance, availability, and scalability. The conventional technology stack made it costly to meet new business demands, especially for internet-based services, which were affected by insufficient availability and scalability of the data layer, leading to increased costs and risks. @@ -46,15 +45,15 @@ With the fast business growth and the unceasing technological upgrade, the compl SIG determined its database upgrade strategy based on three principles: -**○** Exhaustive replacement. All new business systems, including core features, must be upgraded based on domestic databases. +- **Exhaustive replacement.** All new business systems, including core features, must be upgraded based on domestic databases. -**○** Layer replacement with expert support. Collectively replace key layers such as the databases and middleware layers, and establish an internal expert group to cope with upgrade challenges and ensure a smooth process. +- **Layer replacement with expert support.** Collectively replace key layers such as the databases and middleware layers, and establish an internal expert group to cope with upgrade challenges and ensure a smooth process. -**○** Hands-on use: Deploy and use domestic products in real-world business scenarios, and avoid long-term running of both legacy and new databases in parallel. This verifies whether core business systems are viable on domestic databases. +- **Hands-on use.** Deploy and use domestic products in real-world business scenarios, and avoid long-term running of both legacy and new databases in parallel. This verifies whether core business systems are viable on domestic databases. Based on this strategy, Yang explained why they chose OceanBase Database: **"Reliability and cost-effectiveness are the two decisive factors in our database selection. "** -![1732193417](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-11/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.png) +![1732193417](/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.png) When it came to reliability, SIG placed a high value on three factors: **Autonomous control**: The whole stack of OceanBase Database is developed fully in house, which aligns perfectly with SIG's requirements. The vendor can quickly deal with any custom needs or issues during the upgrade process. **Technical reliability**: OceanBase Database has advantages in distributed architecture, high performance and availability, and scalability, and its reliability has been demonstrated through successful applications in peer companies in the industry. **Service reliability**: OceanBase Database is backed by a full-time professional technical support team, who can assist with database deployment and O&M. @@ -73,11 +72,11 @@ At the conference, Yang highlighted their experience in implementing the Ultra-S **Given this background, the database supporting the Ultra-Short-Term Insurance system must provide high concurrency, low latency, strong consistency, high availability, resource efficiency, and minimal post-migration changes**. -![1732193764](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-11/bf7723fa-71c7-4478-af25-a049acee4d72.png) +![1732193764](/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.png) Yang showed the four stages of migrating the database supporting the Ultra-Short-Term Insurance system. -![1732193777](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-11/0f4f8565-6427-498e-9bb8-d6b90664c5cb.png) +![1732193777](/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.png) **Stage 1: Using OceanBase Migration Assessment (OMA), SIG conducted a thorough analysis of the original centralized database to identify incompatible functions and problematic SQL statements**. It encountered challenges, such as issues related to stored procedures and global unique IDs, in the assessment. SIG addressed the stored procedure issues by disabling stored procedures at the development level, and implemented a highly available distributed naming service based on Zookeeper for continuous ID generation. diff --git a/docs/blogs/users/Tansun.md b/docs/blogs/users/Tansun.md new file mode 100644 index 000000000..9cef709d8 --- /dev/null +++ b/docs/blogs/users/Tansun.md @@ -0,0 +1,154 @@ +--- +slug: Tansun +title: 'OceanBase & Tansun Technology Unveil a Joint Solution with a Next-gen Credit Card Core System to Provide Innovative Vitality and Data-driven Support for Steady Growth of Credit Card Business' +tags: + - User Case +--- + + +In 1985, Bank of China (BOC) issued the first RMB credit card in China. After more than 30 years of development, credit cards have become popular among the general public from being held by a few elites. According to data from the People's Bank of China (PBC), as of the end of the first quarter of 2022, each person in China held an average of 0.57 credit cards and combined credit and debit cards. + + + +However, as the growth rate driven by traffic dividends slows down and several new regulations concerning the standardized development of the credit card industry have officially come into effect, the once booming credit card business is entering the "existing-customer-focused era." According to the financial reports of major banks in the first half of 2022, it is obvious that the relevant data of credit card business, such as the number of issued cards, loan balances, and transaction amounts, have all shown weak growth. + + + +In the "existing-customer-focused era", the growth logic of credit card business is changing. As a financial product with both payment and loan features, credit cards are always in the vanguard of retail banking. To discover new opportunities in this era, credit card business needs to shift its focus from products to customers and upgrade from digitalization and dataization to intelligence. This transition requires strong support from the underlying system. + + + +OceanBase Database, a completely self-developed native distributed database service, has teamed up with Tansun Technology, a leading IT solution service provider in the domestic financial industry, to launch a "joint solution with a next-generation credit card core system." The joint solution uses the next-generation credit card core system CreditX of Tansun Technology at the business application layer, and uses an integrated architecture of microservices, container clouds, and native distributed database service OceanBase Database at the platform support layer. This provides innovative vitality and data-driven support for steady growth of credit card business. + + + +![](/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.png) + +_Architecure of the "joint solution for credit cards"_ + + + +CreditX organically integrates retail financial services such as credit cards and consumer loans in the business model, supports real-time posting of financial transactions without delays, and processes data at the transaction level. It provides a "customer-centric" view to manage the entire lifecycle of customers, a customizable mesh credit limit management system, a flexible and powerful parameter system, and comprehensive pricing capabilities for various customers. These features effectively support banks in managing and innovating next-generation credit card business. + + + +The integrated architecture at the platform support layer realizes unlimited elastic scalability of the banking system, supports hundreds of millions of accounts and daily processing of hundreds of millions of transactions, and supports multi-active deployment of applications and databases, such as three IDCs across two regions, five IDCs across three regions, local active-active disaster recovery, and active geo-redundancy. This greatly improves the disaster recovery capability of the credit card system. + + + +As shown in the figure below, the joint solution has eight major business innovations, including transaction-oriented interest calculation, real-time transaction posting, integration of multiple business models, and multi-dimensional accounting processing, as well as eight major technical innovations, including auto scaling, true data consistency, unitized deployment, and agile iteration. + +![](/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.png) + + +**Transaction-oriented interest calculation** + + + +Due to the unique nature of credit card business, interest processing is especially complex and critical and is involved in all stages of credit card transactions, including debit transactions, credit transactions, and full repayments. However, under the balance-based interest calculation mechanism of the traditional credit card system, the correlation between balances and transactions is missing. In this case, the entire interest processing only reflects the results instead of the process, which poses a great challenge to customer service. At the same time, the balance-based interest calculation mechanism calculates interests in an inaccurate manner, which also restricts the fine-grained accounting of banks. + + + +To handle the preceding pain points of the balance-based interest calculation mechanism, Tansun Technology pioneers an innovative reform in the industry by adopting a transaction unit-based interest calculation mechanism. Under the new mechanism, a transaction unit is formed for each transaction, with interest calculated independently within the unit. The final interest is determined based on the relationships between different transaction units. Unlike the conventional balance-based mechanism, the transaction unit-based mechanism processes interest calculation at transaction level, which accurately reflects the interest calculation of each transaction. It also supports dynamic interest calculation and determines the final interest based on the transaction behaviors of customers, providing strong support for joining the complex and changeable competition in the future market. + + + + + + + +**Real-time transaction posting** + + + +In the traditional dual-message processing mode of credit cards, a transaction is considered complete in the system backend only after it is authorized and posted. However, there is a natural time difference between the authorization and posting of a transaction, which is calculated in days. With the development of credit card business, more derivatives are involved in transaction posting, including more types of credit limits and more rights and interests. Cardholders also have higher requirements for the timeliness of transaction posting. + + + +The transaction posting of credit cards is complex. Therefore, to authorize and post a transaction at the same time, the system must respond quickly and provide high processing performance. Tansun Technology is the first in the industry to innovatively adopt a real-time balance-based mechanism, in which single-message transactions of credit cards are truly posted in real time. This mechanism uses real-time balances as a bridge to connect the authorization and posting processes. When a transaction is authorized, the balance is updated in real time and the credit limit is accurately restored, and the subsequent complex accounting processing is asynchronously processed based on the posting result of the real-time balance. This way, real-time posting can be achieved with timely and high-performance transaction authorization. + + + + + + + +**Auto scaling** + + + +The upgrade and transformation of credit card business requires both agility and business continuity. To ensure business continuity, CreditX is deeply adapted to OceanBase Database, which allows the system to scale without limits. + + + +In the traditional IT architecture, databases often become a scalability bottleneck. However, by integrating with native distributed databases, the credit card system can use one set of code to support both sharding and partitioning database architectures. This provides a stable, reliable, cost-effective, and efficient database solution for application development. For the capacity, OceanBase Database supports the following three scaling methods. Theoretically, the capacity is unlimited. + + + +**Scaling based on existing server resources, which allows databases to stagger resource utilization.** This is the most common method used by small and medium-sized banks, in which existing resources can be allocated based on business requirements. For example, the ratio of resources used for trading and analysis can be 9:1 during the day and 4:6 at night. The existing server resources are like a piece of cake, which can be dynamically allocated between different databases and different services to shift loads for resource utilization. + + + +![](/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.png) + + + +**Vertical scaling, which allows the dynamic increase or decrease of server resources in existing IDCs.** For large-scale promotional activities and marketing activities, you can estimate the required resource capacity of the system in advance and adjust the number of server resources accordingly. Distributed databases support smooth and seamless scaling. + + + +**Horizontal scaling, which allows the increase of IDCs to scale out resources.** To support horizontal scaling, a distributed database must be a cluster deployed among multiple IDCs where the number of replicas for a tenant or the database can be specified. + + + + + + + + + +**True data consistency** + + + +CreditX uses advanced methods to manage data throughout the entire lifecycle from data generation to the final state of the transaction process, and data in the lifecycle is fully preserved. OceanBase Database has the following five unique lines of defense to ensure data consistency, which prevent data loss and confusion by detecting and resisting data issues such as silent disk errors and hard disk firmware damages. + + + +![](/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.png) + + + +The Paxos consistency protocol serves as the first line of defense. It ensures not only data consistency during transactions, but also seamless switching of database write points among multiple replicas. + + + +The verification of data consistency among multiple replicas within the cluster serves as the second line of defense. It supports hybrid deployment of replicas with heterogeneous chips within the cluster for rigorous canary releases and the long-term running of the replicas. + + + +The verification of data consistency between primary and standby clusters serves as the third line of defense. It supports hybrid deployment of primary and standby clusters with heterogeneous chips for rigorous canary releases of the chips and the operating system, as well as the long-term running of the clusters. + + + +The chained checksum technology serves as the fourth line of defense. It supports both binary and columnar checks on the checksums of data blocks, table partitions, and indexes. Any tampering of bytes can be detected. + + + +The periodic scanning of cold data serves as the fifth line of defense. It effectively detects silent disk errors to prevent data loss, proving that advanced technology makes data reliable. + + + +Currently, this joint solution has been put into practice in many leading banks. For example, a large state-owned bank has used this joint solution to migrate the hosts of the credit card core system from a mainframe to a distributed architecture. This reconstruction project reduced the yearly system O&M costs by 75% and cut the batch processing time by more than 50% for the bank. As builders of the system supporting credit card business, OceanBase and Tansun Technology will continue to go deep into the front line, improve technologies, products, and services, and work together to provide customers with more valuable overall solutions. + + + +Follow us in the [OceanBase community](https://open.oceanbase.com/blog). We aspire to regularly contribute technical information so that we can all move forward together. + + + +Search DingTalk group 33254054 or scan the QR code below to join the OceanBase technical Q&A group. You can find answers to all your technical questions there. + + + +![](https://gw.alipayobjects.com/zos/oceanbase/f4d95b17-3494-4004-8295-09ab4e649b68/image/2022-08-29/00ff7894-c260-446d-939d-f98aa6648760.png) \ No newline at end of file diff --git a/docs/blogs/users/baicizhan-cto.md b/docs/blogs/users/baicizhan-cto.md new file mode 100644 index 000000000..d93fe0063 --- /dev/null +++ b/docs/blogs/users/baicizhan-cto.md @@ -0,0 +1,104 @@ +--- +slug: baicizhan-cto +title: 'Core Learning Record Database of Baicizhan Migrated to the Cloud Within Five Month, Saving 80% of Storage Space' +tags: + - User Case +--- + +# Core Learning Record Database of Baicizhan Migrated to the Cloud Within Five Month, Saving 80% of Storage Space +This article is a summary of Episode 14 of the DB Gurus Talks series. + +Baicizhan is an illustrated English learning app designed to make building vocabulary fun for people of all ages and proficiency levels. In the 14th episode of DB Gurus Talks, Mr. Jing Mi, CTO of Chengdu Chaoyouai Technology Co., Ltd. ("Chaoyouai" for short), was invited to share his insights. + +Jing is a seasoned technical expert with extensive experience in distributed architectures and databases. He worked with big tech companies like Baidu and Xunlei before joining Chaoyouai. Years of complex software development have shaped his meticulous approach to testing and verification, which he has carried into his current job. + +In this episode, Jing talked about how the careful selection, comprehensive testing, and thorough preparation ensured the smooth migration of a 30-node MySQL database to OceanBase Cloud. + +![1728469648](/img/blogs/users/baicizhan-cto/image/1728469646994.png) + +What's great about Baicizhan is that it uses image-based memorization techniques to make vocabulary building more engaging. The unique learning method has brought it the highest user activity over other foreign language learning apps. + +Released in 2012, Baicizhan has amassed over 200 million users. For an app with such a huge user base, ensuring smooth user experience is no small feat. Baicizhan must not only guarantee 24/7 accessibility but also track every user interaction, from vocabulary book selections to learning progress and reviews. All such data is stored in the core learning record database. + +**As the user base grew, the learning record database was expanded to 30 nodes**. By early 2024, the company decided to upgrade the database from MySQL to OceanBase Cloud, shrinking it to 3 nodes. This move not only saved 20% to 30% in costs but also eliminated the need for manual scaling, leading to a remarkable O&M efficiency boost and paving the way for further business innovation. + + + +**1. Business Growth Putting Pressure on Scalability** +--------------------- + +Hosting the learning progress data of all users, the learning record database is the largest and one of the most important databases of Baicizhan. With the continuous influx of new users and the growing data volume, the company had to deploy more database nodes. + +"Baicizhan was born in 2011. Given its internet-based services, we adopted the mainstream technology stack of the time. MySQL was the go-to choice for most internet companies, so we chose it for our learning record database," said Jing. + +Jing Mi is a tech veteran who has worked on software development and architecture design at companies like Baidu and Xunlei. He joined Baicizhan in 2021, leading its technical operations, from infrastructure setup to cloud service optimization, and exploration of new technologies like AI and multi-infrastructure. + +"We had to add database nodes one after another as the data volume grew. In the past two years, with the rise of e-learning, our core business grew rapidly, requiring 3 or 4 new nodes each year," he added. + +**Baicizhan deployed its system on a public cloud, using the RDS for MySQL database service provided by the cloud vendor. The maximum storage capacity of a node was around 3 TB. When the data volume exceeded the capacity, it had to scale out the system**, which was quite straightforward for data of new users—simply routing it to new nodes. However, scaling was complicated for data of existing users, as it required re-sharding the archive data, a task that was done manually by database administrators (DBAs) based on their experience. Over time, more nodes were deployed, making that approach increasingly challenging. + +First, ensuring the smooth operation of so many nodes without downtime was really hard. Despite robust cloud infrastructure, hardware or software issues could cause nodes to fail, disrupting business operations and putting immense pressure on system maintenance, disaster recovery, and backup. + +Second, manual sharding incurred high labor costs. For example, DBAs must monitor which nodes were nearing their limits and quickly re-shard data before the nodes ran out of resources. This increased not only O&M costs, but also development costs, because developers must know which node stored the data required by the query. + +Third, from a technical perspective, a true distributed solution would not require manually distributing data. + +"Manual data distribution often leads to uneven node loads. For instance, new nodes typically handle data of new or highly active users, running under high loads. However, manual optimizations could hardly achieve dynamic load balancing, leading to uneven loads across nodes," explained Jing. + +Furthermore, the large number of standalone RDS instances required numerous data transmission service (DTS) connections for data synchronization with the big data platform, resulting in complex O&M and high costs. + + + +**2. Thorough Testing for a Stable Migration** +--------------- + +In July 2023, overwhelmed by the O&M pressure, Baicizhan decided to make a change. Jing Mi believed that adopting a distributed database would be an effective solution to move away from manual data distribution. During his time at Baidu, Jing participated in the development of a distributed database, now the open source Apache Doris. His deep expertise in distributed database technology made it clear that Baicizhan needed a reliable distributed database. + +Jing revealed that the company had considered replacing the database during the online education boom a couple of years ago. However, the rapid business growth compelled them to handle more imperative tasks. + +With many distributed database vendors in the market, choosing the right one was not easy. Baicizhan spent nearly two months on market research and product verification, engaging with several vendors before tentatively selecting OceanBase Cloud. + +**"We chose OceanBase Cloud because it perfectly fits our needs. For example, its high data compression ratio and computing capabilities are well-suited to our applications. Plus, we can easily find many success stories that prove its benefits," said Jing.** + +Another key factor in the decision was that the OceanBase team was highly cooperative and supportive during product verification, which made the communication smooth and efficient. + +Baicizhan did not immediately start the migration. Instead, they spent nearly three months testing OceanBase Cloud before finalizing the selection. Jing emphasized that the testing was comprehensive, with test data reaching 1/10 of their total data volume to simulate real-world scenarios. + +**"We set up a three-node cluster and tested its performance with terabytes of data in various scenarios, such as bulk writes, bulk reads, and fault recovery. For example, we simulated extreme conditions like shutting down a node or network interruptions to observe the results," said Jing.** + +The testing also covered computing capabilities, APIs, and usability of the database. The thorough testing paid off as the actual migration process went smoothly. Starting in January 2024, Baicizhan migrated two nodes every two weeks, and then gradually increased the pace and completed the migration by the end of June. + +"The migration process went smoothly, thanks to our exhaustive preparations and simulations. We also took the opportunity to streamline the codebase, which has been in use for over a decade," explained Jing. + + + +**3. Immediate Cost Savings and Efficiency Gains** +----------------- + +Running the learning record database on OceanBase Cloud, the benefits are remarkable. + +**Streamlined architecture**. The new architecture consists of only 3 nodes instead of 30, and the data storage space required is reduced substantially. "OceanBase Cloud provides a high compression ratio. The data now occupies less than 1/5 of the original storage space," Jing noted. + +**Lower costs**. With fewer nodes, database costs are slashed by 20-30%, even with considerable redundancy of computing and storage resources. + +**Relieved workload**. DBAs no longer need to watch database metrics closely and hurry to scale the database as the resource usage approaches thresholds. When scaling the previous database system, DBAs must manually distribute data across nodes, which requires deep understanding of the business logic and extensive experience. Now, OceanBase Cloud automatically handles data sharding, leading to a significant improvement in O&M efficiency. + +**Higher scalability. Storage space is no longer decided by the computing resource specification. Unlike an RDS for MySQL database with 30 CPU cores, an OceanBase cluster with the same core count can store up to 200 TB of data.** + +Jing noted that the successful migration of the learning record database has facilitated their understanding of OceanBase Database and they have gained valuable experience for future database migrations. In fact, the company is now evaluating the feasibility of migrating other business databases, and OceanBase Database has become their top choice for new business designs. + + + +**4. Summary** +---------- + +Currently, Baicizhan is exploring more features of OceanBase Cloud, such as AI and hybrid transaction and analytical processing (HTAP) capabilities, aiming to further empower its business. Freed from laborious O&M of conventional databases, DBAs are able to invest more time and energy in the explorations. + +"Today, our business teams are expecting more from databases. DBAs should shift their focus toward business needs, and participate in data and system architecture design and data governance. By understanding the company’s strategies, DBAs can contribute to data planning, broaden their career prospects and enhance their competitiveness," Jing concluded. + +* * * + +**Mark Your Calendar!** +------------ + +The [2024 OceanBase Annual Conference](https://oceanbaseweb-pre.oceanbase.com/conference2024?activityCode=4923042&officerId=3881) will be open on October 23 at the Hyatt Regency Beijing Wangjing. Mr. Jing Mi, CTO of Baicizhan, will be there to share the best practices of OceanBase Cloud in Baicizhan's system. [Sign up now](https://oceanbaseweb-pre.oceanbase.com/conference2024?activityCode=4923042&officerId=3881) to join the event! \ No newline at end of file diff --git a/docs/blogs/users/financial-industry.md b/docs/blogs/users/financial-industry.md new file mode 100644 index 000000000..1cbd6e9d3 --- /dev/null +++ b/docs/blogs/users/financial-industry.md @@ -0,0 +1,119 @@ +--- +slug: financial-industry +title: 'Case Study of Selecting OceanBase Database in the Financial Industry' +tags: + - User Case +--- + +The application of OceanBase Database in the financial industry has yielded remarkable results. OceanBase Database, as a distributed database, has proven its value in meeting stringent requirements for high availability, low latency, high throughput, and data consistency. This article describes some real-world case studies that illustrate the selection and implementation outcomes of OceanBase Database in the financial industry. + + + +**1. Background and Requirements** + +The unique characteristics of the financial industry dictate its rigorous requirements for database systems in the following aspects: + +1) High concurrency and throughput: Financial systems, especially in banking and securities markets, face millions of transaction requests or even a hundred times more on a daily basis. Databases must be capable of handling extremely high concurrent loads. + +2) High availability: Financial operations cannot afford downtime. Any instance of database unavailability can lead to financial losses and customer attrition. Robust disaster recovery and fault tolerance mechanisms are critical. + +3) Data consistency: Financial transactions require strong data consistency. Any data inconsistency can jeopardize the accuracy of transactions and the flow of funds. + +4) Compliance and security: The financial industry is subject to strict regulations on data privacy and security, and databases must comply with national and industry standards. + +OceanBase Database intrinsically excels in these aspects, thanks to its powerful distributed architecture, which supports horizontal scaling, distributed transactions, and strong consistency. + + + +**2. Case Studies of OceanBase Database in the Financial Industry** + +**1) Migration of a core banking system** + +A large state-owned commercial bank previously relied on conventional relational databases like Oracle. However, as it served more customers, the transaction volume surged, leading to several critical issues: + +- Performance bottlenecks: The legacy database system struggled to handle highly concurrent requests and massive data volumes, leading to prolonged response time and decreased throughput, especially during peak transaction hours. + +- Availability concerns: The legacy database system lacked flexible disaster recovery and high availability mechanisms. In particular, it did not support disaster recovery across multiple IDCs. + +- High costs: The legacy database system incurred high hardware and software maintenance costs and could hardly be scaled, resulting in escalating O&M expenses. + +To address those issues, the bank decided to upgrade its core database system and ultimately chose OceanBase Database after rounds of assessment. + +![1735288135](/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.png) + +**Key points of the upgrade process:** + +- Distributed architecture: OceanBase Database's distributed architecture is highly scalable. The bank can effortlessly add storage and computing nodes to horizontally scale OceanBase Database without disrupting operations, meeting its needs for high concurrency and throughput. + +- Distributed transactions: OceanBase Database ensures strong consistency of transaction data even under high concurrency, which is crucial for financial business. + +- High availability design: OceanBase Database supports multi-IDC deployment, which provides data backups across different regions. In the event of an IDC failure, the system swiftly switches services to another IDC to ensure uninterrupted business operations. + +- Data security and compliance: OceanBase Database provides built-in encryption, auditing, and authorization mechanisms, helping the bank meet regulatory requirements and safeguard customer data. + +**Outcomes:** + +- Performance boost: The bank's transaction processing capacity has been improved substantially. OceanBase Database has demonstrated its robust throughput performance, handling millions of transactions stably during peak hours. + +- Cost reduction: Thanks to the distributed architecture of OceanBase Database, the bank can flexibly scale resources as needed, resulting in lower hardware and software procurement and maintenance costs compared to conventional standalone databases. + +- High availability assurance: The bank has achieved uninterrupted service availability. The new solution prevents transaction disruptions caused by database failures and enhances customer satisfaction and system stability. + +**2) Upgrade of a securities trading platform** + +A well-known securities company deployed a conventional relational database for its trading platform. As the market demand and transaction volume grew, the platform encountered the following issues: + +- High database latency: The platform handled highly frequent transactions, and the database latency became a bottleneck, leading to poor user experience. + +- Poor scalability: The conventional database provided limited horizontal scalability and struggled to keep up with the rapidly expanding trading business. + +- Unstable performance during peak hours: During periods of high market volatility, the conventional database struggled to process surging real-time transaction requests, leading to system overloads, performance degradation, and service interruptions. + +To enhance the performance and stability of its trading platform, the securities company decided to replace the conventional database with OceanBase Database. + +![1735288234](/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.png) + +**Key points of the upgrade process:** + +- Advantages of a distributed architecture: OceanBase Database can be deployed in a distributed architecture, which allows the company to scale database resources horizontally to tackle millions of transaction requests per second and ensure stable operation even during peak hours. + +- High concurrency processing capacity: OceanBase Database is optimized to meet the requirements for high concurrency and low latency of financial applications. Its in-memory storage engine and distributed transaction management capability can significantly improve the processing efficiency of real-time transactions. + +- Multi-IDC deployment: OceanBase Database can be deployed across multiple IDCs. If one IDC fails, another IDC can take over the services. This guarantees the high availability of the trading platform. + +- Data consistency assurance: OceanBase Database provides distributed transaction management features to ensure data consistency across different database nodes, preventing issues like fund errors during transactions. + +**Outcomes:** + +- Enhanced system performance: The trading platform's response time has dropped dramatically. It processes order and transaction requests faster and improves user experience greatly. + +- High availability: OceanBase Database provides high availability by eliminating single points of failure (SPOFs). It ensures stable platform operation during peak hours and maintains transaction continuity. + +- Flexible scaling: The company can now scale resources flexibly in response to market demands without being restricted by the limitations of conventional databases. + + +**3) Migration of an insurance data platform** + +A large insurance company faced limitations of its conventional database architecture, which resulted in performance and scalability bottlenecks of its data platform, particularly when handling massive volumes of insurance policy and claims data. As the company's business kept growing, its database system incurred increasingly high maintenance and scaling costs. + +**Key points of the upgrade process:** + +- Data migration and compatibility: OceanBase Database is compatible with conventional databases like Oracle. It ensures smooth migration and minimizes risks and business interruptions during migration. + +- Horizontal scaling: Thanks to the distributed architecture of OceanBase Database, the insurance company can scale data storage and computing resources as needed to support the processing and queries of massive amounts of data. + +- Data security and compliance: Data privacy and security are paramount in the insurance industry. OceanBase Database provides robust encryption, auditing, and authorization features to help the insurance company meet regulatory requirements. + +![1735288370](/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.png) + +**Outcomes:** + +- Performance optimization: The data processing efficiency and query speed have been improved significantly, enabling the insurance company to handle claims and customer information more effectively. + +- Simplified O&M: OceanBase Database provides automatic O&M and auto-scaling capabilities, which greatly reduce the complexity and costs of database O&M. + +**3. Summary** + +These cases demonstrate that OceanBase Database is capable of handling massive amounts of data, highly concurrent requests, and complex transactions. It is particularly well-suited for financial systems that require high availability, low latency, high throughput, and strong data consistency. OceanBase Database effectively addresses the pain points of conventional databases in terms of performance, scalability, and O&M, helping financial institutions, be it a bank, a securities company, or an insurance company, enhance the stability, performance, and security of their business systems while reducing costs and achieve successful digital transformation. + +As more financial institutions are seeking digitization and intelligentization solutions, OceanBase Database, an innovative database system developed in-house, is poised to see even broader adoption in more financial scenarios. \ No newline at end of file diff --git a/docs/blogs/users/vivo.md b/docs/blogs/users/vivo.md new file mode 100644 index 000000000..af2d0bfff --- /dev/null +++ b/docs/blogs/users/vivo.md @@ -0,0 +1,189 @@ +--- +slug: vivo +title: 'Migrated from MySQL to OceanBase Database, vivo Built a Robust Data Foundation Without Standalone Performance Bottlenecks' +tags: + - User Case +--- + + +> This article, authored by Xu Shaohui from the vivo Internet and Database Team, was originally published by vivo Internet Technology on WeChat Official Accounts Platform. It listed major database challenges vivo faced and described the solution provided by OceanBase, along with its implementation. + +vivo is a technology company providing smart devices and intelligent services to over 500 million users worldwide. As our expanding user base kept generating more data, our database team ran into challenges in O&M of our legacy database system. + +* **Necessity for sharding**: As the growing data volume of MySQL instances exceeded the capacity limits of a single server, we must perform database and table sharding, which incurred high costs and risks, and compelled the need for a MySQL-compatible distributed database to address these issues. +* **Cost pressure**: Our large user base caused significant annual data growth, and we had to keep buying new servers for data storage, leading to mounting cost pressure. + +To tackle those challenges, we chose OceanBase Database after evaluating distributed database products that are compatible with MySQL and provide proven features. + +1 Replace the Sharding Solution with OceanBase Database +-------------------- + +We chose OceanBase Database in the expectation that its native distributed architecture and table partitioning feature could resolve the issues due to the MySQL sharding solution. We also hoped that its exceptional data compression and tenant-level resource isolation could help cut our storage and O&M costs. + +**(1) Native distributed architecture and table partitioning** + +The native distributed architecture of OceanBase Database consists of an OBProxy layer for data routing and an OBServer layer that stores data and handles computing tasks. OBServer nodes are managed in zones to ensure the proper functioning of automatic disaster recovery mechanisms and optimization strategies within an OceanBase cluster. Depending on the business scenarios, we can deploy OceanBase Database in different high-availability architectures, such as three IDCs in the same region and five IDCs across three regions. By adding or removing OBServer nodes, we can horizontally scale out or in an OceanBase cluster to quickly increase or decrease resources, thus eliminating capacity limits of a single server. + +![1733888919](/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.png) + +_Figure: Distributed architecture of OceanBase Database_ + +**(2) Data compression and tenant-level resource isolation** + +OceanBase Database supports table partitioning. Partitions are evenly distributed across different OBServer nodes. Each physical partition has a storage layer object, called a tablet, for storing data records. A tablet has multiple replicas distributed across different OBServer nodes. OceanBase Database uses log streams for data persistence and inter-replica synchronization. Under normal conditions, leader replicas are used to provide services. When a leader replica fails, the system automatically uses a follower replica instead to ensure data safety and service availability. + +In an OceanBase cluster, you can create multiple isolated database instances. Each of such instances is called a tenant. In other words, a single cluster can serve multiple business lines with the data of one tenant isolated from that of others. This feature reduces deployment and O&M costs. + +Moreover, OceanBase Database provides a storage engine based on the log-structured merge-tree (LSM-tree) architecture, and thus boasts exceptional data compression capabilities. According to official documentation and case studies, it can slash storage costs by over 70%. + +In a nutshell, OceanBase Database's native table partitioning feature effectively addresses the issues due to a sharding solution. Table partitioning is transparent to upper-layer applications. It not only greatly cuts the costs and time wasted on code modifications, but also lowers system risks and improves business availability. Additionally, OceanBase Database provides data compression algorithms that substantially shrink the storage space required, while its performance, availability, security, and community support meet our expectations and business needs. + +2 Deploy Tools to Prepare for Migration +------------- + +To ensure a successful migration to OceanBase Database and smooth database O&M in the new architecture, we deployed OceanBase Cloud Platform (OCP), OceanBase LogProxy (oblogproxy), and OceanBase Migration Service (OMS) before migration. These tools could help us manage cluster deployment, handle monitoring alerts, perform backup and restore, collect logs, and migrate data. Combined with our internal database management platform, our database administrators were able to manage metadata, and query and modify data, making the system ready for production. + +**(1) OCP deployment** + +OCP is an enterprise-level database management platform tailored for OceanBase clusters. It provides full-lifecycle management of components such as OceanBase clusters and tenants, and manages OceanBase resources such as hosts, networks, and software packages. It enables us to manage OceanBase clusters more efficiently and reduces our IT O&M costs. + +![1733889058](/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.png) + +_Figure: Architecture of OCP_ + +OCP consists of six modules working in coordination: Management Agent, Management Service, Metadata Repository, Monitor Repository, Management Console, and OBProxy. It can be deployed in high availability mode, where one primary and multiple standby OCP clusters are maintained to avoid single points of failure (SPOFs). + +We deployed OCP on three nodes in different IDCs. In addition, since we already had an alerting platform, we created custom alerting channels to integrate it with OCP, making it more compatible with the OCP alerting service. + +Another crucial feature of OCP is backup and restore. Physical backups stored in OCP consist of baseline data and archived log data, and follower replicas are often used for backup tasks. When a user initiates a backup request, it is first forwarded to the node running RootService. RootService generates a data backup task based on the current tenant and the partition groups (PGs) of the tenant. The backup task is then distributed to OBServer nodes for parallel execution. Backup files are stored on online storage media. + +![1733889089](/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.png) + +_Figure: OCP high-availability architecture_ + +OceanBase Database supports various storage media, such as Network File System (NFS), Alibaba Cloud Object Storage Service (OSS), Tencent Cloud Object Storage (COS), Amazon Simple Storage Service (S3), and object storage services compatible with the S3 protocol. Notably, the backup strategy of OCP requires S3 storage media. If you launch a cluster backup task in OCP, you must store backup files in the specified S3 directory, as shown in the following figure. + +![1733889113](/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.png) + +**(2) oblogproxy deployment** + +oblogproxy is the incremental log proxy service of OceanBase Database. It establishes connections with OceanBase Database to read incremental logs and provides downstream services with change data capture (CDC) capabilities. The binlog mode of oblogproxy is designed for compatibility with MySQL binlogs. It allows us to synchronize MySQL binlogs to OceanBase Database. + +![1733889133](/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.png) + +_Figure: Architecture of oblogproxy_ + +oblogproxy starts the binlog converter (BC) module to pull clogs from OceanBase Database and converts them into binlogs, which are then written to binlog files. A MySQL binlog tool, such as Canal or Flink-CDC, initiates binlog subscription requests to OBProxy, which forwards the requests to oblogproxy. Upon receiving a request, oblogproxy starts the binlog dumper (BD) module, which reads binlog files and provides subscription services by performing binlog dumps. We deployed oblogproxy across multiple nodes and stored the metadata in shared online storage to ensure high availability. + +**(3) OMS deployment** + +OMS supports data exchange between a homogeneous or heterogeneous data source and OceanBase Database. OMS provides the capabilities for online migration of existing data and real-time synchronization of incremental data. + +![1733889157](/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.png) + +_Figure: Architecture of OMS_ + +OMS has the following components: + +* DBCat: It collects and converts data objects. +* Store for pulling incremental data, Incr-Sync for synchronizing incremental data, Full-Import for importing full data, and Full-Verification for verifying full data. +* Basic service components for the management of clusters, resource pools, high availability mechanism, and metadata. These components ensure efficient scheduling and stable operations of the migration module. +* Console: It provides all-round migration scheduling capabilities. + +We also deployed OMS on three nodes in different IDCs to ensure its high availability. For monitoring and alerting during data migration and synchronization, OMS leverages OCP’s alerting channels instead of implementing redundant components. + +3 Smooth Migration to Break Capacity Limits of a Single Server +--------------- + + + +**(1) Migration from MySQL to OceanBase Database** + +To prevent issues during migration, we conducted a feasibility assessment, which included performance stress tests and compatibility tests on, for example, table schemas, SQL statements, and accounts. The test results met our requirements. During partition adaptability testing, we found that applications required table schemas and SQL statements be adapted to partitioned tables, which, considering the modification costs, was within our expectations. + +Then, we launched OMS to migrate all existing data and incremental data from MySQL to OceanBase Database. OMS ensured real-time synchronization and full data verification. Its reverse incremental synchronization feature enables instant rollback in case of migration failures, ensuring business availability. + +![1733889240](/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.png) + +_Figure: Process of a Data Migration Task in OMS_ + +The migration process consists of eight steps: + +* Pre-migration configuration verification. +* Verification of OceanBase Database tenants and accounts. +* Data consistency verification. +* Pausing DDL operations that could modify table schemas. +* Verification of synchronization latency. +* Configuring database switchover connections or modifying DNS parameters for applications. +* Terminating all connections to the source database and ensuring that applications are connected to OceanBase Database. +* Stopping forward synchronization and enabling reverse synchronization to get ready for rollback. + +![1733889297](/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.png) + +_Figure: Migration process_ + +To ensure a successful switchover, minimize risks, and maximize business availability and security, we prepared a rollback plan. + +That time, we migrated nearly 20 TB of data from five MySQL clusters to OceanBase Database, which has brought us the following benefits: + +* With massive and rapidly growing data stored on cloud storage services, the MySQL sharding solution caused huge maintenance and management costs and serious availability risks. OceanBase Database not only provides table partitioning to diminish maintenance costs, and its high compression ratio also saves storage expenses. +* The high data write volume of the risk control cluster caused considerable master-slave latency, risking data loss. OceanBase Database fixes that issue by ensuring strong consistency, and shrinks the required storage space by 70%. +* The TokuDB-based archive database of the financial service suffered ineffective unique indexes and lacked technical support from TokuDB. OceanBase Database has resolved these problems. It not only improves query and DDL performance, but also eliminates capacity limits of a single server, thanks to its horizontally scalable distributed architecture. + +**(2) Migration of another distributed database** + + + +We deployed a distributed database of another vendor to support some peripheral applications, and decided to migrate these applications to OceanBase Database. Two migration methods were considered. One was based on TiCDC, Kafka, and OMS, and the other was based on CloudCanal. Their pros and cons are described in the following figure. + +![1733889386](/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.png) + +The CloudCanal-based method was simple, but it did not support reverse synchronization, and demonstrated unsatisfactory performance in incremental synchronization. The other, despite a more complex architecture, was more compatible with OceanBase Database, and supported reverse synchronization, showing better overall performance. So we chose the TiCDC + Kafka + OMS method for full migration, incremental synchronization, full verification, and reverse incremental synchronization. + +![1733889410](/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.png) + +_Figure: Synchronization process_ + +As shown in the figure above, TiCDC parses incremental data from the business cluster into ordered row-level change data, and sends it to Kafka. OMS consumes this incremental data from Kafka and writes it to OceanBase Database. Kafka retains data for seven days by default, but you can adjust the retention period if the delay is considerable. You can also increase the concurrency of OMS to improve the synchronization speed. + +The full migration, which involved nearly 50 billion rows, was initially quite slow, running at only 6,000-8,000 rows per second (RPS), and was estimated to take weeks to complete. Analysis revealed that the source and target databases were not under pressure, and OMS host loads were normal. The issue was traced to widely spaced values of the primary key in the source tables, causing OMS to migrate small data chunks as it used the primary key for data slicing. + +We set the `source.sliceBatchSize` parameter to `12000` and increased memory, improving RPS to around 39,257, which still fell short of our expectations. + +By analyzing the `msg/metrics.log` file, we found that the value of `wait_dispatch_record_size` reached `157690`, which was pretty high, indicating OMS bottlenecks in partition calculations. So we disabled partition calculation by setting the `sink.enablePartitionBucket` parameter to `false`, and set the `srink.workerNum` parameter to a larger value. After that, the RPS increased to 500,000-600,000. + +Here, I would like to talk about three issues occurred during migration. + +**Issue 1: A message reading "The response from the CM service is not success" was reported during the migration task.** + +**Solution:** The `connector.log` file recorded that `CM service is not success`, but the CM service was normal. So we checked the memory usage of the synchronization task, and found a serious memory shortage, which led to highly frequent full garbage collection, and thus service anomalies. We logged in to the OMS container, opened the `/home/admin/conf/command/start_oms_cm.sh` file, and set the `jvm` parameter to `-server -Xmx16g -Xms16g -Xmn8g`. + + + + + +**Issue 2: The RPS of incremental synchronization was quite low, around 8,000, despite high concurrency settings and normal loads of databases and OMS.** + +**Solution:** The `connector.log` file of the task indicated serious primary key conflicts when the incremental synchronization caught up the full synchronization timestamp, while no data exceptions were found in the source and target databases. The issue was then traced to TiCDC writing duplicate data, which in turn prevented the OMS from batch writing. Back then, OMS had not been optimized for this specific scenario, so the only way to improve RPS was to increase the write concurrency. + + + +**Issue 3: Index space amplification. When an index was created, despite the cluster's disk usage being only around 50%, this error was reported: ERROR 4184 (53100): Server out of disk space.** + +**Solution:** The OBServer log indicated that the index space usage was amplified by 5.5 times, requiring 5.41 TB of space, while the cluster only had 1.4 TB of space remained. + + + +Index space amplification was an issue of OceanBase Database earlier than V4.2.3. The causes were as follows: + +* During sorting, intermediate results were written to disk, and metadata records were also generated simultaneously. +* External sorting involved two rounds of data recording. +* During the sorting process, data was decompressed. + + +In OceanBase Database V4.2.3 and later, intermediate results are compressed and stored in a compact format, and the disk space is incrementally released during data writing. As a result, the index space amplification has been reduced to 1.5. Therefore, you can use OceanBase Database V4.2.3 or later for scenarios involving large datasets and great incremental data volume. + +4 Summary +---- + +Overall, OceanBase Database has fixed vulnerabilities of vivo's previous MySQL solution, thanks to its excellent performance and data compression capabilities and robust O&M tools. Next, we will continue exploring OceanBase Database’s features and look forward to further enhancements in its O&M tools to address our challenges more effectively. \ No newline at end of file diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.png b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.png new file mode 100644 index 000000000..7148b9bc2 Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.png differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.psd b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.psd new file mode 100644 index 000000000..95ae819b5 Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827526582.psd differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.png b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.png new file mode 100644 index 000000000..7c9e3199f Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.png differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.psd b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.psd new file mode 100644 index 000000000..135465a40 Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827731388.psd differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.png b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.png new file mode 100644 index 000000000..2c67a2b10 Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.png differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.psd b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.psd new file mode 100644 index 000000000..1838fe703 Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1677827842984.psd differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.png b/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.png new file mode 100644 index 000000000..b5bfac80e Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.png differ diff --git a/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.psd b/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.psd new file mode 100644 index 000000000..e583fe18d Binary files /dev/null and b/static/img/blogs/tech/DDL-Execution-Efficient/image/1724655654379.psd differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849285226.png b/static/img/blogs/tech/Real-time-AP/image/1713849285226.png new file mode 100644 index 000000000..2a416dbb1 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849285226.png differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849285226.psd b/static/img/blogs/tech/Real-time-AP/image/1713849285226.psd new file mode 100644 index 000000000..42cbf6603 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849285226.psd differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849296536.png b/static/img/blogs/tech/Real-time-AP/image/1713849296536.png new file mode 100644 index 000000000..7dc9a8665 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849296536.png differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849296536.psd b/static/img/blogs/tech/Real-time-AP/image/1713849296536.psd new file mode 100644 index 000000000..364fbdc61 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849296536.psd differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849310310.png b/static/img/blogs/tech/Real-time-AP/image/1713849310310.png new file mode 100644 index 000000000..adecbb615 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849310310.png differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849310310.psd b/static/img/blogs/tech/Real-time-AP/image/1713849310310.psd new file mode 100644 index 000000000..7e4701429 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849310310.psd differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849771931.png b/static/img/blogs/tech/Real-time-AP/image/1713849771931.png new file mode 100644 index 000000000..34df6ad98 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849771931.png differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849771931.psd b/static/img/blogs/tech/Real-time-AP/image/1713849771931.psd new file mode 100644 index 000000000..94f399d5a Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849771931.psd differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849828408.png b/static/img/blogs/tech/Real-time-AP/image/1713849828408.png new file mode 100644 index 000000000..baa2404db Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849828408.png differ diff --git a/static/img/blogs/tech/Real-time-AP/image/1713849828408.psd b/static/img/blogs/tech/Real-time-AP/image/1713849828408.psd new file mode 100644 index 000000000..cc5f1b563 Binary files /dev/null and b/static/img/blogs/tech/Real-time-AP/image/1713849828408.psd differ diff --git a/static/img/blogs/tech/analysis-column/image/1716795999198.png b/static/img/blogs/tech/analysis-column/image/1716795999198.png new file mode 100644 index 000000000..4601d26eb Binary files /dev/null and b/static/img/blogs/tech/analysis-column/image/1716795999198.png differ diff --git a/static/img/blogs/tech/analysis-column/image/1716795999198.psd b/static/img/blogs/tech/analysis-column/image/1716795999198.psd new file mode 100644 index 000000000..11078d9db Binary files /dev/null and b/static/img/blogs/tech/analysis-column/image/1716795999198.psd differ diff --git a/static/img/blogs/tech/binlog-service/image/1719456552842.png b/static/img/blogs/tech/binlog-service/image/1719456552842.png new file mode 100644 index 000000000..6112c47b6 Binary files /dev/null and b/static/img/blogs/tech/binlog-service/image/1719456552842.png differ diff --git a/static/img/blogs/tech/binlog-service/image/1719456552842.psd b/static/img/blogs/tech/binlog-service/image/1719456552842.psd new file mode 100644 index 000000000..1b6e3a4d0 Binary files /dev/null and b/static/img/blogs/tech/binlog-service/image/1719456552842.psd differ diff --git a/static/img/blogs/tech/column-store/image/1717386085767.png b/static/img/blogs/tech/column-store/image/1717386085767.png new file mode 100644 index 000000000..b779b0ea1 Binary files /dev/null and b/static/img/blogs/tech/column-store/image/1717386085767.png differ diff --git a/static/img/blogs/tech/column-store/image/1717386085767.psd b/static/img/blogs/tech/column-store/image/1717386085767.psd new file mode 100644 index 000000000..9353a5cc3 Binary files /dev/null and b/static/img/blogs/tech/column-store/image/1717386085767.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646145286.png b/static/img/blogs/tech/core-tech-ob/image/1691646145286.png new file mode 100644 index 000000000..33c6698f6 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646145286.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646145286.psd b/static/img/blogs/tech/core-tech-ob/image/1691646145286.psd new file mode 100644 index 000000000..a804b31af Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646145286.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646186227.png b/static/img/blogs/tech/core-tech-ob/image/1691646186227.png new file mode 100644 index 000000000..e06b2aa97 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646186227.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646186227.psd b/static/img/blogs/tech/core-tech-ob/image/1691646186227.psd new file mode 100644 index 000000000..75dea42a7 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646186227.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646209339.png b/static/img/blogs/tech/core-tech-ob/image/1691646209339.png new file mode 100644 index 000000000..127a1a50b Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646209339.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646209339.psd b/static/img/blogs/tech/core-tech-ob/image/1691646209339.psd new file mode 100644 index 000000000..1c4488240 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646209339.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646238911.png b/static/img/blogs/tech/core-tech-ob/image/1691646238911.png new file mode 100644 index 000000000..dec768ee4 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646238911.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646238911.psd b/static/img/blogs/tech/core-tech-ob/image/1691646238911.psd new file mode 100644 index 000000000..60c925549 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646238911.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646264872.png b/static/img/blogs/tech/core-tech-ob/image/1691646264872.png new file mode 100644 index 000000000..c6564684d Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646264872.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646264872.psd b/static/img/blogs/tech/core-tech-ob/image/1691646264872.psd new file mode 100644 index 000000000..3ba00fff1 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646264872.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646364695.png b/static/img/blogs/tech/core-tech-ob/image/1691646364695.png new file mode 100644 index 000000000..f8b75b787 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646364695.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646364695.psd b/static/img/blogs/tech/core-tech-ob/image/1691646364695.psd new file mode 100644 index 000000000..7dc7dacdf Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646364695.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646408458.png b/static/img/blogs/tech/core-tech-ob/image/1691646408458.png new file mode 100644 index 000000000..79e208754 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646408458.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646408458.psd b/static/img/blogs/tech/core-tech-ob/image/1691646408458.psd new file mode 100644 index 000000000..fbebc9ddb Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646408458.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646610454.png b/static/img/blogs/tech/core-tech-ob/image/1691646610454.png new file mode 100644 index 000000000..94a07f23b Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646610454.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646610454.psd b/static/img/blogs/tech/core-tech-ob/image/1691646610454.psd new file mode 100644 index 000000000..08bdf1c33 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646610454.psd differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646625403.png b/static/img/blogs/tech/core-tech-ob/image/1691646625403.png new file mode 100644 index 000000000..a7f666663 Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646625403.png differ diff --git a/static/img/blogs/tech/core-tech-ob/image/1691646625403.psd b/static/img/blogs/tech/core-tech-ob/image/1691646625403.psd new file mode 100644 index 000000000..a5ab05c4e Binary files /dev/null and b/static/img/blogs/tech/core-tech-ob/image/1691646625403.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.png b/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.png new file mode 100644 index 000000000..e3b6daae1 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.psd new file mode 100644 index 000000000..92da611cd Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085682712.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.png b/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.png new file mode 100644 index 000000000..a0ef40794 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.psd new file mode 100644 index 000000000..ed167e1a2 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085789075.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.png b/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.png new file mode 100644 index 000000000..59296aaa8 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.psd new file mode 100644 index 000000000..fb4158428 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678085987788.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.png new file mode 100644 index 000000000..70f3add49 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.psd new file mode 100644 index 000000000..a9790e11a Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086049739.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.png new file mode 100644 index 000000000..12ed7da9a Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.psd new file mode 100644 index 000000000..2c86d4369 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086081133.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.png new file mode 100644 index 000000000..40525fdc5 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.psd new file mode 100644 index 000000000..0e118c0d4 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086110289.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.png new file mode 100644 index 000000000..36d54bce9 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.psd new file mode 100644 index 000000000..696081ff8 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086185409.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.png new file mode 100644 index 000000000..f41bfe4bc Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.psd new file mode 100644 index 000000000..50d71def5 Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086219476.psd differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.png b/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.png new file mode 100644 index 000000000..a41ef3dbd Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.png differ diff --git a/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.psd b/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.psd new file mode 100644 index 000000000..74ad5714b Binary files /dev/null and b/static/img/blogs/tech/end-to-end-tracing/image/1678086322794.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205078.png b/static/img/blogs/tech/image-search-vector-search/image/202412032205078.png new file mode 100644 index 000000000..1b381c278 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205078.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205078.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032205078.psd new file mode 100644 index 000000000..1acad29d8 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205078.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205459.png b/static/img/blogs/tech/image-search-vector-search/image/202412032205459.png new file mode 100644 index 000000000..4241e5c96 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205459.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205459.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032205459.psd new file mode 100644 index 000000000..231c3fbc7 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205459.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205465.png b/static/img/blogs/tech/image-search-vector-search/image/202412032205465.png new file mode 100644 index 000000000..49e843ce6 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205465.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205465.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032205465.psd new file mode 100644 index 000000000..e66c4d287 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205465.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205524.png b/static/img/blogs/tech/image-search-vector-search/image/202412032205524.png new file mode 100644 index 000000000..8c59acc17 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205524.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205524.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032205524.psd new file mode 100644 index 000000000..845ccc6fd Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205524.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205613.png b/static/img/blogs/tech/image-search-vector-search/image/202412032205613.png new file mode 100644 index 000000000..89428e857 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205613.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032205613.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032205613.psd new file mode 100644 index 000000000..7c010db7b Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032205613.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032206075.png b/static/img/blogs/tech/image-search-vector-search/image/202412032206075.png new file mode 100644 index 000000000..ea5655df1 Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032206075.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032206075.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032206075.psd new file mode 100644 index 000000000..1ae6ef9bc Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032206075.psd differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032206402.png b/static/img/blogs/tech/image-search-vector-search/image/202412032206402.png new file mode 100644 index 000000000..6179e692b Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032206402.png differ diff --git a/static/img/blogs/tech/image-search-vector-search/image/202412032206402.psd b/static/img/blogs/tech/image-search-vector-search/image/202412032206402.psd new file mode 100644 index 000000000..e90af288a Binary files /dev/null and b/static/img/blogs/tech/image-search-vector-search/image/202412032206402.psd differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.png b/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.png new file mode 100644 index 000000000..405dda9d6 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.png differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.psd b/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.psd new file mode 100644 index 000000000..90b5fbe71 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348343480.psd differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.png b/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.png new file mode 100644 index 000000000..f0596a6a4 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.png differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.psd b/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.psd new file mode 100644 index 000000000..3c1a11d8d Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348392419.psd differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.png b/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.png new file mode 100644 index 000000000..f5b3ef7e0 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.png differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.psd b/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.psd new file mode 100644 index 000000000..335112604 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348487990.psd differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.png b/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.png new file mode 100644 index 000000000..fd9c964da Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.png differ diff --git a/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.psd b/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.psd new file mode 100644 index 000000000..1836f8382 Binary files /dev/null and b/static/img/blogs/tech/legacy-monitoring-system/image/1727348518029.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743793097.png b/static/img/blogs/tech/native-distributed/image/1726743793097.png new file mode 100644 index 000000000..1bcbbec47 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743793097.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743793097.psd b/static/img/blogs/tech/native-distributed/image/1726743793097.psd new file mode 100644 index 000000000..5127a086a Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743793097.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743852693.png b/static/img/blogs/tech/native-distributed/image/1726743852693.png new file mode 100644 index 000000000..c10a954e8 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743852693.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743852693.psd b/static/img/blogs/tech/native-distributed/image/1726743852693.psd new file mode 100644 index 000000000..a900c551e Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743852693.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743883873.png b/static/img/blogs/tech/native-distributed/image/1726743883873.png new file mode 100644 index 000000000..6605a9375 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743883873.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726743883873.psd b/static/img/blogs/tech/native-distributed/image/1726743883873.psd new file mode 100644 index 000000000..3c316c42d Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726743883873.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744014446.png b/static/img/blogs/tech/native-distributed/image/1726744014446.png new file mode 100644 index 000000000..eb8d4f8d7 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744014446.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744014446.psd b/static/img/blogs/tech/native-distributed/image/1726744014446.psd new file mode 100644 index 000000000..c2978cfbe Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744014446.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744218420.png b/static/img/blogs/tech/native-distributed/image/1726744218420.png new file mode 100644 index 000000000..d9d210b3f Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744218420.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744218420.psd b/static/img/blogs/tech/native-distributed/image/1726744218420.psd new file mode 100644 index 000000000..87c88d7d3 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744218420.psd differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744284269.png b/static/img/blogs/tech/native-distributed/image/1726744284269.png new file mode 100644 index 000000000..85f09337f Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744284269.png differ diff --git a/static/img/blogs/tech/native-distributed/image/1726744284269.psd b/static/img/blogs/tech/native-distributed/image/1726744284269.psd new file mode 100644 index 000000000..dfa469ea1 Binary files /dev/null and b/static/img/blogs/tech/native-distributed/image/1726744284269.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701397672185.png b/static/img/blogs/tech/ob-db-transform/image/1701397672185.png new file mode 100644 index 000000000..9bd1aeb21 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701397672185.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701397672185.psd b/static/img/blogs/tech/ob-db-transform/image/1701397672185.psd new file mode 100644 index 000000000..65c164bf9 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701397672185.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701397989150.png b/static/img/blogs/tech/ob-db-transform/image/1701397989150.png new file mode 100644 index 000000000..1bc591385 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701397989150.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701397989150.psd b/static/img/blogs/tech/ob-db-transform/image/1701397989150.psd new file mode 100644 index 000000000..fb0fae9ff Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701397989150.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398106532.png b/static/img/blogs/tech/ob-db-transform/image/1701398106532.png new file mode 100644 index 000000000..e24c6dfb1 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398106532.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398106532.psd b/static/img/blogs/tech/ob-db-transform/image/1701398106532.psd new file mode 100644 index 000000000..64dec2772 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398106532.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398210021.png b/static/img/blogs/tech/ob-db-transform/image/1701398210021.png new file mode 100644 index 000000000..b2e1bb8fa Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398210021.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398210021.psd b/static/img/blogs/tech/ob-db-transform/image/1701398210021.psd new file mode 100644 index 000000000..c502501a8 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398210021.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398294996.png b/static/img/blogs/tech/ob-db-transform/image/1701398294996.png new file mode 100644 index 000000000..b96c82956 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398294996.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398294996.psd b/static/img/blogs/tech/ob-db-transform/image/1701398294996.psd new file mode 100644 index 000000000..5d9fe5b46 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398294996.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398369362.png b/static/img/blogs/tech/ob-db-transform/image/1701398369362.png new file mode 100644 index 000000000..51545ed2f Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398369362.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398369362.psd b/static/img/blogs/tech/ob-db-transform/image/1701398369362.psd new file mode 100644 index 000000000..6f340d8c4 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398369362.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398571836.png b/static/img/blogs/tech/ob-db-transform/image/1701398571836.png new file mode 100644 index 000000000..13f7b5806 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398571836.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398571836.psd b/static/img/blogs/tech/ob-db-transform/image/1701398571836.psd new file mode 100644 index 000000000..f34ec1dc9 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398571836.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398788515.png b/static/img/blogs/tech/ob-db-transform/image/1701398788515.png new file mode 100644 index 000000000..0861555f3 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398788515.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398788515.psd b/static/img/blogs/tech/ob-db-transform/image/1701398788515.psd new file mode 100644 index 000000000..57bf275c6 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398788515.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398871970.png b/static/img/blogs/tech/ob-db-transform/image/1701398871970.png new file mode 100644 index 000000000..25a96fefa Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398871970.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701398871970.psd b/static/img/blogs/tech/ob-db-transform/image/1701398871970.psd new file mode 100644 index 000000000..9d940e4af Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701398871970.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399043319.png b/static/img/blogs/tech/ob-db-transform/image/1701399043319.png new file mode 100644 index 000000000..ca8e610cc Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399043319.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399043319.psd b/static/img/blogs/tech/ob-db-transform/image/1701399043319.psd new file mode 100644 index 000000000..02ddb317c Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399043319.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399212565.png b/static/img/blogs/tech/ob-db-transform/image/1701399212565.png new file mode 100644 index 000000000..6f3ec50ce Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399212565.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399212565.psd b/static/img/blogs/tech/ob-db-transform/image/1701399212565.psd new file mode 100644 index 000000000..9f481d728 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399212565.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399556978.png b/static/img/blogs/tech/ob-db-transform/image/1701399556978.png new file mode 100644 index 000000000..a101043ab Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399556978.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399556978.psd b/static/img/blogs/tech/ob-db-transform/image/1701399556978.psd new file mode 100644 index 000000000..1d0de7ae7 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399556978.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399759969.png b/static/img/blogs/tech/ob-db-transform/image/1701399759969.png new file mode 100644 index 000000000..a7e544cc1 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399759969.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399759969.psd b/static/img/blogs/tech/ob-db-transform/image/1701399759969.psd new file mode 100644 index 000000000..a0e396654 Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399759969.psd differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399849562.png b/static/img/blogs/tech/ob-db-transform/image/1701399849562.png new file mode 100644 index 000000000..b0968dabf Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399849562.png differ diff --git a/static/img/blogs/tech/ob-db-transform/image/1701399849562.psd b/static/img/blogs/tech/ob-db-transform/image/1701399849562.psd new file mode 100644 index 000000000..c7544f82c Binary files /dev/null and b/static/img/blogs/tech/ob-db-transform/image/1701399849562.psd differ diff --git a/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.png b/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.png new file mode 100644 index 000000000..678402922 Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.png differ diff --git a/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.psd b/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.psd new file mode 100644 index 000000000..46ceed82c Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/0a41f314-f6a5-41f1-913d-c6b230dd0f25.psd differ diff --git a/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.png b/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.png new file mode 100644 index 000000000..4928173dd Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.png differ diff --git a/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.psd b/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.psd new file mode 100644 index 000000000..627f214cb Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/2c02e366-d102-488a-b179-42ea1f8c3779.psd differ diff --git a/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.png b/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.png new file mode 100644 index 000000000..a79fed10c Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.png differ diff --git a/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.psd b/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.psd new file mode 100644 index 000000000..1beacb8d0 Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/330988c9-f643-4c4b-af39-50584f9b99e0.psd differ diff --git a/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.png b/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.png new file mode 100644 index 000000000..9b3911d55 Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.png differ diff --git a/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.psd b/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.psd new file mode 100644 index 000000000..c06188397 Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/87ee8003-c9b5-46bb-83c2-52abd3a3eb8c.psd differ diff --git a/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.png b/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.png new file mode 100644 index 000000000..28154834b Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.png differ diff --git a/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.psd b/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.psd new file mode 100644 index 000000000..02ab19715 Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/8ae0d7fe-c66c-491d-ae63-c70899cad4b3.psd differ diff --git a/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.png b/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.png new file mode 100644 index 000000000..966289e5c Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.png differ diff --git a/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.psd b/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.psd new file mode 100644 index 000000000..8793b1ddd Binary files /dev/null and b/static/img/blogs/tech/query-perf/image/bb5d7618-9c57-45d6-9547-3fea3c97e651.psd differ diff --git a/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.png b/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.png new file mode 100644 index 000000000..66690c9f1 Binary files /dev/null and b/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.png differ diff --git a/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.psd b/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.psd new file mode 100644 index 000000000..c53045d33 Binary files /dev/null and b/static/img/blogs/tech/real-time-analytics/image/b965ff90-0fe6-4dcf-baee-57d870a05eda.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.png b/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.png new file mode 100644 index 000000000..1e9ea0bc0 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.psd b/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.psd new file mode 100644 index 000000000..bf9212216 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/2573623f-47c0-4293-b209-1d02abac7360.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.png b/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.png new file mode 100644 index 000000000..858a944d6 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.psd b/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.psd new file mode 100644 index 000000000..855ca9f29 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/295d4719-fc9d-4c62-a95e-9d0e23727a70.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/5830c47b-96eb-4303-9398-8f1b080610a4.png b/static/img/blogs/tech/small-specification-deployment/image/5830c47b-96eb-4303-9398-8f1b080610a4.png new file mode 100644 index 000000000..4675a19c9 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/5830c47b-96eb-4303-9398-8f1b080610a4.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.png b/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.png new file mode 100644 index 000000000..d1a1a2462 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.psd b/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.psd new file mode 100644 index 000000000..28b74fd1e Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/74429483-aee7-4cb6-9cdf-0f5c43646d01.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.png b/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.png new file mode 100644 index 000000000..3fd96c06f Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.psd b/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.psd new file mode 100644 index 000000000..2bd5a8a88 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/b4324d1e-6bd1-48cb-9ccc-c100cca6aae6.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.png b/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.png new file mode 100644 index 000000000..137b4da4d Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.png differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.psd b/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.psd new file mode 100644 index 000000000..7a3815e09 Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/ca4e4ba7-3312-470a-9176-080ae7005b77.psd differ diff --git a/static/img/blogs/tech/small-specification-deployment/image/d6979385-99d3-4ed9-a378-d2a8962e3342.png b/static/img/blogs/tech/small-specification-deployment/image/d6979385-99d3-4ed9-a378-d2a8962e3342.png new file mode 100644 index 000000000..ffb52612d Binary files /dev/null and b/static/img/blogs/tech/small-specification-deployment/image/d6979385-99d3-4ed9-a378-d2a8962e3342.png differ diff --git a/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.png b/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.png new file mode 100644 index 000000000..a4ffe910c Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.png differ diff --git a/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.psd b/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.psd new file mode 100644 index 000000000..62986bc1c Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/05541494-edc7-46ec-88f4-4d791de0f609.psd differ diff --git a/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.png b/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.png new file mode 100644 index 000000000..157b3f436 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.png differ diff --git a/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.psd b/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.psd new file mode 100644 index 000000000..8bc5db560 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/094b2a62-9360-41a5-8e12-ba6569872e98.psd differ diff --git a/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.png b/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.png new file mode 100644 index 000000000..decfb7ef4 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.png differ diff --git a/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.psd b/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.psd new file mode 100644 index 000000000..d4c984cb4 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/62cd95e6-e65b-43ef-9e12-5ebc7697cea7.psd differ diff --git a/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.png b/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.png new file mode 100644 index 000000000..3442e65bd Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.png differ diff --git a/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.psd b/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.psd new file mode 100644 index 000000000..30f853a37 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/88b37a87-e4b0-4328-b0e8-3f2f681ae2d9.psd differ diff --git a/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.png b/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.png new file mode 100644 index 000000000..f2e2407ec Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.png differ diff --git a/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.psd b/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.psd new file mode 100644 index 000000000..d8e00b425 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/8cba816b-4cb5-40e3-99aa-0db95c8018df.psd differ diff --git a/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.png b/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.png new file mode 100644 index 000000000..512a78b8f Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.png differ diff --git a/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.psd b/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.psd new file mode 100644 index 000000000..3b4a90b15 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/a689f8d6-10ee-4e7b-b46f-825ca91a235b.psd differ diff --git a/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.png b/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.png new file mode 100644 index 000000000..e6dc5d76f Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.png differ diff --git a/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.psd b/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.psd new file mode 100644 index 000000000..16ec54fd8 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/b07ba3cd-ae1a-48cc-bda7-31ad46394940.psd differ diff --git a/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.png b/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.png new file mode 100644 index 000000000..6814c37ac Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.png differ diff --git a/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.psd b/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.psd new file mode 100644 index 000000000..50b9bc2c6 Binary files /dev/null and b/static/img/blogs/users/1st-financial/image/ed0b3c6b-3f28-4fb6-a56b-7296ce3bb6e7.psd differ diff --git a/static/img/blogs/users/Gartner/image/Thumbs.db b/static/img/blogs/users/Gartner/image/Thumbs.db new file mode 100644 index 000000000..38913f478 Binary files /dev/null and b/static/img/blogs/users/Gartner/image/Thumbs.db differ diff --git a/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.png b/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.png new file mode 100644 index 000000000..a455c9eb0 Binary files /dev/null and b/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.png differ diff --git a/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.psd b/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.psd new file mode 100644 index 000000000..3d741f273 Binary files /dev/null and b/static/img/blogs/users/Gartner/image/f91087f7-ce8a-4e82-b101-30d29713874a.psd differ diff --git a/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.png b/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.png new file mode 100644 index 000000000..e7ceb545c Binary files /dev/null and b/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.png differ diff --git a/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.psd b/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.psd new file mode 100644 index 000000000..3a6b672c6 Binary files /dev/null and b/static/img/blogs/users/RAG-CUSRI/image/2bf08ce7-83af-42ac-80d0-0a479e65ca9a.psd differ diff --git a/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.png b/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.png new file mode 100644 index 000000000..75e59e019 Binary files /dev/null and b/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.png differ diff --git a/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.psd b/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.psd new file mode 100644 index 000000000..35e26b8bf Binary files /dev/null and b/static/img/blogs/users/RAG-CUSRI/image/529053c3-7d69-433e-9238-f1ca5f303e00.psd differ diff --git a/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.png b/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.png new file mode 100644 index 000000000..6309b4546 Binary files /dev/null and b/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.png differ diff --git a/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.psd b/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.psd new file mode 100644 index 000000000..a07ce95a7 Binary files /dev/null and b/static/img/blogs/users/SAIC-Volkswagen/image/1724923468990.psd differ diff --git a/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.png b/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.png new file mode 100644 index 000000000..6c024dd61 Binary files /dev/null and b/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.png differ diff --git a/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.psd b/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.psd new file mode 100644 index 000000000..a6d456af5 Binary files /dev/null and b/static/img/blogs/users/SAIC-Volkswagen/image/1727344692079.psd differ diff --git a/static/img/blogs/users/SAIC-Volkswagen/image/Thumbs.db b/static/img/blogs/users/SAIC-Volkswagen/image/Thumbs.db new file mode 100644 index 000000000..d406fddd5 Binary files /dev/null and b/static/img/blogs/users/SAIC-Volkswagen/image/Thumbs.db differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.png b/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.png new file mode 100644 index 000000000..d1eb8eb5c Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.png differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.psd b/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.psd new file mode 100644 index 000000000..7952166cb Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/043bb459-a79c-457d-afd1-f6d383c17aad.psd differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.png b/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.png new file mode 100644 index 000000000..62abfacdd Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.png differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.psd b/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.psd new file mode 100644 index 000000000..0ec210996 Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/0f4f8565-6427-498e-9bb8-d6b90664c5cb.psd differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.png b/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.png new file mode 100644 index 000000000..17116ae89 Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.png differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.psd b/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.psd new file mode 100644 index 000000000..57d7292aa Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/bf7723fa-71c7-4478-af25-a049acee4d72.psd differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.png b/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.png new file mode 100644 index 000000000..06135e0cf Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.png differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.psd b/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.psd new file mode 100644 index 000000000..2b5243e99 Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/d72d29fc-34c7-4c74-bef0-a45a76cc99dc.psd differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.png b/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.png new file mode 100644 index 000000000..d32a570c7 Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.png differ diff --git a/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.psd b/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.psd new file mode 100644 index 000000000..c2a68b60b Binary files /dev/null and b/static/img/blogs/users/Sunshine-Insurance/image/ed2f11eb-68c8-44ec-9fa4-ddcd17593c1d.psd differ diff --git a/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.png b/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.png new file mode 100644 index 000000000..5397a76b2 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.png differ diff --git a/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.psd b/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.psd new file mode 100644 index 000000000..b1a4db1ea Binary files /dev/null and b/static/img/blogs/users/Tansun/image/0cd27c95-858d-4efb-b03e-e8af26996aa5.psd differ diff --git a/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.png b/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.png new file mode 100644 index 000000000..633836540 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.png differ diff --git a/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.psd b/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.psd new file mode 100644 index 000000000..7dcc69f60 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/7834b4f1-580f-40c1-9841-a252e7141c09.psd differ diff --git a/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.png b/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.png new file mode 100644 index 000000000..eb1edb33c Binary files /dev/null and b/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.png differ diff --git a/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.psd b/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.psd new file mode 100644 index 000000000..a9463d6d1 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/7eaa09e1-7728-435e-8b13-307d5f77efdd.psd differ diff --git a/static/img/blogs/users/Tansun/image/Thumbs.db b/static/img/blogs/users/Tansun/image/Thumbs.db new file mode 100644 index 000000000..b105c2531 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/Thumbs.db differ diff --git a/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.png b/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.png new file mode 100644 index 000000000..003d1b3ee Binary files /dev/null and b/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.png differ diff --git a/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.psd b/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.psd new file mode 100644 index 000000000..50113ab08 Binary files /dev/null and b/static/img/blogs/users/Tansun/image/af22497b-a464-40ed-baad-60788eac9e51.psd differ diff --git a/static/img/blogs/users/baicizhan-cto/image/1728469646994.png b/static/img/blogs/users/baicizhan-cto/image/1728469646994.png new file mode 100644 index 000000000..12314a499 Binary files /dev/null and b/static/img/blogs/users/baicizhan-cto/image/1728469646994.png differ diff --git a/static/img/blogs/users/baicizhan-cto/image/1728469646994.psd b/static/img/blogs/users/baicizhan-cto/image/1728469646994.psd new file mode 100644 index 000000000..accbf9f18 Binary files /dev/null and b/static/img/blogs/users/baicizhan-cto/image/1728469646994.psd differ diff --git a/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.png b/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.png new file mode 100644 index 000000000..e0481fd32 Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.png differ diff --git a/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.psd b/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.psd new file mode 100644 index 000000000..33b8f37c9 Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/0f00cc51-e929-44ea-94f7-e66d894f0e54.psd differ diff --git a/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.png b/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.png new file mode 100644 index 000000000..f362be435 Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.png differ diff --git a/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.psd b/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.psd new file mode 100644 index 000000000..656f330b0 Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/6d50ec91-84e5-4364-a330-a01a028894cb.psd differ diff --git a/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.png b/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.png new file mode 100644 index 000000000..32a404534 Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.png differ diff --git a/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.psd b/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.psd new file mode 100644 index 000000000..ecb4202af Binary files /dev/null and b/static/img/blogs/users/financial-industry/image/a0a8a543-ebd5-4645-95b7-a8535b0fb6ef.psd differ diff --git a/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.png b/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.png new file mode 100644 index 000000000..dcc66f55e Binary files /dev/null and b/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.png differ diff --git a/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.psd b/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.psd new file mode 100644 index 000000000..0d3986c0d Binary files /dev/null and b/static/img/blogs/users/vivo/image/0539554b-07d3-4251-a88d-a5231e38b32d.psd differ diff --git a/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.png b/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.png new file mode 100644 index 000000000..1db0aaaff Binary files /dev/null and b/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.png differ diff --git a/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.psd b/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.psd new file mode 100644 index 000000000..c9af6fe87 Binary files /dev/null and b/static/img/blogs/users/vivo/image/1f92fd96-c804-4de0-bc43-57f6bb3d47ee.psd differ diff --git a/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.png b/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.png new file mode 100644 index 000000000..a4758511e Binary files /dev/null and b/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.png differ diff --git a/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.psd b/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.psd new file mode 100644 index 000000000..5cf13d0f5 Binary files /dev/null and b/static/img/blogs/users/vivo/image/25f4daab-2a49-45a8-80f7-f95ad8ba749b.psd differ diff --git a/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.png b/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.png new file mode 100644 index 000000000..b6d0e90a9 Binary files /dev/null and b/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.png differ diff --git a/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.psd b/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.psd new file mode 100644 index 000000000..489707181 Binary files /dev/null and b/static/img/blogs/users/vivo/image/2c998925-2260-42b0-80e9-28a6f0fdaaa9.psd differ diff --git a/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.png b/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.png new file mode 100644 index 000000000..990b60298 Binary files /dev/null and b/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.png differ diff --git a/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.psd b/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.psd new file mode 100644 index 000000000..4d7d31e53 Binary files /dev/null and b/static/img/blogs/users/vivo/image/4200b9c5-c227-423a-bdf9-b6d0d10d801c.psd differ diff --git a/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.png b/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.png new file mode 100644 index 000000000..82c99aa3e Binary files /dev/null and b/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.png differ diff --git a/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.psd b/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.psd new file mode 100644 index 000000000..4fdce2845 Binary files /dev/null and b/static/img/blogs/users/vivo/image/5870cde8-ed04-438c-9139-582f38c2676d.psd differ diff --git a/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.png b/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.png new file mode 100644 index 000000000..22140c5d9 Binary files /dev/null and b/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.png differ diff --git a/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.psd b/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.psd new file mode 100644 index 000000000..8e9900149 Binary files /dev/null and b/static/img/blogs/users/vivo/image/88955026-f2cc-496e-a177-d64f336072aa.psd differ diff --git a/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.png b/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.png new file mode 100644 index 000000000..7dc131293 Binary files /dev/null and b/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.png differ diff --git a/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.psd b/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.psd new file mode 100644 index 000000000..de6dd3ccb Binary files /dev/null and b/static/img/blogs/users/vivo/image/8d842ddd-3152-4a31-9562-85fa32b8dcd1.psd differ diff --git a/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.png b/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.png new file mode 100644 index 000000000..eba1967b1 Binary files /dev/null and b/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.png differ diff --git a/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.psd b/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.psd new file mode 100644 index 000000000..a5d210950 Binary files /dev/null and b/static/img/blogs/users/vivo/image/c2e5b5e4-2375-475c-a22e-e1d6aeb7f773.psd differ diff --git a/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.png b/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.png new file mode 100644 index 000000000..9a2e26702 Binary files /dev/null and b/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.png differ diff --git a/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.psd b/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.psd new file mode 100644 index 000000000..5e965515f Binary files /dev/null and b/static/img/blogs/users/vivo/image/d7d158cc-e683-4af5-b0f1-b87b87ad665d.psd differ