diff --git a/best-practices/grafana-monitor-best-practices.md b/best-practices/grafana-monitor-best-practices.md index 073df9a161e4a..233683af537fc 100644 --- a/best-practices/grafana-monitor-best-practices.md +++ b/best-practices/grafana-monitor-best-practices.md @@ -15,7 +15,7 @@ When you [deploy a TiDB cluster using TiDB Ansible](/online-deployment-using-ans ![The monitoring architecture in the TiDB cluster](/media/prometheus-in-tidb.png) -For TiDB 2.1.3 or later versions, TiDB monitoring uses the pull method instead of the push method which is used in previous versions. It is a good adjustment with the following benefits: +For TiDB 2.1.3 or later versions, TiDB monitoring supports the pull method. It is a good adjustment with the following benefits: - There is no need to restart the entire TiDB cluster if you need to migrate Prometheus. Before adjustment, migrating Prometheus requires restarting the entire cluster because the target address needs to be updated. - You can deploy 2 separate sets of Grafana + Prometheus monitoring platforms (not highly available) to prevent a single point of monitoring. To do this, execute the deployment command of TiDB ansible twice with different IP addresses. diff --git a/tispark-overview.md b/tispark-overview.md index edf5a5df86ca9..c0ff76774ab29 100644 --- a/tispark-overview.md +++ b/tispark-overview.md @@ -20,7 +20,8 @@ TiSpark is an OLAP solution that runs Spark SQL directly on TiKV, the distribute + TiSpark integrates with Spark Catalyst Engine deeply. It provides precise control of the computing, which allows Spark read data from TiKV efficiently. It also supports index seek, which improves the performance of the point query execution significantly. + It utilizes several strategies to push down the computing to reduce the size of dataset handling by Spark SQL, which accelerates the query execution. It also uses the TiDB built-in statistical information for the query plan optimization. + From the data integration point of view, TiSpark and TiDB serve as a solution for running both transaction and analysis directly on the same platform without building and maintaining any ETLs. It simplifies the system architecture and reduces the cost of maintenance. -+ also, you can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using TiSpark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ++ You can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using TiSpark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ++ Also, TiSpark supports distributed writes to TiKV. Compared to using Spark combined with JDBC to write to TiDB, distributed writes to TiKV can implement transactions (either all data are written successfully or all writes fail), and the writes are faster. ## Environment setup @@ -90,6 +91,8 @@ spark-shell --jars $TISPARK_FOLDER/tispark-${name_with_version}.jar If you do not have a Spark cluster, we recommend using the standalone mode. To use the Spark Standalone model, you can simply place a compiled version of Spark on each node of the cluster. If you encounter problems, see its [official website](https://spark.apache.org/docs/latest/spark-standalone.html). And you are welcome to [file an issue](https://github.com/pingcap/tispark/issues/new) on our GitHub. +If you are using TiDB Ansible to deploy a TiDB cluster, you can also use TiDB Ansible to deploy a Spark standalone cluster, and TiSpark is also deployed at the same time. + #### Download and install You can download [Apache Spark](https://spark.apache.org/downloads.html)