From 6d36a1c6e4f732bddcac97e51b7789640c1fe822 Mon Sep 17 00:00:00 2001 From: josh-wong Date: Fri, 26 Sep 2025 05:06:30 +0000 Subject: [PATCH] AUTO: Sync ScalarDB docs in English to docs site repo --- docs/requirements.mdx | 29 +++++++++++++++++-- docs/scalardb-analytics/_README.mdx | 13 ++++----- docs/scalardb-analytics/configurations.mdx | 6 ++-- .../create-scalardb-analytics-catalog.mdx | 10 +++++-- docs/scalardb-analytics/deployment.mdx | 9 +++--- docs/scalardb-analytics/design.mdx | 6 ++-- .../reference-cli-command.mdx | 4 +-- .../run-analytical-queries.mdx | 12 ++------ 8 files changed, 56 insertions(+), 33 deletions(-) diff --git a/docs/requirements.mdx b/docs/requirements.mdx index 03e2e48db..01bb43acb 100644 --- a/docs/requirements.mdx +++ b/docs/requirements.mdx @@ -534,7 +534,7 @@ ScalarDB Cluster is provided as a cluster consisting of one or more Pods on the #### Platform -- **[Kubernetes](https://kubernetes.io/):** 1.28 - 1.32 +- **[Kubernetes](https://kubernetes.io/):** 1.30 - 1.33 - **[Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)** - **[Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/products/kubernetes-service)** - **[Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift):** TBD @@ -657,6 +657,31 @@ ScalarDB Analytics can run analytical queries on the following NoSQL databases * +#### Analytical platforms + +ScalarDB Analytics can run analytical queries on the following analytical platforms **not** managed by ScalarDB Core and Cluster. + + + + +| Version | Databricks | +| :-------------------------- | :--------- | +| **ScalarDB Analytics 3.16** | ✅ | +| **ScalarDB Analytics 3.15** | ❌ | +| **ScalarDB Analytics 3.14** | ❌ | + + + + +| Version | Snowflake | +| :-------------------------- | :-------- | +| **ScalarDB Analytics 3.16** | ✅ | +| **ScalarDB Analytics 3.15** | ❌ | +| **ScalarDB Analytics 3.14** | ❌ | + + + + ### Database permission requirements ScalarDB Analytics requires read permissions to perform its operations on the underlying databases. @@ -681,7 +706,7 @@ The server component of ScalarDB Analytics (ScalarDB Analytics server) is provid #### Platform -- **[Kubernetes](https://kubernetes.io/):** 1.28 - 1.32 +- **[Kubernetes](https://kubernetes.io/):** 1.30 - 1.33 - **[Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)** - **[Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/products/kubernetes-service)** - **[Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift):** TBD diff --git a/docs/scalardb-analytics/_README.mdx b/docs/scalardb-analytics/_README.mdx index fa475f1a3..8df88434a 100644 --- a/docs/scalardb-analytics/_README.mdx +++ b/docs/scalardb-analytics/_README.mdx @@ -24,13 +24,12 @@ This section provides links to various ScalarDB Analytics–related documentatio ### Key documentation -* [Overview](./overview.mdx) - Understand ScalarDB Analytics architecture and features -* [Deploy ScalarDB Analytics](./deployment.mdx) - Deploy on Amazon EMR, Databricks, and other platforms -* [Run Analytical Queries](./run-analytical-queries.mdx) - Execute queries across multiple databases -* [Administration Guide](./administration.mdx) - Manage catalogs and data sources -* [Configuration Reference](./configuration.mdx) - Configure Spark and data sources +* [Deploy ScalarDB Analytics in Public Cloud Environments](./deployment.mdx) - Deploy on Amazon EMR, Databricks, and other platforms +* [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx) - Create catalogs and add data sources +* [Run Analytical Queries Through ScalarDB Analytics](./run-analytical-queries.mdx) - Execute queries across multiple databases +* [ScalarDB Analytics Configurations](./configurations.mdx) - Configure Spark and data sources ### Technical details -* [Design Document](./design.mdx) - Deep dive into the technical architecture -* [Version Compatibility](./run-analytical-queries.mdx#version-compatibility) - Supported Spark and Scala versions +* [ScalarDB Analytics Design](./design.mdx) - Deep dive into the technical architecture +* [Spark](../requirements.mdx#spark) - Supported Spark and Scala versions diff --git a/docs/scalardb-analytics/configurations.mdx b/docs/scalardb-analytics/configurations.mdx index 59bdb01fb..228815216 100644 --- a/docs/scalardb-analytics/configurations.mdx +++ b/docs/scalardb-analytics/configurations.mdx @@ -344,6 +344,6 @@ spark.sql.catalog.analytics.server.tls.ca_root_cert_path /path/to/cert.pem ## Next steps -- [Catalog management](catalog-management.mdx) - Learn how to manage catalogs and data sources -- [Run analytical queries](run-analytical-queries.mdx) - Start running queries with your configuration -- [Deployment guide](deployment.mdx) - Deploy ScalarDB Analytics in production +- [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx) - Learn how to create catalogs and add data sources +- [Run Analytical Queries Through ScalarDB Analytics](run-analytical-queries.mdx) - Start running queries with your configuration +- [Deploy ScalarDB Analytics in Public Cloud Environments](deployment.mdx) - Deploy ScalarDB Analytics in production diff --git a/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx b/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx index 35b9a6fdb..ecd2c6c73 100644 --- a/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx +++ b/docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx @@ -18,7 +18,7 @@ Catalog information is managed by a component called a ScalarDB Analytics server ### Prerequisites -The ScalarDB Analytics server requires a database to store catalog information. We refer to this database as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database: +The ScalarDB Analytics server requires a database to store catalog information. This database is referred to as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database: - PostgreSQL - MySQL @@ -56,7 +56,9 @@ scalar.db.analytics.server.metering.storage.path=/var/scalardb-analytics/meterin ``` :::note -For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem.For detailed configuration options, see the [Configuration reference](./configurations.mdx). + +For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem. For detailed configuration options, see [ScalarDB Analytics Configurations](./configurations.mdx). + ::: ### Start the ScalarDB Analytics server @@ -108,6 +110,8 @@ docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 -t ScalarDB Analytics CLI is a command-line tool that communicates with the ScalarDB Analytics server to manage catalogs, register data sources, and perform administrative tasks. +For details, see the [ScalarDB Analytics CLI Command Reference](./reference-cli-command.mdx) + ### Install the CLI The `scalardb-analytics-cli` tool is available as a container image: @@ -146,7 +150,7 @@ scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/ca.crt scalar.db.analytics.client.server.tls.override_authority=analytics.example.com ``` -For detailed configuration options, see the [Configuration reference](./configurations.mdx). +For detailed configuration options, see [ScalarDB Analytics Configurations](./configurations.mdx). ### Set up an alias (optional) diff --git a/docs/scalardb-analytics/deployment.mdx b/docs/scalardb-analytics/deployment.mdx index d2211a252..86597e557 100644 --- a/docs/scalardb-analytics/deployment.mdx +++ b/docs/scalardb-analytics/deployment.mdx @@ -10,13 +10,14 @@ import TabItem from "@theme/TabItem"; # Deploy ScalarDB Analytics in Public Cloud Environments This guide explains how to deploy ScalarDB Analytics in a public cloud environment. ScalarDB Analytics consists of two main components: a ScalarDB Analytics server and Apache Spark. In this guide, you can choose either Amazon EMR or Databricks for the Spark environment. + For details about ScalarDB Analytics, refer to [ScalarDB Analytics Design](./design.mdx). -## Deploy ScalarDB Analytics catalog server +## Deploy ScalarDB Analytics server -ScalarDB Analytics requires a catalog server to manage metadata and data source connections. The catalog server should be deployed by using Helm charts on a Kubernetes cluster. +ScalarDB Analytics requires a catalog server to manage metadata and data source connections. The catalog server should be deployed by using Helm Charts on a Kubernetes cluster. -For detailed deployment instructions, see [TBD - Helm chart deployment guide]. +For detailed deployment instructions, see [How to install Scalar products through AWS Marketplace](../scalar-kubernetes/AwsMarketplaceGuide?products=scalardb-analytics-server). After deploying the catalog server, note the following information for Spark configuration: @@ -156,7 +157,7 @@ spark.sql.catalog..server.metering.port 11052 Replace the placeholders: - ``: The name of the catalog. This must match a catalog created on the ScalarDB Analytics server. -- ``: The host address of your ScalarDB Analytics catalog server. +- ``: The host address of your ScalarDB Analytics server. 4. Add the library of ScalarDB Analytics to the launched cluster as a Maven dependency. For details on how to add the library, refer to the [Databricks cluster libraries documentation](https://docs.databricks.com/en/libraries/cluster-libraries.html). diff --git a/docs/scalardb-analytics/design.mdx b/docs/scalardb-analytics/design.mdx index 92523b5d5..8d333c78f 100644 --- a/docs/scalardb-analytics/design.mdx +++ b/docs/scalardb-analytics/design.mdx @@ -50,12 +50,12 @@ graph TD The following are definitions for those levels: - **Catalog** is a folder that contains all your data source information. For example, you might have one catalog called `analytics_catalog` for your analytics data and another called `operational_catalog` for your day-to-day operations. -- **Data source** represents each data source you connect to. For each data source, we store important information like: +- **Data source** represents each data source you connect to. For each data source, ScalarDB Analytics stores important information like: - What kind of data source it is (PostgreSQL, Cassandra, etc.) - How to connect to it (connection details and passwords) - Special features the data source supports (like transactions) - **Namespace** is like a subfolder within your data source that groups related tables together. In PostgreSQL these are called schemas, in Cassandra they're called keyspaces. You can have multiple levels of namespaces, similar to having folders within folders. -- **Table** is where your actual data lives. For each table, we keep track of: +- **Table** is where your actual data lives. For each table, ScalarDB Analytics keeps track of: - What columns it has - What type of data each column can store - Whether columns can be empty (null) @@ -95,7 +95,7 @@ When registering a data source to ScalarDB Analytics, two types of mappings occu 1. **Catalog structure mapping**: The data source's catalog information (namespaces, tables, and columns) is resolved and mapped to the universal data catalog structure 2. **Data type mapping**: Native data types from each data source are mapped to the universal data types listed above -These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source). +These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog structure mappings by data source](./reference-data-source.mdx#catalog-structure-mappings-by-data-source). ## Query engine diff --git a/docs/scalardb-analytics/reference-cli-command.mdx b/docs/scalardb-analytics/reference-cli-command.mdx index c569d73de..3cb73c8a1 100644 --- a/docs/scalardb-analytics/reference-cli-command.mdx +++ b/docs/scalardb-analytics/reference-cli-command.mdx @@ -4,7 +4,7 @@ tags: displayed_sidebar: docsEnglish --- -# ScalarDB Analytics CLI command reference +# ScalarDB Analytics CLI Command Reference The ScalarDB Analytics CLI uses a hierarchical command structure: @@ -87,7 +87,7 @@ scalardb-analytics-cli data-source register --data-source-json Please replace `` with the file path to your data source registration file. -The `register` command requires a data source registration file. The file format is described in the [Data source configuration](#data-source-configuration) section below. +The `register` command requires a data source registration file. The file format is described in the [Data source registration file format](reference-data-source.mdx#data-source-registration-file-format) section below. ### List all data sources diff --git a/docs/scalardb-analytics/run-analytical-queries.mdx b/docs/scalardb-analytics/run-analytical-queries.mdx index 6a84789d2..47d49316d 100644 --- a/docs/scalardb-analytics/run-analytical-queries.mdx +++ b/docs/scalardb-analytics/run-analytical-queries.mdx @@ -20,13 +20,7 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S ### Prerequisites - **ScalarDB Analytics server:** A running instance that manages catalog information and connects to your data sources. The server must be set up with at least one data source registered. For registering data sources, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx). -- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html). - -:::note - -Apache Spark is built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. For more details, see [Version compatibility](#version-compatibility). - -::: +- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Spark](../requirements.mdx#spark). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html). ### Set up ScalarDB Analytics in the Spark configuration @@ -116,7 +110,7 @@ Depending on your environment, you may not be able to use all the methods mentio ::: -With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog information reference](./reference-data-source.mdx#catalog-information-reference). +With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog structure mappings by data source](./reference-data-source.mdx#catalog-structure-mappings-by-data-source). @@ -129,7 +123,7 @@ First, you need to set up your Java project. For example, if you are using Gradl ```kotlin dependencies { - implementation("com.scalar-labs:scalardb-analytics-spark-_:") + implementation("com.scalar-labs:scalardb-analytics-spark-all-_:") } ```