Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions docs/requirements.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@ ScalarDB Cluster is provided as a cluster consisting of one or more Pods on the

#### Platform

- **[Kubernetes](https://kubernetes.io/):** 1.28 - 1.32
- **[Kubernetes](https://kubernetes.io/):** 1.30 - 1.33
- **[Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)**
- **[Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/products/kubernetes-service)**
- **[Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift):** TBD
Expand Down Expand Up @@ -657,6 +657,31 @@ ScalarDB Analytics can run analytical queries on the following NoSQL databases *
</TabItem>
</Tabs>

#### Analytical platforms

ScalarDB Analytics can run analytical queries on the following analytical platforms **not** managed by ScalarDB Core and Cluster.

<Tabs groupId="analytical-platforms" queryString>
<TabItem value="databricks" label="Databricks" default>

| Version | Databricks |
| :-------------------------- | :--------- |
| **ScalarDB Analytics 3.16** | ✅ |
| **ScalarDB Analytics 3.15** | ❌ |
| **ScalarDB Analytics 3.14** | ❌ |

</TabItem>
<TabItem value="snowflake" label="Snowflake">

| Version | Snowflake |
| :-------------------------- | :-------- |
| **ScalarDB Analytics 3.16** | ✅ |
| **ScalarDB Analytics 3.15** | ❌ |
| **ScalarDB Analytics 3.14** | ❌ |

</TabItem>
</Tabs>

### Database permission requirements

ScalarDB Analytics requires read permissions to perform its operations on the underlying databases.
Expand All @@ -681,7 +706,7 @@ The server component of ScalarDB Analytics (ScalarDB Analytics server) is provid

#### Platform

- **[Kubernetes](https://kubernetes.io/):** 1.28 - 1.32
- **[Kubernetes](https://kubernetes.io/):** 1.30 - 1.33
- **[Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)**
- **[Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/products/kubernetes-service)**
- **[Red Hat OpenShift](https://www.redhat.com/en/technologies/cloud-computing/openshift):** TBD
Expand Down
13 changes: 6 additions & 7 deletions docs/scalardb-analytics/_README.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,12 @@ This section provides links to various ScalarDB Analytics–related documentatio

### Key documentation

* [Overview](./overview.mdx) - Understand ScalarDB Analytics architecture and features
* [Deploy ScalarDB Analytics](./deployment.mdx) - Deploy on Amazon EMR, Databricks, and other platforms
* [Run Analytical Queries](./run-analytical-queries.mdx) - Execute queries across multiple databases
* [Administration Guide](./administration.mdx) - Manage catalogs and data sources
* [Configuration Reference](./configuration.mdx) - Configure Spark and data sources
* [Deploy ScalarDB Analytics in Public Cloud Environments](./deployment.mdx) - Deploy on Amazon EMR, Databricks, and other platforms
* [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx) - Create catalogs and add data sources
* [Run Analytical Queries Through ScalarDB Analytics](./run-analytical-queries.mdx) - Execute queries across multiple databases
* [ScalarDB Analytics Configurations](./configurations.mdx) - Configure Spark and data sources

### Technical details

* [Design Document](./design.mdx) - Deep dive into the technical architecture
* [Version Compatibility](./run-analytical-queries.mdx#version-compatibility) - Supported Spark and Scala versions
* [ScalarDB Analytics Design](./design.mdx) - Deep dive into the technical architecture
* [Spark](../requirements.mdx#spark) - Supported Spark and Scala versions
6 changes: 3 additions & 3 deletions docs/scalardb-analytics/configurations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,6 @@ spark.sql.catalog.analytics.server.tls.ca_root_cert_path /path/to/cert.pem

## Next steps

- [Catalog management](catalog-management.mdx) - Learn how to manage catalogs and data sources
- [Run analytical queries](run-analytical-queries.mdx) - Start running queries with your configuration
- [Deployment guide](deployment.mdx) - Deploy ScalarDB Analytics in production
- [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx) - Learn how to create catalogs and add data sources
- [Run Analytical Queries Through ScalarDB Analytics](run-analytical-queries.mdx) - Start running queries with your configuration
- [Deploy ScalarDB Analytics in Public Cloud Environments](deployment.mdx) - Deploy ScalarDB Analytics in production
10 changes: 7 additions & 3 deletions docs/scalardb-analytics/create-scalardb-analytics-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Catalog information is managed by a component called a ScalarDB Analytics server

### Prerequisites

The ScalarDB Analytics server requires a database to store catalog information. We refer to this database as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database:
The ScalarDB Analytics server requires a database to store catalog information. This database is referred to as the **metadata database** throughout this documentation. ScalarDB Analytics supports the following databases for the metadata database:

- PostgreSQL
- MySQL
Expand Down Expand Up @@ -56,7 +56,9 @@ scalar.db.analytics.server.metering.storage.path=/var/scalardb-analytics/meterin
```

:::note
For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem.For detailed configuration options, see the [Configuration reference](./configurations.mdx).

For production deployments, configure metering storage to use object storage (for example, Amazon S3, Google Cloud Storage, or Azure Blob Storage) instead of the local filesystem. For detailed configuration options, see [ScalarDB Analytics Configurations](./configurations.mdx).

:::

### Start the ScalarDB Analytics server
Expand Down Expand Up @@ -108,6 +110,8 @@ docker exec scalardb-analytics-server grpc-health-probe -addr=localhost:11051 -t

ScalarDB Analytics CLI is a command-line tool that communicates with the ScalarDB Analytics server to manage catalogs, register data sources, and perform administrative tasks.

For details, see the [ScalarDB Analytics CLI Command Reference](./reference-cli-command.mdx)

### Install the CLI

The `scalardb-analytics-cli` tool is available as a container image:
Expand Down Expand Up @@ -146,7 +150,7 @@ scalar.db.analytics.client.server.tls.ca_root_cert_path=/path/to/ca.crt
scalar.db.analytics.client.server.tls.override_authority=analytics.example.com
```

For detailed configuration options, see the [Configuration reference](./configurations.mdx).
For detailed configuration options, see [ScalarDB Analytics Configurations](./configurations.mdx).

### Set up an alias (optional)

Expand Down
9 changes: 5 additions & 4 deletions docs/scalardb-analytics/deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@ import TabItem from "@theme/TabItem";
# Deploy ScalarDB Analytics in Public Cloud Environments

This guide explains how to deploy ScalarDB Analytics in a public cloud environment. ScalarDB Analytics consists of two main components: a ScalarDB Analytics server and Apache Spark. In this guide, you can choose either Amazon EMR or Databricks for the Spark environment.

For details about ScalarDB Analytics, refer to [ScalarDB Analytics Design](./design.mdx).

## Deploy ScalarDB Analytics catalog server
## Deploy ScalarDB Analytics server

ScalarDB Analytics requires a catalog server to manage metadata and data source connections. The catalog server should be deployed by using Helm charts on a Kubernetes cluster.
ScalarDB Analytics requires a catalog server to manage metadata and data source connections. The catalog server should be deployed by using Helm Charts on a Kubernetes cluster.

For detailed deployment instructions, see [TBD - Helm chart deployment guide].
For detailed deployment instructions, see [How to install Scalar products through AWS Marketplace](../scalar-kubernetes/AwsMarketplaceGuide?products=scalardb-analytics-server).

After deploying the catalog server, note the following information for Spark configuration:

Expand Down Expand Up @@ -156,7 +157,7 @@ spark.sql.catalog.<CATALOG_NAME>.server.metering.port 11052
Replace the placeholders:

- `<CATALOG_NAME>`: The name of the catalog. This must match a catalog created on the ScalarDB Analytics server.
- `<CATALOG_SERVER_HOST>`: The host address of your ScalarDB Analytics catalog server.
- `<CATALOG_SERVER_HOST>`: The host address of your ScalarDB Analytics server.

4. Add the library of ScalarDB Analytics to the launched cluster as a Maven dependency. For details on how to add the library, refer to the [Databricks cluster libraries documentation](https://docs.databricks.com/en/libraries/cluster-libraries.html).

Expand Down
6 changes: 3 additions & 3 deletions docs/scalardb-analytics/design.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ graph TD
The following are definitions for those levels:

- **Catalog** is a folder that contains all your data source information. For example, you might have one catalog called `analytics_catalog` for your analytics data and another called `operational_catalog` for your day-to-day operations.
- **Data source** represents each data source you connect to. For each data source, we store important information like:
- **Data source** represents each data source you connect to. For each data source, ScalarDB Analytics stores important information like:
- What kind of data source it is (PostgreSQL, Cassandra, etc.)
- How to connect to it (connection details and passwords)
- Special features the data source supports (like transactions)
- **Namespace** is like a subfolder within your data source that groups related tables together. In PostgreSQL these are called schemas, in Cassandra they're called keyspaces. You can have multiple levels of namespaces, similar to having folders within folders.
- **Table** is where your actual data lives. For each table, we keep track of:
- **Table** is where your actual data lives. For each table, ScalarDB Analytics keeps track of:
- What columns it has
- What type of data each column can store
- Whether columns can be empty (null)
Expand Down Expand Up @@ -95,7 +95,7 @@ When registering a data source to ScalarDB Analytics, two types of mappings occu
1. **Catalog structure mapping**: The data source's catalog information (namespaces, tables, and columns) is resolved and mapped to the universal data catalog structure
2. **Data type mapping**: Native data types from each data source are mapped to the universal data types listed above

These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog information mappings by data source](./design.mdx#catalog-information-mappings-by-data-source).
These mappings ensure compatibility and consistency across different database systems. For detailed information about how specific databases are mapped, see [Catalog structure mappings by data source](./reference-data-source.mdx#catalog-structure-mappings-by-data-source).

## Query engine

Expand Down
4 changes: 2 additions & 2 deletions docs/scalardb-analytics/reference-cli-command.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ tags:
displayed_sidebar: docsEnglish
---

# ScalarDB Analytics CLI command reference
# ScalarDB Analytics CLI Command Reference

The ScalarDB Analytics CLI uses a hierarchical command structure:

Expand Down Expand Up @@ -87,7 +87,7 @@ scalardb-analytics-cli data-source register --data-source-json <path-to-json>

Please replace `<path-to-json>` with the file path to your data source registration file.

The `register` command requires a data source registration file. The file format is described in the [Data source configuration](#data-source-configuration) section below.
The `register` command requires a data source registration file. The file format is described in the [Data source registration file format](reference-data-source.mdx#data-source-registration-file-format) section below.

### List all data sources

Expand Down
12 changes: 3 additions & 9 deletions docs/scalardb-analytics/run-analytical-queries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,7 @@ This section describes the prerequisites, setting up ScalarDB Analytics in the S
### Prerequisites

- **ScalarDB Analytics server:** A running instance that manages catalog information and connects to your data sources. The server must be set up with at least one data source registered. For registering data sources, see [Create a ScalarDB Analytics Catalog](./create-scalardb-analytics-catalog.mdx).
- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Version compatibility](#version-compatibility). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).

:::note

Apache Spark is built with either Scala 2.12 or Scala 2.13. ScalarDB Analytics supports both versions. You need to be sure which version you are using so that you can select the correct version of ScalarDB Analytics later. For more details, see [Version compatibility](#version-compatibility).

:::
- **Apache Spark:** A compatible version of Apache Spark. For supported versions, see [Spark](../requirements.mdx#spark). If you don't have Spark installed yet, please download the Spark distribution from [Apache's website](https://spark.apache.org/downloads.html).

### Set up ScalarDB Analytics in the Spark configuration

Expand Down Expand Up @@ -116,7 +110,7 @@ Depending on your environment, you may not be able to use all the methods mentio

:::

With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog information reference](./reference-data-source.mdx#catalog-information-reference).
With all these methods, you can refer to tables in ScalarDB Analytics by using the same table identifier format. For details about how ScalarDB Analytics maps catalog information from data sources, see [Catalog structure mappings by data source](./reference-data-source.mdx#catalog-structure-mappings-by-data-source).

<Tabs groupId="spark-application-type" queryString>
<TabItem value="spark-driver" label="Spark driver application">
Expand All @@ -129,7 +123,7 @@ First, you need to set up your Java project. For example, if you are using Gradl

```kotlin
dependencies {
implementation("com.scalar-labs:scalardb-analytics-spark-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>")
implementation("com.scalar-labs:scalardb-analytics-spark-all-<SPARK_VERSION>_<SCALA_VERSION>:<SCALARDB_ANALYTICS_VERSION>")
}
```

Expand Down