From bdc05b0efd1f9ff86d17dd1f854047e6f7e430e7 Mon Sep 17 00:00:00 2001 From: Muhammet Orazov Date: Wed, 10 Feb 2021 12:26:45 +0100 Subject: [PATCH] #110: Added overview table with supported features (#124) --- README.md | 1 - doc/changes/changes_1.0.0.md | 1 + doc/user_guide/delta_format.md | 55 ------------- doc/user_guide/user_guide.md | 146 ++++++++++++++++++++++++++++++++- 4 files changed, 145 insertions(+), 58 deletions(-) delete mode 100644 doc/user_guide/delta_format.md diff --git a/README.md b/README.md index 5640d14f..cf7f9391 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,6 @@ Exasol Cloud Storage Extension provides [Exasol][exasol] user-defined functions For more information please check out the following guides. * [User Guide](doc/user_guide/user_guide.md) -* [Delta Format Import](doc/user_guide/delta_format.md) * [Changelog](doc/changes/changelog.md) ## Information for Contributors diff --git a/doc/changes/changes_1.0.0.md b/doc/changes/changes_1.0.0.md index fcb2f5b6..2e2093b6 100644 --- a/doc/changes/changes_1.0.0.md +++ b/doc/changes/changes_1.0.0.md @@ -20,6 +20,7 @@ ## Documentation * #89: Increased the default number of characters for file path (PR #105). +* #110: Added overview table with supported features (PR #124). ## Dependency Updates diff --git a/doc/user_guide/delta_format.md b/doc/user_guide/delta_format.md deleted file mode 100644 index 87a258bb..00000000 --- a/doc/user_guide/delta_format.md +++ /dev/null @@ -1,55 +0,0 @@ -# Delta Format - -[Delta format][delta-io] is an open-source storage layer that brings ACID -transaction properties to Apache Spark and other blob storage systems. - -Using the Exasol Cloud Storage Extension, it is now possible to import data from -the Delta format. - -## Import Delta Formatted Data - -Like other cloud storage systems, you can run the Exasol IMPORT SQL statement to -import the data from the Delta format. - -Here is an example of import delta formatted data from Amazon S3: - -```sql -CREATE OR REPLACE CONNECTION S3_CONNECTION -TO '' -USER '' -IDENTIFIED BY 'S3_ACCESS_KEY=;S3_SECRET_KEY='; - -IMPORT INTO . -FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH - BUCKET_PATH = 's3a:///import/delta/data/*' - DATA_FORMAT = 'DELTA' - S3_ENDPOINT = 's3..amazonaws.com' - CONNECTION_NAME = 'S3_CONNECTION' - PARALLELISM = 'nproc()*'; -``` - -## Supported Cloud Storage Systems - -Currently, only Amazon S3, Azure Blob Storage and Azure Data Lake Storage Gen1 -storage systems are supported. - -You can read more about the supported storage requirements and configuration on -the [delta.io/delta-storage.html][delta-storage] page. - -## Delta Snapshot - -When running the import SQL statement, we first query the [latest -snapshot][delta-history] of the Delta format and only import the data from the -latest snapshot version. Thus, each import from the Delta format will import the -snapshot data to the Exasol table. - -## Schema Evolution - -Delta format supports schema evolution and the import statement queries the -latest schema defined in the Delta format. Therefore, users should update the -Exasol table schema manually before the import if the schema in the delta format -changes. - -[delta-io]: https://delta.io/ -[delta-storage]: https://docs.delta.io/latest/delta-storage.html -[delta-history]: https://docs.delta.io/latest/delta-utility.html#history diff --git a/doc/user_guide/user_guide.md b/doc/user_guide/user_guide.md index ad5af2fd..b18a03c5 100644 --- a/doc/user_guide/user_guide.md +++ b/doc/user_guide/user_guide.md @@ -19,15 +19,100 @@ Parquet, Avro or Orc. - [Azure Blob Storage](#azure-blob-storage) - [Azure DataLake Gen1 Storage](#azure-data-lake-gen1-storage) - [Azure DataLake Gen2 Storage](#azure-data-lake-gen2-storage) +- [Delta Format](#delta-format) + +## Overview + +Here an overview of the supported features. + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Import
ParquetAvroOrcDelta
AWSS3
GCPGoogle Cloud Storage
AzureBlob Storage
Data Lake (Gen1) Storage
Data Lake (Gen2) Storage
Export
ParquetAvroOrcDelta
AWSS3
GCPGoogle Cloud Storage
Azure
Blob Storage
Data Lake (Gen1) Storage
Data Lake (Gen2) Storage
## Getting Started -We assume you have an Exasol cluster running with a version `6.0` or above. +The `cloud-storage-extension` works for all the supported Exasol versions. ### Supported Data Formats We support the Parquet, Avro and Orc formats when importing data from cloud -storages into an Exasol table. However, we export Exasol tables only as Parquet +storages into an Exasol table. However, we export Exasol tables only as Parquet data to storage systems. ### Supported Cloud Storage Systems @@ -871,3 +956,60 @@ INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH ``` The bucket path should start with an `abfs` or `abfss` URI scheme. + +## Delta Format + +[Delta format][delta-io] is an open-source storage layer that brings ACID +transaction properties to Apache Spark and other blob storage systems. + +Using the Exasol Cloud Storage Extension, it is now possible to import data from +the Delta format. + +### Import Delta Formatted Data + +Like other cloud storage systems, you can run the Exasol IMPORT SQL statement to +import the data from the Delta format. + +Here is an example of import delta formatted data from Amazon S3: + +```sql +CREATE OR REPLACE CONNECTION S3_CONNECTION +TO '' +USER '' +IDENTIFIED BY 'S3_ACCESS_KEY=;S3_SECRET_KEY='; + +IMPORT INTO . +FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH + BUCKET_PATH = 's3a:///import/delta/data/*' + DATA_FORMAT = 'DELTA' + S3_ENDPOINT = 's3..amazonaws.com' + CONNECTION_NAME = 'S3_CONNECTION' + PARALLELISM = 'nproc()*'; +``` + +### Supported Cloud Storage Systems + +Currently, cloud-storage-extension supports importing delta formatted data from +Amazon S3, Azure Blob Storage and Azure Data Lake Storage Gen1 and Gen2 storage +systems. + +You can read more about the supported storage requirements and configuration on +the [delta.io/delta-storage.html][delta-storage] page. + +### Delta Snapshot + +When running the import SQL statement, we first query the [latest +snapshot][delta-history] of the Delta format and only import the data from the +latest snapshot version. Thus, each import from the Delta format will import the +snapshot data to the Exasol table. + +### Schema Evolution + +Delta format supports schema evolution and the import statement queries the +latest schema defined in the Delta format. Therefore, users should update the +Exasol table schema manually before the import if the schema in the delta format +changes. + +[delta-io]: https://delta.io/ +[delta-storage]: https://docs.delta.io/latest/delta-storage.html +[delta-history]: https://docs.delta.io/latest/delta-utility.html#history