Skip to content

Commit

Permalink
#110: Added overview table with supported features (#124)
Browse files Browse the repository at this point in the history
  • Loading branch information
morazow committed Feb 10, 2021
1 parent 4651597 commit bdc05b0
Show file tree
Hide file tree
Showing 4 changed files with 145 additions and 58 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ Exasol Cloud Storage Extension provides [Exasol][exasol] user-defined functions
For more information please check out the following guides.

* [User Guide](doc/user_guide/user_guide.md)
* [Delta Format Import](doc/user_guide/delta_format.md)
* [Changelog](doc/changes/changelog.md)

## Information for Contributors
Expand Down
1 change: 1 addition & 0 deletions doc/changes/changes_1.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
## Documentation

* #89: Increased the default number of characters for file path (PR #105).
* #110: Added overview table with supported features (PR #124).

## Dependency Updates

Expand Down
55 changes: 0 additions & 55 deletions doc/user_guide/delta_format.md

This file was deleted.

146 changes: 144 additions & 2 deletions doc/user_guide/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,100 @@ Parquet, Avro or Orc.
- [Azure Blob Storage](#azure-blob-storage)
- [Azure DataLake Gen1 Storage](#azure-data-lake-gen1-storage)
- [Azure DataLake Gen2 Storage](#azure-data-lake-gen2-storage)
- [Delta Format](#delta-format)

## Overview

Here an overview of the supported features.

<table>
<tr>
<th rowspan="2">Import</th>
<th rowspan="2"></th>
<th colspan="4"></th>
</tr>
<tr>
<th>Parquet</th>
<th>Avro</th>
<th>Orc</th>
<th>Delta</th>
</tr>
<tr>
<td>AWS</td>
<td>S3</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center">&#10003;</td>
</tr>
<tr>
<td>GCP</td>
<td>Google Cloud Storage</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center">&#10003;</td>
<td rowspan="1" align="center"></td>
</tr>
<tr>
<td rowspan="3" align="center">Azure</td>
<td>Blob Storage</td>
<td rowspan="3" align="center">&#10003;</td>
<td rowspan="3" align="center">&#10003;</td>
<td rowspan="3" align="center">&#10003;</td>
<td rowspan="3" align="center">&#10003;</td>
</tr>
<tr>
<td>Data Lake (Gen1) Storage</td>
</tr>
<tr>
<td>Data Lake (Gen2) Storage</td>
</tr>
<tr></tr>
<tr>
<th rowspan="2">Export</th>
<th rowspan="2"></th>
<th colspan="4"></th>
</tr>
<tr>
<th>Parquet</th>
<th>Avro</th>
<th>Orc</th>
<th>Delta</th>
</tr>
<tr>
<td>AWS</td>
<td>S3</td>
<td rowspan="6" align="center">&#10003;</td>
<td rowspan="6" align="center"></td>
<td rowspan="6" align="center"></td>
<td rowspan="6" align="center"></td>
</tr>
<tr>
<td>GCP</td>
<td>Google Cloud Storage</td>
</tr>
<tr>
<td rowspan="4">Azure</td>
</tr>
<tr>
<td>Blob Storage</td>
</tr>
<tr>
<td>Data Lake (Gen1) Storage</td>
</tr>
<tr>
<td>Data Lake (Gen2) Storage</td>
</tr>
</table>

## Getting Started

We assume you have an Exasol cluster running with a version `6.0` or above.
The `cloud-storage-extension` works for all the supported Exasol versions.

### Supported Data Formats

We support the Parquet, Avro and Orc formats when importing data from cloud
storages into an Exasol table. However, we export Exasol tables only as Parquet
storages into an Exasol table. However, we export Exasol tables only as Parquet
data to storage systems.

### Supported Cloud Storage Systems
Expand Down Expand Up @@ -871,3 +956,60 @@ INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
```

The bucket path should start with an `abfs` or `abfss` URI scheme.

## Delta Format

[Delta format][delta-io] is an open-source storage layer that brings ACID
transaction properties to Apache Spark and other blob storage systems.

Using the Exasol Cloud Storage Extension, it is now possible to import data from
the Delta format.

### Import Delta Formatted Data

Like other cloud storage systems, you can run the Exasol IMPORT SQL statement to
import the data from the Delta format.

Here is an example of import delta formatted data from Amazon S3:

```sql
CREATE OR REPLACE CONNECTION S3_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'S3_ACCESS_KEY=<AWS_ACCESS_KEY>;S3_SECRET_KEY=<AWS_SECRET_KEY>';

IMPORT INTO <schema>.<table>
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
BUCKET_PATH = 's3a://<S3_PATH>/import/delta/data/*'
DATA_FORMAT = 'DELTA'
S3_ENDPOINT = 's3.<REGION>.amazonaws.com'
CONNECTION_NAME = 'S3_CONNECTION'
PARALLELISM = 'nproc()*<MULTIPLIER>';
```

### Supported Cloud Storage Systems

Currently, cloud-storage-extension supports importing delta formatted data from
Amazon S3, Azure Blob Storage and Azure Data Lake Storage Gen1 and Gen2 storage
systems.

You can read more about the supported storage requirements and configuration on
the [delta.io/delta-storage.html][delta-storage] page.

### Delta Snapshot

When running the import SQL statement, we first query the [latest
snapshot][delta-history] of the Delta format and only import the data from the
latest snapshot version. Thus, each import from the Delta format will import the
snapshot data to the Exasol table.

### Schema Evolution

Delta format supports schema evolution and the import statement queries the
latest schema defined in the Delta format. Therefore, users should update the
Exasol table schema manually before the import if the schema in the delta format
changes.

[delta-io]: https://delta.io/
[delta-storage]: https://docs.delta.io/latest/delta-storage.html
[delta-history]: https://docs.delta.io/latest/delta-utility.html#history

0 comments on commit bdc05b0

Please sign in to comment.