Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.0 #38

Merged
merged 8 commits into from
Oct 4, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,7 @@ dependency-reduced-pom.xml

# Others
.DS_Store
*.swp
*.swp
**/local
Scripts
.dbeaver*
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ matrix:
include:
- jdk: "oraclejdk8"

before_script: ./jdbc-adapter/tools/version.sh verify

script: ./jdbc-adapter/integration-test-data/run_integration_tests.sh
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Build Status](https://travis-ci.org/EXASOL/virtual-schemas.svg?branch=master)](https://travis-ci.org/EXASOL/virtual-schemas)

###### Please note that this is an open source project which is officially supported by Exasol. For any question, you can contact our support team.
<p style="border: 1px solid black;padding: 10px; background-color: #FFFFCC;"><span style="font-size:200%">&#9888;</span> Please note that this is an open source project which is officially supported by Exasol. For any question, you can contact our support team.</p>

Virtual schemas provide a powerful abstraction to conveniently access arbitrary data sources. Virtual schemas are a kind of read-only link to an external source and contain virtual tables which look like regular tables except that the actual data are not stored locally.

Expand All @@ -14,16 +14,14 @@ Please note that virtual schemas are part of the Advanced Edition of Exasol.

For further details about the concept, usage and examples, please see the corresponding chapter in our Exasol User Manual.


## API Specification

The subdirectory [doc](doc) contains the API specification for virtual schema adapters.


## JDBC Adapter

The subdirectory [jdbc-adapter](jdbc-adapter) contains the JDBC adapter which allows to integrate any kind of JDBC data source which provides a JDBC driver.

## Python Redis Demo Adapter

The subdirectory [python-redis-demo-adapter](python-redis-demo-adapter) contains a demo adapter for Redis written in Python. This adapter was created to easily demonstrate the key concepts in a real, but very simple implementation. If you want to write your own adapter, this might be the right code to get a first impression what you'll have to develop.
The subdirectory [python-redis-demo-adapter](python-redis-demo-adapter) contains a demo adapter for Redis written in Python. This adapter was created to easily demonstrate the key concepts in a real, but very simple implementation. If you want to write your own adapter, this might be the right code to get a first impression what you'll have to develop.
81 changes: 56 additions & 25 deletions jdbc-adapter/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,52 @@
# JDBC Adapter for Virtual Schemas

[![Build Status](https://travis-ci.org/EXASOL/virtual-schemas.svg?branch=master)](https://travis-ci.org/EXASOL/virtual-schemas)
[![Build Status](https://travis-ci.org/EXASOL/virtual-schemas.svg)](https://travis-ci.org/EXASOL/virtual-schemas)

## Supported Dialects

1. [EXASOL](doc/sql_dialects/exasol.md)
1. [Hive](doc/sql_dialects/hive.md)
1. [Impala](doc/sql_dialects/impala.md)
1. [DB2](doc/sql_dialects/db2.md)
1. [Oracle](doc/sql_dialects/oracle.md)
1. [Teradata](doc/sql_dialects/teradata.md)
1. [Redshift](doc/sql_dialects/redshift.md)
1. [SQL Server](doc/sql_dialects/sql_server.md)
1. [Sybase ASE](doc/sql_dialects/sybase.md)
1. [PostgresSQL](doc/sql_dialects/postgresql.md)
1. Generic

## Overview
The JDBC adapter for virtual schemas allows you to connect to JDBC data sources like Hive, Oracle, Teradata, Exasol or any other data source supporting JDBC. It uses the well proven ```IMPORT FROM JDBC``` Exasol statement behind the scenes to obtain the requested data, when running a query on a virtual table. The JDBC adapter also serves as the reference adapter for the Exasol virtual schema framework.

The JDBC adapter currently supports the following SQL dialects and data sources. This list will be continuously extended based on the feedback from our users:
* Exasol
* Hive
* Impala
* Oracle
* Teradata
* Redshift
* DB2
* SQL Server
* PostgreSQL
Check the [SQL dialect list](doc/supported_sql_dialects.md) to learn which SQL dialects the JDBC adapter currently supports

This list will be continuously extended based on the feedback from our users.

Each such implementation of a dialect handles three major aspects:

* How to **map the tables** in the source systems to virtual tables in Exasol, including how to **map the data types** to Exasol data types.
* How is the **SQL syntax** of the data source, including identifier quoting, case-sensitivity, function names, or special syntax like `LIMIT`/`TOP`.
* How is the **SQL syntax** of the data source, including identifier quoting, case-sensitivity, function names, or special syntax like `LIMIT` / `TOP`.
* Which **capabilities** are supported by the data source. E.g. is it supported to run filters, to specify select list expressions, to run aggregation or scalar functions or to order or limit the result.

In addition to the aforementioned dialects there is the so called `GENERIC` dialect, which is designed to work with any JDBC driver. It derives the SQL dialect from the JDBC driver metadata. However, it does not support any capabilities and might fail if the data source has special syntax or data types, so it should only be used for evaluation purposes.

If you are interested in a introduction to virtual schemas please refer to the Exasol user manual. You can find it in the [download area of the Exasol user portal](https://www.exasol.com/portal/display/DOWNLOAD/6.0).
If you are interested in a introduction to virtual schemas please refer to the Exasol user manual. You can find it in the [download area of the Exasol user portal](https://www.exasol.com/portal/display/DOC/Database+User+Manual).

## Before you Start

Please note that the syntax for creating adapter scripts is not recognized by all SQL clients. [DBeaver](https://dbeaver.io/) for example. If you encounter such a problem, try a different client.

## Getting Started

Before you can start using the JDBC adapter for virtual schemas you have to deploy the adapter and the JDBC driver of your data source in your Exasol database.
Please follow the [step-by-step deployment guide](doc/deploy-adapter.md).

Please follow the [step-by-step deployment guide](doc/deploying_the_virtual_schema_adapter.md).

## Using the Adapter
The following statements demonstrate how you can use virtual schemas with the JDBC adapter to connect to a Hive system. Please scroll down to see a list of all properties supported by the JDBC adapter.

First we create a virtual schema using the JDBC adapter. The adapter will retrieve the metadata via JDBC and map them to virtual tables. The metadata (virtual tables, columns and data types) are then cached in Exasol.

```sql
CREATE CONNECTION hive_conn TO 'jdbc:hive2://localhost:10000/default' USER 'hive-usr' IDENTIFIED BY 'hive-pwd';

Expand All @@ -46,47 +57,53 @@ CREATE VIRTUAL SCHEMA hive USING adapter.jdbc_adapter WITH
```

We can now explore the tables in the virtual schema, just like for a regular schema:

```sql
OPEN SCHEMA hive;
SELECT * FROM cat;
DESCRIBE clicks;
```

And we can run arbitrary queries on the virtual tables:

```sql
SELECT count(*) FROM clicks;
SELECT DISTINCT USER_ID FROM clicks;
```

Behind the scenes the Exasol command `IMPORT FROM JDBC` will be executed to obtain the data needed from the data source to fulfil the query. The Exasol database interacts with the adapter to pushdown as much as possible to the data source (e.g. filters, aggregations or `ORDER BY`/`LIMIT`), while considering the capabilities of the data source.
Behind the scenes the Exasol command `IMPORT FROM JDBC` will be executed to obtain the data needed from the data source to fulfil the query. The Exasol database interacts with the adapter to pushdown as much as possible to the data source (e.g. filters, aggregations or `ORDER BY` / `LIMIT`), while considering the capabilities of the data source.

Let's combine a virtual and a native tables in a query:
```

```sql
SELECT * from clicks JOIN native_schema.users on clicks.userid = users.id;
```

You can refresh the schemas metadata, e.g. if tables were added in the remote system:

```sql
ALTER VIRTUAL SCHEMA hive REFRESH;
ALTER VIRTUAL SCHEMA hive REFRESH TABLES t1 t2; -- refresh only these tables
```

Or set properties. Depending on the adapter and the property you set this might update the metadata or not. In our example the metadata are affected, because afterwards the virtual schema will only expose two virtul tables.
Or set properties. Depending on the adapter and the property you set this might update the metadata or not. In our example the metadata are affected, because afterwards the virtual schema will only expose two virtual tables.

```sql
ALTER VIRTUAL SCHEMA hive SET TABLE_FILTER='CUSTOMERS, CLICKS';
```

Finally you can unset properties:

```sql
ALTER VIRTUAL SCHEMA hive SET TABLE_FILTER=null;
```

Or drop the virtual schema:

```sql
DROP VIRTUAL SCHEMA hive CASCADE;
```


### Adapter Properties
The following properties can be used to control the behavior of the JDBC adapter. As you see above, these properties can be defined in `CREATE VIRTUAL SCHEMA` or changed afterwards via `ALTER VIRTUAL SCHEMA SET`. Note that properties are always strings, like `TABLE_FILTER='T1,T2'`.

Expand Down Expand Up @@ -129,14 +146,17 @@ Property | Value


## Debugging

To see all communication between the database and the adapter you can use the python script udf_debug.py located in the [tools](tools) directory.

First, start the `udf_debug.py` script, which will listen on the specified address and print all incoming text.
```

```sh
python tools/udf_debug.py -s myhost -p 3000
```

Then run following SQL statement in your session to redirect all stdout and stderr from the adapter script to the `udf_debug.py` script we started before.

```sql
ALTER SESSION SET SCRIPT_OUTPUT_ADDRESS='host-where-udf-debug-script-runs:3000'
```
Expand All @@ -145,12 +165,23 @@ You have to make sure that Exasol can connect to the host running the `udf_debug


## Frequent Issues
* **Error: No suitable driver found for JDBC...**: The JDBC driver class was not discovered automatically. Either you have to add a `META-INF/services/java.sql.Driver` file with the classname to your jar, or you have to load the driver manually (see `JdbcMetadataReader.readRemoteMetadata()`).

### Error: No suitable driver found for JDBC...

The JDBC driver class was not discovered automatically. Either you have to add a `META-INF/services/java.sql.Driver` file with the class name to your JAR, or you have to load the driver manually (see `JdbcMetadataReader.readRemoteMetadata()`).

See https://docs.oracle.com/javase/7/docs/api/java/sql/DriverManager.html
* **Very slow execution of queries with SCRIPT_OUTPUT_ADDRESS**: If `SCRIPT_OUTPUT_ADDRESS` is set as explained in the [debugging section](#debugging), verify that a service is actually listening at that address. Otherwise, if Exasol can not establish a connection, repeated connection attempts can be the cause for slowdowns.
* **Very slow execution of queries**: Depending on which JDK version Exasol uses to execute Java user-defined functions, a blocking randomness source may be used by default. Especially cryptographic operations do not complete until the operating system has collected a sufficient amount of entropy. This problem seems to occur most often when Exasol is run in an isolated environment, e.g., a virtual machine or a container. A solution is to use a non-blocking randomness source.
To do so, log in to EXAOperation and shutdown the database. Append `-etlJdbcJavaEnv -Djava.security.egd=/dev/./urandom` to the "Extra Database Parameters" input field and power the database on again.

### Very Slow Execution of Queries With SCRIPT_OUTPUT_ADDRESS

If `SCRIPT_OUTPUT_ADDRESS` is set as explained in the [debugging section](#debugging), verify that a service is actually listening at that address. Otherwise, if Exasol can not establish a connection, repeated connection attempts can be the cause for slowdowns.

### Very Slow Execution of Queries

Depending on which JDK version Exasol uses to execute Java user-defined functions, a blocking random-number source may be used by default. Especially cryptographic operations do not complete until the operating system has collected a sufficient amount of entropy. This problem seems to occur most often when Exasol is run in an isolated environment, e.g., a virtual machine or a container. A solution is to use a non-blocking random-number source.

To do so, log in to EXAOperation and shutdown the database. Append `-etlJdbcJavaEnv -Djava.security.egd=/dev/./urandom` to the "Extra Database Parameters" input field and power the database on again.

## Developing New Dialects

If you want to contribute a new dialect please visit the guide [how to develop and test a dialect](doc/develop-dialect.md).
If you want to contribute a new dialect please visit the guide [how to develop and test a dialect](doc/developing_an_sql_dialect.md).
Original file line number Diff line number Diff line change
@@ -1,56 +1,70 @@
## Deploying the Adapter Step By Step
# Deploying the Adapter Step By Step

Run the following steps to deploy your adapter:

### 1. Prerequisites:
* EXASOL >= 6.0
## Prerequisites

* Exasol Version 6.0 or later
* Advanced edition (which includes the ability to execute adapter scripts), or Free Small Business Edition
* EXASOL must be able to connect to the host and port specified in the JDBC connection string. In case of problems you can use a [UDF to test the connectivity](https://www.exasol.com/support/browse/SOL-307).
* If the JDBC driver requires Kerberos authentication (e.g. for Hive or Impala), the EXASOL database will authenticate using a keytab file. Each EXASOL node needs access to port 88 of the the Kerberos KDC (key distribution center).
* Exasol must be able to connect to the host and port specified in the JDBC connection string. In case of problems you can use a [UDF to test the connectivity](https://www.exasol.com/support/browse/SOL-307).
* If the JDBC driver requires Kerberos authentication (e.g. for Hive or Impala), the Exasol database will authenticate using a keytab file. Each Exasol node needs access to port 88 of the the Kerberos KDC (key distribution center).

### 2. Obtain Jar:
## Obtaining JAR Archives

First you have to obtain the so called fat jar (including all dependencies).
First you have to obtain the so called fat JAR (including all dependencies).

The easiest way is to download the jar from the last [Release](https://github.com/EXASOL/virtual-schemas/releases).
The easiest way is to download the JAR from the last [Release](https://github.com/Exasol/virtual-schemas/releases).

Alternatively you can clone the repository and build the jar as follows:
```
git clone https://github.com/EXASOL/virtual-schemas.git
Alternatively you can clone the repository and build the JAR as follows:

```bash
git clone https://github.com/Exasol/virtual-schemas.git
cd virtual-schemas/jdbc-adapter/
mvn clean -DskipTests package
```

The resulting fat jar is stored in `virtualschema-jdbc-adapter-dist/target/virtualschema-jdbc-adapter-dist-1.0.2-SNAPSHOT.jar`.
The resulting fat JAR is stored in `virtualschema-jdbc-adapter-dist/target/virtualschema-jdbc-adapter-dist-1.1.0.jar`.

### 3. Upload Adapter Jar
## Uploading the Adapter JAR Archive

You have to upload the jar of the adapter to a bucket of your choice in the EXASOL bucket file system (BucketFS). This will allow using the jar in the adapter script.
You have to upload the JAR of the adapter to a bucket of your choice in the Exasol bucket file system (BucketFS). This will allow using the jar in the adapter script.

Following steps are required to upload a file to a bucket:
* Make sure you have a bucket file system (BucketFS) and you know the port for either http or https. This can be done in EXAOperation under "EXABuckets". E.g. the id could be `bucketfs1` and the http port 2580.
* Check if you have a bucket in the BucketFS. Simply click on the name of the BucketFS in EXAOperation and add a bucket there, e.g. `bucket1`. Also make sure you know the write password. For simplicity we assume that the bucket is defined as a public bucket, i.e. it can be read by any script.
* Now upload the file into this bucket, e.g. using curl (adapt the hostname, BucketFS port, bucket name and bucket write password).
```
curl -X PUT -T virtualschema-jdbc-adapter-dist/target/virtualschema-jdbc-adapter-dist-1.0.2-SNAPSHOT.jar \
http://w:write-password@your.exasol.host.com:2580/bucket1/virtualschema-jdbc-adapter-dist-1.0.2-SNAPSHOT.jar

1. Make sure you have a bucket file system (BucketFS) and you know the port for either HTTP or HTTPS.

This can be done in EXAOperation under "EXABuckets". E.g. the id could be `bucketfs1` and the HTTP port 2580.

1. Check if you have a bucket in the BucketFS. Simply click on the name of the BucketFS in EXAOperation and add a bucket there, e.g. `bucket1`.

Also make sure you know the write password. For simplicity we assume that the bucket is defined as a public bucket, i.e. it can be read by any script.

1. Now upload the file into this bucket, e.g. using curl (adapt the hostname, BucketFS port, bucket name and bucket write password).

```bash
curl -X PUT -T virtualschema-jdbc-adapter-dist/target/virtualschema-jdbc-adapter-dist-1.1.0.jar \
http://w:write-password@your.exasol.host.com:2580/bucket1/virtualschema-jdbc-adapter-dist-1.1.0.jar
```

See chapter 3.6.4. "The synchronous cluster file system BucketFS" in the EXASolution User Manual for more details about BucketFS.

### 4. Upload JDBC Driver Files
## Deploying JDBC Driver Files

You have to upload the JDBC driver files of your remote database **twice**:

You have to upload the JDBC driver files of your remote database **two times**:
* Upload all files of the JDBC driver into a bucket of your choice, so that they can be accessed from the adapter script. This happens the same way as described above for the adapter jar. You can use the same bucket.
* Upload all files of the JDBC driver into a bucket of your choice, so that they can be accessed from the adapter script.
This happens the same way as described above for the adapter JAR. You can use the same bucket.
* Upload all files of the JDBC driver as a JDBC driver in EXAOperation
- In EXAOperation go to Software -> JDBC Drivers
- Add the JDBC driver by specifying the jdbc main class and the prefix of the JDBC connection string
- Add the JDBC driver by specifying the JDBC main class and the prefix of the JDBC connection string
- Upload all files (one by one) to the specific JDBC to the newly added JDBC driver.

Note that some JDBC drivers consist of several files and that you have to upload all of them. To find out which jar you need, consult the [supported dialects page](supported-dialects.md).
Note that some JDBC drivers consist of several files and that you have to upload all of them. To find out which JAR you need, consult the [supported dialects page](supported_sql_dialects.md).

## Deploying the Adapter Script

### 5. Deploy Adapter Script
Then run the following SQL commands to deploy the adapter in the database:

```sql
-- The adapter is simply a script. It has to be stored in any regular schema.
CREATE SCHEMA adapter;
Expand All @@ -61,7 +75,7 @@ CREATE JAVA ADAPTER SCRIPT adapter.jdbc_adapter AS

// This will add the adapter jar to the classpath so that it can be used inside the adapter script
// Replace the names of the bucketfs and the bucket with the ones you used.
%jar /buckets/your-bucket-fs/your-bucket/virtualschema-jdbc-adapter-dist-1.0.2-SNAPSHOT.jar;
%jar /buckets/your-bucket-fs/your-bucket/virtualschema-jdbc-adapter-dist-1.1.0.jar;

// You have to add all files of the data source jdbc driver here (e.g. Hive JDBC driver files)
%jar /buckets/your-bucket-fs/your-bucket/name-of-data-source-jdbc-driver.jar;
Expand Down
Loading