This utility wraps the Tableau SDK to parse CSV files and convert them into Tableau Data Extracts.
The Tableau Extract API is not thread-safe when inserting a row to the extract, but the work of parsing the CSV and generating a Tableau Row can be multi-threaded. This utility allows you to specify the number of threads to use when generating an extract. Inserting rows to the extract is synchronized, so there are diminishing returns to higher thread counts.
Included in the utility is a thin wrapper to publish an extract to Tableau.
The latest version is 1.2.1, which is available here.
On my small dual core Macbook Pro, I see the following performance:
1 Thread: 28,286 rows/second
2 Threads: 40,072 rows/second
3 Threads: 44,624 rows/second
On my 4-core 2015 iMac, I see the following performance:
1 Thread: 37,034 rows/s
2 Threads: 53,494 rows/s
3 Threads: 60,181 rows/s
4 Threads: 66,868 rows/s
I have tested this on CentOS 7, and OS X High Sierra. The Tableau SDK supports Fedora 18 and later, CentOS 7 and later, and Ubuntu 12.04 and later. Support for CentOS 6 was removed from the Tableau SDK in version 10.2 of the SDK.
If you encounter a problem such as java.lang.UnsatisfiedLinkError: Unable to load library TableauCommon: /lib64/libc.so.6: version 'GLIBC_2.14' not found
, your OS is unfortunately not supported. Consider running in a Docker container (see below).
I have not tested this on Windows. Certainly the various shell scripts will not work, but you should be able to install the SDK for Windows and invoke Java directly. Any pull requests to add better support for Windows would be appreciated.
There are public images for this project in Docker Hub. You can simple mount a Docker volume, and invoke this utility within the container. Note that Docker volumes must be an absolute path. Assuming you've cloned this project to a folder called "tableau" in your home directory, the command below will create a Tableau Extract using a sample CSV and schema:
PATH_TO_CLONED_PROJECT="~/tableau"
docker run --rm -it -v $PATH_TO_CLONED_PROJECT:/build jlmorton/tableau-sdk-wrapper:latest /opt/tableau-sdk-wrapper/bin/extract.sh \
-o /build/sample.tde \
-s /build/samples/sample-schema.json \
-f /build/samples/sample-extract.csv -t 4
This will download the latest image from Docker Hub, run a container, and attempt to build a TDE extract using the "sample.csv" and "sample.schema" within your ~/tableau folder.
This library uses the Tableau SDK to create and publish Tableau extracts. This SDK is not available in Central Maven repositories. The SDK license allows distribution, but I've chosen to exclude it from this repository.
Instead, there is a small shell script included in this repository, bin/install_tableau_sdk.sh
. This shell script will download the SDK, extract it to the lib
folder within the top-level repository directory, and then mvn install
the Java dependencies to your local Maven installation.
The utility requires Java 8.
Download the current release distribution and unzip. On Linux, run the install_tableau_sdk.sh
to install the Tableau SDK. Alternatively, be sure to install the Tableau SDK for your platformn manually.
After installing the Tableau SDK, simply run mvn install
This utility expects a schema file which describes the data types in the CSV file. The schema is in JSON format. The utility handles strings (CHAR_STRING
), booleans (BOOLEAN
), dates (DATE
), date & times (DATETIME
), integers (INTEGER
) and doubles (DOUBLE
).
Here is a sample schema file:
{
"schemaName": "Sample",
"schema": {
"foo": "CHAR_STRING",
"bar": "INTEGER",
"baz": "BOOLEAN",
"bax": "DOUBLE",
"test": "DATE",
"test_time": "DATETIME"
}
}
usage: java -jar tableau.jar
-a,--append Append to existing extract
-c,--project <arg> Project name to publish to
-d,--datasource <arg> Name of datasource to publish
-e,--extract <arg> Filename of extract to publish
-f,--file <arg> CSV file to import
-h,--help
-n,--username <arg> Tableau Server username for publishing
-o,--output <arg> Output file name, or name of existing extract in
append mode
-p,--publish Publish an extract to Tableau (requires
--extract, --site, --project, --datasource,
--username --password, and --url,
-s,--site <arg> Tableau site name to publish
-t,--threads <arg> Number of threads (default: 1)
-u,--url <arg> Tableau Server URL for publishing
-x,--password <arg> Tableau Server password for publishing`
./bin/extract.sh -o MyExtract.tde -s samples/test.schema -f samples/test.csv -t 2
./bin/publish.sh -e MyExtract.tde -u https://my-tableau-server -n username -x password -s tableau-site-name -p project-name -d datasource-name
Note: If you require using a proxy server to publish the extracts, the Tableau SDK resepects the standard http_proxy
and https_proxy
environment variables to specify the proxy server. The SDK also exposes hooks to set the proxy username and password, but this wrapper does not currently implement that.
The Tableau Server user used to publish the SDK must have permission to publish a datasource.
This software is licensed under the Apache 2.0 license.