2. Source Specification

A source in Tightlock represents the location of the data that is going to be processed by a destination. The accompanying destination is the one responsible for defining the data schema that needs to be present in the corresponding source. For more information about data schemas necessary for each connection available in Tightlock, see the destination specification article.

See that all sources require a unique_id field that is used by Tightlock to guarantee batch consistency. When creating a source, you can specify unique_id to be the column name of any guaranteed distinct field that is present in the data.

unique_id defaults to the column name "id" for all available data sources, so make sure to add an "id" column and populate it with unique identifiers for your data in case you don't use a custom field.

You can find below the necessary fields for each type of source available:

Big Query

Config

Name	Type	Description
dataset	str	The name of your BigQuery dataset.
table	str	The name of your BigQuery table.
credentials	str	The full credentials service-account JSON string. Not needed if your backend is located in the same GCP project as the BigQuery table.
unique_id	str	Unique id column name to be used by BigQuery. Defaults to 'id' when nothing is provided.

Google Cloud Storage

NOTE: GCS source supports typical Hadoop-supported file types (CSV, JSON, Avro, Parquet etc). If using the csv format, use the .csvh extension so that headers can be taken into account (see sample_data directory for reference).

Config

Name	Type	Description
bucket_name	str	The name of the GCS bucket that contains your data.
location*	str	The name of the GCS bucket folder that contains your data.
unique_id	str	Unique id column name to be used by GCS source engine. Defaults to 'id' when nothing is provided.

*: All files in the location folder will compose the "table". That means that you can partition table between multiple files inside of a folder, as long as they all have the same structure.

AWS S3

NOTE: S3 source supports typical Hadoop-supported file types (CSV, JSON, Avro, Parquet etc). If using the csv format, use the .csvh extension so that headers can be taken into account (see sample_data directory for reference).

Config

Name	Type	Description
bucket_name	str	The name of the S3 bucket that contains your data.
location*	str	The name of the S3 bucket folder that contains your data.
secret_key	str	Optional AWS secret key (only needed when running Tightlock on a separate cloud environment).
access_key	str	Optional AWS access key (only needed when running Tightlock on a separate cloud environment).
unique_id	str	Unique id column name to be used by S3 source engine. Defaults to 'id' when nothing is provided.

*: All files in the location folder will compose the "table". That means that you can partition table between multiple files inside of a folder, as long as they all have the same structure.

Local File

NOTE: Local File source type is usually used for development or testing purposes. This source type only has access to files located at the sample_data directory, so only files deployed alongside the code can be referenced in configuration creation time.

Config

Name	Type	Description
location	str	The path to your local file, relative to the container 'data' folder (which is mapped to the host 'sample_data' folder)
unique_id	str	Unique id column name to be used by local file engine. Defaults to 'id' when nothing is provided.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Source Specification

Big Query

Config

Google Cloud Storage

Config

AWS S3

Config

Local File

Config

Clone this wiki locally