diff --git a/docs/README.md b/docs/README.md index 31a55f8d..f3285d8c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,6 +11,7 @@ GRPC) * Log Sink * Bigquery Sink * Redis Sink +* Bigtable Sink Depot is a sink connector, which acts as a bridge between data processing systems and real sink. The APIs in this library can be used to push data to various sinks. Common sinks implementations will be added in this repo. diff --git a/docs/reference/configuration/README.md b/docs/reference/configuration/README.md index 642d1f34..859a854b 100644 --- a/docs/reference/configuration/README.md +++ b/docs/reference/configuration/README.md @@ -7,4 +7,6 @@ This page contains reference for all the configurations for sink connectors. * [Generic](generic.md) * [Stencil Client](stencil-client.md) * [Bigquery Sink](bigquery-sink.md) +* [Redis Sink](redis.md) +* [Bigtable Sink](bigtable.md) diff --git a/docs/reference/configuration/bigtable.md b/docs/reference/configuration/bigtable.md new file mode 100644 index 00000000..d99066a4 --- /dev/null +++ b/docs/reference/configuration/bigtable.md @@ -0,0 +1,61 @@ +# Bigtable Sink + +A Bigtable sink requires the following variables to be set along with Generic ones + +## `SINK_BIGTABLE_GOOGLE_CLOUD_PROJECT_ID` + +Contains information of google cloud project id of the bigtable table where the records need to be inserted/updated. Further +documentation on google cloud [project id](https://cloud.google.com/resource-manager/docs/creating-managing-projects). + +* Example value: `gcp-project-id` +* Type: `required` + +## `SINK_BIGTABLE_INSTANCE_ID` + +A Bigtable instance is a container for your data, which contain clusters that your applications can connect to. Each cluster contains nodes, compute units that manage your data and perform maintenance tasks. + +A table belongs to an instance, not to a cluster or node. Here you provide the name of that bigtable instance your table belongs to. Further +documentation on [bigtable Instances, clusters, and nodes](https://cloud.google.com/bigtable/docs/instances-clusters-nodes). + +* Example value: `cloud-bigtable-instance-id` +* Type: `required` + +## `SINK_BIGTABLE_CREDENTIAL_PATH` + +Full path of google cloud credentials file. Further documentation of google cloud authentication +and [credentials](https://cloud.google.com/docs/authentication/getting-started). + +* Example value: `/.secret/google-cloud-credentials.json` +* Type: `required` + +## `SINK_BIGTABLE_TABLE_ID` + +Bigtable stores data in massively scalable tables, each of which is a sorted key/value map. + +Here you provide the name of the table where the records need to be inserted/updated. Further documentation on +[bigtable tables](https://cloud.google.com/bigtable/docs/managing-tables). + +* Example value: `depot-sample-table` +* Type: `required` + +## `SINK_BIGTABLE_ROW_KEY_TEMPLATE` + +Bigtable tables are composed of rows, each of which typically describes a single entity. Each row is indexed by a single row key. + +Here you provide a string template which will be used to create row keys using one or many fields of your input data. Further documentation on [Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model). + +In the example below, If field_1 and field_2 are `String` and `Integer` data types respectively with values as `alpha` and `10` for a specific record, row key generated for this record will be: `key-alpha-10` + +* Example value: `key-%s-%d, field_1, field_2` +* Type: `required` + +## `SINK_BIGTABLE_COLUMN_FAMILY_MAPPING` + +Bigtable columns that are related to one another are typically grouped into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family. + +Here you provide the mapping of the table's `column families` and `qualifiers`, and the field names from input data that we intent to insert into the table. Further documentation on [Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model). + +Please note that `Column families` being provided in this configuration, need to exist in the table beforehand. While `Column Qualifiers` will be created if they don't exist. + +* Example value: `{ "depot-sample-family" : { "depot-sample-qualifier-1" : "field_1", "depot-sample-qualifier-2" : "field_7", "depot-sample-qualifier-3" : "field_5"} }` +* Type: `required` diff --git a/docs/reference/metrics.md b/docs/reference/metrics.md index 8b926388..2ee2bdb6 100644 --- a/docs/reference/metrics.md +++ b/docs/reference/metrics.md @@ -6,10 +6,11 @@ Sinks can have their own metrics, and they will be emmited while using sink conn ## Table of Contents * [Bigquery Sink](metrics.md#bigquery-sink) +* [Bigtable Sink](metrics.md#bigtable-sink) ## Bigquery Sink -### `Biquery Operation Total` +### `Bigquery Operation Total` Total number of bigquery API operation performed @@ -19,7 +20,20 @@ Time taken for bigquery API operation performed ### `Bigquery Errors Total` -Total numbers of error occurred on bigquery insert operation. +Total numbers of error occurred on bigquery insert operation +## Bigtable Sink + +### `Bigtable Operation Total` + +Total number of bigtable insert/update operation performed + +### `Bigtable Operation Latency` + +Time taken for bigtable insert/update operation performed + +### `Bigtable Errors Total` + +Total numbers of error occurred on bigtable insert/update operation diff --git a/docs/reference/odpf_sink_response.md b/docs/reference/odpf_sink_response.md index d2cf9880..1fe749f3 100644 --- a/docs/reference/odpf_sink_response.md +++ b/docs/reference/odpf_sink_response.md @@ -17,7 +17,8 @@ These errors are returned by sinks in the OdpfSinkResponse object. The error typ * UNKNOWN_FIELDS_ERROR * SINK_4XX_ERROR * SINK_5XX_ERROR +* SINK_RETRYABLE_ERROR * SINK_UNKNOWN_ERROR * DEFAULT_ERROR - * If no error is specified + * If no error is specified (To be deprecated soon) diff --git a/docs/sinks/bigtable.md b/docs/sinks/bigtable.md new file mode 100644 index 00000000..2bcb7f52 --- /dev/null +++ b/docs/sinks/bigtable.md @@ -0,0 +1,40 @@ +# Bigtable Sink + +## Overview +Depot Bigtable Sink translates protobuf messages to bigtable records and insert them to a bigtable table. Its other responsibilities include validating the provided [column-family-schema](../reference/configuration/bigtable.md#sink_bigtable_column_family_mapping), and check whether the configured table exists in [Bigtable instance](../reference/configuration/bigtable.md#sink_bigtable_instance_id) or not. + +Depot uses [Java Client Library for the Cloud Bigtable API](https://cloud.google.com/bigtable/docs/reference/libraries) to perform any operations on Bigtable. + +## Setup Required +To be able to insert/update records in Bigtable, One must have following setup in place: + +* [Bigtable Instance](../reference/configuration/bigtable.md#sink_bigtable_instance_id) belonging to the [GCP project](../reference/configuration/bigtable.md#sink_bigtable_google_cloud_project_id) provided in configuration +* Bigtable [Table](../reference/configuration/bigtable.md#sink_bigtable_table_id) where the records are supposed to be inserted/updated +* Column families that are provided as part of [column-family-mapping](../reference/configuration/bigtable.md#sink_bigtable_column_family_mapping) +* Google cloud [Bigtable IAM permission](https://cloud.google.com/bigtable/docs/access-control) required to access and modify the configured Bigtable Instance and Table + +## Metrics + +Check out the list of [metrics](../reference/metrics.md#bigtable-sink) captured under Bigtable Sink. + +## Error Handling + +[BigtableResponse](../../src/main/java/io/odpf/depot/bigtable/response/BigTableResponse.java) class have the list of failed [mutations](https://cloud.google.com/bigtable/docs/writes#write-types). [BigtableResponseParser](../../src/main/java/io/odpf/depot/bigtable/parser/BigTableResponseParser.java) looks for errors from each failed mutation and create [ErrorInfo](../../src/main/java/io/odpf/depot/error/ErrorInfo.java) objects based on the type/HttpStatusCode of the underlying error. This error info is then sent to the application. + +| Error From Bigtable | Error Type Captured | +| --------------- | -------------------- | +| Retryable Error | SINK_RETRYABLE_ERROR | +| Having status code in range 400-499 | SINK_4XX_ERROR | +| Having status code in range 500-599 | SINK_5XX_ERROR | +| Any other Error | SINK_UNKNOWN_ERROR | + +### Error Telemetry + +[BigtableResponseParser](../../src/main/java/io/odpf/depot/bigtable/parser/BigTableResponseParser.java) looks for any specific error types sent from Bigtable and capture those under [BigtableTotalErrorMetrics](../reference/metrics.md#bigtable-sink) with suitable error tags. + +| Error Type | Error Tag Assigned | +| --------------- | -------------------- | +| Bad Request | BAD_REQUEST | +| Quota Failure | QUOTA_FAILURE | +| Precondition Failure | PRECONDITION_FAILURE | +| Any other Error | RPC_FAILURE |