From 49b2ddae68a3c31e1b60b662f37510474c9b6c3b Mon Sep 17 00:00:00 2001 From: forstisabella <92472883+forstisabella@users.noreply.github.com> Date: Mon, 8 Nov 2021 12:04:59 -0500 Subject: [PATCH 1/4] DOC-361 Fixing referrer link in warehouses index file --- src/connections/storage/warehouses/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/warehouses/index.md b/src/connections/storage/warehouses/index.md index 2d189a4182..9864ea146a 100644 --- a/src/connections/storage/warehouses/index.md +++ b/src/connections/storage/warehouses/index.md @@ -24,7 +24,7 @@ Examples of data warehouses include Amazon Redshift, Google BigQuery, and Postgr > info "Looking for the Warehouse Schemas docs?" > They've moved! Check them out [here](schema/). -{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/&referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %} +{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/?referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %} ### More Help From 1d7face2ebd5b592a9f0661bc5346b5b829b1f99 Mon Sep 17 00:00:00 2001 From: forstisabella <92472883+forstisabella@users.noreply.github.com> Date: Thu, 18 Nov 2021 11:43:56 -0500 Subject: [PATCH 2/4] DOC-361 First pass of HTML table in the doc --- src/connections/storage/warehouses/schema.md | 113 ++++++++++++++++++- 1 file changed, 109 insertions(+), 4 deletions(-) diff --git a/src/connections/storage/warehouses/schema.md b/src/connections/storage/warehouses/schema.md index 288968c3d0..1daf3a615d 100644 --- a/src/connections/storage/warehouses/schema.md +++ b/src/connections/storage/warehouses/schema.md @@ -229,10 +229,115 @@ AND table_name = '' ORDER by column_name ``` -> info "Note" -> If you send us an array, we stringify it in Redshift. That way you don't end up having to pollute your events. It won't work if you have a lot of array elements but should work decently to store and query those. We also flatten nested objects.  +### How event tables handle nested objects and arrays + +In order to preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Value Type Field Type Transformation Schema (Example) Code (Example)
Object Context Flatten + +``` json +context: { + app: { + version: "1.0.0" + } +} +``` + + + Column Name:
+ context_app_version +

+ Value:
+ "1.0.0" +
Traits Flatten + +```json +traits: { + address: { + street: "6th Street" + } +} +``` + +Column Name:
+address_street
+
+Value:
+"6th Street" +
Properties Stringify + +```json +properties: { + product_id: { + sku: "G-32" + } +} +``` + + + Column Name:
+ product_id

+ Value:
+ "{sku.'G-32'}" +
+ Array +AnyStringify + +```json +products: { + product_id: [ + "507f1f77bcf86cd799439011", "505bd76785ebb509fc183733" + ] +} +``` + + Column Name:
+ product_id

+ Value: + "[507f1f77bcf86cd799439011, 505bd76785ebb509fc183733]" +
## Tracks vs. Events Tables @@ -303,7 +408,7 @@ New event properties and traits create columns. Segment processes the incoming d When Segment process a new batch and discover a new column to add, we take the most recent occurrence of a column and choose its datatype. -The datatypes that we support right now are:  +The data types that we currently support include:  - `timestamp` - `integer`  @@ -325,7 +430,7 @@ All four timestamps pass through to your Warehouse for every ETL'd event. In mos `timestamp` is the UTC-converted timestamp which is set by the Segment library. If you are importing historical events using a server-side library, this is the timestamp you'll want to reference in your queries. -`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabed as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column. +`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabeled as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column. `sent_at` is the UTC timestamp set by library when the Segment API call was sent. This timestamp can also be affected by device clock skew. From 8d42cf033176af9cc62db1b6c9969d4c712145ca Mon Sep 17 00:00:00 2001 From: forstisabella <92472883+forstisabella@users.noreply.github.com> Date: Thu, 18 Nov 2021 12:45:55 -0500 Subject: [PATCH 3/4] DOC-361 Fixed table to fit on the page --- src/connections/storage/warehouses/schema.md | 62 ++++++++------------ 1 file changed, 24 insertions(+), 38 deletions(-) diff --git a/src/connections/storage/warehouses/schema.md b/src/connections/storage/warehouses/schema.md index 1daf3a615d..f452e5ef51 100644 --- a/src/connections/storage/warehouses/schema.md +++ b/src/connections/storage/warehouses/schema.md @@ -233,44 +233,37 @@ ORDER by column_name In order to preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables: - +
- - - - + + - - - - - + + + - - - - + - - - - + - - - - +
Value Type Field Type Transformation Schema (Example) Field Code (Example) Schema (Example)
Object Context Flatten - -``` json -context: { - app: { - version: "1.0.0" - } -} -``` - - + Object (Context): Flatten + + ``` json + context: { + app: { + version: "1.0.0" + } + } + ``` + Column Name:
context_app_version

Value:
"1.0.0" -
Traits Flatten + Object (Traits): Flatten ```json traits: { @@ -291,10 +284,8 @@ address_street
Properties Stringify +Object (Properties): Stringify ```json properties: { @@ -303,7 +294,6 @@ properties: { } } ``` - Column Name:
@@ -314,17 +304,13 @@ properties: {
- Array -AnyStringify +Array (Any): Stringify ```json products: { product_id: [ - "507f1f77bcf86cd799439011", "505bd76785ebb509fc183733" + "507f1", "505bd" ] } ``` @@ -334,7 +320,7 @@ products: { Column Name:
product_id

Value: - "[507f1f77bcf86cd799439011, 505bd76785ebb509fc183733]" + "[507f1, 505bd]"
From 457fece6ae86977c2989a12381e26198f4889d60 Mon Sep 17 00:00:00 2001 From: forstisabella <92472883+forstisabella@users.noreply.github.com> Date: Thu, 18 Nov 2021 15:02:33 -0500 Subject: [PATCH 4/4] Making changes requested in code review! --- src/connections/storage/warehouses/schema.md | 34 +++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/src/connections/storage/warehouses/schema.md b/src/connections/storage/warehouses/schema.md index f452e5ef51..ef82468ad7 100644 --- a/src/connections/storage/warehouses/schema.md +++ b/src/connections/storage/warehouses/schema.md @@ -231,26 +231,28 @@ ORDER by column_name ### How event tables handle nested objects and arrays -In order to preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables: +To preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables: + + @@ -309,9 +311,9 @@ properties: { ```json products: { - product_id: [ - "507f1", "505bd" - ] + product_id: [ + "507f1", "505bd" + ] } ```
Field Code (Example) Schema (Example)
Object (Context): Flatten - ``` json - context: { - app: { - version: "1.0.0" - } +``` json +context: { + app: { + version: "1.0.0" } - ``` +} +``` Column Name:
@@ -267,9 +269,9 @@ In order to preserve the quality of your events data, Segment uses the following ```json traits: { - address: { - street: "6th Street" - } + address: { + street: "6th Street" + } } ``` @@ -289,9 +291,9 @@ address_street
```json properties: { - product_id: { - sku: "G-32" - } + product_id: { + sku: "G-32" + } } ```