Skip to content

Added object/array table to Warehouse Schema doc #2162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/connections/storage/warehouses/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Examples of data warehouses include Amazon Redshift, Google BigQuery, and Postgr
> info "Looking for the Warehouse Schemas docs?"
> They've moved! Check them out [here](schema/).

{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/&referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %}
{% include components/reference-button.html href="https://segment.com/academy/intro/when-to-use-sql-for-analysis/?referrer=docs" icon="media/academy.svg" title="Analytics Academy: When to use SQL for analysis" description="When your existing analytics tools can't answer your questions, it's time to level-up and use SQL for analysis." %}

### More Help

Expand Down
101 changes: 97 additions & 4 deletions src/connections/storage/warehouses/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,103 @@ AND table_name = '<event>'
ORDER by column_name
```

> info "Note"
> If you send us an array, we stringify it in Redshift. That way you don't end up having to pollute your events. It won't work if you have a lot of array elements but should work decently to store and query those. We also flatten nested objects. 
### How event tables handle nested objects and arrays

To preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables:

<table>
<thead>
<tr>
<th> Field </th>
<th> Code (Example) </th>
<th> Schema (Example) </th>
</tr>
</thead>

<tr>
<td><b>Object (Context):</b> Flatten </td>
<td markdown="1">

``` json
context: {
app: {
version: "1.0.0"
}
}
```
</td>
<td>
<b>Column Name:</b><br/>
context_app_version
<br/><br/>
<b>Value:</b><br/>
"1.0.0"
</td>
</tr>

<tr>
<td> <b>Object (Traits):</b> Flatten </td>
<td markdown= "1">

```json
traits: {
address: {
street: "6th Street"
}
}
```

</td>
<td>
<b>Column Name:</b><br/>
address_street<br/>
<br/>
<b>Value:</b><br/>
"6th Street"
</td>
</tr>

<tr>
<td><b>Object (Properties):</b> Stringify</td>
<td markdown="1">

```json
properties: {
product_id: {
sku: "G-32"
}
}
```
</td>
<td>
<b>Column Name:</b><br/>
product_id<br/><br/>
<b>Value:</b><br/>
"{sku.'G-32'}"
</td>
</tr>

<tr>
<td><b>Array (Any):</b> Stringify</td>
<td markdown="1">

```json
products: {
product_id: [
"507f1", "505bd"
]
}
```

</td>
<td>
<b>Column Name:</b> <br/>
product_id <br/><br/>
<b>Value:</b>
"[507f1, 505bd]"
</td>
</tr>
</table>

## Tracks vs. Events Tables

Expand Down Expand Up @@ -303,7 +396,7 @@ New event properties and traits create columns. Segment processes the incoming d

When Segment process a new batch and discover a new column to add, we take the most recent occurrence of a column and choose its datatype.

The datatypes that we support right now are
The data types that we currently support include

- `timestamp`
- `integer` 
Expand All @@ -325,7 +418,7 @@ All four timestamps pass through to your Warehouse for every ETL'd event. In mos

`timestamp` is the UTC-converted timestamp which is set by the Segment library. If you are importing historical events using a server-side library, this is the timestamp you'll want to reference in your queries.

`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabed as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.
`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabeled as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.

`sent_at` is the UTC timestamp set by library when the Segment API call was sent. This timestamp can also be affected by device clock skew.

Expand Down