From 438994ef6548a86cf3ad21a6b8f92b015a1f27a2 Mon Sep 17 00:00:00 2001 From: stayseesong Date: Fri, 29 Oct 2021 12:15:41 -0700 Subject: [PATCH 1/8] [netlify-build] --- src/_data/sidenav/main.yml | 4 +- src/connections/storage/warehouses/faq.md | 31 ++---- .../storage/warehouses/selective-sync.md | 56 ---------- .../storage/warehouses/warehouse-syncs.md | 101 ++++++++++++++++++ src/guides/filtering-data.md | 17 +-- 5 files changed, 112 insertions(+), 97 deletions(-) delete mode 100644 src/connections/storage/warehouses/selective-sync.md create mode 100644 src/connections/storage/warehouses/warehouse-syncs.md diff --git a/src/_data/sidenav/main.yml b/src/_data/sidenav/main.yml index aa26d50db4..cd52ccc3f6 100644 --- a/src/_data/sidenav/main.yml +++ b/src/_data/sidenav/main.yml @@ -197,8 +197,8 @@ sections: title: Warehouse Overview - path: /connections/storage/warehouses/schema title: Warehouse Schemas - - path: /connections/storage/warehouses/selective-sync - title: Warehouse Selective Sync + - path: /connections/storage/warehouses/sync + title: Warehouse Syncs - path: /connections/storage/warehouses/health title: Warehouse Health Dashboards - path: /connections/storage/warehouses/choose-warehouse diff --git a/src/connections/storage/warehouses/faq.md b/src/connections/storage/warehouses/faq.md index 6383816f93..f3321ca348 100644 --- a/src/connections/storage/warehouses/faq.md +++ b/src/connections/storage/warehouses/faq.md @@ -5,14 +5,11 @@ redirect_from: '/connections/warehouses/faq/' ## Can I control what data is sent to my warehouse? -Yes! For those of you who are on our [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse. +Yes! For those of you who are on Segment's [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). -Selective Sync will help manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. Check out more information on how to use Selective Sync [here](https://segment.com/docs/guides/filtering-data/#warehouse-selective-sync). - -Once a source, collection or property is disabled, we no longer sync data from that source. We will not, however, delete any historical data from your warehouse. When a source is re-enabled, we will sync all events since the last sync. Note: This does not apply when a collection or property is re-enabled - Only new data generated after re-enabling a collection or property will sync to your warehouse. - -For Self-Serve and free customers, we do not currently support the ability to select which collections or properties sync to your warehouse. +Selective Sync helps manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. +Once a source, collection or property is disabled, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When a source is re-enabled, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse. ## Can we add, tweak, or delete some of the tables? @@ -47,27 +44,11 @@ Your warehouse id appears in the URL when you look at the [warehouse destination ## How fresh is the data in Segment Warehouses? -Your data will be available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience we need to balance all three of these. - -Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, our guarantee is that your data will be available in Redshift within 24 hours. - -As we improve and update our ETL processes and optimize for SQL query performance downstream, the actual load time will vary, but we'll ensure it's always within 24 hours. - -You can use the Sync History page to see the status and history of data updates in your warehouse. The Sync History page is available for every source connected to each warehouse. This page helps you answer questions like, "has the data from a specific source been updated recently?" "Did a sync completely fail, or only partially fail?" and "Why wasn't this sync successful?" - -The Sync History includes the following information: -- **Sync Status**: The possible statuses are: - - _Success_: Sync run completed without any notices and all rows synced, OR no rows synced because no data was found. - - _Partial_: Sync run completed with some notices and some rows synced. - - _Failure_: Sync run with some notices and no rows synced. -- **Start Time**: The time at which the sync began. Shown in your local timezone. -- **Duration**: Length of time this sync took. -- **Synced Rows**: Number of rows successfully synced from the sync run. -- **Notices**: A list of errors or warnings found, which could indicate problems with the sync run. Click a notice message to show details about the result, and any errors or warnings for each collection included in the sync run. +Your data will be available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these. -> info "" -> If a sync run shows a partial success or failure, the next sync attempts to syncing any data which was not successfully synced in the prior run. +Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours. +As Segment improves and updates the ETL processes and optimizes for SQL query performance downstream, the actual load time will vary, but Segment ensures it's always within 24 hours. ## What if I want to add custom data to my warehouse? diff --git a/src/connections/storage/warehouses/selective-sync.md b/src/connections/storage/warehouses/selective-sync.md deleted file mode 100644 index b55db2eec5..0000000000 --- a/src/connections/storage/warehouses/selective-sync.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: Warehouse Selective Sync -redirect_from: '/connections/warehouses/selective-sync/' ---- - -[Warehouse Selective Sync](/docs/connections/warehouses/faq/#can-i-control-what-data-is-sent-to-my-warehouse/) allows you to manage the data that you send to your warehouses. You can use this feature to stop syncing specific events (also known as collections) or properties that aren’t relevant, and could be slowing down your warehouse syncs. - -> info "" -> This feature is available to Business Tier customers only. - -With Selective Sync, you can customize which collections and properties from a source are sent to each warehouse. Previously, changes made using this feature were applied to all warehouses within a workspace. Now, instead of all changes affecting every warehouse in a workspace, you can use Selective Sync to decide which data should go to each individual warehouse. This allows you to send different data to each warehouse. - -This feature affects [warehouses](/docs/connections/storage/warehouses/), and does not prevent data from going to any other [destinations](/docs/connections/destinations/). - -> warning "" -> Note: For each warehouse only the first 5,000 collections per source and 5,000 properties per collection are visible in the Selective Sync user interface. Learn more about the limits [here](#selective-sync-user-interface-limits). - - -## When to use Selective Sync - -By default, all sources and their collections and properties are sent, and no data is prevented from reaching warehouses. - -When you disable sources, collections or properties using Selective Sync, Segment stops sending new data for these sources/collections/properties to your warehouse, however it doesn’t delete any existing data in the warehouse. If you later re-enable a source to begin syncing again, Segment loads all data that arrived since the last sync into the warehouse, but doesn’t backfill data that was omitted while these were disabled. Note: When a collection or property is re-enabled, data will only sync going forward, it will not be loaded from the last sync. - -## Enabling Selective Sync - -To use Selective Sync, go to the **Overview** page in the Segment App and select the warehouse you want to manage from the list of Destinations. - -From here, you can access the Selective Sync feature from two places within the app - from the warehouse level (which makes it quicker to manage multiple sources at once), or from the warehouse-to-source connection page, which is quicker if you only want to manage data from one source. - - -### Change sync settings to a single warehouse from multiple sources - -Click **Settings**, and click **Selective Sync** in the left menu. This may be valuable if you’re looking to make changes in bulk, such as when setting up a new warehouse. - -![](images/WH_SS_WH.png) - -### Change sync settings on a specific Warehouse to Source connection - -To manage data from one specific source to an individual warehouse, go to the Warehouse Overview page. Click the Schema (source) you want to manage, and click **Settings**. This can be valuable when are making smaller changes (for example, disabling all properties from one unnecessary collection). - -![](images/WH_SS_Source.png) - - -All changes made through Selective Sync only impact an individual warehouse - they do **not** propagate to multiple warehouses at once. To make changes to multiple warehouses, you need to enable/disable data for each individual warehouse. - -### Selective Sync User Interface Limits - -Regardless of schema size, for each warehouse only the first 5,000 collections per source and 5,000 properties per collection can be managed using the Selective Sync user interface. After you hit any of these limits, all future data is still tracked and sent to your warehouse. New collections created after hitting this limit is not displayed in the Selective Sync table. - -You will see a warning in the Selective Sync user interface when the warehouse schema has reached 80% of the limit for collections and/or properties. An error message will appear when you've reached the limit. - -Contact [Support](https://app.segment.com/help/contact/) to edit Selective Sync settings for any collections and/or properties which exceed the limit. - -> warning "" -> Note: Only Workspace Owners can change Selective Sync settings. diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md new file mode 100644 index 0000000000..df8d0e7074 --- /dev/null +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -0,0 +1,101 @@ +--- +title: Warehouse Syncs +redirect_from: '/connections/storage/warehouses/sync' +--- + +The Warehouse Sync process prepares the raw data coming from a source and loads it into a warehouse destination. There are two phases to the sync process: +1. **Preparation phase**: This is where Segment prepares data coming from a source. +2. **Loading phase**: This is where Segment loads data into the warehouse destination. + +Instead of constantly streaming data to the warehouse destination, Segment loads data to the warehouse in bulk at regular intervals. Before the data loads, Segment inserts and updates events and objects, and automatically adjusts the schema to make sure the data in the warehouse is inline with the data in Segment. + +Warehouses sync with all data coming from your source and your data is available in your warehouse within 24-48 hours. If you'd like to manage the data you send to your warehouse, use [Warehouse Selective Sync](#warehouse-selective-sync). + +## Sync History +You can use the Sync History page to see the status and history of data updates in your warehouse. The Sync History page is available for every source connected to each warehouse. This page helps you answer questions like, “Has the data from a specific source been updated recently?” “Did a sync completely fail, or only partially fail?” and “Why wasn’t this sync successful?” + +The Sync History includes the following information: + +* **Sync Status**: The possible statuses are: + * *Success*: Sync run completed without any notices and all rows synced, OR no rows synced because no data was found. + * *Partial*: Sync run completed with some notices and some rows synced. + * *Failure*: Sync run with some notices and no rows synced. +* **Start Time**: The time at which the sync began. This is shown in your local timezone. +* **Duration**: The length of time the sync took. +* **Synced Rows**: Number of rows successfully synced from the sync run. +* **Notices**: A list of errors or warnings found, which could indicate problems with the sync run. Click a notice message to show details about the result, and any errors or warnings for each collection included in the sync run. + +> info "" +> If a sync run shows a partial success or failure, the next sync attempts to sync any data that was not successfully synced in the prior run. + +### View the Sync History + +To view the Sync History: +1. Go to **Connections > Destinations** and choose the warehouse destination you want to view the sync history for. +2. Click the source you want to view the sync history for. +3. *(Optional)* Click on any of the rows in the Sync History table to see additional details related to that sync. You can view: + * The **Results** of your sync which shows the number of rows synced for each collection. + * The **Sync Duration** which shows the **Preparation** and **Loading** times of your sync. + +## Warehouse Selective Sync + +Warehouse Selective Sync allows you to manage the data that you send to your warehouses. You can use this feature to stop syncing specific events (also known as collections) or properties that aren’t relevant, and could be slowing down your warehouse syncs. + +> info "" +> This feature is only available to Business Tier customers.

You must be a Workspace Owner to change Selective Sync settings. + +With Selective Sync, you can customize which collections and properties from a source are sent to each warehouse. This helps you manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. + +> note "" +> **NOTE:** This feature only affects [warehouses](/docs/connections/storage/warehouses/), and doesn't prevent data from going to any other [destinations](/docs/connections/destinations/). + +Once a source, collection, or property is disabled, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When a source is re-enabled, Segment syncs all events since the last sync. This does not apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property syncs to your warehouse. + +> warning "" +> For each warehouse only the first 5,000 collections per source and 5,000 properties per collection are visible in the Selective Sync user interface. [Learn more about the limits](#selective-sync-user-interface-limits). + +### When to use Selective Sync + +By default, all sources and their collections and properties are sent, and no data is prevented from reaching warehouses. + +When you disable sources, collections, or properties using Selective Sync, Segment stops sending new data for these sources, collections, or properties to your warehouse. It doesn’t delete any existing data in the warehouse. + +If you choose to re-enable a source to begin syncing again, Segment loads all data that arrived since the last sync into the warehouse, but doesn’t backfill data that was omitted while these were disabled. When a collection or property is re-enabled, data only syncs going forward. It will not be loaded from the last sync. + +### Enable Selective Sync + +To use Selective Sync: +1. Go to **Connections > Destinations** and select the warehouse you want to enable Selective Sync for. +2. Click the **Settings** tab and click **Selective Sync** in the left menu. +3. Select which sources, collections, and properties to sync. All that is not selected won't be synced to your warehouse. +4. Click **Save Changes**. + +### Change sync settings to a single warehouse from multiple sources + +To change the sync settings to a single warehouse from multiple sources, follow the same steps as [above](#enable-selective-sync). + +This may be valuable if you’re looking to make changes in bulk, such as when setting up a new warehouse. + + +### Change sync settings on a specific Warehouse to Source connection + +To manage data from one specific source to an individual warehouse: +1. Go to **Connections > Destinations** and select the warehouse you want to change the sync settings for. +2. On the **Warehouse Overview** page, click the **Schema** you want to change the sync settings for. +3. On the **Settings** tab of the **Sync History** page for that source, select the data you want synced to your warehouse, or deselect the data you don't want synced. + +This may be valuable when you're making smaller changes, for example, disabling all properties from one unnecessary collection. + +> info "" +> All changes made through Selective Sync only impact an individual warehouse. They don't impact multiple warehouses at once. To make changes to multiple warehouses, you need to enable/disable data for each individual warehouse. + +### Selective Sync User Interface Limits + +Regardless of schema size, for each warehouse only the first 5,000 collections per source and 5,000 properties per collection can be managed using the Selective Sync user interface. After you hit any of these limits, all future data is still tracked and sent to your warehouse. New collections created after hitting this limit is not displayed in the Selective Sync table. + +You will see a warning in the Selective Sync user interface when the warehouse schema has reached 80% of the limit for collections and/or properties. An error message will appear when you've reached the limit. + +Contact [Support](https://app.segment.com/help/contact/) to edit Selective Sync settings for any collections and/or properties which exceed the limit. + +> warning "" +> Only Workspace Owners can change Selective Sync settings. diff --git a/src/guides/filtering-data.md b/src/guides/filtering-data.md index 29e8fa0755..02a18d0dca 100644 --- a/src/guides/filtering-data.md +++ b/src/guides/filtering-data.md @@ -100,25 +100,14 @@ If you have Protocols in your workspace, **and** have a tracking plan associated ## Warehouse Selective Sync -[Warehouse Selective Sync](/docs/connections/storage/warehouses/selective-sync/) allows you to stop sending specific data to specific warehouses. You can use this to stop syncing specific events or properties that aren’t relevant, and which could be slowing down your warehouse syncs. +Warehouse Selective Sync allows you to stop sending specific data to specific warehouses. You can use this to stop syncing specific events or properties that aren’t relevant, and could be slowing down your warehouse syncs. See the [Warehouse Selective Sync documentation](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync) to learn more. > info "" -> This feature is available to Business Tier customers only, and only Workspace Owners can change Selective Sync settings. - -By default, a warehouse receives all sources and their collections and properties. No data is prevented from reaching warehouses. With Selective Sync, you can configure which collections and properties from a source are sent to each warehouse. This allows you to send different sets of data to each warehouse. This also means that you need to enable or disable data for each individual warehouse. - -This feature only affects [warehouses](/docs/connections/storage/warehouses/), and does not prevent data from going to any other [destinations](/docs/connections/destinations/). - -When you use Selective Sync to prevent data from syncing to a specific warehouse, Segment stops sending new data that meets the selection criteria to that warehouse, however it doesn't delete any existing data in the warehouses. If you use Selective Sync to re-enable a source after disabling it, Segment loads all data that arrived since the last sync into the warehouse, but doesn't backfill data that was omitted while the source was not syncing. Re-enabling a collection or property does **not** backfill any historical data -- only new data generated after re-enabling will be synced to your warehouse. - -To enable selective sync, in the Segment app go to the Destinations page, select the warehouse, click **Settings**, and click **Selective sync** in the left menu. -See the documentation on [Warehouse Selective Sync](/docs/connections/storage/warehouses/selective-sync/) for more details. - -![](images/warehouse-selective-sync.png) +> This feature is only available to Business Tier customers, and you must be a Workspace Owner to change Selective Sync settings. ## Privacy Portal filtering -The [Privacy Portal](/docs/privacy/portal/) is available to all Segment customers, because we believe that data privacy is a right, and that anyone collecting data should have tools to help ensure their users' privacy. More enhancements are available to BT customers who may need tools for managing complex implementations. +The [Privacy Portal](/docs/privacy/portal/) is available to all Segment customers, because Segment believes that data privacy is a right, and that anyone collecting data should have tools to help ensure their users' privacy. More enhancements are available to BT customers who may need tools for managing complex implementations. The Privacy Portal tools allow you to inspect your incoming calls and their payloads, detect potential Personally Identifiable Information (PII) in properties using matchers, classify the information by different categories of risk, and use those categories to determine which Destinations may or may not receive the data. Learn more about these features in the [Privacy Portal documentation](/docs/privacy/portal/). From dac13db50445a225bbd073ec3070633f8abf77c5 Mon Sep 17 00:00:00 2001 From: stayseesong <83784848+stayseesong@users.noreply.github.com> Date: Fri, 29 Oct 2021 12:22:45 -0700 Subject: [PATCH 2/8] Update main.yml --- src/_data/sidenav/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/_data/sidenav/main.yml b/src/_data/sidenav/main.yml index cd52ccc3f6..85ff91dffc 100644 --- a/src/_data/sidenav/main.yml +++ b/src/_data/sidenav/main.yml @@ -197,7 +197,7 @@ sections: title: Warehouse Overview - path: /connections/storage/warehouses/schema title: Warehouse Schemas - - path: /connections/storage/warehouses/sync + - path: /connections/storage/warehouses/warehouse-syncs title: Warehouse Syncs - path: /connections/storage/warehouses/health title: Warehouse Health Dashboards From a2ea2c41844a740d6c1a21e4c165d2c241cdd9f5 Mon Sep 17 00:00:00 2001 From: stayseesong <83784848+stayseesong@users.noreply.github.com> Date: Fri, 29 Oct 2021 12:23:36 -0700 Subject: [PATCH 3/8] Update warehouse-syncs.md --- src/connections/storage/warehouses/warehouse-syncs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md index df8d0e7074..bf24005a48 100644 --- a/src/connections/storage/warehouses/warehouse-syncs.md +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -1,6 +1,6 @@ --- title: Warehouse Syncs -redirect_from: '/connections/storage/warehouses/sync' +redirect_from: '/connections/storage/warehouses/warehouse-syncs' --- The Warehouse Sync process prepares the raw data coming from a source and loads it into a warehouse destination. There are two phases to the sync process: From e900e895b1bec562d9d0ae12bca263ae5faa79f1 Mon Sep 17 00:00:00 2001 From: stayseesong Date: Fri, 29 Oct 2021 12:34:01 -0700 Subject: [PATCH 4/8] [netlify-build] --- src/connections/storage/warehouses/warehouse-syncs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md index bf24005a48..e1a6c2f685 100644 --- a/src/connections/storage/warehouses/warehouse-syncs.md +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -17,9 +17,9 @@ You can use the Sync History page to see the status and history of data updates The Sync History includes the following information: * **Sync Status**: The possible statuses are: - * *Success*: Sync run completed without any notices and all rows synced, OR no rows synced because no data was found. - * *Partial*: Sync run completed with some notices and some rows synced. - * *Failure*: Sync run with some notices and no rows synced. + * *Success*: The sync run completed without any notices and all rows synced, OR no rows synced because no data was found. + * *Partial*: The sync run completed with some notices and some rows synced. + * *Failure*: The sync run completed with some notices and no rows synced. * **Start Time**: The time at which the sync began. This is shown in your local timezone. * **Duration**: The length of time the sync took. * **Synced Rows**: Number of rows successfully synced from the sync run. From e175f8f029fd87c325d7e4d26b6be50b675f6067 Mon Sep 17 00:00:00 2001 From: stayseesong Date: Fri, 29 Oct 2021 13:48:37 -0700 Subject: [PATCH 5/8] [netlify-build] --- src/connections/storage/warehouses/warehouse-syncs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md index e1a6c2f685..895de82c0b 100644 --- a/src/connections/storage/warehouses/warehouse-syncs.md +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -1,6 +1,6 @@ --- title: Warehouse Syncs -redirect_from: '/connections/storage/warehouses/warehouse-syncs' +redirect_from: '/connections/warehouses/selective-sync/' --- The Warehouse Sync process prepares the raw data coming from a source and loads it into a warehouse destination. There are two phases to the sync process: From 73b0232edd170d86bf78ebf19ebcde98942430e1 Mon Sep 17 00:00:00 2001 From: stayseesong <83784848+stayseesong@users.noreply.github.com> Date: Mon, 1 Nov 2021 09:04:41 -0700 Subject: [PATCH 6/8] Apply suggestions from code review Co-authored-by: pwseg <86626706+pwseg@users.noreply.github.com> --- src/connections/storage/warehouses/faq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/connections/storage/warehouses/faq.md b/src/connections/storage/warehouses/faq.md index f3321ca348..de715e0ac5 100644 --- a/src/connections/storage/warehouses/faq.md +++ b/src/connections/storage/warehouses/faq.md @@ -5,7 +5,7 @@ redirect_from: '/connections/warehouses/faq/' ## Can I control what data is sent to my warehouse? -Yes! For those of you who are on Segment's [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). +Yes. For those of you who are on Segment's [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). Selective Sync helps manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. @@ -46,7 +46,7 @@ Your warehouse id appears in the URL when you look at the [warehouse destination Your data will be available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these. -Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours. +Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours. As Segment improves and updates the ETL processes and optimizes for SQL query performance downstream, the actual load time will vary, but Segment ensures it's always within 24 hours. From 4b7de899753698d00708c9ce42f009b16997bbb8 Mon Sep 17 00:00:00 2001 From: stayseesong <83784848+stayseesong@users.noreply.github.com> Date: Mon, 1 Nov 2021 17:33:28 -0700 Subject: [PATCH 7/8] Apply suggestions from code review Co-authored-by: markzegarelli --- src/connections/storage/warehouses/faq.md | 8 ++++---- src/connections/storage/warehouses/warehouse-syncs.md | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/connections/storage/warehouses/faq.md b/src/connections/storage/warehouses/faq.md index de715e0ac5..a75db02d98 100644 --- a/src/connections/storage/warehouses/faq.md +++ b/src/connections/storage/warehouses/faq.md @@ -5,11 +5,11 @@ redirect_from: '/connections/warehouses/faq/' ## Can I control what data is sent to my warehouse? -Yes. For those of you who are on Segment's [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). +Yes. Customers on Segment's [Business plan](https://segment.com/pricing) can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). -Selective Sync helps manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. +Selective Sync helps manage the data Segment sends to each warehouse, allowing you to sync different sets of data from the same source to different warehouses. -Once a source, collection or property is disabled, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When a source is re-enabled, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse. +When you disable a source, collection or property, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When you re-enable a source, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse. ## Can we add, tweak, or delete some of the tables? @@ -44,7 +44,7 @@ Your warehouse id appears in the URL when you look at the [warehouse destination ## How fresh is the data in Segment Warehouses? -Your data will be available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these. +Data is available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these. Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours. diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md index 895de82c0b..edee92e1b5 100644 --- a/src/connections/storage/warehouses/warehouse-syncs.md +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -39,17 +39,17 @@ To view the Sync History: ## Warehouse Selective Sync -Warehouse Selective Sync allows you to manage the data that you send to your warehouses. You can use this feature to stop syncing specific events (also known as collections) or properties that aren’t relevant, and could be slowing down your warehouse syncs. +Warehouse Selective Sync allows you to manage the data that you send to your warehouses. You can use this feature to stop syncing specific events (also known as collections) or properties that aren’t relevant, and may slow down your warehouse syncs. > info "" > This feature is only available to Business Tier customers.

You must be a Workspace Owner to change Selective Sync settings. -With Selective Sync, you can customize which collections and properties from a source are sent to each warehouse. This helps you manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. +With Selective Sync, you can customize which collections and properties from a source are sent to each warehouse. This helps you control the data that is sent to each warehouse, allowing you to sync different sets of data from the same source to different warehouses. > note "" > **NOTE:** This feature only affects [warehouses](/docs/connections/storage/warehouses/), and doesn't prevent data from going to any other [destinations](/docs/connections/destinations/). -Once a source, collection, or property is disabled, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When a source is re-enabled, Segment syncs all events since the last sync. This does not apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property syncs to your warehouse. +When you disable a source, collection or property, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When you re-enable a source, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse. > warning "" > For each warehouse only the first 5,000 collections per source and 5,000 properties per collection are visible in the Selective Sync user interface. [Learn more about the limits](#selective-sync-user-interface-limits). From 634e6391105d8faad9d1fdc524f09c6b4e74e983 Mon Sep 17 00:00:00 2001 From: stayseesong <83784848+stayseesong@users.noreply.github.com> Date: Wed, 3 Nov 2021 15:10:57 -0700 Subject: [PATCH 8/8] [netlify-build] --- src/connections/storage/warehouses/warehouse-syncs.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/connections/storage/warehouses/warehouse-syncs.md b/src/connections/storage/warehouses/warehouse-syncs.md index edee92e1b5..7107498667 100644 --- a/src/connections/storage/warehouses/warehouse-syncs.md +++ b/src/connections/storage/warehouses/warehouse-syncs.md @@ -4,8 +4,8 @@ redirect_from: '/connections/warehouses/selective-sync/' --- The Warehouse Sync process prepares the raw data coming from a source and loads it into a warehouse destination. There are two phases to the sync process: -1. **Preparation phase**: This is where Segment prepares data coming from a source. -2. **Loading phase**: This is where Segment loads data into the warehouse destination. +1. **Preparation phase**: This is where Segment prepares the data coming from a source so that it's in the right format for the loading phase. +2. **Loading phase**: This is where Segment deduplicates data and the data loads into the warehouse destination. Any sync issues that occur in this phase can be traced back to your warehouse. Instead of constantly streaming data to the warehouse destination, Segment loads data to the warehouse in bulk at regular intervals. Before the data loads, Segment inserts and updates events and objects, and automatically adjusts the schema to make sure the data in the warehouse is inline with the data in Segment.