# Using the Data Pool in SQL Server 2019 Big Data Clusters

## Step 1: Create the external data source for the data pool

In [1]:
USE sales;
GO
IF NOT EXISTS(SELECT * FROM sys.external_data_sources WHERE name = 'SqlDataPool')
    CREATE EXTERNAL DATA SOURCE SqlDataPool
    WITH (LOCATION = 'sqldatapool://controller-svc/default');
GO

## Step 2: Create an external table using the data pool data source
A table will created in each pod of the data pool matching this table schema. ROUND_ROBIN means rows will be placed in a round robin fashion in each pod of the pool

In [2]:
IF NOT EXISTS(SELECT * FROM sys.external_tables WHERE name = 'web_clickstream_clicks_data_pool')
    CREATE EXTERNAL TABLE [web_clickstream_clicks_data_pool]
    ("wcs_user_sk" BIGINT , "i_category_id" BIGINT , "clicks" BIGINT)
    WITH
    (
        DATA_SOURCE = SqlDataPool,
        DISTRIBUTION = ROUND_ROBIN
    );
GO

## Step 3: Ingest data in the data pool based on queries from HDFS joined with local SQL Server tables

In [3]:
INSERT INTO web_clickstream_clicks_data_pool
SELECT wcs_user_sk, i_category_id, COUNT_BIG(*) as clicks
  FROM sales.dbo.web_clickstreams_hdfs_csv
 INNER JOIN sales.dbo.item it ON (wcs_item_sk = i_item_sk
                        AND wcs_user_sk IS NOT NULL)
 GROUP BY wcs_user_sk, i_category_id
HAVING COUNT_BIG(*) > 100;
GO

## Step 4: Query data from the external table in the data pool


In [4]:
SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool];
GO
SELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool]
GO

(No column name)
3864


wcs_user_sk,i_category_id,clicks
18716,3,119
7705,2,258
55982,3,145
41825,3,169
43560,3,143
37106,3,165
21862,2,215
30814,2,198
34284,2,216
38517,2,198


## Step 5: Join data from the data pool external table with a local SQL Server table

In [6]:
SELECT TOP (100)
    w.wcs_user_sk,
    SUM( CASE WHEN i.i_category = 'Books' THEN w.clicks ELSE 0 END) AS book_category_clicks,
    SUM( CASE WHEN w.i_category_id = 1 THEN w.clicks ELSE 0 END) AS [Home & Kitchen],
    SUM( CASE WHEN w.i_category_id = 2 THEN w.clicks ELSE 0 END) AS [Music],
    SUM( CASE WHEN w.i_category_id = 3 THEN w.clicks ELSE 0 END) AS [Books],
    SUM( CASE WHEN w.i_category_id = 4 THEN w.clicks ELSE 0 END) AS [Clothing & Accessories],
    SUM( CASE WHEN w.i_category_id = 5 THEN w.clicks ELSE 0 END) AS [Electronics],
    SUM( CASE WHEN w.i_category_id = 6 THEN w.clicks ELSE 0 END) AS [Tools & Home Improvement],
    SUM( CASE WHEN w.i_category_id = 7 THEN w.clicks ELSE 0 END) AS [Toys & Games],
    SUM( CASE WHEN w.i_category_id = 8 THEN w.clicks ELSE 0 END) AS [Movies & TV],
    SUM( CASE WHEN w.i_category_id = 9 THEN w.clicks ELSE 0 END) AS [Sports & Outdoors]
  FROM [dbo].[web_clickstream_clicks_data_pool] as w
  INNER JOIN (SELECT DISTINCT i_category_id, i_category FROM item) as i
    ON i.i_category_id = w.i_category_id
GROUP BY w.wcs_user_sk;
GO

wcs_user_sk,book_category_clicks,Home & Kitchen,Music,Books,Clothing & Accessories,Electronics,Tools & Home Improvement,Toys & Games,Movies & TV,Sports & Outdoors
41825,338,0,482,338,0,0,0,0,0,0
94335,300,0,424,300,0,0,0,0,0,0
4667,338,0,470,338,0,0,0,0,0,0
36019,244,0,448,244,0,0,0,0,0,0
66777,334,0,468,334,0,0,0,0,0,0
41663,214,0,380,214,0,0,0,0,0,0
24738,240,0,352,240,0,0,0,0,0,0
1683,368,0,532,368,0,0,0,0,0,0
58642,308,0,482,308,0,0,0,0,0,0
94173,0,0,284,0,0,0,0,0,0,0
