# Enable PolyBase
PolyBase must be enabled on each database that it is going to be used from.


In [1]:
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;

Check to see if PolyBase is installed and running.

In [2]:
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;  

IsPolyBaseInstalled
1


## Master Key  

The first thing you have to do to setup PolyBase is to create a master key. The master key is created on a per database basis. It is recommended that you use a different master key for each database.

In [3]:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'abcdefg123456!@#';
GO

## Scoped Credentials

In order for PolyBase to access an external database that requires authentication, a user will need to be set up in the external database. Typically the user only needs read level access. There are some data sources where PolyBase can be used to write as well, but in general PolyBase is only used for reading data.

  

Once the user is created in the external system, you will need to create database scoped credentials using that information.

In [4]:
CREATE DATABASE SCOPED CREDENTIAL AwUser WITH IDENTITY = 'aw_user', SECRET = 'Password123!';
GO

-- Azure Data Lake Store
--CREATE DATABASE SCOPED CREDENTIAL ADL_User WITH IDENTITY = '<client_id>@\<OAuth_2.0_Token_EndPoint>', SECRET = '<key>';
--GO

## External Data Source

We will use our scoped credentials to create an external data source. The external data source in SQL Server stores the metadata about the external data source such as what type of data source it is and what credentials to use to connect to it if any are required.

### SQL Server

#### Default Instance


In [None]:
CREATE EXTERNAL DATA SOURCE [AdvWrksDB]
WITH (
    LOCATION = N'sqlserver://localhost', 
    PUSHDOWN = ON, -- On by default
    CREDENTIAL = [AwUser]
);

: Msg 10061, Level 16, State 1, Line 0
TCP Provider: No connection could be made because the target machine actively refused it.


## External Tables

### SQL Server


In [None]:
CREATE EXTERNAL TABLE [dbo].[local_table]
(
    [table_id] INT
)
WITH (
    LOCATION = N'[AdventureWorks2019].[Sales].[SalesOrderDetail]', 
    DATA_SOURCE = [AdvWrksDB]
);

CREATE EXTERNAL TABLE [local_schema].[local_table]
(
    [table_id] INT
)
WITH (
    LOCATION = N'[AdventureWorks2019].[Sales].[SalesOrderDetail]', 
    DATA_SOURCE = [AdvWrksDB]
);

## Statistics

Statistics are used to help SQL server be able to query remote systems better through PolyBase. In most simple terms statistics builds a SQL Server side catalog of information about the remote table. Usually you will create statistics on the index columns of tables you are querying so that SQL Server can store informaiton locally about that index column. I have seen some situations where this has dramatically increased the performance of External Table Queries, and other times it doesn't have a noticeable effect.

When creating statistics, you have to use the WITH FULLSCAN option.

In [None]:
CREATE STATISTICS statistics_name ON [schema].[table] (field_name) WITH FULLSCAN;
GO