# SQL Server 2019 Data Virtualization - Using Polybase to query Oracle
This notebook contains an example of how to use external tables to query data in Oracle without moving data. You may need to change identity, secret, connection, database, schema, and remote table names to work with your Oracle Database.

This notebook also assumes you are using SQL Server 2019 Release Candidate or later and that the Polybase feature has been installed and enabled.

## Step 0: Create a database in Oracle, table, add data, and create a database in SQL Server

This example uses an Oracle Express Instance which by default is called XEPDB1. Use the following script files provided with this example to create a user (schema), a table, and populate data.

- **createuser.sql** - Create a new user and schema. Login as SYSTEM to run this script
- **createable.sql** - Create a new table. Login as the new user to run this script
- **insertdata.sql** - Populate the table with data. Login as the new user to run this script.

Create a database in SQL Server called **TutorialDB** (use all the defaults)

## Step 1: Create a master key
Create a master key to encrypt the database credential

In [1]:
USE [TutorialDB];
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '!Sql2019isfast#';
GO

## Step 2: Create a database credential.
The database credential contains the IDENTITY (login) and SECRET (password) of Oracle instance. Change this to the login and password created in Step 0.

In [2]:
CREATE DATABASE SCOPED CREDENTIAL OracleCredentials   
WITH IDENTITY = 'orauser', SECRET = 'orapwd';
GO

## Step 3: Create an EXTERNAL DATA SOURCE
The EXTERNAL DATA SOURCE indicates what type of data source, the connection "string", where PUSHDOWN predicates should be used (if possible), and the name of the database credential.

The LOCATION syntax is <datasourcetype>:<connection string>.

datasourcetype can be sqlserver, oracle, teradata, mongodb, or odbc (Windows only)
The connection string depends on the datasourcetype

For this example, put IP address or hostname of the Oracle instance and port number

In [3]:
CREATE EXTERNAL DATA SOURCE OracleServer
WITH ( 
LOCATION = 'oracle://bworacle:1521',
PUSHDOWN = ON,
CREDENTIAL = OracleCredentials
);
GO

## Step 4: Create a schema for the EXTERNAL TABLE
Schemas provide convenient methods to secure and organize objects

In [5]:
CREATE SCHEMA oracle;
GO

## Step 5: Create an EXTERNAL TABLE
An external table provides metadata so SQL Server knows how to map columns to the remote table. The name of the tables for the external table can be your choice. But the columns must be specified in the same order with the same name as they are defined in the remote table. Furthermore, local data types must be compatible with the remote table.

The WITH clause specifies a LOCATION. This LOCATION is different than the EXTERNAL DATA SOURCE. For Oracle, this LOCATION indicates the [instance].[schema].[table] of the Oracle table. The DATA_SOURCE clauses is the name of the EXTERNAL DATA SOURCE you created earlier.

For Oracle, the LOCATION needs to be UPPERCASE. The column names must match the target data source. The column names must be UPPERCASE for Oracle in the table definition but not when you reference them.

In [6]:
CREATE EXTERNAL TABLE oracle.rental_data
(
YEAR int,
MONTH int,
DAY int,
RENTALCOUNT int,
WEEKDAY int,
HOLIDAY int,
SNOW int,
FHOLIDAY nvarchar(255) COLLATE Latin1_General_100_BIN2_UTF8,
FSNOW nvarchar(255) COLLATE Latin1_General_100_BIN2_UTF8,
FWEEKDAY nvarchar(255) COLLATE Latin1_General_100_BIN2_UTF8
)
 WITH (
 LOCATION='[XEPDB1].[ORAUSER].[RENTAL_DATA]',
 DATA_SOURCE=OracleServer
);
GO

## Step 6: Create statistics
SQL Server allows you to store local statistics about specific columns from the remote table. This can help the query processing to make more efficient plan decisions.

In [7]:
CREATE STATISTICS rental_data_stats ON oracle.rental_data ([YEAR]) WITH FULLSCAN
GO;

## Step 7: Try to scan the remote table
Run a simple query on the EXTERNAL TABLE to scan all rows.

In [8]:
SELECT * FROM oracle.rental_data;
GO

YEAR,MONTH,DAY,RENTALCOUNT,WEEKDAY,HOLIDAY,SNOW,FHOLIDAY,FSNOW,FWEEKDAY
2013,1,16,50,4,0,0,0,0,4
2013,2,3,499,1,0,0,0,0,1
2015,12,20,280,1,0,0,0,0,1
2014,3,14,41,6,0,1,0,1,6
2015,1,4,468,1,0,0,0,0,1
2015,2,19,63,5,0,1,0,1,5
2013,4,8,33,2,0,0,0,0,2
2014,2,7,35,6,0,0,0,0,6
2014,12,8,35,2,0,0,0,0,2
2013,12,12,22,5,0,0,0,0,5


## Step 8: Query the remote table with a WHERE clause
Even though the table may be small SQL Server will "push" the WHERE clause filter to the remote table

In [9]:
SELECT * FROM oracle.rental_data
WHERE year = 2015;
GO

YEAR,MONTH,DAY,RENTALCOUNT,WEEKDAY,HOLIDAY,SNOW,FHOLIDAY,FSNOW,FWEEKDAY
2015,1,16,42,6,0,1,0,1,6
2015,4,5,270,1,0,0,0,0,1
2015,4,28,32,3,0,0,0,0,3
2015,1,7,39,4,0,0,0,0,4
2015,1,30,53,6,0,1,0,1,6
2015,2,20,52,6,0,0,0,0,6
2015,12,15,38,3,0,1,0,1,3
2015,2,14,750,7,0,1,0,1,7
2015,2,6,44,6,0,0,0,0,6
2015,3,21,276,7,0,0,0,0,7
