# Load data to Delta Lake from S3 with COPY INTO

This notebook shows you how to create a Delta Lake from by using `COPY INTO` to load data from AWS S3.

The examples below use CSV as a file source; for more details and additional options, see [Load data with COPY INTO](https://docs.databricks.com/ingestion/copy-into/index.html).

## Create a target Delta table

`COPY INTO` requires a target table created with Delta Lake. If using Databricks Runtime (DBR) 11.0 or above, you can create an empty Delta table using the command below.

When using DBR below 11.0, you'll need to specify the schema for the table during creation.

Delta Lake is the default format for all tables created in DBR 8.0 and above. When using DBR below 8.0, you'll need to add a `USING DELTA` clause to your create table statement.

In [0]:
%sql
-- 11.0 and above
CREATE TABLE <database-name>.<table-name>;

-- 8.0 and above
-- CREATE TABLE <database-name>.<table-name>
-- (col_1 TYPE, col_2 TYPE, ...);

-- Below 8.0
-- CREATE TABLE <database-name>.<table-name>
-- (col_1 TYPE, col_2 TYPE, ...)
-- USING delta;

## Loading data with an instance profile

Users with sufficient permissions can create instance profiles in the AWS console.

Databricks administrator can load instance profiles for use in the Databricks workspace.

Databricks recommends securing access to S3 buckets by attaching instance profiles to clusters.

* [Databricks docs: Secure access to S3 buckets using instance profiles](https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html)

In [0]:
%sql
COPY INTO <database-name>.<table-name>
FROM 's3://bucket-name/path/to/folder'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')

## COPY INTO with temporary credentials

`COPY INTO` also supports using temporary credentials to access data from S3 buckets.

* [Databricks docs: Use temporary credentials to load data with COPY INTO](https://docs.databricks.com/ingestion/copy-into/temporary-credentials.html)

You can use the AWS CLI to generate the access key, secret key, and session token you'll need to access the S3 bucket. Note that this process merely provides authentication to AWS, and will just allow Databricks to access your S3 bucket with your user credentials. (If you do not have permissions to access the S3 bucket in AWS, you will need to talk to your cloud administrator.)

* [AWS docs: Developing with Amazon S3 using the AWS CLI](https://docs.aws.amazon.com/AmazonS3/latest/userguide/setup-aws-cli.html)
* [AWS docs: AWS CLI sts get-session-token](https://docs.aws.amazon.com/cli/latest/reference/sts/get-session-token.html)

For more details on IAM user temporary credentials and a complete set of programmatic options, see:
* [AWS docs: Making requests using IAM user temporary credentials](https://docs.aws.amazon.com/AmazonS3/latest/userguide/AuthUsingTempSessionToken.html)

In [0]:
%sql
COPY INTO <database-name>.<table-name>
FROM 's3://bucket-name/path/to/folder' WITH (
  CREDENTIAL (AWS_ACCESS_KEY = '<access-key>', AWS_SECRET_KEY = '<secret-key>', AWS_SESSION_TOKEN = '<session-token>')
)
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true')