<div id="singlestore-header" style="display: flex; background-color: rgba(210, 255, 153, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/chart-network.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Load files from S3 into Shared Tier</h1>
    </div>
</div>

This notebook guides you through data ingestion of CSV files from an AWS S3 location into your shared tier workspace.

## Create a Pipeline from CSV files in AWS S3

In this example, we want to create a pipeline that ingests from a CSV file stored in an AWS S3 bucket.

### Create a Table

We first create a table **{{S2_TABLE_NAME}}** in your shared tier database **{{S2_DATABASE_NAME}}** .

<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>Make sure to select the <tt>{{S2_DATABASE_NAME}}</tt> database from the drop-down menu at the top of this notebook. It updates the <tt>connection_url</tt> to connect to that database.</p>
    </div>
</div>

In [1]:
%%sql
CREATE TABLE IF NOT EXISTS {{S2_DATABASE_NAME}}.{{S2_TABLE_NAME}} (
`field1` int,
`field2` double,
`field3` text
);

Replace `field1`, `field2`, `field3` (and add additional fields) with the columns in your csv file, and ensure they have the correct data format.

### Create a pipeline

We then create a pipeline in your database by pointing the data source to your S3 bucket, and the destination to your table.

To create the pipeline, you will need the following information:

- The name of the bucket, such as: `\<bucket name\>`
- The name of the bucket’s region, such as: `\<us-west-1\>`
- Your AWS account’s access keys, such as: `\<aws_access_key_id\>* and *\<aws_secret_access_key\>`

*For more on how to retrieve the above information, read [this doc](https://docs.singlestore.com/cloud/load-data/load-data-with-pipelines/how-to-load-data-using-pipelines/load-data-from-amazon-web-services-aws-s-3/).

In [2]:
%%sql
CREATE PIPELINE if not exists {{S2_DATABASE_NAME}}.{{S2_PIPELINE_NAME}}
    AS LOAD DATA S3 's3://bucket_name/<file_name>'
    CONFIG '{ \"region\": \"<us-west-1>\" }'
    CREDENTIALS '{"aws_access_key_id": "<aws_access_key_id>",
                  "aws_secret_access_key": "<aws_secret_access_key>"}'
    BATCH_INTERVAL 45000
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE {{S2_TABLE_NAME}}
    FORMAT CSV
    FIELDS TERMINATED BY ',';

### Start and monitor the pipeline

The CREATE PIPELINE statement creates a new pipeline, but the pipeline has not yet been started, and no data has been loaded. To start a pipeline in the background, run:

In [3]:
%%sql
START PIPELINE {{S2_DATABASE_NAME}}.{{S2_PIPELINE_NAME}};

If there is no error or warning, you should see no error message.

In [4]:
%%sql
SELECT * FROM information_schema.pipelines_errors
    WHERE pipeline_name = {{S2_PIPELINE_NAME}} ;

### Query the table

In [5]:
%%sql
SELECT * FROM {{S2_DATABASE_NAME}}.{{S2_TABLE_NAME}};

### Cleanup ressources

In [6]:
%%sql
DROP PIPELINE IF EXISTS {{S2_DATABASE_NAME}}.{{S2_PIPELINE_NAME}};
DROP TABLE IF EXISTS {{S2_DATABASE_NAME}}.{{S2_TABLE_NAME}};

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>