## File Upload & Copy (Snowpipe) using Python

Another way to get data into Snowflake is to use a service specifically designed for this task: Snowpipe. Snowpipe uses serverless infrastructure to ingest data from a file uploaded from a client. In this use case I will upload a file to an internal stage and call the Snowpipe service to ingest the file.

This is not the only way to use Snowpipe. You can use external stages as well as use eventing from those blob stores so Snowflake will automatically ingest files as they land. Kafka also uses Snowpipe internally which you will see in later examples.

Create the table and the snowpipe to handle the ingest. If you changed the data generator for your use case, you will need to change this table to support your data.


In [None]:
USE ROLE INGEST;

USE DATABASE INGEST;
USE SCHEMA INGEST;

CREATE OR REPLACE TABLE LIFT_TICKETS_PY_SNOWPIPE (TXID varchar(255), RFID varchar(255), RESORT varchar(255), PURCHASE_TIME datetime, EXPIRATION_TIME date, DAYS number, NAME varchar(255), ADDRESS variant, PHONE varchar(255), EMAIL varchar(255), EMERGENCY_CONTACT variant);

CREATE PIPE LIFT_TICKETS_PIPE AS COPY INTO LIFT_TICKETS_PY_SNOWPIPE
FILE_FORMAT=(TYPE='PARQUET') 
MATCH_BY_COLUMN_NAME=CASE_SENSITIVE;

Take a look at py_snowpipe.py.

This code will read a batch of lines from standard input, write a file to temporary storage, upload/put that file to LIFT_TICKETS_PY_SNOWPIPE stage, and call the API endpoint to have LIFT_TICKETS_PIPE ingest the file uploaded. Snowpipe will do the COPY INTO the table LIFT_TICKETS_PY_SNOWPIPE.

Since this pattern is creating a file, uploading the file, and copying the results of that data it can VERY efficiently load large numbers of records. It is also only charging for the number of seconds of compute used by Snowpipe.

In order to test this insert, run the following in your shell:

```
python ./data_generator.py 1 | python py_snowpipe.py 1
```

In [None]:
-- Query the table to verify the data was inserted. You will probably see 0 records for up to a minute while Snowpipe ingests the file.

SELECT count(*) FROM LIFT_TICKETS_PY_SNOWPIPE;

To send in all your test data, you can run the following in your shell:

```
cat data.json.gz | zcat | python py_snowpipe.py 10000
```

This last call will batch together 10,000 records into each file for processing. As this file gets larger, up to 100mb, you will see this be more efficient on seconds of compute used in Snowpipe and see higher throughputs.

Test this approach with more test data and larger batch sizes. Review INFORMATION_SCHEMA PIPE_USAGE_HISTORY to see how efficient large batches are vs small batches.

In [None]:
-- Query the table to verify the data was inserted. You will probably see 0 records for up to a minute while Snowpipe ingests the file.

SELECT count(*) FROM LIFT_TICKETS_PY_SNOWPIPE;

### Tips
* Ingest is billed based on seconds of compute used by Snowpipe and number of files ingested.
* This is one of the most efficient and highest throughput ways to ingest data when batches are well sized.
* File size is a huge factor for cost efficiency and throughput. If you have files and batches much smaller than 100mb and cannot change them, this pattern should be avoided.
* Expect delays when Snowpipe has enqueued the request to ingest the data. This process is asynchronous. In most cases these patterns can deliver ~ minute ingest times when including the time to batch, upload, and copy but this varies based on your use case.