## File Upload & Copy (Serverless) from the Python Connector

It can be useful to leverage a Serverless Task which is scheduled every minute to ingest the files uploaded by clients over the last minute.

This has several advantages over using Snowpipe for Copy:

* Eliminates the per file costs incurred by Snowpipe.
* Small files can be merged together more efficiently

It is also billed per second of compute so warehouse planning/optimization is not required.


In [None]:
-- create the table and task needed for this ingest pattern

USE ROLE ACCOUNTADMIN;
GRANT EXECUTE TASK ON ACCOUNT TO ROLE INGEST;
GRANT EXECUTE MANAGED TASK ON ACCOUNT TO ROLE INGEST;

USE ROLE INGEST;

use database INGEST;
use schema INGEST;

CREATE OR REPLACE TABLE LIFT_TICKETS_PY_SERVERLESS (TXID varchar(255), RFID varchar(255), RESORT varchar(255), PURCHASE_TIME datetime, EXPIRATION_TIME date, DAYS number, NAME varchar(255), ADDRESS variant, PHONE varchar(255), EMAIL varchar(255), EMERGENCY_CONTACT variant);

CREATE OR REPLACE TASK LIFT_TICKETS_PY_SERVERLESS 
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE='XSMALL' 
AS
COPY INTO LIFT_TICKETS_PY_SERVERLESS
FILE_FORMAT=(TYPE='PARQUET') 
MATCH_BY_COLUMN_NAME=CASE_SENSITIVE 
PURGE=TRUE;

And we'll use our py_serverless.py for this.

To test this insert, run the following

```
python ./data_generator.py 1 | python py_serverless.py 1
```

In [None]:
-- Query the table to verify the data was inserted.

SELECT count(*) FROM LIFT_TICKETS_PY_SERVERLESS;

To send in all your test data, you can run the following in your shell:

```
cat data.json.gz | zcat | python py_serverless.py 10000
```

If you run multiple tests with different batch sizes (especially smaller sizes), you will see this can save credit consumption over the previous Snowpipe solution as it combines files into loads.

The code is calling execute task after each file is uploaded. While this may not seem optimimal, it is not running after each file is uploaded. It is leveraging a feature of tasks which does not allow additional tasks to be enqueued when one is already enqueued to run.

It is also common to schedule the task to run every n minutes instead of calling from the clients.

In [None]:
-- Query the table to verify the data was inserted.

SELECT count(*) FROM LIFT_TICKETS_PY_SERVERLESS;

### Tips
* Only run the Task as needed when enough data (> 100mb) has been loaded into stage for most efficiency.
* Use Serverless Tasks to avoid per file charges and resolve small file inefficiencies.