## File Upload & Copy (Warehouse) from the Python Connector

To get to better sized batches, the client can upload a file and have a warehouse copy the data into the destination. The Python connector can execute the COPY after uploading the file.

Create the table which will be used for landing the data, changing as needed for your use case.

In [None]:
USE ROLE INGEST;

USE DATABASE INGEST;
USE SCHEMA INGEST;

CREATE OR REPLACE TABLE LIFT_TICKETS_PY_COPY_INTO (TXID varchar(255), RFID varchar(255), RESORT varchar(255), PURCHASE_TIME datetime, EXPIRATION_TIME date, DAYS number, NAME varchar(255), ADDRESS variant, PHONE varchar(255), EMAIL varchar(255), EMERGENCY_CONTACT variant);

Take a look at the py_copy_into.py

You will see a lot of similarity of this pattern with the previous one in that the connection is the same, but instead of doing single record inserts it batches together a set of records. That batch is written into a Parquet file which is PUT to the table stage and COPY is used to insert. This data shows up immediately after the COPY call is made.

In order to test this insert, run the following in your shell:

```
python ./data_generator.py 1 | python py_copy_into.py 1
```


In [None]:
-- Query the table to verify the data was inserted.
SELECT count(*) FROM LIFT_TICKETS_PY_COPY_INTO;

To send in all your test data, you can run the following in your shell:
```
cat data.json.gz | zcat | python py_copy_into.py 10000
```

This last call will batch together 10,000 records into each file for processing. As this file gets larger, up to 100mb, you will see this be more efficient on seconds of compute used in Snowpipe and see higher throughputs. Feel free to generate more test data and increase this to get more understanding of this relationship. Review the query performance in Query History in Snowflake.

### Tips
* Ingest is billed based on warehouse credits consumed while online.
* It is very hard to fully utilize a warehouse with this pattern. Adding some concurrency will help IF the files are already well sized. Even with the best code, very few workloads have fixed data flow volumes that well match a warehouse. This is mostly a wasted effort as serverless and snowpipe solves all use cases w/o constraints.
* Try to get to 100mb files for most efficiency.
* Best warehouses sizes are almost always way smaller than expected, commonly XS.