## SQL Inserts from the Python Connector

Snowflake has a [Python connector](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector?_fsi=kVaBWUHz&_fsi=kVaBWUHz) which is an easy way to run sql and upload files. One way to get data in would be to do an SQL INSERT statement for each record. While this is a convenient way to insert data, it is not efficient as Snowflake is an OLAP engine and is optimized around writing large batches of data.

Create a table in Snowflake called LIFT_TICKETS_PY_INSERT to recieve this data from the INGEST user.

In [None]:
USE ROLE INGEST;

USE DATABASE INGEST;
USE SCHEMA INGEST;


CREATE OR REPLACE TABLE LIFT_TICKETS_PY_INSERT (TXID varchar(255), RFID varchar(255), RESORT varchar(255), PURCHASE_TIME datetime, EXPIRATION_TIME date, DAYS number, NAME varchar(255), ADDRESS variant, PHONE varchar(255), EMAIL varchar(255), EMERGENCY_CONTACT variant);

Now, we're going to go back to VS Code and take a look at py_insert.py

We will then test a single insert via our shell

```
python ./data_generator.py 1 | python py_insert.py
```

In [None]:
-- When the above statement is done running, we can check our table.
SELECT count(*) FROM LIFT_TICKETS_PY_INSERT;

Fingers crossed, but you should have one row in that table.

### HOWEVER

This is not a good way to load data and will take a long time. I don't really want you to have to wait for hours to load your example dataset, so lets just load 1000 records. It is still going to take a very long time.

To send all your test data, run the following in your shell

```
cat data.json.gz | zcat | head -n 1000 | python py_insert.py
```


We could parallelize the work if we wanted, so feel free to run this in 10 separate terminals. 

```
cat data.json.gz | zcat | head -n 100 | python py_insert.py
```

Spoiler alert - this will not help. This is not a good pattern to get a high throughput of records.


### Tips
* Ingest is billed based on warehouse credits consumed while online.
* The connectors support multi-inserts but data containing a variant field cannot be formatted into a multi-insert.
* Using inserts and multi-inserts will not efficiently use warehouse resources (optimal at 100MB or more with some concurrency). It is better to upload data and COPY into the table.
* Connectors will switch to creating and uploading a file and doing a COPY into when large batches are set. This is not configurable.
* Many assume adding significant concurrency will support higher throughputs of data. The additional concurrent INSERTS will be blocked by other INSERTS, more frequently when small payloads are inserted. You need to move to bigger batches to get more througput.
* Review query history to see what the connector is doing.

In cases where the connector has enough data in the executemany to create a well sized file for COPY and does so, this does become as efficient as the following methods.

The example above could not use executemany as it had VARIANT data.

### The next methods will show how to batch into better sized blocks of work which will drive higher throughputs and higher efficiency on Snowflake.
