## From Kafka - in Snowpipe (Batch) mode

The table for the data to be written to will be automatically created by the connector.

Configure and install the connector to load data. Run the following in your shell:

```
export KAFKA_TOPIC=LIFT_TICKETS_KAFKA_BATCH
eval $(cat .env)

URL="https://$SNOWFLAKE_ACCOUNT.snowflakecomputing.com"
NAME="LIFT_TICKETS_KAFKA_BATCH"

curl -i -X PUT -H "Content-Type:application/json" \
    "http://localhost:8083/connectors/$NAME/config" \
    -d '{
        "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
        "errors.log.enable":"true",
        "snowflake.database.name":"INGEST",
        "snowflake.private.key":"'$PRIVATE_KEY'",
        "snowflake.schema.name":"INGEST",
        "snowflake.role.name":"INGEST",
        "snowflake.url.name":"'$URL'",
        "snowflake.user.name":"'$SNOWFLAKE_USER'",
        "topics":"'$KAFKA_TOPIC'",
        "name":"'$NAME'",
        "buffer.size.bytes":"250000000",
        "buffer.flush.time":"60",
        "buffer.count.records":"1000000",
        "snowflake.topic2table.map":"'$KAFKA_TOPIC:$NAME'"
    }'
```

Verify the connector was created and is running in the [Redpanda console](http://localhost:8080/topics).



To start, lets push in one message to get the table created and verify the connector is working.

Run the following in your shell:

```
export KAFKA_TOPIC=LIFT_TICKETS_KAFKA_BATCH
python ./data_generator.py 1 | python ./publish_data.py
```

A table named LIFT_TICKETS_KAFKA_BATCH should be created in your account.

In [None]:
-- There should be 1 row of data which was created by the data_generator. Note: This can take a minute or so to the flush times in configuration.
USE ROLE INGEST;

USE DATABASE INGEST;
USE SCHEMA INGEST;

SELECT get_ddl('table', 'LIFT_TICKETS_KAFKA_BATCH');

In [None]:
-- Once we verify that the table was created and it has a single row, we can import all our data
SELECT count(*) FROM LIFT_TICKETS_KAFKA_BATCH;

Run the following in your shell:

```
export KAFKA_TOPIC=LIFT_TICKETS_KAFKA_BATCH
cat data.json.gz | zcat | python ./publish_data.py
```

In [None]:
SELECT count(*) FROM LIFT_TICKETS_KAFKA_BATCH;

### Tips
* Every partition will flush to a file when the bytes, time, or records is hit. This can create a LOT of tiny files if not configured well which will be inefficient.
* Not all workloads can accommodate quick flush times. The more data that is flowing, the quicker data can be visible while being efficient.
* Reducing the number of partitions and increasing the bytes, time, records to get to well sized files is valuable for efficiency.
* If you don't have time or a use case to get to well sized files, move to streaming which will match or be better in all cases.
* Number of tasks, number of nodes in the Kafka Connect cluster, amount of CPU and memory on those nodes, and number of partitions will affect performance and credit consumption.
* Kafka Connector for Snowflake is billed by the second of compute needed to ingest files (Snowpipe).