Adding some code to demonstrate the benchmarking for snowflake queries
This repo accompanies my medium article.
I have generated some test data which can be loaded in snowflake and then these benchmarking queries can be run against it.
- Download data from google drive
- This data is a gzip parquet file.
- You can run the script
file_to_tbl.py
to load data into your snowflake account. - Please setup your snowflake creds in the file
snowflake_connection.py
- You can generate your own data as well. I have provided the script
generate_data.py
which can be used to generate data. You can modify the script to generate data of your choice.- Install the requirements
pip install -r requirements.txt
in repo root. Usepython@3.10
. - Go to folder
data_generation
- Create an empty folder
pq
- Run
python generate_data.py
- This will generate a parquet file
merge.parquet
in the same folder. Move it out one level up - Run tge script
file_to_tbl.py
to load data into your snowflake account.
- Install the requirements
- Go to folder
benchmarking
- Run the script
pivot_query.py
to obtain the benchmarking results for pivot query.