Athena Query Speed Test

Code used to facilitate an investigation into how file size affects querying speed on AWS Athena.

Running instructions

Run inside a tmux session with the following command, substituting the number of rows each time.

rm -rf /data/rp1615 && mkdir /data/rp1615 && cd ~/Documents && ./main 100000000 && rm -rf /data/rp1615 && exit

Method

All tables had 100000000 rows. Query tested was SELECT count(ax) FROM row100 WHERE ax > 0.

Results

Results:

Rows per file	Time taken /s
100	201.15
1000	19.53
10000	4.07
100000	2.79
1000000	2.35
10000000	2.27
100000000	2.83

Results analysis

As shown in the graph, the query time is significantly longer when small files are used due to the additional overhead of creating new connections to S3 and reading additional metadata. The single file performance (largest entry) is also slower than multiple files as the query cannot be parallelised.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
appleWatch3Row		appleWatch3Row
parquetHandler		parquetHandler
s3Connection		s3Connection
.gitignore		.gitignore
README.md		README.md
graph_plotter.py		graph_plotter.py
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Athena Query Speed Test

Running instructions

Method

Results

Results analysis

Further reading

About

Releases

Packages

Languages

kine-dmd/athena-query-speed-experiment

Folders and files

Latest commit

History

Repository files navigation

Athena Query Speed Test

Running instructions

Method

Results

Results analysis

Further reading

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages