The benchmarking tests were conducted manually. We want to automate the process, including a visualization functionality and comparison graphs.