We should decide on criteria for what makes a good/suitable benchmark problem and how it can be implemented.