For the tutorial we calculate TFLOPS and return it instead of ms. xref in upstream - https://github.com/triton-lang/triton/issues/4749. This issue should be closed once we sync the changes with upstream.