Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Graph][Benchmark] Update benchmark function #363

Merged
merged 11 commits into from Oct 12, 2023

Conversation

Aalanli
Copy link
Collaborator

@Aalanli Aalanli commented Oct 10, 2023

The old benchmarking function did not clear the l2 cache, so repeated runs are biased.
This is especially prevalent in tuning for parallel-k parts, which always selects k_parts=1 due to l2 cache hits, even when it is not the fastest implementation.

@yaoyaoding
Copy link
Member

Hi Allan,

The PR looks good to me. But before merging, it is better to have some demos on the improvement of the accuracy on the selection of parallel k when we clear the L2 cache.

@Aalanli
Copy link
Collaborator Author

Aalanli commented Oct 11, 2023

After some further investigation, it appears that the clearing of the l2 cache is not the greatest contributor, but the usage of torch.cuda.Event, which I assume to be more accurate than time.time().
Here is my benchmarking script for reference: https://gist.github.com/Aalanli/b81d1a751a78ea72b491d872aa993f9e

image
image

@Aalanli
Copy link
Collaborator Author

Aalanli commented Oct 11, 2023

orig-latency is the original benchmark function
new-latency is the benchmark function used by this pr
orig-latency-with-event is a benchmark function that uses torch.cuda.Event, but does not clear l2 cache.

@Aalanli
Copy link
Collaborator Author

Aalanli commented Oct 11, 2023

I just removed the torch dependencies.

python/hidet/cuda/device.py Outdated Show resolved Hide resolved
@Aalanli Aalanli merged commit 82ddb8c into hidet-org:main Oct 12, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants