Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent heatmaps and the current timing script #51

Open
Schefflera-Arboricola opened this issue Mar 5, 2024 · 4 comments · May be fixed by #61
Open

Inconsistent heatmaps and the current timing script #51

Schefflera-Arboricola opened this issue Mar 5, 2024 · 4 comments · May be fixed by #61

Comments

@Schefflera-Arboricola
Copy link
Member

current heatmap- in repo(it says- ran on 2 cores):
x

I tried re-running these and got these results(used 8 cores):
y

Potential cause: other background processes or the machine goes into 'sleep mode' in between, this can increase or decrease the time a function takes.

Potential solution: running heatmaps on VM and using some other timing function that gives the time CPU core(s) was actually running the process, instead of just time.time().

@dschult
Copy link
Member

dschult commented Mar 13, 2024

Can you look into Python's std library timeit.Timer() tool? I believe that is what IPython uses for its %timeit and %%timeit functions (which are really nice cuz they time it multiple times and report stdev of results and give warnings if one is way off of the others, etc.) Also, timeit.timeit can work well for one-liners. But timeit.Timer is the more complete interface.

https://docs.python.org/3/library/timeit.html

@Schefflera-Arboricola Schefflera-Arboricola linked a pull request Apr 21, 2024 that will close this issue
@Schefflera-Arboricola
Copy link
Member Author

I was looking into how to time processes that create other parallel processes.

The time.[process_time() just times the parent process's time and ignores the child processes' times. Also it didn't seem very clear to me what exactly it meant to time processes that are running in parallel. Do we take the process that took the maximum time or take an average time? And also we don't just have to time the parallel part, but also the non-parallel part of an algorithm.

So, I started looking into how ASV benchmarks time their benchmarks and they use timeit.default_timer() and it doesn't remove the inconsistencies caused by laptop going in sleep mode or other heavier processes running in the background, etc. But, they run it multiple times to minimize the effects of such inconsistencies and I think we can also do the same for heatmaps. (ref. asv timing docs)

the timeit docs also suggests running the program multiple times:
"default_timer() measurements can be affected by other programs running on the same machine, so the best thing to do when accurate timing is necessary is to repeat the timing a few times and use the best time. The -r option is good for this; the default of 5 repetitions is probably enough in most cases. You can use time.process_time() to measure CPU time."

PS: Before implementing all this and re-generating all the heatmaps, I just want to dig a bit deeper and figure out how exactly the timeit.default_timer() works internally and if that aligns with what we are trying to do here. And also if there are any other better alternatives. And also look into how running and timing parallel processes on my local machine is different from running and timing them on a VM over ssh.

@dschult
Copy link
Member

dschult commented May 15, 2024

It seems like you want some approximation of "wall clock time" -- that is, how long between starting and finishing the call. I don't think we want to somehow compute the total cpu-time of the call. So timeit.default_timer seems good. It looks like it calls time.perf_counter() which relies on the OS clock values.

Running it multiple times and taking the best time also seems reasonable.

Using your own machine and comparing to a VM is expected to give different values, but hopefully the factor of speedup is similar. Timing is difficult to get exactly right. But a consistent set of hardware without distractions from other processes will be a good start.

@eriknw
Copy link

eriknw commented May 23, 2024

fyi, graphblas-algorithms uses timeit.Timer. It runs the specified number of times, or tries to run for the target number of seconds. If running for a target number of seconds, it estimates the number of times to run based on the first run. See:
https://github.com/python-graphblas/graphblas-algorithms/blob/35dbc90e808c6bf51b63d51d8a63f59238c02975/scripts/bench.py#L172
and
https://github.com/python-graphblas/graphblas-algorithms/blob/35dbc90e808c6bf51b63d51d8a63f59238c02975/scripts/bench.py#L202

Would there be any interest in making common utilities that make it easy to benchmark different backends? Maybe we could use what we have for graphblas-algorithms as a starting point. I'd be happy to talk about what it does and to discuss what y'all want to do for nx-parallel.

nx-cugraph uses pytest-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants