Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Peer Review] Add progress monitoring #5

Closed
cthoyt opened this issue Oct 2, 2020 · 10 comments
Closed

[Peer Review] Add progress monitoring #5

cthoyt opened this issue Oct 2, 2020 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@cthoyt
Copy link
Contributor

cthoyt commented Oct 2, 2020

For small graphs like BioGRID, it might be reasonable to wait for 30 seconds or so in the different steps, but as the graphs start getting large and training time gets into the minutes/hours, it would be nice to communicate progress to the user.

I would normally suggest using tqdm, but after modifying the code myself, it doesn't look like it plays nicely with numba's parallelization. I found this thread: numba/numba#4267 that describes the issue a bit more in detail, and the outlook is a bit grim.

It's not a dealbreaker, but it would make your package much more user friendly.

@RemyLau
Copy link
Contributor

RemyLau commented Mar 8, 2021

Many tries have been made to exchange progress information among threads during numba jit parallelism, which is necessary to print the overall progress, in particular, for the simulate_walk function. One of the thoughts was to use atomic operation to sync progress among threads, but cpu atomic is not available yet: numba/numba#2988. Another thought was to use reduction on iteration count, but the reduction only happens after all threads are done.

@RemyLau
Copy link
Contributor

RemyLau commented Mar 8, 2021

It is also attempted to use only one of the threads to print the progress as the estimation to the overall progress by using numba.np.ufunc.parallel._get_thread_id(). However to print out progress bar in a nice format, it is necessary to be able to flush out the previous stdout line with current print, which is made possible in Python either by print(flush=True) or sys.stdout.flush(), and yet neither is supported in nopython mode: numba/numba#3475

@songololo
Copy link

songololo commented May 19, 2021

Interesting discussion, one I've been following for some time (across other threads).

In my case I've been using a manual progress bar (just hash symbols printed as strings from no python mode) and when in parallelised mode it scrambles the order, e.g.:

|                                 | 0.0 %
|############################     | 87.87 %
|########################         | 75.75 %
|################                 | 51.51 %
|####################             | 63.63 %

Though I suppose this still gives some indication that something is happening. One way to give some sense of progress might be to change the progress bar to read something like "one of x number of batches has been completed..." instead of an explicit percentage and progress bar.

Edit: Batches probably wouldn't work as the iterations triggering the update are probably fairly arbitrary...?

@RemyLau
Copy link
Contributor

RemyLau commented May 19, 2021

Hi @songololo, thanks for your comment! That's a good idea and I've tried something similar along this line. But as you've pointed out at the end, there's some issue with the scheduling being random. And also the main issue that hold this kind of counting base approach back is due to inability to communicate progress information across threads during parallel execution in Numba, as briefly discussed above. So it is at the moment impossible to get an overall progress by counting the number of iterations done across all threads.

With that being said, we could instead monitor the progress in each thread separately, and output similarly "x% of iterations completed in thread i / N", where i is the thread id and N is the total number of threads. However this makes the monitoring very messy and might not be a good idea to do so. One work around is to flush out previous prints whenever next monitoring message is being printed. But the flushing option is not currently supported by Numba nopython mode (see above).

@songololo
Copy link

@RemyLau I don't fully understand the numba.np.ufunc.parallel._get_thread_id() approach, but if it works in principle, then would this be worth pursuing?

i.e. just making peace with a not-so-nice progress bar that doesn't flush between prints?

@RemyLau
Copy link
Contributor

RemyLau commented May 19, 2021

@songololo so the main idea of the _get_thread_id() approach is to keep track of the progress of each individual thread instead of the overall progress since we numba threads are not capable of communicating with each other at present. I did try to make it print out the progress just now and I realized that even if we're only keeping track of progress within a single thread, there's a complication: the total number of jobs (or iterations) assigned to a thread is not known a priori so we can't compute the percent of iteration done by individual thread either.

Update: after some more searching, it seems like prange by default perform static scheduling and assign (roughly) equal number of jobs to each thread, see this reply in an issue with numba scheduling. So perhaps I could still output to the progress for each thread. Another complication I've encountered is that nopython mode does not support string formatting (see numba/numba#3250). In another thread numba/numba#3475, this is linked to an StackOverflow thread, which uses the objmode() context manager to resolve this issue.

Update on objmode: attempt on using objmode() failed, I'm using numba 0.52.0 currently. I'll try to see if this work on the latest released version of numba

numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: Handle with contexts)
Failed in nopython mode pipeline (step: nopython mode backend)
Failed in nopython mode pipeline (step: nopython mode backend)
ctypes objects containing pointers cannot be pickled

File "../src/pecanpy/node2vec.py", line 113:
        def node2vec_walks():
            <source elided>
                    print(np.float32(private_count / n * 100), "% walks generated by thread #", _get_thread_id())
                    with objmode():
                    ^

During: lowering "$148 = call $147(func=$147, args=[], kws=(), vararg=None)" at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (113)
During: lowering "id=2[LoopNest(index_variable = parfor_index.98, range = (0, $n.103, 1))]{290: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (96)>, 130: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (101)>, 228: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (112)>, 132: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (101)>, 194: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (107)>, 208: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (110)>, 86: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (96)>, 158: <ir.Block at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (104)>}Var(parfor_index.98, node2vec.py:96)" at /mnt/ufs18/home-026/liurenmi/repo/PecanPy/src/pecanpy/node2vec.py (96)

@songololo
Copy link

songololo commented May 19, 2021

@RemyLau thanks, this makes sense compared to what I've experienced with plotting progress from prange.

For example, I've now switched to really simple updates which report a non-sequential bundle number:

Completed non-sequential bundle 1 of 33
Completed non-sequential bundle 30 of 33
Completed non-sequential bundle 26 of 33
etc.

From a function which looks like this:

@njit(cache=False)
def progress_bar(current: int, total: int, steps: int = 20):
    '''
    Printing carries a performance penalty
    Cache has to be set to false per Numba issue:
    https://github.com/numba/numba/issues/3555
    TODO: set cache to True once resolved
    '''
    if steps == 0:
        return
    step_size = int(total / steps)
    if step_size == 0:
        return
    if current % step_size == 0:
        chunk = round(current / step_size) + 1
        print('Completed non-sequential bundle', chunk, 'of', steps)

and called like this:

  for idx in prange(n):
      # numba no object mode can only handle basic printing
      # note that progress bar adds a performance penalty
      if not suppress_progress:
          progress_bar(idx, n, steps)

Not very pretty but at least it gives a sense of progress! 🤣

@RemyLau
Copy link
Contributor

RemyLau commented May 19, 2021

Thanks for the suggestion @songololo! I've just pushed some changes (see commit:9c6ca6f
) to implement the suggested progress monitoring, here's an example of the output with --verbose option set to enable progress monitoring:

$ pecanpy --input karate.edg --output karate.emd --dimensions 8 --workers 4 --verbose

Took 00:00:00.00 to load graph
Took 00:00:00.00 to pre-compute transition probabilities
Thread #  3 progress: |###                      | 9.41 %
Thread #  0 progress: |###                      | 9.41 %
Thread #  1 progress: |###                      | 9.41 %
Thread #  2 progress: |###                      | 9.41 %
Thread #  3 progress: |#####                    | 18.82 %
Thread #  1 progress: |#####                    | 18.82 %
Thread #  2 progress: |#####                    | 18.82 %
Thread #  0 progress: |#####                    | 18.82 %
Thread #  3 progress: |########                 | 28.23 %
Thread #  1 progress: |########                 | 28.23 %
Thread #  2 progress: |########                 | 28.23 %
Thread #  0 progress: |########                 | 28.23 %
Thread #  3 progress: |##########               | 37.64 %
Thread #  1 progress: |##########               | 37.64 %
Thread #  2 progress: |##########               | 37.64 %
Thread #  3 progress: |############             | 47.05 %
Thread #  1 progress: |############             | 47.05 %
Thread #  0 progress: |##########               | 37.64 %
Thread #  2 progress: |############             | 47.05 %
Thread #  3 progress: |###############          | 56.47 %
Thread #  1 progress: |###############          | 56.47 %
Thread #  2 progress: |###############          | 56.47 %
Thread #  0 progress: |############             | 47.05 %
Thread #  3 progress: |#################        | 65.88 %
Thread #  1 progress: |#################        | 65.88 %
Thread #  2 progress: |#################        | 65.88 %
Thread #  3 progress: |###################      | 75.29 %
Thread #  1 progress: |###################      | 75.29 %
Thread #  0 progress: |###############          | 56.47 %
Thread #  2 progress: |###################      | 75.29 %
Thread #  1 progress: |######################   | 84.7 %
Thread #  3 progress: |######################   | 84.7 %
Thread #  0 progress: |#################        | 65.88 %
Thread #  2 progress: |######################   | 84.7 %
Thread #  1 progress: |######################## | 94.11 %
Thread #  3 progress: |######################## | 94.11 %
Thread #  2 progress: |######################## | 94.11 %
Thread #  0 progress: |###################      | 75.29 %
Thread #  0 progress: |######################   | 84.7 %
Thread #  0 progress: |######################## | 94.11 %
Took 00:00:05.70 to generate walks
Took 00:00:00.03 to train embeddings

I guess the couple other things that I'll do right after is to add progress bar to the preprocessing phase for PreComp mode. I'll also try to see if using the latest Numba release would allow me to use objmode to enable flushing to make the printing prettier.

@songololo
Copy link

Interesting approach! Thanks for sharing.

@RemyLau
Copy link
Contributor

RemyLau commented Nov 25, 2021

@cthoyt cthoyt changed the title Add progress monitoring [Peer Review] Add progress monitoring Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants