Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for progress bars #66

Merged
merged 7 commits into from
Mar 20, 2023

Conversation

mehtamohit013
Copy link
Contributor

@mehtamohit013 mehtamohit013 commented Mar 17, 2023

Describe your changes

Added support for flushing out stdout and stderr

Issue ticket number and link

Closes #61 #1080

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have added thorough tests (when necessary).
  • I have added the right documentation in the docstring and changelog (when needed)

📚 Documentation preview 📚: https://ploomber-engine--66.org.readthedocs.build/en/66/

@mehtamohit013 mehtamohit013 changed the title Dev/papermill diff Added support for progress bars Mar 17, 2023
@mehtamohit013
Copy link
Contributor Author

mehtamohit013 commented Mar 17, 2023

Code used to run:

print("Hello World")
for i in range(5):
    print(i)
from tqdm import tqdm
import time

my_list = list(range(100))

with tqdm(total=len(my_list)) as pbar:
    for x in my_list:
       time.sleep(0.1)
       pbar.update(1)
       if x%20==0:
           print(x)
print("Hello World")
print('Hello World 2')

Output
Screenshot_2023-03-17_11-27-26

Issues:

  • \r character when tqdm end
  • Whenever a something is written to stdout when tqdm is running, there is no \n as sys.stderr.write() doesn't include \n and if I include one, progress bar will not be updated in place

Any suggestions on how to resolve this issue?
Maybe we can add some type of cell tag when tqdm is running and resolve it separately from stderr one

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023

Any suggestions on how to resolve this issue?

Why don't you just output it as is?
You can always parse and wait for those characters but I assume they've implemented a mechanism for that already in tqdm?
Also let's keep the discussion in one place (probably here is better).

@edublancas
Copy link
Contributor

I think for this PR, let's just make the output of ploomber-engine be as close as possible to papermill.

Based on the discussion, it sounds like we just need to add 1) cell separators 2) Displaying standard error 2) flushing standard output/error

Once that's in, we can tackle updating the progress bar inline.

@mehtamohit013
Copy link
Contributor Author

Based on the discussion, it sounds like we just need to add 1) cell separators 2) Displaying standard error 2) flushing standard output/error

  1. What do you mean by cell seperators?
  2. stderr is currently displayed
  3. Flushing messes up the output. So, I have not used flushing

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023

@mehtamohit013 check if you can ask for review now? I've updated the permissions

@edublancas
Copy link
Contributor

cell separators: https://user-images.githubusercontent.com/43580047/225945963-93552b85-83ef-48ae-9993-fd9acf5ba1b6.png

Executing cell X
Ending cell X

ok so if stderr is displayed and we cannot flush output at the moment, I think what's missing is to fix the CI, right?

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023

  1. Flushing messes up the output. So, I have not used flushing

So how are we going to show the output as it comes? I think that's essentially the main difference we have to solve it.

@mehtamohit013
Copy link
Contributor Author

mehtamohit013 commented Mar 18, 2023

Why don't you just output it as is? You can always parse and wait for those characters but I assume they've implemented a mechanism for that already in tqdm?

The problem lies with the fact that I cannot make a parser just only for tqdm as there may be other messages pushed to stderr. Currently, I am just simply redirecting the notebook stderr and stdout to the original stderr and stdout

ok so if stderr is displayed and we cannot flush output at the moment, I think what's missing is to fix the CI, right?

No, tqdm messes up with the output. You can see in the image attached above
Code used: self.default.write(s) where self.default is the original stdout and stderr, before it is switched with IO() class

So how are we going to show the output as it comes? I think that's essentially the main difference we have to solve it.

I think it is because of yield. I will look into flushing more

def patch_sys_std_out_err(display_output):
    """Path sys.{stout, sterr} to capture output"""
    # keep a reference to the system ones
    stdout, stderr = sys.stdout, sys.stderr

    # patch them
    stdout_stream = IO(default = stdout,std_type='out', display = display_output)
    stderr_stream = IO(default = stderr, std_type='err')
    sys.stdout, sys.stderr = stdout_stream, stderr_stream

    try:
        yield stdout_stream, stderr_stream
    finally:
        # revert
        sys.stdout, sys.stderr = stdout, stderr
class IO(StringIO):
    def __init__(self, default,std_type,display = True, newline="\n", initial_value="" ):
        super().__init__(initial_value=initial_value, newline=newline)
        self.default = default
        self.std_type = std_type
        self.display = display
        self._values = []

    def write(self, s):
        self._values.append(s)
        if self.display:
            self.default.write(s)
        super().write(s)

    def get_separated_values(self):
        return self._values[:]

@edublancas
Copy link
Contributor

ok, so then the only "easy" thing we can do at the moment are the cell separators right? I'd suggest opening a PR for that, we merge, make a release. and then we investigate this flushing issue

@mehtamohit013
Copy link
Contributor Author

ok, so then the only "easy" thing we can do at the moment are the cell separators right? I'd suggest opening a PR for that, we merge, make a release. and then we investigate this flushing issue

Yeah sure

Btw this is the output with flushing (Observe the '0', which messes with ploomber progress bar, because ploomber progress bar updates after cell is executed)
Screenshot_2023-03-18_11-09-21

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023

I think it is because of yield. I will look into flushing more

Yeah I think you're right, I also assume it works fine: tqdm + papermill?

@mehtamohit013
Copy link
Contributor Author

mehtamohit013 commented Mar 18, 2023

Yeah I think you're right, I also assume it works fine: tqdm + papermill?

No, it doesn't 😅

Ours is messy but tqdm progress bar is updated inplace and we are registering \r also.
The robust solution is to make a custom wrapper with tqdm

Below is the output with papermill-2.4.0:
Screenshot_2023-03-18_11-19-45

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023 via email

@mehtamohit013
Copy link
Contributor Author

@edublancas @idomic
If you want a quick fix, how about this: Just some extra \n but the progress bar is working
Screenshot_2023-03-18_11-34-42

@idomic
Copy link
Contributor

idomic commented Mar 18, 2023 via email

@mehtamohit013
Copy link
Contributor Author

@edublancas @idomic If you want a quick fix, how about this: Just some extra \n but the progress bar is working Screenshot_2023-03-18_11-34-42

Implemented this one

@mehtamohit013 mehtamohit013 marked this pull request as ready for review March 18, 2023 19:54
Copy link
Contributor

@idomic idomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good besides the leading \n .

src/ploomber_engine/ipython.py Show resolved Hide resolved
@idomic
Copy link
Contributor

idomic commented Mar 20, 2023

It doesn't closes ploomber/ploomber#1080 since we're still using papermill there.

@idomic idomic merged commit f9fcf2f into ploomber:main Mar 20, 2023
@mehtamohit013 mehtamohit013 deleted the dev/papermill_diff branch April 5, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

engine and papermill differences
3 participants