Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still No Output From Pylint #2586

Closed
tbpassin opened this issue Apr 6, 2022 · 63 comments
Closed

Still No Output From Pylint #2586

tbpassin opened this issue Apr 6, 2022 · 63 comments
Assignees
Labels
Milestone

Comments

@tbpassin
Copy link
Contributor

tbpassin commented Apr 6, 2022

See PR #2589 and PR #2590.

With the new PR, the BPM waits (via proc.communicate) for all output if BPM.wait is True.

@tbpassin I expect more work is needed, but I wanted to merge PR #2590 right away. Please test devel with wait set to both True and False.

@tbpassin tbpassin added the Bug label Apr 6, 2022
@edreamleo edreamleo self-assigned this Apr 6, 2022
@edreamleo edreamleo added this to the 6.6.1 milestone Apr 6, 2022
@edreamleo
Copy link
Member

@tbpassin Do you ever get output from pylint?

For example, do you get any pylint output from a small test file that should generate just a few pylint warnings?

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 6, 2022

Yesterday, I think it was, I did get output on a tiny @file, just a few lines long. I immediately ran it on VR3 and got no output.

Today, I got no output on a file of about 10 or 15 lines, and got no output. I also got no output on Freewin.

Leo does not block when I run pylint, and I do get the command name echoed to the console, but not to the log pane. I never see a message that pylint is finished.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

The problem is definitely in the background process manager. It's getting to the step at the bottom of the start_process node where it invokes open_process(), which is deffed this way::

    def open_process():
        return subprocess.Popen(
            command,
            shell=shell,
            stderr=subprocess.PIPE,
            stdout=subprocess.PIPE,
            universal_newlines=True,
        )

For this call, shell == False. But according to
How to use subprocess command with pipes:

"To use a pipe with the subprocess module, you have to pass shell=True.
However, this isn't really advisable for various reasons, not least of which is security".

I haven't checked the Python docs at this point.

@edreamleo
Copy link
Member

@tbpassin Looks like a breakthrough, Thomas.

Imo, there are no serious security implications to setting shell = True in the BPM. If someone has access enough to call LeoQtLog.put then they can do anything.

@edreamleo
Copy link
Member

edreamleo commented Apr 8, 2022

More on security. From the subprocess discussion of security considerations:

QQQ
If the shell is invoked explicitly, via shell=True, it is the application’s responsibility to ensure that all whitespace and metacharacters are quoted appropriately to avoid shell injection vulnerabilities. On some platforms, it is possible to use shlex.quote() for this escaping.
QQQ

Note that shlex.quote is only designed to work on Linux.

Imo, the security risks are small: we do not expect tools such as pylint, mypy, etc. to create error messages that contain shell injection attacks!

edreamleo added a commit that referenced this issue Apr 8, 2022
@edreamleo
Copy link
Member

Arrrgh. I have managed to completely screw everything. My apologies to all.

@edreamleo edreamleo reopened this Apr 8, 2022
@edreamleo
Copy link
Member

edreamleo commented Apr 8, 2022

PR #2590 sets the wait switch to True. This should ensure that all output is captured, at the cost of hanging Leo while pylint runs.

@tbpassin Let's test "wait mode" to see if it works. If so, we can then see if we can solve the hanging problem.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

I agree about the security question. If attackers could mess with those things, there are more serious and direct bad things they could do without needing to mess wiht Leo.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

What branch is that on?

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

I don't see where the output is ever supposed to get out of the pipe and back into Leo to be formatted and displayed. It should happen here in pylint.run:

    for fn, p in data:
        self.run_pylint(fn, p)
    # #1808: return the last data file.
    return data[-1] if data else False

It seems to me that either the call that runs pylint should be an async/await kind of operation, or if that's not possible it should be done on a separate thread. If not, how is control ever going to come back to the right place for further processing of pylint's messages? Does the POpen process stick the stdout pipe's output into data[-1]?

Confirming the notion that there's no way for the results to come back at the right time and place is that I forced shell = True and still got nothing except a message that pylint had finished.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

This S.O. thread may be very relevant:

Non Blocking Subprocess,call

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

Another useful discussion: PyQt: How to open a program with python without freezing

edreamleo added a commit that referenced this issue Apr 8, 2022
@edreamleo
Copy link
Member

@tbpassin Please reopen as necessary.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

With the latest changeset, I do not get any output from pylint unless I set self.wait = True in bpm.start_process(). So this changeset will not work for anyone running pylint who does not want to go in and change the value of self.wait.

@tbpassin tbpassin reopened this Apr 8, 2022
@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

Why wouldn't this be the Leo-esque way to do it?

pseudo-code:

proc = Popen.subprocess(...)
set idle time hook(callback)

def callback:
    check proc returncode
    if returncode:
        cancel hook
        proc.kill()
        outs, errs = proc.communicate()
        compute_msg_string()
        log.put(msg_string)

I don't know the mechanics of setting the idle time hook on a callback, and I know you know them very well. Otherwise I'd have tried it before posting this message.

@edreamleo
Copy link
Member

@tbpassin Thanks, Thomas, for your interest and work on this issue. Everything works for me, so it's hard to say what is going on. I'm going to ask for more testers in devel.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 8, 2022

I just fetched the latest changeset in devel, and still I get no blocking but not pylint output.

Leo 6.6.1-devel, devel branch, build 4bdce3ef48
2022-04-08 10:05:24 -0500
Python 3.9.9, PyQt version 6.2.4
Windows 10 AMD64 (build 10.0.19043) 

I see that wait get set to False in bpm.init(). So it's a class variable, not even an instance variable. Unless bpm is a singleton class, isn't that asking for trouble (not that it's an issue for the pylint problem)?

On my system, bpm.wait = false doesn't give output, True does, with blocking.

@edreamleo
Copy link
Member

@tbpassin The BPM is a singleton, so I don't think there is a problem.

@edreamleo
Copy link
Member

I think I agree with most of what you are saying. I have a plan. We'll see how it works.

@edreamleo
Copy link
Member

@tbpassin Please see PR #2592. The new code is dead simple. The Aha, we can call proc.communicate after the process has finished. This works for me. There should be no danger of hanging Leo.

The question is, can the process ever block? I'm not sure why it would, but we shall see.

Note: I really do not want add a new user setting telling whether to call proc.communicate immediately, which would hang Leo.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 9, 2022

This can't work. The pipe will still fill up and stall the pylint process. It was one of the first things I tried yesterday, actually. I've just verified this running your new devel branch on vr3 [LATER:] and now the "hangs" branch too. I never get the output, and the new pylint process stalls and hangs around using no cpu, as shown by the Windows task manager. As soon as I killed the pylint process with task manager, (some of) the output appeared in the log pane. What appeared must have been how much was in the pipe. A lot was missing.

I think you need a test case that will produce lots of output. Can you change your pylint rc file to check for I1101, "c-extension-no-member"? That will produce enough output from vr3 to let you test things yourself.

@edreamleo
Copy link
Member

As I said in the discussion group, I now have a proper test. Imo, there is no simple solution to blocked pipes, except to call proc.communicate once, just after creating the separate process.

So I have given up, for now, and set wait=True. This appears to work.

Aha: there is an entirely new approach: just run pylint (or mypy) outside of Leo, and then find some way to copy the output to the log pane so that the links become clickable! Clearly, this can be made to work!

@tbpassin
Copy link
Contributor Author

Yes, I thought of that too, and it does seem workable. I was saving it in case we came to another solution. It should probably be run using POpen or it will probably end up blocking too.

@tbpassin
Copy link
Contributor Author

OK, I have a working, non-blocking threaded solution. The POpen process is checked in a separate thread. When this thread is done, and the pylint output collected, it stores the collected lines in a bpm variable. On the timer thread, the called task checks the variable to see if it is not []. If it has contents, then it puts those contents to the log pane. Writes to and reads from the variable are protected by a lock so they are guaranteed to be atomic.

Here is the thread procedure:

def thrd_pipe_proc(self):
    """The threaded procedure to handle the Popen pipe.

    When the process has finished, its output data is
    collected and placed into bpm.process_return_data.
    A timer callback in bpm checks to see if this data has 
    arrived and if so, uses it.
    """
    lines = []
    while self.pid and not self.pid.poll():
        # Prevent buffer from filling, which would cause blocking
        lines.extend(self.pid.stdout.readlines(16))
        print('===', len(lines))  # Only for experimenting, remove later

    if self.pid:
        lines.extend(self.pid.stdout)
    while self.lock.locked():
        sleep(0.03)
    self.lock.acquire()
    self.process_return_data = lines
    self.lock.release()

Here is the timer callback:

def on_idle(self):
    """The idle-time callback for leo.commands.checkerCommands."""
    try:
        g.app.gui.qtApp.processEvents()  # #2528.
    except Exception:
        pass

    if self.process_return_data:
        # Wait for data to be fully written
        while self.lock.locked():
            sleep(0.03)
        for s in self.process_return_data:
            self.data.number_of_lines += 1
            self.put_log(s)

        self.end()  # End this process.
        self.start_next()  # Start the next process.

Sleeps in the thread cause blocking in the main GUI, so you want to keep any sleeps very short.

There's some setup required, of course. Here's part of it:

def start_process(self, c, command, kind,
    fn=None,
    link_pattern=None,  # None, string, or re.pattern.
    link_root=None,
):
    """
      def open_process():
        proc = subprocess.Popen(
            command,
            shell=False,  # #2586
            stderr=subprocess.PIPE,
            stdout=subprocess.PIPE,
            universal_newlines=True,
        )
        return proc

    def start_timer():  # #2528 & #2557.
        if self.wait:
            return
        if not self.timer.isActive():
            self.timer.start(100)
        thread.start_new_thread(self.thrd_pipe_proc, ())

    self.process_return_data = []
    # ... Carry on with existing code

There is some cleanup needed, and some code review to make sure I didn't drop out some functionality by mistake. And some more testing by me, of course.
Probably tomorrow I'll be able to put in a PR so others can try it out.

@edreamleo
Copy link
Member

@tbpassin When I said "just run pylint (or mypy) outside of Leo" I was thinking "run pylint (or mypy) as usual, from a .cmd file. These never block.

@edreamleo
Copy link
Member

@tbpassin Congratulations. You may have rescued the BPM. I'll look forward to your PR.

The acid test will be the responsiveness of Leo. If it's acceptable, I'll accept the PR. Otherwise, I'll deprecate the BPM.

I have just created #2593, which suggests that copying text into Leo's log window will automagically create clickable links. Devs could run (using scripts) pylint or mypy in separate consoles, with zero impact on Leo. If errors appear, devs would paste the links into Leo's log window. Leo would never wait!

Note: While considering whether to deprecate the BPM, searches revealed several functions that can (should!) be removed. So all this work is having knock-on benefits.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 10, 2022

Was this the intended work flow for the bpm?

  1. User starts pylint on one file;
  2. While that task is still running, user starts pylint on a second file;
  3. repeat for more files.

I'm not sure how helpful that would be, since the results for a previous run would scroll up out of sight when the later ones come in. In my experience, one needs to spend time - maybe a lot of time - checking out pylint results before moving on, and it's not very workable to try to scroll back to the right place in the log pane output to try to do that.

If the pylint command were blocking, at least that would remove that potential issue. OTOH, I always dislike blocking if it can be avoided. But if the command does not block, then the user might overlap pylint commands and that has to be provided for somehow.

@edreamleo
Copy link
Member

@tbpassin Yes, that was one intended workflow. I might use this when not expecting any complaints from pylint. The queue-related code in the BPM is intended to honor various delayed requests. But your innovations are paramount.

@tbpassin
Copy link
Contributor Author

I've implemented the changes and they work like a charm. I'd like you to test the branch on my fork, and also do some code review. This changeset has some diagnostic print statements which I'd remove before a PR.

I am able to run pylint on every file in the Qt-only plugins tree and continue editing and navigating in Leo while they are all getting checked (with lots of pylint messages), and Leo feels normal while I do it.

I found one anomaly that I took care of. If pylint finished with no warning messages, the checking loop never terminated. The current code monitors for this condition and breaks out of the loop if necessary.

I have tried to make minimal changes to the bpm code, but I did remove those lines in bpm.end() that duplicated the output to the log pane. I also suppressed printing the pylint output to the console. I really dislike that: it clutters up the console, it's nearly unreadable and very hard to do anything with, and it duplicates what displays in the log pane. I left the commented line of code in to make it easy to restore if you just can't live without it.

My fork is at tbp's github fork. The branch is tbp-pylint.

Here are the tests I've been doing.

  1. Run pylint against vr3.
  2. Run pylint against plugins/colorize_headlines.py (this is the file that produced no warning messages and stalled earlier versions of the code).
  3. Start pylint on vr3, and while it is running, start it (from the minibuffer) on vr.
  4. Select the Qt only plugins top node, and run pylint. All files in the tree should get pylinted successfully.

Once you agree, I'll remove the diagnostic print statements and issue a PR.

@tbpassin
Copy link
Contributor Author

You may want to check my implementation with mypy. I have only ever checked it with pylint.

@tbpassin
Copy link
Contributor Author

Turns out that I see stalled loops sometimes for files that do have pylint warnings. I don't really know why that is. But the loop breakout mechanism handles those cases perfectly, so I don't feel inclined to go digging into it any further.

@tbpassin
Copy link
Contributor Author

I considered and discarded two design choices for the threaded process checker:

  1. Having the threaded process start the actual POpen process;
  2. Having the threaded process put the results to the log pane.

I rejected those ideas because I thought the thread should do only one thing, which is to check for completion. I knew how to pass the results back to the bpm, so there was no need to have the threaded process do the output. And anyway, the bpm is responsible for just what to do with the output, it's not the business of the process checker thread.

@tbpassin
Copy link
Contributor Author

Ha! I used if not self.pid.poll(): but I should have used if self.pid.poll() is not None:. Making this change cleared up all those cases where we seemed to be missing the poll() change. I'll leave the protective loop-breakout code in place as a backup, but it doesn't seem to ever get hit any more.

@tbpassin
Copy link
Contributor Author

Oops, that should have been if self.pid.poll() is None!. This way we get the previous behavior with the pylint output coming in during the loop. Aparently with the test inadvertently reversed the loop was skipped but we still waited for the task to be finished in the line

lines.extend(self.pid.stdout)

This, however, didn't stall with a full pipe. Why not, I don't know since it did before we split this code out into the separate thread.

Well, anyway, it's working as intended.

@tbpassin
Copy link
Contributor Author

tbpassin commented Apr 10, 2022

Doh! This is the perfect place to use communicate(). It will block the thread but that's why we created the thread in the first place. So the whole thread function becomes only this:

def thrd_pipe_proc(self):
    """The threaded procedure to handle the Popen pipe.

    When the process has finished, its output data is
    collected and placed into bpm.process_return_data.
    An on_idle task in bpm checks to see if this data has 
    arrived and if so, uses it.
    """
    result, err = self.pid.communicate()
    while self.lock.locked():
        sleep(0.03)
    self.lock.acquire()
    self.process_return_data = result.split('\n')
    self.lock.release()

It works perfectly.

Please try this version in my github fork. The changeset is

5cd60c36a062a51301fc955a3ee586c849ff2b5c

There is still some blocking, but it only occurs while the pylint output is actually being put to the log file. I'd say we can live with that!

@edreamleo
Copy link
Member

@tbpassin Looks like great work. I'll test it myself today.

@edreamleo
Copy link
Member

edreamleo commented Apr 10, 2022

@tbpassin Just tested the code. It seems to work. Please do create a PR.

A cleanup. The ekr-hang5 branch (it will never be used!) flattens check_processed as follows:

check_count = 0

def check_process(self):
    """Check the running process, and switch if necessary."""
    self.check_count += 1
    # g.trace(self.check_count)
    if self.pid and self.pid.poll() is None:
        return
    if self.pid:
        self.end()  # End this process.
    self.start_next()  # Start the next process.

I would like you to use this as a "template". The advantages:

  • Simpler.
  • No need to guard the call to self.start_next(). There is a guard inside start_next.

I'm guessing that the result of applying this template to the PR's code would be:

check_count = 0

def check_process(self):
    """Check the running process, and switch if necessary."""
    self.check_count += 1
    # g.trace(self.check_count)
    if self.pid and self.pid.poll() is None:
        return
    if self.pid:
        for s in self.pid.stdout:
            self.put_log(s)
        self.end()  # End this process.
    self.start_next()  # Start the next process.

Another cleanup. I have deleted data.number_of_lines everywhere.

Finally, don't worry about mypy. I can always tweak that after merging the PR.

@tbpassin
Copy link
Contributor Author

I actually don't use check_process at all and will remove it for the PR. The threaded equivalent is now so simple that I don't think anything can be taken away. But I'll look and see what I can apply elsewhere.

@edreamleo
Copy link
Member

@tbpassin Even better!

@edreamleo
Copy link
Member

Closed via PR #2595.

@edreamleo
Copy link
Member

edreamleo commented Oct 11, 2022 via email

@tbpassin
Copy link
Contributor Author

Wow, I'm really cold on this one so it will take me some time to get my mind around it all. I'm puzzled because what you are describing doesn't seem like what I (vaguely) remember. I thought we got rid of check_process, and in fact I don't see it in the code at all (I haven't updated from the devel branch for a week or so).

At a quick look, the only things I see to change (from my original version) would be the wait loops before lock.acquire(). I have since learned that lock.acquire() will wait until the lock is available, so those loops aren't needed.

"repeated calls to communicate reduce Leo's responsiveness" - in the code we've been using since April, there is only one call to communicate() per background process. It blocks the background thread (not Leo itself) until the background process is finished, but that's as designed and is the reason it's on a separate thread. Since (as I remember it) the BPM won't launch the next process from its queue until the first one returns, this should never cause any problem. The only activity that gets repeated by Leo is an on-idle check to see if the results are back from the background process. And that's just a check to see if the list self.process_return_data is non-empty or not. This check would not incur a noticeable delay.

I get the impression that your remarks may apply to a version of the BPM from before my April PR. Would you mind checking that, please? Could you perhaps have gone back to your old BPM branch from April but never updated it to the current code?

What branch are your current changes on? I hope it's not devel : ).

@tbpassin
Copy link
Contributor Author

As I remember it, we had a problem with communicate() because it would not always return all the data generated by the subprocess. It was as if the pipe has a very limited capacity. I see from the Python docs:

"Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited."

Also from the docs:

"If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output.

The child process is not killed if the timeout expires, so in order to cleanup properly a well-behaved application should kill the child process and finish communication:

proc = subprocess.Popen(...)
try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

"

It seems to me that a change from my design - of calling communicate() just once on the process's thread, where it can block as long as it needs to - to some other that uses communicate() on the main thread repeatedly would entail a lot more complexity for no gain (and potentially reduce Leo's responsiveness).

@edreamleo
Copy link
Member

@tbpassin Hi Thomas. Thanks for taking a look. It would be nice to get this resolved.

@tbpassin
Copy link
Contributor Author

I don't know where to go from here since the code you describe seems totally different from the code that we've been using since the April PR. Could you attach an outline that causes problems with pylint for you, and I'll try it out myself?

If the only issue is duplicate results, we could always just de-dup them before writing them to the log pane. But there shouldn't be duplicate results if there is only one call to communicate(). And the April PR code calls it just once per subprocess launch.

In the work for the April PR, I learned that using a PIPE and calling communicate() to get its data could lose data since the pipe has a very limited data capacity. If it overflows before getting read, the missing data is gone forever. Calling communicate() before the process is done apparently also doesn't necessarily clear out everything that was there. Between these two behaviors, it's better to call it just once at the end. Since the call blocks until done (or until the timeout, but I don't understand the side effects of a timeout yet), I put it on its own thread.

I'm not going to update my clone of the devel branch until we get this straightened out, since the changeset I have still seems to contain the April code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants