Flush buffered errors after each file is processed #1294

gvanrossum · 2016-03-16T17:56:41Z

Currently we only flush errors at the very end of a run. It might be nicer if we flushed errors after each file is fully processed (i.e. once the type checking pass has finished). We buffer errors so we can sort them and de-dupe them, but there shouldn't be any new errors for a file once it's been processed.

rawrgulmuffins · 2016-06-03T18:01:50Z

I'm going to try to take on this bug. I'll update my progress once I've figured out what I think is a good solution and then I'll come back with a reference to my PR once it's ready.

rawrgulmuffins · 2016-06-03T19:45:26Z

wasn't sure if this was about unit tests or the error reporting at first. So I asked @gmprice for some help on this issue.

So it looks like since there's now this concept of strongly connected components there's not this concept of processing one file at a time. So instead of trying to write errors after the end of a file we print after the end of a strongly connected component.

Changes I intend to make

I'm going to change main.py such that when it calls build.build it has an output_callback function. This function will be called in build.py in process_graph which is used to process strongly connected components.

At the end of processing strongly connected components there shouldn't be any new errors. As such we should be able to output all of the error messages in a deduped manner.

Test Plan

I need to ensure that if I have the scenario of File A imports File B then we will have the result of

"Start of File B Processing"
"File B output"
"Start of File A Processing"
"File A output"

In order to achieve this I'm going to mock the call to process_graph (which is called by dispatch which is in turn called by build.build such that it prints the file it's processing as it's first action. Then I'll test to make sure we get error ouptut from File B in the manner we expect, the start of File A, then File A's output.

Refactor

It's also been requested that since I'm changing main.py that I also rename the one letter variables (a, e, f) and remove the type_check_only function since it only has one line that calls build.build.

gvanrossum · 2016-06-03T19:57:13Z

Note: there are many passes. You can only flush after the final pass is done with an SCC. There should be an easy place where to do that.

…

--Guido (mobile)

rawrgulmuffins · 2016-06-03T20:16:06Z

@gvanrossum Cool, thank you. I'll look for what line is the end of an SCC. I'm working going to work on the unit test after lunch.

gvanrossum · 2016-06-03T20:25:47Z

The right place should be at the very end of process_graph(), inside the for-loop, right after it's called either process_fresh_scc() or process_stale_scc(). One more concern I have (which @JukkaL can probably alleviate) is whether there are cases where we have A imports B (so B is processed before A) but somehow during the processing of A another error about B is reported.

rawrgulmuffins · 2016-06-04T00:35:33Z

So I've been struggling with this problem all day. So there's this problem where the callback way of doing things kind of breaks the way that unit tests are being done.

testcheck.py:            res = build.build(target=build.TYPE_CHECK,
testsemanal.py:        result = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:        res = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:            result = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:            result = build.build(target=build.SEMANTIC_ANALYSIS,
testtransform.py:        result = build.build(target=build.SEMANTIC_ANALYSIS,
testtypegen.py:            result = build.build(target=build.TYPE_CHECK,

There's a bunch of tests that rely on the build.build function returning a result. But the problem is that if I want to stream results back I need to remove he errors from the internal manager that's used. This means that the returned result won't have any errors in it and none of the current tests will work.

So this means I can keep the current call graph for the tests (everything just works like it currently does) and have a different call path for printing results.

So currently I've changed the test suit to pass callback functions of lambda: None and I haven't changed the basic return path. But there's a part of me that wonders if this should actually yield results from inside the process_graph for loop instead.

rawrgulmuffins · 2016-06-04T00:44:45Z

The hardest part of this has just been figuring out how I'm going to actually test this.

gvanrossum · 2016-06-04T00:59:55Z

I agree the tests probably could use a different API. Or a flag. Or something. Do you have a draft of the code you're thinking of using yet? Maybe you can push your branch without making it a PR yet?

rawrgulmuffins · 2016-06-04T01:22:25Z

Here's my skeleton. This doesn't have the changes to main but the build.build call graph down to process_graph.

All of the unit tests pass and there's no fundamental change to the code base other than adding the possibility of running the callback function at the end of the SCC loop.

Quick note I had to move build manager in order to get the type checking to pass which is why the line count is so long.

gnprice · 2016-06-04T01:58:49Z

@rawrgulmuffins and I chatted for a bit about this. The tricky bit to test is the end-to-end fact that we manage to spit out the errors for file B before doing the work to type-check A, if A -> B is the import graph. To do that, he's been looking for a way to log something in or next to the type-checking work; usually he'd use the logging module, adding a debug-level log line and capturing that in the test.

We don't currently use logging (though likely it'd be useful for us to do so!), but happily there is the DUMP_INFER_STATS flag, which causes us to spit out some statistics right after type-checking a given file. So I suggested setting that for the test and looking for that line.

gvanrossum · 2016-06-04T02:39:07Z

Maybe you can just use a custom output_callback that records something the
unit test can check for? You may have to write a unit test for this in
Python rather than just use the check*.test files.

On Fri, Jun 3, 2016 at 6:58 PM, Greg Price notifications@github.com wrote:

@rawrgulmuffins https://github.com/rawrgulmuffins and I chatted for a
bit about this. The tricky bit to test is the end-to-end fact that we
manage to spit out the errors for file B before doing the work to
type-check A, if A -> B is the import graph. To do that, he's been
looking for a way to log something in or next to the type-checking work;
usually he'd use the logging module, adding a debug-level log line and
capturing that in the test.

We don't currently use logging (though likely it'd be useful for us to do
so!), but happily there is the DUMP_INFER_STATS flag, which causes us to
spit out some statistics right after type-checking a given file. So I
suggested setting that for the test and looking for that line.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1294 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACwrMsSIr4kKlWhNyoDbbtNB1-q-vRsrks5qINvcgaJpZM4HyP4I
.

--Guido van Rossum (python.org/~guido)

rawrgulmuffins · 2016-06-04T23:22:09Z

As an update I have a failing unit test. I've set things up such that i'm temporarily setting manager.flags to include DUMP_INFER_STATS and I'm using that as the sentinel value. I need to head for my train now but I think that I have a pretty straight path to finishing this. Potentially done tomorrow or early this week.

I haven't made a PR for it but all of my code is on the branch that I referenced earlier.

gvanrossum · 2016-06-04T23:40:01Z

Great progress! Looking forward to the PR.

gvanrossum · 2017-11-27T19:59:46Z

So this is still open and rawrgulmuffins' efforts have come to naught. It's a simple win though for large codebases like we have at Dropbox, where it's often frustrating to have to wait several extra minutes for the rest of the codebase to be processed when a simple (non-blocking) error is detected.

In an abundance of caution it may be best to flush errors only after each SCC has been processed. For codebases with no or few import cycles this comes down to the same thing (each file will be in its own SCC, or perhaps there may be a few SCCs containing a handful of files), and due to the way SCCs are processed (see process_stale_scc()) we won't know for sure we've generated all errors until we're completely done processing all files.

gvanrossum · 2017-11-28T19:25:39Z

Note that if we do this, generate_unused_ignore_notes() also would have to be done on a per-SCC basis.

In order to avoid duplicate error messages for errors produced in both load_graph() and process_graph() and to prevent misordered error messages in a number of places, lists of error messages are now tracked per-file. These lists are collected and printed out when a file is complete. To maintain consistency with clients that use .messages() (namely, tests), messages are generated file-at-a-time even when not printing them out incrementally. Fixes #1294

gvanrossum added feature easy labels Mar 16, 2016

gvanrossum modified the milestone: 0.3.3 Mar 17, 2016

gvanrossum mentioned this issue Jun 11, 2016

Warn about unused '# type: ignore' comments #1695

Merged

gvanrossum modified the milestones: Future, 0.4.x Jul 7, 2016

gvanrossum added priority-2-low and removed difficulty/easy labels Jan 6, 2017

gvanrossum removed this from the Future milestone Mar 29, 2017

gvanrossum added priority-1-normal and removed priority-2-low labels Nov 27, 2017

gvanrossum mentioned this issue Nov 28, 2017

Invalid "unused 'type ignore' comment" messages #4288

Closed

msullivan mentioned this issue Dec 20, 2017

Flush error messages incrementally after processing a file #4396

Merged

gvanrossum closed this as completed in #4396 Jan 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush buffered errors after each file is processed #1294

Flush buffered errors after each file is processed #1294

gvanrossum commented Mar 16, 2016

rawrgulmuffins commented Jun 3, 2016

rawrgulmuffins commented Jun 3, 2016 •

edited

Loading

gvanrossum commented Jun 3, 2016 via email

rawrgulmuffins commented Jun 3, 2016

gvanrossum commented Jun 3, 2016 via email

rawrgulmuffins commented Jun 4, 2016

rawrgulmuffins commented Jun 4, 2016

gvanrossum commented Jun 4, 2016 via email

rawrgulmuffins commented Jun 4, 2016 •

edited

Loading

gnprice commented Jun 4, 2016

gvanrossum commented Jun 4, 2016

rawrgulmuffins commented Jun 4, 2016 •

edited

Loading

gvanrossum commented Jun 4, 2016 via email

gvanrossum commented Nov 27, 2017

gvanrossum commented Nov 28, 2017

Flush buffered errors after each file is processed #1294

Flush buffered errors after each file is processed #1294

Comments

gvanrossum commented Mar 16, 2016

rawrgulmuffins commented Jun 3, 2016

rawrgulmuffins commented Jun 3, 2016 • edited Loading

Changes I intend to make

Test Plan

Refactor

gvanrossum commented Jun 3, 2016 via email

rawrgulmuffins commented Jun 3, 2016

gvanrossum commented Jun 3, 2016 via email

rawrgulmuffins commented Jun 4, 2016

rawrgulmuffins commented Jun 4, 2016

gvanrossum commented Jun 4, 2016 via email

rawrgulmuffins commented Jun 4, 2016 • edited Loading

gnprice commented Jun 4, 2016

gvanrossum commented Jun 4, 2016

rawrgulmuffins commented Jun 4, 2016 • edited Loading

gvanrossum commented Jun 4, 2016 via email

gvanrossum commented Nov 27, 2017

gvanrossum commented Nov 28, 2017

rawrgulmuffins commented Jun 3, 2016 •

edited

Loading

rawrgulmuffins commented Jun 4, 2016 •

edited

Loading

rawrgulmuffins commented Jun 4, 2016 •

edited

Loading