Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush buffered errors after each file is processed #1294

Closed
gvanrossum opened this issue Mar 16, 2016 · 15 comments
Closed

Flush buffered errors after each file is processed #1294

gvanrossum opened this issue Mar 16, 2016 · 15 comments

Comments

@gvanrossum
Copy link
Member

Currently we only flush errors at the very end of a run. It might be nicer if we flushed errors after each file is fully processed (i.e. once the type checking pass has finished). We buffer errors so we can sort them and de-dupe them, but there shouldn't be any new errors for a file once it's been processed.

@gvanrossum gvanrossum modified the milestone: 0.3.3 Mar 17, 2016
@rawrgulmuffins
Copy link

I'm going to try to take on this bug. I'll update my progress once I've figured out what I think is a good solution and then I'll come back with a reference to my PR once it's ready.

@rawrgulmuffins
Copy link

rawrgulmuffins commented Jun 3, 2016

wasn't sure if this was about unit tests or the error reporting at first. So I asked @gmprice for some help on this issue.

So it looks like since there's now this concept of strongly connected components there's not this concept of processing one file at a time. So instead of trying to write errors after the end of a file we print after the end of a strongly connected component.

Changes I intend to make

I'm going to change main.py such that when it calls build.build it has an output_callback function. This function will be called in build.py in process_graph which is used to process strongly connected components.

At the end of processing strongly connected components there shouldn't be any new errors. As such we should be able to output all of the error messages in a deduped manner.

Test Plan

I need to ensure that if I have the scenario of File A imports File B then we will have the result of

"Start of File B Processing"
"File B output"
"Start of File A Processing"
"File A output"

In order to achieve this I'm going to mock the call to process_graph (which is called by dispatch which is in turn called by build.build such that it prints the file it's processing as it's first action. Then I'll test to make sure we get error ouptut from File B in the manner we expect, the start of File A, then File A's output.

Refactor

It's also been requested that since I'm changing main.py that I also rename the one letter variables (a, e, f) and remove the type_check_only function since it only has one line that calls build.build.

@gvanrossum
Copy link
Member Author

gvanrossum commented Jun 3, 2016 via email

@rawrgulmuffins
Copy link

@gvanrossum Cool, thank you. I'll look for what line is the end of an SCC. I'm working going to work on the unit test after lunch.

@gvanrossum
Copy link
Member Author

gvanrossum commented Jun 3, 2016 via email

@rawrgulmuffins
Copy link

So I've been struggling with this problem all day. So there's this problem where the callback way of doing things kind of breaks the way that unit tests are being done.

testcheck.py:            res = build.build(target=build.TYPE_CHECK,
testsemanal.py:        result = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:        res = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:            result = build.build(target=build.SEMANTIC_ANALYSIS,
testsemanal.py:            result = build.build(target=build.SEMANTIC_ANALYSIS,
testtransform.py:        result = build.build(target=build.SEMANTIC_ANALYSIS,
testtypegen.py:            result = build.build(target=build.TYPE_CHECK,

There's a bunch of tests that rely on the build.build function returning a result. But the problem is that if I want to stream results back I need to remove he errors from the internal manager that's used. This means that the returned result won't have any errors in it and none of the current tests will work.

So this means I can keep the current call graph for the tests (everything just works like it currently does) and have a different call path for printing results.

So currently I've changed the test suit to pass callback functions of lambda: None and I haven't changed the basic return path. But there's a part of me that wonders if this should actually yield results from inside the process_graph for loop instead.

@rawrgulmuffins
Copy link

The hardest part of this has just been figuring out how I'm going to actually test this.

@gvanrossum
Copy link
Member Author

gvanrossum commented Jun 4, 2016 via email

@rawrgulmuffins
Copy link

rawrgulmuffins commented Jun 4, 2016

Here's my skeleton. This doesn't have the changes to main but the build.build call graph down to process_graph.

All of the unit tests pass and there's no fundamental change to the code base other than adding the possibility of running the callback function at the end of the SCC loop.

Quick note I had to move build manager in order to get the type checking to pass which is why the line count is so long.

@gnprice
Copy link
Collaborator

gnprice commented Jun 4, 2016

@rawrgulmuffins and I chatted for a bit about this. The tricky bit to test is the end-to-end fact that we manage to spit out the errors for file B before doing the work to type-check A, if A -> B is the import graph. To do that, he's been looking for a way to log something in or next to the type-checking work; usually he'd use the logging module, adding a debug-level log line and capturing that in the test.

We don't currently use logging (though likely it'd be useful for us to do so!), but happily there is the DUMP_INFER_STATS flag, which causes us to spit out some statistics right after type-checking a given file. So I suggested setting that for the test and looking for that line.

@gvanrossum
Copy link
Member Author

Maybe you can just use a custom output_callback that records something the
unit test can check for? You may have to write a unit test for this in
Python rather than just use the check*.test files.

On Fri, Jun 3, 2016 at 6:58 PM, Greg Price notifications@github.com wrote:

@rawrgulmuffins https://github.com/rawrgulmuffins and I chatted for a
bit about this. The tricky bit to test is the end-to-end fact that we
manage to spit out the errors for file B before doing the work to
type-check A, if A -> B is the import graph. To do that, he's been
looking for a way to log something in or next to the type-checking work;
usually he'd use the logging module, adding a debug-level log line and
capturing that in the test.

We don't currently use logging (though likely it'd be useful for us to do
so!), but happily there is the DUMP_INFER_STATS flag, which causes us to
spit out some statistics right after type-checking a given file. So I
suggested setting that for the test and looking for that line.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1294 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACwrMsSIr4kKlWhNyoDbbtNB1-q-vRsrks5qINvcgaJpZM4HyP4I
.

--Guido van Rossum (python.org/~guido)

@rawrgulmuffins
Copy link

rawrgulmuffins commented Jun 4, 2016

As an update I have a failing unit test. I've set things up such that i'm temporarily setting manager.flags to include DUMP_INFER_STATS and I'm using that as the sentinel value. I need to head for my train now but I think that I have a pretty straight path to finishing this. Potentially done tomorrow or early this week.

I haven't made a PR for it but all of my code is on the branch that I referenced earlier.

@gvanrossum
Copy link
Member Author

gvanrossum commented Jun 4, 2016 via email

@gvanrossum gvanrossum modified the milestones: Future, 0.4.x Jul 7, 2016
@gvanrossum gvanrossum removed this from the Future milestone Mar 29, 2017
@gvanrossum
Copy link
Member Author

So this is still open and rawrgulmuffins' efforts have come to naught. It's a simple win though for large codebases like we have at Dropbox, where it's often frustrating to have to wait several extra minutes for the rest of the codebase to be processed when a simple (non-blocking) error is detected.

In an abundance of caution it may be best to flush errors only after each SCC has been processed. For codebases with no or few import cycles this comes down to the same thing (each file will be in its own SCC, or perhaps there may be a few SCCs containing a handful of files), and due to the way SCCs are processed (see process_stale_scc()) we won't know for sure we've generated all errors until we're completely done processing all files.

@gvanrossum
Copy link
Member Author

Note that if we do this, generate_unused_ignore_notes() also would have to be done on a per-SCC basis.

msullivan added a commit that referenced this issue Dec 20, 2017
In order to avoid duplicate error messages for errors produced in both
load_graph() and process_graph() and to prevent misordered error
messages in a number of places, lists of error messages are now
tracked per-file.

These lists are collected and printed out when a file is complete.  To
maintain consistency with clients that use .messages() (namely,
tests), messages are generated file-at-a-time even when not printing
them out incrementally.

Fixes #1294
msullivan added a commit that referenced this issue Jan 4, 2018
In order to avoid duplicate error messages for errors produced in both
load_graph() and process_graph() and to prevent misordered error
messages in a number of places, lists of error messages are now
tracked per-file.

These lists are collected and printed out when a file is complete.  To
maintain consistency with clients that use .messages() (namely,
tests), messages are generated file-at-a-time even when not printing
them out incrementally.

Fixes #1294
gvanrossum pushed a commit that referenced this issue Jan 9, 2018
In order to avoid duplicate error messages for errors produced in both
load_graph() and process_graph() and to prevent misordered error
messages in a number of places, lists of error messages are now
tracked per-file.

These lists are collected and printed out when a file is complete.  To
maintain consistency with clients that use .messages() (namely,
tests), messages are generated file-at-a-time even when not printing
them out incrementally.

Fixes #1294
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants