Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fix alerts found with lgtm #9278

Merged
merged 10 commits into from Jul 5, 2017

Conversation

@jhelie
Copy link
Contributor

@jhelie jhelie commented Jul 4, 2017

This PR fixes some alerts flagged by lgtm.com code queries. Most are minor improvements suggestions (mostly fixing some misconstructed child classes) but a few some should fix unintentional code behaviour.

The standard lgtm engineering analytics highlights dependencies, vulnerabilities and tracks introduced and fixed alerts as the code changes (and shows that scikit-learn keeps getting better) but you can investigate further by writing your own custom queries to explore your code base.

I personally find lgtm alerts most useful when using PR integration as it allows to check and deal with potential issues before they find their way into master (see for instance 5147fd0).

Full disclosure: I frequently use scikit-learn but also work on lgtm.com, hence my interest! Hope that helps, any question please ask.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

Thanks. The string comparison by reference fix is the only one that appears to actually be a bug fix. In some cases we intentionally avoid super...__init__, but these look good to me.

Is there a way to mark LGTM alerts as "Won't fix" (e.g. the first)?

Your PR introduces flake8 errors.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

From a glance at the alert, it looks like a number of them are valid. What's the best way to get a permalink for an alert?

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

Apart from Flake8 errors (please fix if you can), this LGTM.

@jnothman jnothman changed the title Fix alerts found with lgtm [MRG+1] Fix alerts found with lgtm Jul 5, 2017
self.n_iter = n_iter
self.callback = callback
self.callback = callback

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jul 5, 2017
Member

It seems to me that there is trailing whitespace here.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Jul 5, 2017

All these changes are good. I am 👍 for merge, but the style issues (flake8) should be addressed.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

Still failing flake8, says Travis:

--------------------------------------------------------------------------------
./sklearn/decomposition/sparse_pca.py:265:33: W291 trailing whitespace
        self.callback = callback
                                ^
@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

@jnothman hopefully that was the last one but will update if not - replying to your other questions now!

@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

@jnothman permalink to alerts are obtained by clicking the (diagonal up) arrow at the end of the line highlighting the alert - while the question mark symbol next to it takes you to the code query.

e.g. for the reference vs value check the link is https://lgtm.com/projects/g/scikit-learn/scikit-learn/snapshot/1a981e87ba6f8d66ecfd6940184836446ded3f4c/files/benchmarks/bench_plot_randomized_svd.py#V185

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

yes, flake is happy.

@jnothman jnothman merged commit ac41281 into scikit-learn:master Jul 5, 2017
1 of 3 checks passed
1 of 3 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details
@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

But I should add that an alert does not need to have found its way into your code base to be referenced and discussed: if you turn on PR integration any introduced alerts will get a similar link that can be shared (obviously pushing new commits will re-trigger the analysis and that might fix the alert)

see for example: https://lgtm.com/projects/g/matplotlib/matplotlib/rev/pr-b7a573728fb8b7712d6aca3e99ef36d8f308c015 (this is an example of a fixed alert, but the idea is the same)

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

Thanks for this. I've posted another couple of LGTM-derived issues.

@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

@jnothman you're welcome. And regarding marking alerts as "won't fix": this is something we're actively working on, in the future you will be able to dismiss individual alerts, turn particular queries on/off and decide which parts of your code base should not be analysed (we automatically detect library, generated and test code but we might miss some).

We've recently come out of beta and we keep rolling out features, any feedback is welcome!

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

Ability to reject false positives would definitely be high on or top of the list. Other usability things I have noted:

  • the CI integration docs do not describe what circumstances lead to a red x.
  • when I view full source code for an alert, and then click back, all the alerts are collapsed again. It should be possible to hit back and see where I was up to in the listing.
  • should be able to filter alerts list by path.
@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

Thanks for this @jnothman! pinging @sjvs our product manager

@amueller
Copy link
Member

@amueller amueller commented Jul 5, 2017

do we want LGTM CI or do we wait for flagging false positives?
What tech are they using? the default queries are proprietary, right? Obviously it would be best if we could do this using open tools.

@sj
Copy link

@sj sj commented Jul 5, 2017

@jhelie: thanks for looping me in; @jnothman: thanks for the feedback!

Regarding CI documentation: a red x will only appear if it is not possible to analyse a revision. For Python analysis this shouldn't happen (famous last words...), but for languages like Java it's very difficult to analyse source code that doesn't compile. However, you're right: this needs to be documented more clearly; I will pass that on to our engineering and documentation teams!

I'll pass your report re. the back button on as well. Browsing and filtering alerts by path is very much on our schedule for the next few months — stay tuned!

@amueller
Copy link
Member

@amueller amueller commented Jul 5, 2017

I don't understand how lgtm infers input types. Does it use the docstring? or just fuzz it?

@sj
Copy link

@sj sj commented Jul 5, 2017

@amueller: the tech behind it is documented here: https://lgtm.com/docs/lgtm/about-lgtm. The queries are all open-sourced here: https://github.com/lgtmhq/lgtm-queries. Pull requests with improvements are really welcome!

We will soon introduce functionality running your own queries as part of PR integration; you can already get started with your own queries on the query console here: https://lgtm.com/query/project:12060052/lang:python/

We pride ourselves on our extremely low false-positive levels, made possible by the query technology and our vision: code = data. If you find any FPs: do let us know (or file a PR!). And if you'd like to know more about the tech behind it, do have a read of some of papers:

  • QL: Object-oriented Queries on Relational Data (Avgustinov et al., ECOOP 2016)
  • Type Inference for Datalog with Complex Type Hierarchies (Schäfer et al., POP 2010)
  • Type Inference for Datalog and Its Application to Query Optimisation (De Moor et al., PODS 2008)
  • Algebraic Data Types for Object-oriented Datalog (Schäfer et al.)

Any other questions: shoot me a message!

@amueller
Copy link
Member

@amueller amueller commented Jul 5, 2017

@sjvs awesome, thanks!

@sj
Copy link

@sj sj commented Jul 5, 2017

@amueller: your question about input types appeared as I was typing my previous response. The quick answer: lgtm performs points-to analysis which tracks the possible types of a variable. This analysis is being improved on a daily basis (literally as we speak).

To answer your original question:

do we want LGTM CI or do we wait for flagging false positives?

I'd say: yes, you do want that 😄.

On a more serious note: we launched lgtm.com fairly recently and are very keen to receive feedback from the community. So I'd really appreciate your thoughts and suggestions for improvements — including (especially!) on the continuous integration.

@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 5, 2017

@amueller: what @sjvs says, and please feel free to ping me if you suspect an FP (I can see a few dubious cases atm but always keen to hear from the project devs!)

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

@sj
Copy link

@sj sj commented Jul 5, 2017

Hehe — point taken, @jnothman. I'll make sure to look into increasing the update frequency soon! We're regularly polling > 50000 GitHub projects to keep the free results up-to-date for everybody, so we have to be a little bit careful 😉

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 5, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 6, 2017

Also @sjvs, have you considered supporting Cython? There's a shortage of quality control tools there, though its use is on the rise.

@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 7, 2017

@jnothman we're continuously monitoring language popularity and actively working on supporting more languages on lgtm.com so thanks for letting us know about your interest in Cython. At the moment I don't think it's scheduled just yet and I believe the C languages will come first.

Regarding your question on project activity: we're curious about this kind of questions too and I'm hoping to have time to write up a short analysis of all the projects on lgtm - there's always more data than time!

@jhelie
Copy link
Contributor Author

@jhelie jhelie commented Jul 7, 2017

At the moment it should take < 48 hours for the results of the analysis to appear - and congrats on fixing 2 alert btw :)

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 8, 2017

massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
@sj
Copy link

@sj sj commented Jul 25, 2017

@jnothman: Cython certainly looks interesting. You're right, it's potentially easier to implement than C. Note that we've had C/C++ analysis support for a long time in our other products, so porting that to lgtm.com is less work than you may think it is.

I'll certainly add Cython to the list!

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants