New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fix alerts found with lgtm #9278

Merged
merged 10 commits into from Jul 5, 2017

Conversation

Projects
None yet
5 participants
@jhelie
Contributor

jhelie commented Jul 4, 2017

This PR fixes some alerts flagged by lgtm.com code queries. Most are minor improvements suggestions (mostly fixing some misconstructed child classes) but a few some should fix unintentional code behaviour.

The standard lgtm engineering analytics highlights dependencies, vulnerabilities and tracks introduced and fixed alerts as the code changes (and shows that scikit-learn keeps getting better) but you can investigate further by writing your own custom queries to explore your code base.

I personally find lgtm alerts most useful when using PR integration as it allows to check and deal with potential issues before they find their way into master (see for instance 5147fd0).

Full disclosure: I frequently use scikit-learn but also work on lgtm.com, hence my interest! Hope that helps, any question please ask.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

Thanks. The string comparison by reference fix is the only one that appears to actually be a bug fix. In some cases we intentionally avoid super...__init__, but these look good to me.

Is there a way to mark LGTM alerts as "Won't fix" (e.g. the first)?

Your PR introduces flake8 errors.

Member

jnothman commented Jul 5, 2017

Thanks. The string comparison by reference fix is the only one that appears to actually be a bug fix. In some cases we intentionally avoid super...__init__, but these look good to me.

Is there a way to mark LGTM alerts as "Won't fix" (e.g. the first)?

Your PR introduces flake8 errors.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

From a glance at the alert, it looks like a number of them are valid. What's the best way to get a permalink for an alert?

Member

jnothman commented Jul 5, 2017

From a glance at the alert, it looks like a number of them are valid. What's the best way to get a permalink for an alert?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

Apart from Flake8 errors (please fix if you can), this LGTM.

Member

jnothman commented Jul 5, 2017

Apart from Flake8 errors (please fix if you can), this LGTM.

@jnothman jnothman changed the title from Fix alerts found with lgtm to [MRG+1] Fix alerts found with lgtm Jul 5, 2017

Show outdated Hide outdated sklearn/decomposition/sparse_pca.py
self.n_iter = n_iter
self.callback = callback
self.callback = callback

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jul 5, 2017

Member

It seems to me that there is trailing whitespace here.

@GaelVaroquaux

GaelVaroquaux Jul 5, 2017

Member

It seems to me that there is trailing whitespace here.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 5, 2017

Member

All these changes are good. I am 👍 for merge, but the style issues (flake8) should be addressed.

Member

GaelVaroquaux commented Jul 5, 2017

All these changes are good. I am 👍 for merge, but the style issues (flake8) should be addressed.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

Still failing flake8, says Travis:

--------------------------------------------------------------------------------
./sklearn/decomposition/sparse_pca.py:265:33: W291 trailing whitespace
        self.callback = callback
                                ^
Member

jnothman commented Jul 5, 2017

Still failing flake8, says Travis:

--------------------------------------------------------------------------------
./sklearn/decomposition/sparse_pca.py:265:33: W291 trailing whitespace
        self.callback = callback
                                ^
@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

@jnothman hopefully that was the last one but will update if not - replying to your other questions now!

Contributor

jhelie commented Jul 5, 2017

@jnothman hopefully that was the last one but will update if not - replying to your other questions now!

@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

@jnothman permalink to alerts are obtained by clicking the (diagonal up) arrow at the end of the line highlighting the alert - while the question mark symbol next to it takes you to the code query.

e.g. for the reference vs value check the link is https://lgtm.com/projects/g/scikit-learn/scikit-learn/snapshot/1a981e87ba6f8d66ecfd6940184836446ded3f4c/files/benchmarks/bench_plot_randomized_svd.py#V185

Contributor

jhelie commented Jul 5, 2017

@jnothman permalink to alerts are obtained by clicking the (diagonal up) arrow at the end of the line highlighting the alert - while the question mark symbol next to it takes you to the code query.

e.g. for the reference vs value check the link is https://lgtm.com/projects/g/scikit-learn/scikit-learn/snapshot/1a981e87ba6f8d66ecfd6940184836446ded3f4c/files/benchmarks/bench_plot_randomized_svd.py#V185

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

yes, flake is happy.

Member

jnothman commented Jul 5, 2017

yes, flake is happy.

@jnothman jnothman merged commit ac41281 into scikit-learn:master Jul 5, 2017

1 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details
@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

But I should add that an alert does not need to have found its way into your code base to be referenced and discussed: if you turn on PR integration any introduced alerts will get a similar link that can be shared (obviously pushing new commits will re-trigger the analysis and that might fix the alert)

see for example: https://lgtm.com/projects/g/matplotlib/matplotlib/rev/pr-b7a573728fb8b7712d6aca3e99ef36d8f308c015 (this is an example of a fixed alert, but the idea is the same)

Contributor

jhelie commented Jul 5, 2017

But I should add that an alert does not need to have found its way into your code base to be referenced and discussed: if you turn on PR integration any introduced alerts will get a similar link that can be shared (obviously pushing new commits will re-trigger the analysis and that might fix the alert)

see for example: https://lgtm.com/projects/g/matplotlib/matplotlib/rev/pr-b7a573728fb8b7712d6aca3e99ef36d8f308c015 (this is an example of a fixed alert, but the idea is the same)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

Thanks for this. I've posted another couple of LGTM-derived issues.

Member

jnothman commented Jul 5, 2017

Thanks for this. I've posted another couple of LGTM-derived issues.

@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

@jnothman you're welcome. And regarding marking alerts as "won't fix": this is something we're actively working on, in the future you will be able to dismiss individual alerts, turn particular queries on/off and decide which parts of your code base should not be analysed (we automatically detect library, generated and test code but we might miss some).

We've recently come out of beta and we keep rolling out features, any feedback is welcome!

Contributor

jhelie commented Jul 5, 2017

@jnothman you're welcome. And regarding marking alerts as "won't fix": this is something we're actively working on, in the future you will be able to dismiss individual alerts, turn particular queries on/off and decide which parts of your code base should not be analysed (we automatically detect library, generated and test code but we might miss some).

We've recently come out of beta and we keep rolling out features, any feedback is welcome!

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

Ability to reject false positives would definitely be high on or top of the list. Other usability things I have noted:

  • the CI integration docs do not describe what circumstances lead to a red x.
  • when I view full source code for an alert, and then click back, all the alerts are collapsed again. It should be possible to hit back and see where I was up to in the listing.
  • should be able to filter alerts list by path.
Member

jnothman commented Jul 5, 2017

Ability to reject false positives would definitely be high on or top of the list. Other usability things I have noted:

  • the CI integration docs do not describe what circumstances lead to a red x.
  • when I view full source code for an alert, and then click back, all the alerts are collapsed again. It should be possible to hit back and see where I was up to in the listing.
  • should be able to filter alerts list by path.
@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

Thanks for this @jnothman! pinging @sjvs our product manager

Contributor

jhelie commented Jul 5, 2017

Thanks for this @jnothman! pinging @sjvs our product manager

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 5, 2017

Member

do we want LGTM CI or do we wait for flagging false positives?
What tech are they using? the default queries are proprietary, right? Obviously it would be best if we could do this using open tools.

Member

amueller commented Jul 5, 2017

do we want LGTM CI or do we wait for flagging false positives?
What tech are they using? the default queries are proprietary, right? Obviously it would be best if we could do this using open tools.

@sjvs

This comment has been minimized.

Show comment
Hide comment
@sjvs

sjvs Jul 5, 2017

@jhelie: thanks for looping me in; @jnothman: thanks for the feedback!

Regarding CI documentation: a red x will only appear if it is not possible to analyse a revision. For Python analysis this shouldn't happen (famous last words...), but for languages like Java it's very difficult to analyse source code that doesn't compile. However, you're right: this needs to be documented more clearly; I will pass that on to our engineering and documentation teams!

I'll pass your report re. the back button on as well. Browsing and filtering alerts by path is very much on our schedule for the next few months — stay tuned!

sjvs commented Jul 5, 2017

@jhelie: thanks for looping me in; @jnothman: thanks for the feedback!

Regarding CI documentation: a red x will only appear if it is not possible to analyse a revision. For Python analysis this shouldn't happen (famous last words...), but for languages like Java it's very difficult to analyse source code that doesn't compile. However, you're right: this needs to be documented more clearly; I will pass that on to our engineering and documentation teams!

I'll pass your report re. the back button on as well. Browsing and filtering alerts by path is very much on our schedule for the next few months — stay tuned!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 5, 2017

Member

I don't understand how lgtm infers input types. Does it use the docstring? or just fuzz it?

Member

amueller commented Jul 5, 2017

I don't understand how lgtm infers input types. Does it use the docstring? or just fuzz it?

@sjvs

This comment has been minimized.

Show comment
Hide comment
@sjvs

sjvs Jul 5, 2017

@amueller: the tech behind it is documented here: https://lgtm.com/docs/lgtm/about-lgtm. The queries are all open-sourced here: https://github.com/lgtmhq/lgtm-queries. Pull requests with improvements are really welcome!

We will soon introduce functionality running your own queries as part of PR integration; you can already get started with your own queries on the query console here: https://lgtm.com/query/project:12060052/lang:python/

We pride ourselves on our extremely low false-positive levels, made possible by the query technology and our vision: code = data. If you find any FPs: do let us know (or file a PR!). And if you'd like to know more about the tech behind it, do have a read of some of papers:

  • QL: Object-oriented Queries on Relational Data (Avgustinov et al., ECOOP 2016)
  • Type Inference for Datalog with Complex Type Hierarchies (Schäfer et al., POP 2010)
  • Type Inference for Datalog and Its Application to Query Optimisation (De Moor et al., PODS 2008)
  • Algebraic Data Types for Object-oriented Datalog (Schäfer et al.)

Any other questions: shoot me a message!

sjvs commented Jul 5, 2017

@amueller: the tech behind it is documented here: https://lgtm.com/docs/lgtm/about-lgtm. The queries are all open-sourced here: https://github.com/lgtmhq/lgtm-queries. Pull requests with improvements are really welcome!

We will soon introduce functionality running your own queries as part of PR integration; you can already get started with your own queries on the query console here: https://lgtm.com/query/project:12060052/lang:python/

We pride ourselves on our extremely low false-positive levels, made possible by the query technology and our vision: code = data. If you find any FPs: do let us know (or file a PR!). And if you'd like to know more about the tech behind it, do have a read of some of papers:

  • QL: Object-oriented Queries on Relational Data (Avgustinov et al., ECOOP 2016)
  • Type Inference for Datalog with Complex Type Hierarchies (Schäfer et al., POP 2010)
  • Type Inference for Datalog and Its Application to Query Optimisation (De Moor et al., PODS 2008)
  • Algebraic Data Types for Object-oriented Datalog (Schäfer et al.)

Any other questions: shoot me a message!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 5, 2017

Member

@sjvs awesome, thanks!

Member

amueller commented Jul 5, 2017

@sjvs awesome, thanks!

@sjvs

This comment has been minimized.

Show comment
Hide comment
@sjvs

sjvs Jul 5, 2017

@amueller: your question about input types appeared as I was typing my previous response. The quick answer: lgtm performs points-to analysis which tracks the possible types of a variable. This analysis is being improved on a daily basis (literally as we speak).

To answer your original question:

do we want LGTM CI or do we wait for flagging false positives?

I'd say: yes, you do want that 😄.

On a more serious note: we launched lgtm.com fairly recently and are very keen to receive feedback from the community. So I'd really appreciate your thoughts and suggestions for improvements — including (especially!) on the continuous integration.

sjvs commented Jul 5, 2017

@amueller: your question about input types appeared as I was typing my previous response. The quick answer: lgtm performs points-to analysis which tracks the possible types of a variable. This analysis is being improved on a daily basis (literally as we speak).

To answer your original question:

do we want LGTM CI or do we wait for flagging false positives?

I'd say: yes, you do want that 😄.

On a more serious note: we launched lgtm.com fairly recently and are very keen to receive feedback from the community. So I'd really appreciate your thoughts and suggestions for improvements — including (especially!) on the continuous integration.

@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 5, 2017

Contributor

@amueller: what @sjvs says, and please feel free to ping me if you suspect an FP (I can see a few dubious cases atm but always keen to hear from the project devs!)

Contributor

jhelie commented Jul 5, 2017

@amueller: what @sjvs says, and please feel free to ping me if you suspect an FP (I can see a few dubious cases atm but always keen to hear from the project devs!)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@sjvs

This comment has been minimized.

Show comment
Hide comment
@sjvs

sjvs Jul 5, 2017

Hehe — point taken, @jnothman. I'll make sure to look into increasing the update frequency soon! We're regularly polling > 50000 GitHub projects to keep the free results up-to-date for everybody, so we have to be a little bit careful 😉

sjvs commented Jul 5, 2017

Hehe — point taken, @jnothman. I'll make sure to look into increasing the update frequency soon! We're regularly polling > 50000 GitHub projects to keep the free results up-to-date for everybody, so we have to be a little bit careful 😉

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 6, 2017

Member

Also @sjvs, have you considered supporting Cython? There's a shortage of quality control tools there, though its use is on the rise.

Member

jnothman commented Jul 6, 2017

Also @sjvs, have you considered supporting Cython? There's a shortage of quality control tools there, though its use is on the rise.

@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 7, 2017

Contributor

@jnothman we're continuously monitoring language popularity and actively working on supporting more languages on lgtm.com so thanks for letting us know about your interest in Cython. At the moment I don't think it's scheduled just yet and I believe the C languages will come first.

Regarding your question on project activity: we're curious about this kind of questions too and I'm hoping to have time to write up a short analysis of all the projects on lgtm - there's always more data than time!

Contributor

jhelie commented Jul 7, 2017

@jnothman we're continuously monitoring language popularity and actively working on supporting more languages on lgtm.com so thanks for letting us know about your interest in Cython. At the moment I don't think it's scheduled just yet and I believe the C languages will come first.

Regarding your question on project activity: we're curious about this kind of questions too and I'm hoping to have time to write up a short analysis of all the projects on lgtm - there's always more data than time!

@jhelie

This comment has been minimized.

Show comment
Hide comment
@jhelie

jhelie Jul 7, 2017

Contributor

At the moment it should take < 48 hours for the results of the analysis to appear - and congrats on fixing 2 alert btw :)

Contributor

jhelie commented Jul 7, 2017

At the moment it should take < 48 hours for the results of the analysis to appear - and congrats on fixing 2 alert btw :)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 8, 2017

Member
Member

jnothman commented Jul 8, 2017

massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
@sjvs

This comment has been minimized.

Show comment
Hide comment
@sjvs

sjvs Jul 25, 2017

@jnothman: Cython certainly looks interesting. You're right, it's potentially easier to implement than C. Note that we've had C/C++ analysis support for a long time in our other products, so porting that to lgtm.com is less work than you may think it is.

I'll certainly add Cython to the list!

sjvs commented Jul 25, 2017

@jnothman: Cython certainly looks interesting. You're right, it's potentially easier to implement than C. Note that we've had C/C++ analysis support for a long time in our other products, so porting that to lgtm.com is less work than you may think it is.

I'll certainly add Cython to the list!

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] Fix alerts found with lgtm (#9278)
* compare values rather than references

* add __ne__ to class

* add missing self parameter

* properly initialise child class using super()

* remove useless loop variables and assigments

* remove useless return statements

* respect PEP8 style

* fix another flake8 check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment