Skip to content

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Apr 27, 2020

Main purpose of this PR:

To properly support keyword-only parameters. That is done in c508e89.

Secondary purpose:

Properly support default values for keyword-only parameters. That is done in c5e14f5 and the updates to the database.

Default values for positional arguments follow a rule, so if an argument has a default value, later positional arguments must also have default values.

The database only stores the actual default values, and nothing about the arguments that doesn't have default values.

This turns out to be a major problem for Argument.getKwDefault(i), since default values for keyword-only arguments doesn't have the same rule. So if you know there is one default value, you can't tell if it is associated with foo or bar, as in the examples below:

def a(*, foo=None, bar):
    pass

def b(*, foo, bar=None):
    pass

This PR (and corresponding internal extractor change) fixes this, by making Argument.getKwDefault(i) return the default value for keyword-only parameter i (instead of the i'th default for a keyword-only parameter). Argument.getDefault is changed in the same manner to keep consistency.

Currently keyword-only parameters are not handled properly :(
I want to ensure we handle when only _some_ parameters have default/annotations
The result is the same, but `getAKeywordOnlyArg` is the method used everywhere
else in the code.
Default values for positional arugments follow a rule, so if an argument has a
default value, later positional arguments must also have default values.

The database only stores the actual default values, and nothing about the
arguments that doesn't have default values.

This turns out to be a major problem for Argument.getKwDefault(i), since default
values for keyword-only arguments doesn't have the same rule. So if you know
there is one default value, you can't tell if it is associated with `foo` or
`bar`, as in the examples below:

```
def a(*, foo=None, bar):
    pass

def b(*, foo, bar=None):
    pass
```
This commit is based on a change to the extractor
@RasmusWL RasmusWL added depends on internal PR This PR should only be merged in sync with an internal Semmle PR Python labels Apr 27, 2020
@RasmusWL RasmusWL marked this pull request as ready for review May 7, 2020 09:40
@RasmusWL RasmusWL requested a review from a team as a code owner May 7, 2020 09:40
@RasmusWL
Copy link
Member Author

RasmusWL commented May 7, 2020

I tested that the upgrade script works properly,

  • first off by looking at python/ql/test/3/library-tests/functions/test.py
  • I also looked at the CodeQL DB for django and checked
    • the number of Arguments.getAKwDefault did not change from original DB when running the query from master-branch, to the upgraded db (643 results).
    • I also checked that with the new library functions in this PR, Function.getKeywordOnlyArg(_).getDefault() gave the same number of results.

@RasmusWL
Copy link
Member Author

RasmusWL commented May 7, 2020

Woops, seems like I forgot the part about 2. Run a Stats/CPP job and update ql repo with the results. 🤔 will have to figure out if it is required for this change or not 🤔

@RasmusWL
Copy link
Member Author

Our internal docs say that:

QL queries are compiled in the context of a database schema and a database statistics file. The former provides definitions of types and existential relations, while the latter provides information to the optimiser. Note in particular that QL queries are not compiled in the context of a populated database; the role of the statistics file is to be a distilled version of a prototypical database. If a statistics file is not provided, is inaccurate, or is incomplete, then the optimiser can transform queries so that they perform extremely badly.

Since this upgrade only renumbers a few items in the py_exprs relation, the cardinality clearly doesn't change. The histograms (relating how much the values in one column depend upon the values in the other), shouldn't change in any meaningful way either.

I think it's probably worth it to look into whether we should update the .dbscheme.stats file, but if we can keep it out of this PR that would be great 👍

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. I think we need to think a bit more about how we might improve the interface with the generated AST classes. I'm leaning more towards wrapping most of the methods in there so that we get more control over naming and documentation. But that's beyond the scope of this PR. 🙂

newidx =
max(int i |
exists(Parameter_ param | param = callable.getInnerScope().getKwonlyarg(i) |
param.getLocation().getStartLine() < expr.getLocation().getStartLine()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ordering bit seems mildly magical. An explanatory comment is probably in order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic

But I tried to explain in 5a18b08 :)

tausbn
tausbn previously approved these changes May 26, 2020
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this looks good. Time to scrutinise the internals. 🔍

@RasmusWL RasmusWL requested a review from tausbn July 2, 2020 12:25
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanatory comment! This looks good to me! 🙂

@tausbn tausbn merged commit ba634af into github:master Jul 2, 2020
@RasmusWL RasmusWL deleted the python-keyword-only-args branch July 2, 2020 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
depends on internal PR This PR should only be merged in sync with an internal Semmle PR Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants