Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type inference for user code #25103

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

oscarbenjamin
Copy link
Contributor

@oscarbenjamin oscarbenjamin commented May 3, 2023

References to other Issues or PRs

See also long discussion in gh-17945

Brief description of what is fixed or changed

Adds type hints for various attributes and methods of Basic and Expr. Adds a new DefinedFunction class as a subclass of Function that should be the base class for defined functions like sin, cos etc.

These changes are enough for simple end user code involving sympy expressions to be statically analysable so that code completion and editor warnings can work with e.g. the pyright language server that is often used with vscode and other editors (I use pyright from within vim myself now).

It is also possible to use pyright as a type checker e.g.:

pip install pyright
pyright sympy

(You need to have node and npm installed for this to work.)

Running pyright like this over the codebase takes about 5 minutes on this computer and for master shows:

42405 errors, 926 warnings, 0 informations

With this PR it is:

26598 errors, 39 warnings, 0 informations

So this removes around half of all of the "errors" in the codebase and pretty much all the warnings. Most of the other errors are probably not that hard to improve with some type declarations but each case requires a bit of analysis and I thought I would stop here before making the diff any larger.

Other comments

It is increasingly common that users of sympy will use an editor like vscode that performs analysis of the code either through type inference or using type hints using a langauge server like pyright. Personally I don't use vscode but I do have the pyright language server running inside vim. The sympy codebase does too many strange things for type inference to work without some type hints. What that means is that if you have pyright running in your editor then just opening a file that uses sympy will cause pyright to consume huge amounts of RAM and CPU while it tries to analyse the sympy codebase. Even then it fails and so the result looks something like this:
image
In other words ordinary code using sympy shows up as being full of errors. The errors are because type inference fails even for basic things like cos(1). It is even worse if you open up a sympy test suite file e.g. test_manual.py:
image

The number one thing that confuses pyright is Function.__new__ and I have looked at how to make accurate type hints for Function.__new__ but it is impossible because it basically works like this:

class Function(Expr):
    def __new__(cls, *args, **kwargs):
        if cls is Function:
            # f = Function('f'), returns an Expr subclass
            return dynamically_created_subclass_of_AppliedUndef
        else:
            # expr = cos(1), returns an Expr instance
            return super().__new__(cls, *args, **kwargs)

class sin(Function):
    # sin is a Function subclass

f = Function('f')  # f is an Expr subclass

What this __new__ method does is that it returns two completely different types of object. More confusingly Function.__new__ returns a completely different type of object from any FunctionSubclass.__new__ method. There is just no way to type a method such that it returns a different tpye when called from a subclass.

The only way that I could make sense of this was to introduce a new class:

class Function(Expr):
    def __new__(cls, *args, **kwargs) -> Type[AppliedUndef]:
        if cls is Function:
            # f = Function('f'), returns an Expr subclass
            return dynamically_created_subclass_of_AppliedUndef
        else:
            # Tell the type checker to ignore this. It will only be used by
            # a direct subclass of Function but not by any subclass of 
            # DefinedFunction. We only need to keep this for compatibility.
            return cls._new_(*args, **kwargs)  # type: ignore

    @classmethod
    def _new_(cls, *args, **kwargs) -> Expr:
        return super().__new__(cls, *args, **kwargs)

class DefinedFunction(Function):
    # Override Function.__new__
    def __new__(cls, *args, **kwargs) -> Expr:
        return cls._new_(*args, **kwargs)

class sin(DefinedFunction):
     ...

f = Function('f')  # f is an Expr subclass

Most of the lines changed in the diff are just changing many classes to subclass DefinedFunction instead of Function.

With these changes pyright understands the types of sympy expressions so that it does not report errors and can handle autocompletion etc:
image
image
image
Obviously this kind of completion/help only works if the language server can infer something about the types of variables like e2 in the code.

Release Notes

  • core
    • Type hints have been added to improve type inference for code that uses sympy expressions. This should improve code completion and help tips for end users using an editor such as vscode that typically uses the pyright language server.

@oscarbenjamin oscarbenjamin added core mypy pyright Related to pyright, pylance and their vscode extensions labels May 3, 2023
@sympy-bot
Copy link

sympy-bot commented May 3, 2023

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • core
    • Type hints have been added to improve type inference for code that uses sympy expressions. This should improve code completion and help tips for end users using an editor such as vscode that typically uses the pyright language server. (#25103 by @oscarbenjamin)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->

See also long discussion in gh-17945

#### Brief description of what is fixed or changed

Adds type hints for various attributes and methods of Basic and Expr. Adds a new `DefinedFunction` class as a subclass of `Function` that should be the base class for defined functions like `sin`, `cos` etc.

These changes are enough for simple end user code involving sympy expressions to be statically analysable so that code completion and editor warnings can work with e.g. the pyright language server that is often used with vscode and other editors (I use pyright from within vim myself now).

It is also possible to use pyright as a type checker e.g.:
```
pip install pyright
pyright sympy
```
(You need to have node and npm installed for this to work.)

Running pyright like this over the codebase takes about 5 minutes on this computer and for master shows:
```
42405 errors, 926 warnings, 0 informations
```
With this PR it is:
```
26598 errors, 39 warnings, 0 informations
```
So this removes around half of all of the "errors" in the codebase and pretty much all the warnings. Most of the other errors are probably not that hard to improve with some type declarations but each case requires a bit of analysis and I thought I would stop here before making the diff any larger.

#### Other comments

It is increasingly common that users of sympy will use an editor like vscode that performs analysis of the code either through type inference or using type hints using a langauge server like pyright. Personally I don't use vscode but I do have the pyright language server running inside vim. The sympy codebase does too many strange things for type inference to work without some type hints. What that means is that if you have pyright running in your editor then just opening a file that uses sympy will cause pyright to consume huge amounts of RAM and CPU while it tries to analyse the sympy codebase. Even then it fails and so the result looks something like this:
![image](https://user-images.githubusercontent.com/1159732/236049488-483bc1c8-bc34-4ff0-a4bf-0ea2ee04b7aa.png)
In other words ordinary code using sympy shows up as being full of errors. The errors are because type inference fails even for basic things like `cos(1)`. It is even worse if you open up a sympy test suite file e.g. `test_manual.py`:
![image](https://user-images.githubusercontent.com/1159732/236054374-a21e7597-508e-4afb-9e86-e4df7796af8b.png)

The number one thing that confuses pyright is `Function.__new__` and I have looked at how to make accurate type hints for `Function.__new__` but it is impossible because it basically works like this:
```python
class Function(Expr):
    def __new__(cls, *args, **kwargs):
        if cls is Function:
            # f = Function('f'), returns an Expr subclass
            return dynamically_created_subclass_of_AppliedUndef
        else:
            # expr = cos(1), returns an Expr instance
            return super().__new__(cls, *args, **kwargs)

class sin(Function):
    # sin is a Function subclass

f = Function('f')  # f is an Expr subclass
```
What this `__new__` method does is that it returns two completely different types of object. More confusingly `Function.__new__` returns a completely different type of object from any `FunctionSubclass.__new__` method. There is just no way to type a method such that it returns a different tpye when called from a subclass.

The only way that I could make sense of this was to introduce a new class:
```python
class Function(Expr):
    def __new__(cls, *args, **kwargs) -> Type[AppliedUndef]:
        if cls is Function:
            # f = Function('f'), returns an Expr subclass
            return dynamically_created_subclass_of_AppliedUndef
        else:
            # Tell the type checker to ignore this. It will only be used by
            # a direct subclass of Function but not by any subclass of 
            # DefinedFunction. We only need to keep this for compatibility.
            return cls._new_(*args, **kwargs)  # type: ignore

    @classmethod
    def _new_(cls, *args, **kwargs) -> Expr:
        return super().__new__(cls, *args, **kwargs)

class DefinedFunction(Function):
    # Override Function.__new__
    def __new__(cls, *args, **kwargs) -> Expr:
        return cls._new_(*args, **kwargs)

class sin(DefinedFunction):
     ...

f = Function('f')  # f is an Expr subclass
```
Most of the lines changed in the diff are just changing many classes to subclass `DefinedFunction` instead of `Function`.

With these changes pyright understands the types of sympy expressions so that it does not report errors and can handle autocompletion etc:
![image](https://user-images.githubusercontent.com/1159732/236052375-f9b73b23-3a77-432f-97d8-82adb97436bd.png)
![image](https://user-images.githubusercontent.com/1159732/236052616-f2e1f155-cdc3-4a1e-92b7-f46efba76293.png)
![image](https://user-images.githubusercontent.com/1159732/236053033-98842f46-f4f9-46fb-9d86-bd736393a90f.png)
Obviously this kind of completion/help only works if the language server can infer something about the types of variables like `e2` in the code.

#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers. Formerly, `log(-x)` incorrectly gave `-log(x)`.

* physics.units
  * Corrected a semantical error in the conversion between volt and statvolt which
    reported the volt as being larger than the statvolt.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* core
    * Type hints have been added to improve type inference for code that uses sympy expressions. This should improve code completion and help tips for end users using an editor such as vscode that typically uses the pyright language server.
<!-- END RELEASE NOTES -->

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

Adds a new DefinedFunction class as a subclass of Function that should be the base class for defined functions like sin, cos etc.

I'm not a fan of this. If I understand the code correctly, it currently isn't a backwards compatibility break, in that things that subclass Function will also still work. But the temptation is certainly there now to do that. It also in general just feels like the sort of change that just adds extra complexity for no reason other than to satisfy mypy.

Is there way to make mypy default to thinking that Function subclasses are actually Function, so that the only thing that wouldn't work is UndefinedFunctions?

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

Is there way to make mypy default to thinking that Function subclasses are actually Function, so that the only thing that wouldn't work is UndefinedFunctions?

Specifically does just inverting the if check to check if cls is not Function first work? It seems like it's just picking the first return type it can find.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

To be fair, I do think the overloading of Function is a bad design, and would have preferred if it were avoided. I would be a little happier with refactoring here if it were to actually go all the way with something better (e.g., #4787).

@oscarbenjamin
Copy link
Contributor Author

Is there way to make mypy default to thinking that Function subclasses are actually Function, so that the only thing that wouldn't work is UndefinedFunctions?

Specifically does just inverting the if check to check if cls is not Function first work? It seems like it's just picking the first return type it can find.

Firstly we are talking about pyright here rather than mypy. If you want to see how this works then I suggest trying out vscode with the standard Python language plugins.

As for the suggestion of inverting the if check no that won't work because we also need to be able to do this:

f = Function('f')
e = f(1)

Here any type analysis tool needs to understand that f (which is returned from Function.__new__) is an Expr subclass so that e is an Expr instance. The type analysis needs to understand both of the types that are returned by Function.__new__ and both of the ways in which Function is used.

To be fair, I do think the overloading of Function is a bad design, and would have preferred if it were avoided.

Agreed. A lot of the complexity here comes from two things:

  1. Undefined functions are classes
  2. Function is overloaded to return an undefined function or be a superclass for defined functions.

Both of those are design flaws. Neither is fixable without breaking compatibility though.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

As for the suggestion of inverting the if check no that won't work because we also need to be able to do this:

Yes I know that, but it will be significantly fewer errors this way than from not understanding Function subclasses.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

Alternatively, can we just tell it that __new__ returns Union[UndefinedFunction, Function]? I get that it can't understand that that's different for subclasses, but is that really a big deal? I think the only issue is that it wouldn't be able to tell that cos(x) is not a class and that cos(x)(x) is an error. Or is the type checker not able to handle a metaclass/class Union like this?

@github-actions
Copy link

github-actions bot commented May 3, 2023

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups.

Significantly changed benchmark results (PR vs master)

Significantly changed benchmark results (master vs previous release)

| Change   | Before [a00718ba]    | After [bbc54928]    |   Ratio | Benchmark (Parameter)                                                |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------|
| -        | 69.5±2ms             | 45.2±0.5ms          |    0.65 | integrate.TimeIntegrationRisch02.time_doit(10)                       |
| -        | 69.6±1ms             | 44.2±0.3ms          |    0.63 | integrate.TimeIntegrationRisch02.time_doit_risch(10)                 |
| +        | 17.8±0.4μs           | 30.5±0.3μs          |    1.71 | integrate.TimeIntegrationRisch03.time_doit(1)                        |
| -        | 5.42±0.03ms          | 2.88±0.02ms         |    0.53 | logic.LogicSuite.time_load_file                                      |
| -        | 72.6±0.3ms           | 29.1±0.3ms          |    0.4  | polys.TimeGCD_GaussInt.time_op(1, 'dense')                           |
| -        | 26.0±0.2ms           | 17.3±0.08ms         |    0.67 | polys.TimeGCD_GaussInt.time_op(1, 'expr')                            |
| -        | 73.9±0.4ms           | 29.0±0.3ms          |    0.39 | polys.TimeGCD_GaussInt.time_op(1, 'sparse')                          |
| -        | 254±1ms              | 127±0.7ms           |    0.5  | polys.TimeGCD_GaussInt.time_op(2, 'dense')                           |
| -        | 257±2ms              | 126±0.3ms           |    0.49 | polys.TimeGCD_GaussInt.time_op(2, 'sparse')                          |
| -        | 652±2ms              | 371±1ms             |    0.57 | polys.TimeGCD_GaussInt.time_op(3, 'dense')                           |
| -        | 655±5ms              | 373±2ms             |    0.57 | polys.TimeGCD_GaussInt.time_op(3, 'sparse')                          |
| -        | 492±2μs              | 288±2μs             |    0.59 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(1, 'dense')            |
| -        | 1.77±0.01ms          | 1.05±0.01ms         |    0.6  | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(2, 'dense')            |
| -        | 5.81±0.02ms          | 3.10±0.02ms         |    0.53 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(3, 'dense')            |
| -        | 444±4μs              | 233±1μs             |    0.52 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(1, 'dense')               |
| -        | 1.48±0.02ms          | 678±7μs             |    0.46 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(2, 'dense')               |
| -        | 4.89±0.03ms          | 1.67±0.01ms         |    0.34 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(3, 'dense')               |
| -        | 375±3μs              | 210±1μs             |    0.56 | polys.TimeGCD_SparseGCDHighDegree.time_op(1, 'dense')                |
| -        | 2.44±0.01ms          | 1.25±0.01ms         |    0.51 | polys.TimeGCD_SparseGCDHighDegree.time_op(3, 'dense')                |
| -        | 10.1±0.03ms          | 4.45±0.02ms         |    0.44 | polys.TimeGCD_SparseGCDHighDegree.time_op(5, 'dense')                |
| -        | 359±2μs              | 169±0.7μs           |    0.47 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(1, 'dense')            |
| -        | 2.53±0.03ms          | 907±5μs             |    0.36 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(3, 'dense')            |
| -        | 9.49±0.09ms          | 2.66±0.01ms         |    0.28 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(5, 'dense')            |
| -        | 1.03±0ms             | 426±4μs             |    0.41 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'dense')           |
| -        | 1.72±0.01ms          | 505±2μs             |    0.29 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')          |
| -        | 6.02±0.04ms          | 1.83±0.01ms         |    0.3  | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'dense')           |
| -        | 8.49±0.05ms          | 1.50±0.01ms         |    0.18 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')          |
| -        | 286±2μs              | 66.0±0.5μs          |    0.23 | polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')             |
| -        | 3.40±0.05ms          | 395±4μs             |    0.12 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'dense')              |
| -        | 3.99±0.03ms          | 278±2μs             |    0.07 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')             |
| -        | 7.08±0.08ms          | 1.28±0.01ms         |    0.18 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'dense')              |
| -        | 8.73±0.06ms          | 836±7μs             |    0.1  | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')             |
| -        | 5.05±0.02ms          | 3.02±0.01ms         |    0.6  | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse') |
| -        | 12.2±0.08ms          | 6.61±0.03ms         |    0.54 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'dense')  |
| -        | 22.3±0.1ms           | 9.12±0.03ms         |    0.41 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| -        | 5.24±0.03ms          | 867±2μs             |    0.17 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')    |
| -        | 12.6±0.02ms          | 7.05±0.04ms         |    0.56 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')    |
| -        | 102±0.8ms            | 25.7±0.2ms          |    0.25 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'dense')     |
| -        | 168±0.4ms            | 54.8±0.2ms          |    0.33 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')    |
| -        | 174±0.9μs            | 112±0.7μs           |    0.64 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'dense')      |
| -        | 358±1μs              | 218±3μs             |    0.61 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'sparse')     |
| -        | 4.27±0.05ms          | 856±10μs            |    0.2  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'dense')      |
| -        | 5.19±0.05ms          | 383±1μs             |    0.07 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')     |
| -        | 19.9±0.06ms          | 2.82±0.01ms         |    0.14 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'dense')      |
| -        | 22.8±0.06ms          | 636±7μs             |    0.03 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')     |
| -        | 479±1μs              | 134±0.8μs           |    0.28 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse') |
| -        | 4.74±0.04ms          | 631±2μs             |    0.13 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'dense')  |
| -        | 5.24±0.03ms          | 141±4μs             |    0.03 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse') |
| -        | 13.1±0.2ms           | 1.31±0.01ms         |    0.1  | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'dense')  |
| -        | 14.0±0.1ms           | 141±0.8μs           |    0.01 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse') |
| -        | 133±2μs              | 75.4±0.6μs          |    0.57 | solve.TimeMatrixOperations.time_rref(3, 0)                           |
| -        | 249±0.5μs            | 89.1±0.4μs          |    0.36 | solve.TimeMatrixOperations.time_rref(4, 0)                           |
| -        | 24.3±0.1ms           | 11.5±0.04ms         |    0.47 | solve.TimeSolveLinSys189x49.time_solve_lin_sys                       |
| -        | 28.7±0.2ms           | 15.5±0.1ms          |    0.54 | solve.TimeSparseSystem.time_linsolve_Aaug(20)                        |
| -        | 56.3±0.6ms           | 25.2±0.2ms          |    0.45 | solve.TimeSparseSystem.time_linsolve_Aaug(30)                        |
| -        | 28.4±0.2ms           | 15.4±0.1ms          |    0.54 | solve.TimeSparseSystem.time_linsolve_Ab(20)                          |
| -        | 55.7±0.4ms           | 24.8±0.3ms          |    0.45 | solve.TimeSparseSystem.time_linsolve_Ab(30)                          |

Full benchmark results can be found as artifacts in GitHub Actions
(click on checks at the top of the PR).

@oscarbenjamin
Copy link
Contributor Author

Yes I know that, but it will be significantly fewer errors this way than not understanding Function subclasses.

It is possible to tell pyright that Function.__new__ returns Expr just by adding -> Expr to the signature and then type: ignore at the lines that contradict that. I think we should be aiming to make this work in general for user code though (even if that means sympy's internal code needing to use a lot of type: ignore).

It is awkward to make this work but all of the reasons it is awkward are directly connected to the sorts of thing that I already thought were poor design decisions regardless of any type checking tool.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

As for the suggestion of inverting the if check no that won't work because we also need to be able to do this:

Yes I know that, but it will be significantly fewer errors this way than from not understanding Function subclasses.

And really the problem here is the Function('f') behavior, not class f(Function). So if anything were to not work out of the box with a type checker, it should be the thing that dynamically creates a class.

Agreed. A lot of the complexity here comes from two things:

  1. Undefined functions are classes
  2. Function is overloaded to return an undefined function or be a superclass for defined functions.

Both of those are design flaws. Neither is fixable without breaking compatibility though.

Making Function subclasses like cos be objects could be really hard, but I wonder just how hard it would be to make Function('f') an object. I doubt many people do f = Function('f'); class g(f):..., and I think we can make isinstance continue to work with __instancecheck__. It wouldn't solve the overloading problem, of course, but there's quite a few other issues related to undefined functions being dynamic (like pickling not working).

It is awkward to make this work but all of the reasons it is awkward are directly connected to the sorts of thing that I already thought were poor design decisions regardless of any type checking tool.

Yes, but making a new subclass that has to be used everywhere is just trading one awkwardness for another. The way I see it, we can either

  1. Tell the type checker that Function.__new__ always returns a Function subclass, or
  2. Create a separate DefinedFunction subclass as you've used here and use it everywhere.

Option 1 is very simple. It doesn't require changes throughout the codebase, and doesn't introduce (or suggest) any backwards incompatibilities. The downside is that some type checks are still wrong, but it's still a significant improvement over the current status quo. And even after option 2 there are still thousands of other type errors from pyright.

Option 2 requires changing things throughout the whole codebase. It's maybe not technically a compatibility break, but it looks like one, and could easily become one in the future if people aren't careful. For instance, the custom functions guide now becomes wrong: "this guide serves both as a guide to end users who want to define their own custom functions and to SymPy developers wishing to extend the functions included with SymPy." It would need to be updated to say "Functions defined inside of SymPy should use DefinedFunction". That immediately opens the question of whether DefinedFunction is a public API or just for internal use. I don't want it to be public because I don't like it as as solution. I'd rather Function always be the class, and something else (breaking) happen for UndefinedFunctions. If we're going to break things it should be a solution that we actually like, and it should fix all the known issues with functions, not just this one issue with type checkers.

@oscarbenjamin
Copy link
Contributor Author

Alternatively, can we just tell it that __new__ returns Union[UndefinedFunction, Function]?

It really needs to be Union[type[AppliedUndef], Function] but no that does not work. The union would mean that only methods common to both types can be used which would make most things look like an error in both cases:
image

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

Also, I'm pretty ignorant of type stuff, but can this be solved with overloads? If __new__ is passed as string it should return an UndefinedFunction, and otherwise it returns a Function instance. That would break type checking of cos('x') but shouldn't affect correct code.

@oscarbenjamin
Copy link
Contributor Author

Also, I'm pretty ignorant of type stuff, but can this be solved with overloads? If __new__ is passed as string it should return an UndefinedFunction

You might be ignorant of typing but I'm pretty sure you know this:

In [1]: cos('1')
Out[1]: cos(1)

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

You might be ignorant of typing but I'm pretty sure you know this:

That's not supposed to happen and we should get rid of it. That's why I said it wouldn't be able to catch that error, but it should work for correct code. If the type checker breaks down in that one corner case, that seems like a much better alternative to having it not work on any undefined functions.

@oscarbenjamin
Copy link
Contributor Author

Option 2 requires changing things throughout the whole codebase.

It's a very simple change though. It just changes the base class to be one whose __new__ method does only what is needed for each of the subclasses that inherits from it.

We have two uses of Function:

  1. f = Function('f')
  2. class sin(Function): ...

If I was to make a hypothetical grand redesign of how all of this works I would keep the first and break the second. Users use f = Function('f') all the time and there is nothing particularly wrong with it as an API. In my hypothetical redesign though we would note use classes as expression heads at all so class sin(Function) would not make any sense any more.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2023

Making all expression heads or even just all Function subclasses objects is a much harder break than just making Function('f') an object. It's a good idea but it will require a significant effort to be able to do it. It will most likely require some nasty meta-programming to keep some level of backwards compatibility (which would itself break all type checkers). Either that or we make a clean break in SymPy 2.0 that requires all custom SymPy user code to be updated.

@oscarbenjamin
Copy link
Contributor Author

We could possibly use overload to say that if the argument is a string then Function.__new__ will return an undefined function. It can also happen when the argument is a Symbol (Function(Symbol('f'))) and there isn't a way to overload that. I would rather aim towards a situation where the types are actually accurate (so that they can be checked) but I can give this a try.

u = [a.name for a in args if isinstance(a, UndefinedFunction)]
if u:
raise TypeError('Invalid argument: expecting an expression, not UndefinedFunction%s: %s' % (
's'*(len(u) > 1), ', '.join(u)))
obj = super().__new__(cls, *args, **options)
obj: Expr = super().__new__(cls, *args, **options) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _new_ should be used directly from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type: ignore here is because mypy rejects having a cls.__new__ method whose return hint does not imply that it returns an instance of cls:

from __future__ import annotations

class A:
    def __new__(cls) -> A:
        return super().__new__(cls)

class B(A):
    def __new__(cls) -> A:
        return super().__new__(cls)

reveal_type(B())

Here mypy rejects this although pyright accepts it:

$ mypy q.py 
q.py:8: error: Incompatible return type for "__new__" (returns "A", but must return a subtype of "B")  [misc]
q.py:11: note: Revealed type is "q.B"
Found 1 error in 1 file (checked 1 source file)
$ pyright q.py 
  q.py:11:13 - information: Type of "B()" is "A"
0 errors, 0 warnings, 1 information 

I could change this to say that the return type is AppliedUndef and then mypy would not complain. I am not sure if this should really be considered a bug in mypy though.

The return type Expr was needed for Function.__new__ because you can have e.g. sin(pi) -> 0 so we have no guarantee that sin(obj) is of type sin but it should always be of type Expr. With AppliedUndef there is no evaluation though so it would be accurate to say that it returns AppliedUndef. I'm just not sure if that is more useful to users than Expr (which is not incorrect since AppliedUndef is a subclass of Expr).

@oscarbenjamin
Copy link
Contributor Author

We could possibly use overload to say that if the argument is a string then Function.__new__ will return an undefined function

It is possible to make this work and to get pyright to understand it like this:

class Function:
    @overload
    def __new__(cls, *args: str, **options) -> Type[AppliedUndef]:
        ...

    @overload
    def __new__(cls, *args: int | float | Expr, **options) -> Expr:
        ...

    @cacheit
    def __new__(cls, *args, **options) -> Type[AppliedUndef] | Expr:
        # body of __new__

Then pyright understands what is happening but mypy rejects this:

sympy/core/function.py:443: error: "__new__" must return a class instance (got "Type[AppliedUndef]")  [misc]
sympy/core/function.py:447: error: Incompatible return type for "__new__" (returns "Expr", but must return a subtype of "Function")  [misc]
sympy/core/function.py:451: error: "__new__" must return a class instance (got "Union[Type[AppliedUndef], Expr]")  [misc]

Those can be ignored with type: ignore but mypy still thinks that Function('f') is of type Function rather than Type[AppliedUndef]. So mypy will complain about something like f = Function('f'); e = f(x):

sympy/solvers/ode/single.py:1999: error: "Function" not callable  [operator]
sympy/solvers/ode/single.py:2000: error: "Function" not callable  [operator]
sympy/solvers/ode/single.py:2008: error: "Function" not callable  [operator]

@sylee957
Copy link
Member

sylee957 commented May 5, 2023

I may suggest the other way around, to split the class factory part, and the object constructor part from the Function.
The constructor part could just be Function
let's say, we cound encourage users to change the API with f = function('f'), or could be other workaround to make that both working.

@oscarbenjamin
Copy link
Contributor Author

we could encourage users to change the API with f = function('f')

We would still need to support the existing f = Function('f') and we would still need a way to make the type hints work for Function.__new__. I think that this changes what users should do in a way that is unnecessary. Using Function as a base class is much less common than using Function to create an undefined function object.

@oscarbenjamin
Copy link
Contributor Author

I opened a mypy issue (python/mypy#15182) but I don't think that there is any way to make this work for mypy without adding DefinedFunction or similar: there needs to be two separate classes.

@sylee957
Copy link
Member

sylee957 commented May 5, 2023

I thought that def Function could be okay too, but the latest edit to the tutorial rather starts encouraging using subclasses.
https://docs.sympy.org/latest/guides/custom-functions.html#creating-a-custom-function
That could be edited too with DefinedFunction

@bjodah
Copy link
Member

bjodah commented May 15, 2023

Getting type hints working with SymPy would be great so I really like the direction of this PR! I don't know enough about the implementation details around Function to have much valuable input.

@asmeurer wrote:

Either that or we make a clean break in SymPy 2.0 that requires all custom SymPy user code to be updated.

I opened gh-25145 to discuss some ramifications around a future 2.0.

@oscarbenjamin
Copy link
Contributor Author

Currently this has merge conflicts but I think that it would be good to get this in.

There are various alternative suggestions above but they are mostly irrelevant to the main point: without substantial changes to the codebase or breaking compatibilty the approach here is the best way to make accurate type hints for common usage of sympy. The downside is that it needs an extra class that doesn't really do anything but at the same time that class doesn't do anything so it is not really a significant problem.

Various people seem to object to anything related to typing on general principle rather than because it has any significant downsides for sympy. The number one usage of type hints is just editor autocompletion which is something that many Python programmers now come to expect. Not supporting basic type inference for editors will make many people dislike using sympy.

I think that in future it would be better if interfaces were designed in a way that is simpler from a typing perspective. Many of the things that type checkers struggle with in the sympy codebase are also things that are problematic for other reasons as well. The point of this PR is not to change those things but just to find the minimal way to get type checkers to understand the code as it is.

Regardless of anything happening in the future like sympy 2.0 or a new way to define functions like sin, cos etc there is no way to move forwards with typing in sympy at all unless it starts with something like this PR. Any possibility of making type inference work for users is blocked on this PR.

@sylee957
Copy link
Member

sylee957 commented Jan 30, 2024

I would vote on going this
I would also note that it may take significantly more effort from the author to reach the success,
than multiple commentors suggestions other directions, which may not reached the success yet.

I agree that I am also scared by lots of red underlines on SymPy codebase, that editors gives.

And I also agree that avoiding type unhappy code at the first place, just gives more definite and concise code,
than many ways to write ‘super dynamic’ code, which I now consider very esoteric programming, if I am a new beginner of Python now.

zachetienne added a commit to nrpy/nrpy that referenced this pull request Feb 6, 2024
…mpy] was a mistake to add it before adding the corresponding type hints." -- sympy/sympy#22337 . So we disable it on sympy as well. Check also this for progress in adding proper typehinting to sympy: sympy/sympy#25103
@oscarbenjamin oscarbenjamin mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Merge conflict mypy pyright Related to pyright, pylance and their vscode extensions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants