Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support defining methods of a function in multiple cells #177

Closed
karlwessel opened this issue Jun 10, 2020 · 19 comments · Fixed by #538
Closed

Support defining methods of a function in multiple cells #177

karlwessel opened this issue Jun 10, 2020 · 19 comments · Fixed by #538
Labels
reactivity The Pluto programming paradigm user request Requested using the feedback form inside Pluto notebooks

Comments

@karlwessel
Copy link
Contributor

The other solution is much harder to implement consistently - i.e. being able to detect multiple methods with the same argument types (instead of just detecting multiple functions with the same name) and deleting only the methods from one cell, while leaving the others.

All of this is possible within the scope of Pluto, but not something for the next couple of months :)

Originally posted by @fonsp in #109 (comment)

@karlwessel
Copy link
Contributor Author

karlwessel commented Jun 10, 2020

I would like to track this feature in an actual issue, since at least I am really really missing this.

Multiple dispatch and therefore multiple methods of the same function is a key feature of Julia. Often those methods are implemented in completely different contexts (for example Base.show). Having to put them all in the same cell makes exploratory implementation of custom types really hard.

Repository owner deleted a comment from karlwessel Jun 11, 2020
Repository owner deleted a comment from karlwessel Jun 11, 2020
@fonsp
Copy link
Owner

fonsp commented Jun 11, 2020

(split into two issues: #178)

@fonsp fonsp added the reactivity The Pluto programming paradigm label Jun 11, 2020
@fonsp
Copy link
Owner

fonsp commented Jun 16, 2020

It would be helpful if someone with an eye for edge cases (@karlwessel) would write the test cases for this before thinking up a solution.

(Writing test cases is fairly easy, and they don't need to be perfect yet.)

@fonsp
Copy link
Owner

fonsp commented Jun 16, 2020

Strictly speaking, two cells f(x::Int) = 1 & f(x::Int) = 2 should not be allowed, and we can find those cases and throw a MultipleDefinitionsError. However, there are some very difficult situations like:

f(x::Int) = 1 & f(x::Int64) = 2 (built in type synonyms)

A = Number & f(x::A) = 1 & f(x::Number) = 2 (dynamic types)

A = rand(Bool) ? Number : String & f(x::A) = 1 & f(x::Number) = 2 (aaah)

f = 1 & f(x) = 1 (variables vs methods)

f(x::Int) = 1; f(x::String) = 2 && f(x::String) = 3; f(x::Vector) = 4 (multiple methods per cell)

f(x::Int) = 1; f(x::Int) = 2 && f(x::Vector) = 3 (number of methods in code might not equal number of defined methods)

f() = 1 && f(; s) = 2 (kwargs)

f(x::T) where T = 1 && f(x) = 2

f(x::(rand(Bool) ? Int64 : Float64)) = 1 &
f(x::(rand(Bool) ? Int64 : Float64)) = 2

If all of these situations are allowed without a MultipleDefinitionsError (which is the easy solution), then a notebook could be ambiguous (ignoring the SECRET fact that the order in which cells are placed is used as a tie-breaker when two cells are mutually independent.) They need to be handled correctly.

@karlwessel
Copy link
Contributor Author

Is it possible to detect those cases during execution of a notebook by comparing the methodswith count for each function for which there are cells defining methods for? And throw a runtime exception if one cell that should increase the count doesn't because it replaced another method?

@fonsp
Copy link
Owner

fonsp commented Jun 17, 2020

That's going in the right direction! (Although my 5th example would not get detected with that method)

@karlwessel
Copy link
Contributor Author

Well, maybe we don't have to catch all corner cases...

@fonsp fonsp added the user request Requested using the feedback form inside Pluto notebooks label Jul 22, 2020
@fonsp
Copy link
Owner

fonsp commented Aug 5, 2020

Building on your counting methods idea:

It's very nice if we can keep all of the syntax analysis in one place, before running the code, and then run cells in a precomputed order. Checking the number of defined methods after running cells (in the middle of a reactive run), and then taking that information back to do something special makes that tricky. But there are three reasons to do this kind of "dynamic" code analysis instead of static anaylsis:

The common hurdles in these three are:

  • the dependency graph can only be determined after runtime
  • the dependency graph depends on the global state
  • we'd like to know things about external packages without
    • analysing their source code, nor
    • importing them into the server process

Solution?

We need to do some dynamic analysis, but we can try to keep it simple and isolated. The runner process will have three new APIs:

do_signatures_overlap(signatures::Array{NTuple{Type}})::Bool

symbols_exported_by_module(name::Symbol)::Set{Symbol}

macroexpand(expr::Expr)::Expr

to solve these three problems, respectively.

do_signatures_overlap?

Create a new test module, import all globals into the test module, and then define a dummy functions with the given signatures. If the number of methods is less than the number of signatures, then they overlap. (Thanks @karlwessel!)

symbols_exported_by_module

Create a new test module, call the using statement there, and then return Core.eval(testmodule, :(names($(name_of_usinged_module))) ).

macroexpand

This is just Base.macroexpand :)

Separate using+import from all other code

using and import already get some of special treatment - they always execute first in a "tie" scenario of a reactive run. This is a heuristic: we know that our syntax analysis fails for using so this is a hacky way to fix most scenarios. All using X sub-expressions are also extracted from cell code and stored separately.

To support the solutions above, we need to go from:

  1. syntax analysis (in server process)
  2. reactive run (in worker process)

to:

  1. mini syntax analysis to find using (in server process)
  2. call the using statements (in worker process)
  3. syntax analysis (in server process), with symbols_exported_by_module to fill in the gaps
  4. reactive run (in worker process), but use do_signatures_overlap during the run to somehow figure out if cells should err.

@fonsp
Copy link
Owner

fonsp commented Aug 5, 2020

We can simplify the do_signatures_overlap quite a bit by dropping support for:
A = B
f(x::A) = 1
f(x::B) = 2

Although it is common to have user-defined types, I hope that it's uncommon for user defined aliases to cause this kind of trouble. So:

do_signatures_overlap?

Create a new test module, import all globals into the test module, create dummy types for all unknown types, and then define dummy functions with the given signatures. If the number of methods is less than the number of signatures, then they overlap.

So:

function f(x::Int, y::MyType, z::random_type())
    asdf
end

will become

struct var"MyType" end
struct var"random_type()" end

function f(x::Int, y::var"MyType", z::var"random_type()")
end

(var"..." is how you define variables with possibly illegal names in Julia).

  1. mini syntax analysis to find using (in server process)
  2. call the using statements (in worker process)
  3. syntax analysis (in server process), with symbols_exported_by_module and do_signatures_overlap to fill in the gaps
  4. reactive run (in worker process)

@karlwessel
Copy link
Contributor Author

karlwessel commented Sep 10, 2020

f(x::Int) = 1 & f(x::Int64) = 2 (built in type synonyms)

A = Number & f(x::A) = 1 & f(x::Number) = 2 (dynamic types)

A = rand(Bool) ? Number : String & f(x::A) = 1 & f(x::Number) = 2 (aaah)

f = 1 & f(x) = 1 (variables vs methods)

f(x::Int) = 1; f(x::String) = 2 && f(x::String) = 3; f(x::Vector) = 4 (multiple methods per cell)

f(x::Int) = 1; f(x::Int) = 2 && f(x::Vector) = 3 (number of methods in code might not equal number of defined methods)

f() = 1 && f(; s) = 2 (kwargs)

f(x::T) where T = 1 && f(x) = 2

f(x::(rand(Bool) ? Int64 : Float64)) = 1 &
f(x::(rand(Bool) ? Int64 : Float64)) = 2

I guess the next step is me implementing those tests :-)

@fonsp fonsp mentioned this issue Sep 10, 2020
@fonsp
Copy link
Owner

fonsp commented Sep 10, 2020

Yes please! Although I think we will be going with a solution where half of those tests are broken :)

@karlwessel

This comment has been minimized.

@karlwessel
Copy link
Contributor Author

Yes please! Although I think we will be going with a solution where half of those tests are broken :)

Why is that? Is the method-counting version to complicated?

@fonsp

This comment has been minimized.

@karlwessel

This comment has been minimized.

@fonsp
Copy link
Owner

fonsp commented Sep 10, 2020

Why is that? Is the method-counting version to complicated?

Counting methods is not too complicated, but then there needs to be a feedback loop: too few methods were detected, so we error those cells manually, delete the methods, and then continue.

Instead of:
Syntax analysis > topological ordering > execution

where each of those steps does not depend on the results of later steps

@karlwessel
Copy link
Contributor Author

It thought that was the process you described in your comment above. What is the simpler alternative?

@Pangoraw
Copy link
Collaborator

Pangoraw commented Oct 4, 2020

Hi!
I made a little proof of concept to solve this issue using a shallow module where all the types referenced are redefined using struct var"$name" end and where the functions have an empty body (following @fonsp's idea). Iterating on @karlwessel's idea of counting the number of methods defined, we visit the methods table to extract all defined methods for a function. We can then reduce by using the signature as a key to identify which signature is defined more than once. This prevent one problem I have identified with counting: We couldn't have overridden method from the Base package for example since the number of methods would have stayed the same.

The complete process would be:

  • Collect the function signatures in ExpressionExplorer so that the symbols states contains the type signature and not just the references.
  • Before a reactive run if a function name is assigned in multiple cells we can construct this shallow module with a empty definition for every type.
  • Define a get_errors function which can identify which methods are duplicated
  • For every function name which is assigned more than once run get_errors to identify whether or not there are duplicate methods
  • Match the duplicate methods with their respective cells using a LineNumberNode containing the cell id for each shallow function
  • Dispatch errors for those cells
  • Remove previously dispatched MultipleDefinitions errors, topology[cell].assignments will no longer contains symbols for function names

What don't work yet

  • Handle curly types Union{A, B}, Option{A} and so on...
  • Handle optional parameters a::Bool=false
  • Handle ::Any Parameters (parameters without types) -> f(a) === f(a::Any)
  • Handle type aliases A = Int64
  • Currently I am "polluting" the Main module with these shallow modules, using another process with Distributed will be cleaner
  • Currently, the detection is run every time there is a change on every cells, there must be a way to filter every function that has not changed
  • Plenty of tests -> #471

A quick demo of the progress

methods_detection

I have pushed my changes on this branch if you want to take a look at the implementation and compare the changes. If the methods implemented seems good to you I can open a draft PR to discuss more technical details of the implementation 🎇

@fonsp
Copy link
Owner

fonsp commented Oct 6, 2020

(ramble)

Hi @Pangoraw!

Thanks so much for your contribution! I've sent you an email in case you would like to call!

Your work looks promising, but motivated by @karlwessel (#177 (comment)) I think we should take a different direction:
These tricky corner cases are very difficult to solve, and will probably make Pluto's internals much more complicated (if we go with the crazy solution I proposed, although I see some good middle ground solutions).

So instead, I think we should only attempt to catch super simple cases like f(x) = 1 vs f(wow::Any) = 2, and leave anything more complicated for later.

I'm also not sure whether I like having the function signature info in its SymbolState. I think it would be better to store it in its name. I have started working on this in a new branch, so far it's just refactoring old code.

During my refactoring, I found that a bunch of tricky code can be eliminated! #536

@fonsp fonsp linked a pull request Oct 7, 2020 that will close this issue
@fonsp fonsp closed this as completed in #538 Oct 7, 2020
fonsp added a commit that referenced this issue Oct 7, 2020
Co-authored-by: Karl Wessel <karl.wessel@stud.uni-goettingen.de>
Co-authored-by: fonsp <fonsvdplas@gmail.com>
Co-authored-by: Jelmar Gerritsen <jelmargerritsen@gmail.com>
Co-authored-by: Michiel Dral <m.c.dral@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reactivity The Pluto programming paradigm user request Requested using the feedback form inside Pluto notebooks
Projects
None yet
3 participants