👒 Include function signature in reactive edge - fix #177 #538

fonsp · 2020-10-07T00:50:51Z

If you want to try it out:

pkg> activate --temp
pkg> add Pluto#canonical-function-root-as-reactive-link
julia> import Pluto; Pluto.run()

Co-authored-by: Karl Wessel <karl.wessel@stud.uni-goettingen.de>

fonsp · 2020-10-07T01:24:45Z

Important bits are:

#177 is solved

https://github.com/fonsp/Pluto.jl/pull/538/files#diff-d4b8b4704dfa87745eb39b155c966ae4

New type to represent nodes in the reactive graph, with more detailed front and back edges

Pluto.jl/src/analysis/ReactiveNode.jl

Lines 3 to 49 in 1fb58c4

    
           "Every cell is a node in the reactive graph. The nodes/point/vertices are the _cells_, and the edges/lines/arrows are the _dependencies between cells_. In a reactive notebook, these dependencies are the **global variable references and definitions**. (For the mathies: a reactive notebook is represented by a _directed multigraph_. A notebook without reactivity errors is an _acyclic directed multigraph_.) This struct contains the back edges (`references`) and forward edges (`definitions`, `funcdefs_with_signatures`, `funcdefs_without_signatures`) of a single node.  
        
           Before 0.12.0, we could have written this struct with just two fields: `references` and `definitions` (both of type `Set{Symbol}`) because we used variable names to form the reactive links. However, to support defining _multiple methods of the same function in different cells_ (https://github.com/fonsp/Pluto.jl/issues/177), we needed to change this. You might want to think about this old behavior first (try it on paper) before reading on. 
        
           The essential idea is that edges are still formed by variable names. Simple global variables (`x = 1`) are registered by their name as `Symbol`, but _function definitions_ `f(x::Int) = 5` are sometimes stored in two ways: 
        
           - by their name (`f`) as `Symbol`, in `funcdefs_without_signatures`, and 
        
           - by their name with its method signature as `FunctionNameSignaturePair`, in `funcdefs_with_signatures`. 
        
           The name _without_ signature is most important: it is used to find the reactive dependencies between cells. The name _with_ signature is needed to detect multiple cells that define methods with the _same_ signature (`f(x) = 1` and `f(x) = 2`) - this is illegal. This is why we do not collect `definitions`, `funcdefs_with_signatures` and `funcdefs_without_signatures` onto a single pile: we need them separately for different searches. 
        
           " 
        
           Base.@kwdef struct ReactiveNode 
        
               references::Set{Symbol} = Set{Symbol}() 
        
               definitions::Set{Symbol} = Set{Symbol}() 
        
           	funcdefs_with_signatures::Set{FunctionNameSignaturePair} = Set{FunctionNameSignaturePair}() 
        
               funcdefs_without_signatures::Set{Symbol} = Set{Symbol}() 
        
           end 
        
           function Base.union!(a::ReactiveNode, bs::ReactiveNode...) 
        
           	union!(a.references, (b.references for b in bs)...) 
        
           	union!(a.definitions, (b.definitions for b in bs)...) 
        
           	union!(a.funcdefs_with_signatures, (b.funcdefs_with_signatures for b in bs)...) 
        
           	union!(a.funcdefs_without_signatures, (b.funcdefs_without_signatures for b in bs)...) 
        
           	return a 
        
           end 
        
           "Turn a `SymbolsState` into a `ReactiveNode`. The main differences are: 
        
           - A `SymbolsState` is a nested structure of function definitions inside function definitions inside... This conversion flattens this structure by merging `SymbolsState`s from defined functions. 
        
           - `ReactiveNode` functions as a cache to improve efficienty, by turning the nested structures into multiple `Set{Symbol}`s with fast lookups." 
        
           function ReactiveNode(symstate::SymbolsState) 
        
           	result = ReactiveNode( 
        
           		references=Set{Symbol}(symstate.references),  
        
           		definitions=Set{Symbol}(symstate.assignments), 
        
           		) 
        
           	# defined functions are 'exploded' into the cell's reactive node 
        
           	union!(result, (ReactiveNode(body_symstate) for (_, body_symstate) in symstate.funcdefs)...) 
        
           	# now we will add the function names to our edges: 
        
           	push!(result.references, (symstate.funccalls .|> join_funcname_parts)...) 
        
           	for (namesig, body_symstate) in symstate.funcdefs 
        
           		push!(result.funcdefs_with_signatures, namesig) 
        
           		push!(result.funcdefs_without_signatures, join_funcname_parts(namesig.name)) 
        
           	end 
        
           	return result 
        
           end

Pluto.jl/src/analysis/Topology.jl

Lines 63 to 88 in 496d5b5

    
           "Return the cells that reference any of the given symbols. Recurses down functions calls, but not down cells." 
        
           function where_referenced(notebook::Notebook, topology::NotebookTopology, myself::Cell)::Array{Cell,1} 
        
           	to_compare = union(topology[myself].definitions, topology[myself].funcdefs_without_signatures) 
        
           	where_referenced(notebook, topology, to_compare) 
        
           end 
        
           function where_referenced(notebook::Notebook, topology::NotebookTopology, to_compare::Set{Symbol})::Array{Cell,1} 
        
           	return filter(notebook.cells) do cell 
        
           		!disjoint(to_compare, topology[cell].references) 
        
           	end 
        
           end 
        
           "Return the cells that assign to any of the given symbols. Recurses down functions calls, but not down cells." 
        
           function where_assigned(notebook::Notebook, topology::NotebookTopology, myself::Cell)::Array{Cell,1} 
        
           	self = topology[myself] 
        
           	return filter(notebook.cells) do cell 
        
           		other = topology[cell] 
        
           		!( 
        
           			disjoint(self.definitions,                 other.definitions) && 
        
           			disjoint(self.definitions,                 other.funcdefs_without_signatures) && 
        
           			disjoint(self.funcdefs_without_signatures, other.definitions) && 
        
           			disjoint(self.funcdefs_with_signatures,    other.funcdefs_with_signatures) 
        
           		) 
        
           	end 
        
           end

We canonalize the expression containing the method signature

Pluto.jl/src/analysis/ExpressionExplorer.jl

Lines 697 to 823 in 8f030b6

    
           ### 
        
           # CANONICALIZE FUNCTION DEFINITIONS 
        
           ### 
        
           """ 
        
           Turn a function definition expression (`Expr`) into a "canonical" form, in the sense that two methods that would evaluate to the same method signature have the same canonical form. Part of a solution to https://github.com/fonsp/Pluto.jl/issues/177. Such a canonical form cannot be achieved statically with 100% correctness (impossible), but we can make it good enough to be practical. 
        
           # Wait, "evaluate to the same method signature"? 
        
           In Pluto, you cannot do definitions of **the same global variable** in different cells. This is needed for reactivity to work, and it avoid ambiguous notebooks and stateful stuff. This rule used to also apply to functions: you had to place all methods of a function in one cell. (Go and read more about methods in Julia if you haven't already.) But this is quite annoying, especially because multiple dispatch is so important in Julia code. So we allow methods of the same function to be defined across multiple cells, but we still want to throw errors when you define **multiple methods with the same signature**, because one overrides the other. For example: 
        
           ```julia 
        
           julia> f(x) = 1 
        
           f (generic function with 1 method) 
        
           julia> f(x) = 2 
        
           f (generic function with 1 method) 
        
           `` 
        
           After adding the second method, the function still has only 1 method. This is because the second definition overrides the first one, instead of being added to the method table. This example should be illegal in Julia, for the same reason that `f = 1` and `f = 2` is illegal. So our problem is: how do we know that two cells will define overlapping methods?  
        
           Ideally, we would just evaluate the user's code and **count methods** afterwards, letting Julia do the work. Unfortunately, we need to know this info _before_ we run cells, otherwise we don't know in which order to run a notebook! There are ways to break this circle, but it would complicate our process quite a bit. 
        
           Instead, we will do _static analysis_ on the function definition expressions to determine whether they overlap. This is non-trivial. For example, `f(x)` and `f(y::Any)` define the same method. Trickier examples are here: https://github.com/fonsp/Pluto.jl/issues/177#issuecomment-645039993 
        
           # Wait, "function definition expressions"? 
        
           For example: 
        
           ```julia 
        
           e = :(function f(x::Int, y::String) 
        
                   x + y 
        
               end) 
        
           dump(e, maxdepth=2) 
        
           #= 
        
           gives: 
        
           Expr 
        
             head: Symbol function 
        
             args: Array{Any}((2,)) 
        
               1: Expr 
        
               2: Expr 
        
           =# 
        
           ``` 
        
           This first arg is the function head: 
        
           ```julia 
        
           e.args[1] == :(f(x::Int, y::String)) 
        
           ``` 
        
           # Mathematics 
        
           Our problem is to find a way to compute the equivalence relation ~ on `H × H`, with `H` the set of function head expressions, defined as: 
        
           `a ~ b` iff evaluating both expressions results in a function with exactly one method. 
        
           _(More precisely, evaluating `Expr(:function, x, Expr(:block))` with `x ∈ {a, b}`.)_ 
        
           The equivalence sets are isomorphic to the set of possible Julia methods. 
        
           Instead of finding a closed form algorithm for `~`, we search for a _canonical form_: a function `canonical: H -> H` that chooses one canonical expression per equivalence class. It has the property  
        
           `canonical(a) = canonical(b)` implies `a ~ b`. 
        
           We use this **canonical form** of the function's definition expression as its "signature". We compare these canonical forms when determining whether two function expressions will result in overlapping methods. 
        
           # Example 
        
           ```julia 
        
           e1 = :(f(x, z::Any)) 
        
           e2 = :(g(x, y)) 
        
           canonalize(e1) == canonalize(e2) 
        
           ``` 
        
           ```julia 
        
           e1 = :(f(x)) 
        
           e2 = :(g(x, y)) 
        
           canonalize(e1) != canonalize(e2) 
        
           ``` 
        
           ```julia 
        
           e1 = :(f(a::X, b::wow(ie), c,      d...; e=f) where T) 
        
           e2 = :(g(z::X, z::wow(ie), z::Any, z...     ) where T) 
        
           canonalize(e1) == canonalize(e2) 
        
           ``` 
        
           """ 
        
           function canonalize(ex::Expr) 
        
           	if ex.head == :where 
        
           		Expr(:where, canonalize(ex.args[1]), ex.args[2:end]...) 
        
           	elseif ex.head == :call 
        
           		ex.args[1] # is the function name, we dont want it 
        
           		interesting = filter(ex.args[2:end]) do arg 
        
           			!(arg isa Expr && arg.head == :parameters) 
        
           		end 
        
           		hide_argument_name.(interesting) 
        
               elseif ex.head == :(::) 
        
                   canonalize(ex.args[1]) 
        
               elseif ex.head == :curly || ex.head == :(<:) 
        
                   # for struct definitions, which we hackily treat as functions 
        
                   nothing 
        
               else 
        
           		@error "Failed to canonalize this strange looking function" ex 
        
           		nothing 
        
           	end 
        
           end 
        
           # for `function g end` 
        
           canonalize(::Symbol) = nothing 
        
           function hide_argument_name(ex::Expr) 
        
               if ex.head == :(::) && length(ex.args) > 1 
        
                   Expr(:(::), nothing, ex.args[2:end]...) 
        
               elseif ex.head == :(...) 
        
                   Expr(:(...), hide_argument_name(ex.args[1])) 
        
               elseif ex.head == :kw 
        
                   Expr(:kw, hide_argument_name(ex.args[1]), nothing) 
        
               else 
        
                   ex 
        
               end 
        
           end 
        
           hide_argument_name(::Symbol) = Expr(:(::), nothing, :Any) 
        
           hide_argument_name(x::Any) = x

https://github.com/fonsp/Pluto.jl/blob/496d5b5c2f5dcdee348a6a9ad1972e1585151325/test/MethodSignatures.jl

fonsp · 2020-10-07T11:02:13Z

Haha the rewrite is 100x faster WHAT on the first run, because i found some code that causes lots of allocations.

On subsequent runs, there is a 1.5x - 2x runtime speedup, nice! No difference in allocs.

See results here:
https://github.com/fonsp/disorganised-mess/blob/master/pluto%20analysis%20benchmark.jl

tested on a digitalocean 2vCPU 4GB computer

(run in VS Code with Alt+Enter)

Co-authored-by: fonsp <fonsvdplas@gmail.com> Co-authored-by: Jelmar Gerritsen <jelmargerritsen@gmail.com> Co-authored-by: Michiel Dral <m.c.dral@gmail.com>

fonsp and others added 21 commits October 6, 2020 10:32

cleanup

b0d5b6b

FunctionNameSignaturePair

e157d06

heyo

f245354

asdf

8f3ab27

✂

8fafdcb

fixie fix

a0923e5

include cell ID in to_delete_funcs

ab4c44a

Multiplemethoddefinitiontests (#537)

ab13217

Co-authored-by: Karl Wessel <karl.wessel@stud.uni-goettingen.de>

close but no sigar

71a2e8b

closer

5dd5d9d

no test no problem

8e82310

fixed bonds

8d0cc20

done!!!!

a38725a

with tests?

4af9f83

hannes

b752dd6

godsamme

09a3742

blind fix

6db8b0f

try again

4fb1bc0

asdf

6c8c4a6

fewer test on windows

965d909

idem

db5a4fc

fonsp linked an issue Oct 7, 2020 that may be closed by this pull request

Support defining methods of a function in multiple cells #177

Closed

fonsp added 2 commits October 7, 2020 00:55

cut

9f53eba

docstrings

496d5b5

fonsp and others added 4 commits October 7, 2020 12:01

mother of all coments

8f030b6

upstream (#540)

59c84d1

Co-authored-by: fonsp <fonsvdplas@gmail.com> Co-authored-by: Jelmar Gerritsen <jelmargerritsen@gmail.com> Co-authored-by: Michiel Dral <m.c.dral@gmail.com>

fixed limbo bonds

adc55ac

docstirngs

e4344b2

cleanup

1fb58c4

fonsp changed the title ~~Include function signature in reactive edge - fix #177~~ 👒 Include function signature in reactive edge - fix #177 Oct 7, 2020

fonsp merged commit 88ab8a3 into master Oct 7, 2020

fonsp deleted the canonical-function-root-as-reactive-link branch October 7, 2020 19:47

fonsp mentioned this pull request Mar 15, 2021

Pluto doesn't like adding constructors to structs #732

Closed

fonsp mentioned this pull request Sep 24, 2021

pluto notebook issues GiggleLiu/NiLang.jl#77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👒 Include function signature in reactive edge - fix #177 #538

👒 Include function signature in reactive edge - fix #177 #538

fonsp commented Oct 7, 2020 •

edited

fonsp commented Oct 7, 2020 •

edited

fonsp commented Oct 7, 2020 •

edited

👒 Include function signature in reactive edge - fix #177 #538

👒 Include function signature in reactive edge - fix #177 #538

Conversation

fonsp commented Oct 7, 2020 • edited

fonsp commented Oct 7, 2020 • edited

#177 is solved

New type to represent nodes in the reactive graph, with more detailed front and back edges

We canonalize the expression containing the method signature

fonsp commented Oct 7, 2020 • edited

fonsp commented Oct 7, 2020 •

edited

fonsp commented Oct 7, 2020 •

edited

fonsp commented Oct 7, 2020 •

edited