Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👒 Include function signature in reactive edge - fix #177 #538

Merged
merged 28 commits into from
Oct 7, 2020

Conversation

fonsp
Copy link
Owner

@fonsp fonsp commented Oct 7, 2020

If you want to try it out:

pkg> activate --temp
pkg> add Pluto#canonical-function-root-as-reactive-link
julia> import Pluto; Pluto.run()

@fonsp fonsp linked an issue Oct 7, 2020 that may be closed by this pull request
@fonsp
Copy link
Owner Author

fonsp commented Oct 7, 2020

Important bits are:

#177 is solved

https://github.com/fonsp/Pluto.jl/pull/538/files#diff-d4b8b4704dfa87745eb39b155c966ae4

New type to represent nodes in the reactive graph, with more detailed front and back edges

"Every cell is a node in the reactive graph. The nodes/point/vertices are the _cells_, and the edges/lines/arrows are the _dependencies between cells_. In a reactive notebook, these dependencies are the **global variable references and definitions**. (For the mathies: a reactive notebook is represented by a _directed multigraph_. A notebook without reactivity errors is an _acyclic directed multigraph_.) This struct contains the back edges (`references`) and forward edges (`definitions`, `funcdefs_with_signatures`, `funcdefs_without_signatures`) of a single node.
Before 0.12.0, we could have written this struct with just two fields: `references` and `definitions` (both of type `Set{Symbol}`) because we used variable names to form the reactive links. However, to support defining _multiple methods of the same function in different cells_ (https://github.com/fonsp/Pluto.jl/issues/177), we needed to change this. You might want to think about this old behavior first (try it on paper) before reading on.
The essential idea is that edges are still formed by variable names. Simple global variables (`x = 1`) are registered by their name as `Symbol`, but _function definitions_ `f(x::Int) = 5` are sometimes stored in two ways:
- by their name (`f`) as `Symbol`, in `funcdefs_without_signatures`, and
- by their name with its method signature as `FunctionNameSignaturePair`, in `funcdefs_with_signatures`.
The name _without_ signature is most important: it is used to find the reactive dependencies between cells. The name _with_ signature is needed to detect multiple cells that define methods with the _same_ signature (`f(x) = 1` and `f(x) = 2`) - this is illegal. This is why we do not collect `definitions`, `funcdefs_with_signatures` and `funcdefs_without_signatures` onto a single pile: we need them separately for different searches.
"
Base.@kwdef struct ReactiveNode
references::Set{Symbol} = Set{Symbol}()
definitions::Set{Symbol} = Set{Symbol}()
funcdefs_with_signatures::Set{FunctionNameSignaturePair} = Set{FunctionNameSignaturePair}()
funcdefs_without_signatures::Set{Symbol} = Set{Symbol}()
end
function Base.union!(a::ReactiveNode, bs::ReactiveNode...)
union!(a.references, (b.references for b in bs)...)
union!(a.definitions, (b.definitions for b in bs)...)
union!(a.funcdefs_with_signatures, (b.funcdefs_with_signatures for b in bs)...)
union!(a.funcdefs_without_signatures, (b.funcdefs_without_signatures for b in bs)...)
return a
end
"Turn a `SymbolsState` into a `ReactiveNode`. The main differences are:
- A `SymbolsState` is a nested structure of function definitions inside function definitions inside... This conversion flattens this structure by merging `SymbolsState`s from defined functions.
- `ReactiveNode` functions as a cache to improve efficienty, by turning the nested structures into multiple `Set{Symbol}`s with fast lookups."
function ReactiveNode(symstate::SymbolsState)
result = ReactiveNode(
references=Set{Symbol}(symstate.references),
definitions=Set{Symbol}(symstate.assignments),
)
# defined functions are 'exploded' into the cell's reactive node
union!(result, (ReactiveNode(body_symstate) for (_, body_symstate) in symstate.funcdefs)...)
# now we will add the function names to our edges:
push!(result.references, (symstate.funccalls .|> join_funcname_parts)...)
for (namesig, body_symstate) in symstate.funcdefs
push!(result.funcdefs_with_signatures, namesig)
push!(result.funcdefs_without_signatures, join_funcname_parts(namesig.name))
end
return result
end

"Return the cells that reference any of the given symbols. Recurses down functions calls, but not down cells."
function where_referenced(notebook::Notebook, topology::NotebookTopology, myself::Cell)::Array{Cell,1}
to_compare = union(topology[myself].definitions, topology[myself].funcdefs_without_signatures)
where_referenced(notebook, topology, to_compare)
end
function where_referenced(notebook::Notebook, topology::NotebookTopology, to_compare::Set{Symbol})::Array{Cell,1}
return filter(notebook.cells) do cell
!disjoint(to_compare, topology[cell].references)
end
end
"Return the cells that assign to any of the given symbols. Recurses down functions calls, but not down cells."
function where_assigned(notebook::Notebook, topology::NotebookTopology, myself::Cell)::Array{Cell,1}
self = topology[myself]
return filter(notebook.cells) do cell
other = topology[cell]
!(
disjoint(self.definitions, other.definitions) &&
disjoint(self.definitions, other.funcdefs_without_signatures) &&
disjoint(self.funcdefs_without_signatures, other.definitions) &&
disjoint(self.funcdefs_with_signatures, other.funcdefs_with_signatures)
)
end
end

We canonalize the expression containing the method signature

###
# CANONICALIZE FUNCTION DEFINITIONS
###
"""
Turn a function definition expression (`Expr`) into a "canonical" form, in the sense that two methods that would evaluate to the same method signature have the same canonical form. Part of a solution to https://github.com/fonsp/Pluto.jl/issues/177. Such a canonical form cannot be achieved statically with 100% correctness (impossible), but we can make it good enough to be practical.
# Wait, "evaluate to the same method signature"?
In Pluto, you cannot do definitions of **the same global variable** in different cells. This is needed for reactivity to work, and it avoid ambiguous notebooks and stateful stuff. This rule used to also apply to functions: you had to place all methods of a function in one cell. (Go and read more about methods in Julia if you haven't already.) But this is quite annoying, especially because multiple dispatch is so important in Julia code. So we allow methods of the same function to be defined across multiple cells, but we still want to throw errors when you define **multiple methods with the same signature**, because one overrides the other. For example:
```julia
julia> f(x) = 1
f (generic function with 1 method)
julia> f(x) = 2
f (generic function with 1 method)
``
After adding the second method, the function still has only 1 method. This is because the second definition overrides the first one, instead of being added to the method table. This example should be illegal in Julia, for the same reason that `f = 1` and `f = 2` is illegal. So our problem is: how do we know that two cells will define overlapping methods?
Ideally, we would just evaluate the user's code and **count methods** afterwards, letting Julia do the work. Unfortunately, we need to know this info _before_ we run cells, otherwise we don't know in which order to run a notebook! There are ways to break this circle, but it would complicate our process quite a bit.
Instead, we will do _static analysis_ on the function definition expressions to determine whether they overlap. This is non-trivial. For example, `f(x)` and `f(y::Any)` define the same method. Trickier examples are here: https://github.com/fonsp/Pluto.jl/issues/177#issuecomment-645039993
# Wait, "function definition expressions"?
For example:
```julia
e = :(function f(x::Int, y::String)
x + y
end)
dump(e, maxdepth=2)
#=
gives:
Expr
head: Symbol function
args: Array{Any}((2,))
1: Expr
2: Expr
=#
```
This first arg is the function head:
```julia
e.args[1] == :(f(x::Int, y::String))
```
# Mathematics
Our problem is to find a way to compute the equivalence relation ~ on `H × H`, with `H` the set of function head expressions, defined as:
`a ~ b` iff evaluating both expressions results in a function with exactly one method.
_(More precisely, evaluating `Expr(:function, x, Expr(:block))` with `x ∈ {a, b}`.)_
The equivalence sets are isomorphic to the set of possible Julia methods.
Instead of finding a closed form algorithm for `~`, we search for a _canonical form_: a function `canonical: H -> H` that chooses one canonical expression per equivalence class. It has the property
`canonical(a) = canonical(b)` implies `a ~ b`.
We use this **canonical form** of the function's definition expression as its "signature". We compare these canonical forms when determining whether two function expressions will result in overlapping methods.
# Example
```julia
e1 = :(f(x, z::Any))
e2 = :(g(x, y))
canonalize(e1) == canonalize(e2)
```
```julia
e1 = :(f(x))
e2 = :(g(x, y))
canonalize(e1) != canonalize(e2)
```
```julia
e1 = :(f(a::X, b::wow(ie), c, d...; e=f) where T)
e2 = :(g(z::X, z::wow(ie), z::Any, z... ) where T)
canonalize(e1) == canonalize(e2)
```
"""
function canonalize(ex::Expr)
if ex.head == :where
Expr(:where, canonalize(ex.args[1]), ex.args[2:end]...)
elseif ex.head == :call
ex.args[1] # is the function name, we dont want it
interesting = filter(ex.args[2:end]) do arg
!(arg isa Expr && arg.head == :parameters)
end
hide_argument_name.(interesting)
elseif ex.head == :(::)
canonalize(ex.args[1])
elseif ex.head == :curly || ex.head == :(<:)
# for struct definitions, which we hackily treat as functions
nothing
else
@error "Failed to canonalize this strange looking function" ex
nothing
end
end
# for `function g end`
canonalize(::Symbol) = nothing
function hide_argument_name(ex::Expr)
if ex.head == :(::) && length(ex.args) > 1
Expr(:(::), nothing, ex.args[2:end]...)
elseif ex.head == :(...)
Expr(:(...), hide_argument_name(ex.args[1]))
elseif ex.head == :kw
Expr(:kw, hide_argument_name(ex.args[1]), nothing)
else
ex
end
end
hide_argument_name(::Symbol) = Expr(:(::), nothing, :Any)
hide_argument_name(x::Any) = x

https://github.com/fonsp/Pluto.jl/blob/496d5b5c2f5dcdee348a6a9ad1972e1585151325/test/MethodSignatures.jl

@fonsp
Copy link
Owner Author

fonsp commented Oct 7, 2020

Haha the rewrite is 100x faster WHAT on the first run, because i found some code that causes lots of allocations.

On subsequent runs, there is a 1.5x - 2x runtime speedup, nice! No difference in allocs.

See results here:
https://github.com/fonsp/disorganised-mess/blob/master/pluto%20analysis%20benchmark.jl

tested on a digitalocean 2vCPU 4GB computer

(run in VS Code with Alt+Enter)

fonsp and others added 4 commits October 7, 2020 12:01
Co-authored-by: fonsp <fonsvdplas@gmail.com>
Co-authored-by: Jelmar Gerritsen <jelmargerritsen@gmail.com>
Co-authored-by: Michiel Dral <m.c.dral@gmail.com>
@fonsp fonsp changed the title Include function signature in reactive edge - fix #177 👒 Include function signature in reactive edge - fix #177 Oct 7, 2020
@fonsp fonsp merged commit 88ab8a3 into master Oct 7, 2020
@fonsp fonsp deleted the canonical-function-root-as-reactive-link branch October 7, 2020 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support defining methods of a function in multiple cells
1 participant