Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange interpolation at boundary #9

Closed
vladdez opened this issue Jul 7, 2022 · 19 comments
Closed

Strange interpolation at boundary #9

vladdez opened this issue Jul 7, 2022 · 19 comments

Comments

@vladdez
Copy link
Contributor

vladdez commented Jul 7, 2022

Hi,
I am plotting some EEG topographies together with @behinger. We observed strange interpolation at the top boundary. This is with the default interpolator ClaughTochter
image

DelaunyMesh has same issue
image

Any ideas? Has it to do with the fill-value?
Cheers,
Vladimir & Bene

@palday
Copy link
Collaborator

palday commented Jul 8, 2022

I suspect the problem would also occur at all the other boundaries if there were electrodes close to them. I'll investigate a bit.

@yakir12
Copy link

yakir12 commented Jul 19, 2022

I think that the main problem here is that of extrapolation: How does one correctly extrapolate from data in two or three dimensional irregular coordinates?

The approach taken here is to add an outer perimeter that encloses the known data. This perimeter can have a certain geometry (shape), padding (how far out does it stretch outside the known coordinates), pad_value (defaults to zero), and number of discrete data points it has. All of these parameters influence the "quality" of the extrapolation. A perimeter with low padding (i.e. close to the data), low pad value (relative to the data), and densely populated will "push away" hot spots from their original coordinates, which is what we're seeing here.

I therefore suspect that datasets with a relatively low number of points will suffer more than denser datasets.

I managed to improve on some examples by choosing values that are different to the defaults. However, no set of values worked for all of my examples (i.e. I did not find globally better defaults).

One expensive idea I imagine could work is if we used some diffusion model to allow the data to diffuse outwards freely and then crop the result with some shape (i.e. geometry). This seems to me like the most natural type of extrapolation.

For context, neither Interpolations.jl nor Dierckx.jl tackle 2D extrapolation of irregularly spaced coordinates, so this is not a simple problem.

@behinger
Copy link
Collaborator

that makes a lot of sense - iirc MNE sets the padding values to the closest electrode, instead of having the same value for all padded locations (not sure exactly how padding works here).

I could not set padding=0. or padding=0.2, I could only change the pad_value. Do you know why?

@yakir12
Copy link

yakir12 commented Jul 19, 2022

sets the padding values to the closest electrode

This would be an improvement, but I suspect it won't always work. I wonder how do they manage to end up with a circle/ellipse using that rule..?

I could not set padding=0. or padding=0.2, I could only change the pad_value. Do you know why?

I suspect it's because there is a depreciation check to warn about that specific keyword that has been used before for padding figures (actually Scenes) from within the plotting function. I see the same issue (unrelated to this specific issue though). It's worth filing an issue separately and I'm sure it would be appreciated if you did so 😄

@behinger
Copy link
Collaborator

Actually, I missremembered the exact comment. They were planning to do that, but now just put it to the mean value as far as I can see:
https://github.com/mne-tools/mne-python/blob/9e4a0b492299d3638203e2e6d2264ea445b13ac0/mne/viz/topomap.py#L642

They either put the extra points on the convex hull, or on the circle of the plot.

EEGlab uses griddata internally which simply returns NaNs for extrapolated values.

Proposed solution: Put pad_value = mean(data) by default. (havent tried it)

Regarding the issue, I wrote it on my list, but will soon be on parental leave, so not sure it will happen..

@behinger
Copy link
Collaborator

I fixed the padding bug, by renaming it to enlarge, padding seems to be somewhat protected, thus als no issue on makie.jl


Just in case here was my finished MWE, before I thought that padding might be protected somehow @SimonDanisch might immediately know whether it is worth to post to Makie.jl or not.

	@recipe(TestPlot,data) do scene
	return Attributes(
		padding = 0.1
	)
end
function Makie.plot!(p::TestPlot)
	plot(p.data)
end
	testplot([1,2,3],padding=5)

@SimonDanisch
Copy link
Member

Ah yeah, I guess that was an old, deprecated scene attribute ...

@yakir12
Copy link

yakir12 commented Jul 20, 2022

A really easy cheat we could use to avoid the original issue here is to treat the raw data as a zero-image where the pixels closest to the coordinates are equal to the coordinates' corresponding values. Then convolve the image with say a gaussian with large enough variance. The result can then either be the end result of the topology, or can be used as the input data to the topology. That's actually really simple.

@behinger
Copy link
Collaborator

behinger commented Jul 20, 2022

@ykair12 (I hope I got the idea) while possible, I don't think that would be a great interpolator, if variance is too small, every electrode is simply a "bump", if variance is too large, we get mean value everywhere. Thus each electrode distance would require their specific variance specification - if the distance is too large, we would automatcally interpolate with 0 . IIRC, something very similar is implemented via #5 ScatteredInterpolation.jl directly as an interpolator via Gaussian

Maybe I understood it wrongly, how would that solve the boundary issue? In the boundary we would interpolate towards 0 still - correct?

@yakir12
Copy link

yakir12 commented Jul 20, 2022

I agree that we would need to choose a good-enough variance, just like a smoothing factor. But unlike introducing novel zero points at the perimeter that "push away" peaks from their original coordinates, what I suggest will keep peaks at their original locations.

@yakir12
Copy link

yakir12 commented Jul 21, 2022

Here is an implementation of my suggestion:

using SparseArrays, Statistics, LinearAlgebra
using GLMakie, ImageFiltering, Delaunay, Interpolations
function toindex(x, r, n)
    x += r
    x /= 2r/(n - 1)
    return round(Int, x + 1)
end
function sparseimg(v, cxy, r, n)
    i = toindex.(first.(cxy), r, n)
    j = toindex.(last.(cxy), r, n)
    return collect(sparse(i, j, v, n, n))
end
function plottopo(v, xy; buff = 0.05, l = 13, σ_factor = 8)
    n = 2l + 1
    c = mean(xy)
    xy .-= c
    r = maximum(norm, xy)
    r *= 1 + buff
    img0 = sparseimg(v, xy, r, n)
    imgw = imfilter(img0, Kernel.gaussian(n/σ_factor))
    xs = range(-r, r, n) .+ c[1]
    ys = range(-r, r, n) .+ c[2]
    #
    keep = [ij for ij in CartesianIndices((n, n)) if norm(Tuple(ij) .- l .- 1)  l]
    #
    data = imgw[keep]
    positions = Point2.(xs, ys')[keep]
    itp = LinearInterpolation((xs, ys), imgw)
    for θ in range(0, 2π, 51)[1:end-1]
        p = c + r * Point2(reverse(sincos(θ)))
        push!(positions, p)
        push!(data, itp[p...])
    end
    #
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    #
    fig = Figure()
    ax = Axis(fig[1,1], aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy .+ c, color = v, colormap = :reds)
    fig
end
using TopoPlots
data, positions = TopoPlots.example_data()
plottopo(vec(data[:,6,3]), positions)

Which looks like this:
tmp

Now it's just the matter of integrating this into the existing code here. I'll start a draft PR.

@yakir12
Copy link

yakir12 commented Jul 21, 2022

I found an even better shortcut: instead of convolving gaussian filters with a sparse image, I can just compose a mixture model from Distributions.jl! Much much cheaper:

using Statistics, LinearAlgebra
using GLMakie, Delaunay, Distributions
function random_disk_points(c, n, r)
    positions = fill(c, n)
    for i in 1:n
        R = r * sqrt(rand())
        θ = 2π * rand()
        positions[i] += Point2(R .* sincos(θ))
    end
    return positions
end
function get_perimeter(c, n, r)
    positions = fill(c, n)
    for (i, θ) in pairs(range(0, 2π, n + 1)[1:end-1])
        positions[i] += Point2(r .* sincos(θ))
    end
    return positions
end
function plottopo(v, xy; buff = 0.05, σ_factor = 5, n = 200)
    c = mean(xy)
    r = maximum(norm, xy .- c)
    r *= 1 + buff
    positions = copy(xy)
    random = random_disk_points(c, n, r)
    perimeter = get_perimeter(c, 50, r)
    append!(positions, random, perimeter)
    σ = r/σ_factor
    p = v .- minimum(v)
    p /= sum(p)
    a = MixtureModel(MvNormal, tuple.(xy, σ), p)
    data = pdf.(Ref(a), positions)
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    fig = Figure()
    ax = Axis(fig[1,1], aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy, color = v, colormap = :reds)
    fig
end
using TopoPlots
data, xy = TopoPlots.example_data()
v = vec(data[:,6,3])
fig = plottopo(v, xy)

tmp

@behinger
Copy link
Collaborator

cool! could you plot data[:,340,1] instead?

One Q: Maybe I missunderstand something, but isnt there the same artefact as in the previous interpolators?

@yakir12
Copy link

yakir12 commented Jul 21, 2022

cool! could you plot data[:,340,1] instead?

Sure:
tmp

One Q: Maybe I missunderstand something, but isnt there the same artefact as in the previous interpolators?

No. The points along the perimeter in TopoPlots.jl are all zero-valued, and therefore affect the resulting gradient. The points along the perimeter here are calculated from the mixture model and are therefore closer to the data we have (without the need to assume that they fall to absolute zero exactly at the perimeter).

Here is an example where the standard deviation is much smaller:

fig = plottopo(v, xy; σ_factor=20, n=10000)

You can see that the peaks match the data exactly:
tmp

@behinger
Copy link
Collaborator

Ah. I was confused about the scatter colorscale.

I tried playing around with your solution, not sure whats going on, this is how it should look like (data340)
grafik

could it be that negative values are "removed" due to the v .- minimum(v)?

@yakir12
Copy link

yakir12 commented Jul 22, 2022

Ah. I was confused about the scatter colorscale.

Yeah, I just set the color to the value of the specific point so that I'll better understand the contribution of each point to the resulting gradient.

Late last night, I realised why you asked me to try data[:,340,1] and how my implementation (with the mixture model) was wrong! Instead of using the Probability Density Function of a mixture model, I should use the sum of the Gaussian functions directly.

Here I define a gaussian function with a set standard deviation, a mean equal to the coordinate of the data, and a maximum equal to the data's "height" or value. The sum of all the points' gaussians describes the topology of the data.

Here is the implementation and result.

using Statistics, LinearAlgebra
using GLMakie, Delaunay, Distributions
iso_2d_gaussian(μx, μy, σ, v, x, y) = v*exp(-((x - μx)^2 + (y - μy)^2)/2σ^2) 
iso_2d_gaussian(μ, σ, v, xy) = iso_2d_gaussian..., σ, v, xy...)
function random_disk_points(c, n, r)
    positions = fill(c, n)
    for i in 1:n
        R = r * sqrt(rand())
        θ = 2π * rand()
        positions[i] += Point2(R .* sincos(θ))
    end
    return positions
end
function get_perimeter(c, n, r)
    positions = fill(c, n)
    for (i, θ) in pairs(range(0, 2π, n + 1)[1:end-1])
        positions[i] += Point2(r .* sincos(θ))
    end
    return positions
end
function plottopo(v, xy; buff = 0.05, σ_factor = 5, n = 200)
    c = mean(xy)
    r = maximum(norm, xy .- c)
    r *= 1 + buff
    positions = copy(xy)
    random = random_disk_points(c, n, r)
    perimeter = get_perimeter(c, 50, r)
    append!(positions, random, perimeter)
    σ = r/σ_factor
    fun = x -> sum(zip(xy, v)) do (μ, h)
        iso_2d_gaussian(μ, σ, h, x)
    end
    data = fun.(positions)
    m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));
    fig = Figure()
    ax = Axis(fig[1,1])#, aspect = DataAspect())
    mesh!(ax, m.points, m.simplices, color = data)
    scatter!(ax, xy, color = :transparent, strokecolor = :black, strokewidth = 1, markersize = 5)#v, colormap = :reds)
    fig
end
using TopoPlots
data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])
fig = plottopo(v, xy)

tmp

I think that while this is better than introducing tons of zeros (or pad_values) at the perimeter, it still suffers from the heavy-handed smoothing...

@behinger
Copy link
Collaborator

I guess it works well for high density caps, and maybe less so for 10/20 caps?
I gess that is true for all interpolation schemes.

I think it is worth to look into the MNE scheme how they do it (linked above), and what griddata in Matlab does

@yakir12
Copy link

yakir12 commented Jul 22, 2022

OK, what a loop. I was wrong about a few things, most importantly, Dierckx.jl does indeed tackle 2D extrapolation of irregularly spaced coordinates! Following is a really simple implementation that seems to work well (although there are some linear artefacts in the extrapolation).

There are two instances where new points need to be added to the data:

  1. along the perimeter
  2. gridded data for plotting the contour lines

While the interpolator (i.e. Dierckx's spline in the following case) is used to generate points for the second instance, the first instance just uses zeros (i.e. pad_value) in the current implementation of TopoPlots.jl. Here, I use the interpolator to generate points for the perimeter as well.

My next step is to try and use the (superior?) ClaughTochter interpolator.

using GLMakie
using TopoPlots

using Statistics, LinearAlgebra
using Dierckx, Delaunay

# get the data 
data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])

# some parameters
buff = 0.05
kx = ky = 2
s = 0.5
resolution = 512

# create the interpolating spline
itp = Spline2D(first.(xy), last.(xy), v; kx, ky, s)

# create perimeter
c = mean(xy)
r = maximum(norm, xy .- c)
r *= 1 + buff
perimeter_xy = decompose(Point2f, Circle(c, r))
perimeter_v = evaluate(itp, first.(perimeter_xy), last.(perimeter_xy))

# add the perimeter to the data
positions = vcat(xy, perimeter_xy)
data = vcat(v, perimeter_v)

# delaunay triangulation for the mesh
m = delaunay(convert(Matrix{Float64}, hcat(first.(positions), last.(positions))));

# gridded data for the contour
xl = range(c[1] - r, c[1] + r, resolution)
yl = range(c[2] - r, c[2] + r, resolution)
z = evalgrid(itp, xl, yl)
for (i, x) in pairs(xl), (j, y) in pairs(yl)
    if norm(Point2f(x, y) - c) > r
        z[i, j] = 0 # could be set to NaN too...
    end
end

# plot it all
fig = Figure()
ax = Axis(fig[1,1], aspect = DataAspect())
mesh!(ax, m.points, m.simplices, color = data)
contour!(ax, xl, yl, z, levels=6)
scatter!(ax, xy, color = :transparent, strokecolor = :black, strokewidth = 1, markersize = 5)

tmp

@yakir12
Copy link

yakir12 commented Jul 25, 2022

The saga continues...

I found that given some data "within a circle", such as the case with EEG, the extrapolation problem is easiest to solve if we just set the 4 corners of the enclosing square to zero. So by adding 4 new coordinates to the original data, one coordinate at each corner of the square that encloses the (enlarged) circle that itself encloses all the original coordinates, and by setting those 4 points to zero, we solve the whole thing. We now have our original data plus 4 zero corners, and now we can interpolate within this square all the data we need (i.e. the perimeter of the circle for the Delaunay mesh plot, and the gridded data within that circle for the contour plot). The only assumption is that data at the corners of the enclosing square is equal to zero (or some user defined value).

In the case of rectangular data, enlarging/extrapolating the data is dubious to begin with, but the same principals apply here as well.

using Statistics, LinearAlgebra
using SciPy, Delaunay

using GLMakie
using TopoPlots

function get_circle(xy; buff = 0.1)
    c = mean(xy)
    r = (1 + buff)*maximum(x -> norm(x - c), xy)
    return c, r
end

function get_boundingbox(c, r)
    bb_xy = Point2f[c + r*Point2f(sx, sy) for sx in (-1, 1) for sy in (-1, 1)]
    bb_v = zeros(4)
    return bb_xy, bb_v
end

function get_interpolator(xy, v, c, r; tol = 1e-6, maxiter = 400, rescale = false)
    # get the bounding box
    bb_xy, bb_v = get_boundingbox(c, r)
    append!(bb_xy, xy)
    append!(bb_v, v)

    # interpolate within the bounding box
    itp = SciPy.interpolate.CloughTocher2DInterpolator(Tuple.(bb_xy), bb_v; tol, maxiter, rescale)
    return itp
end


function plottopo(v, xy)
    # find enclosing circle
    c, r = get_circle(xy)
    itp = get_interpolator(xy, v, c, r)

    # gridded data
    resolution = 100
    xl = range(c[1] - r, c[1] + r, length = resolution)
    yl = range(c[2] - r, c[2] + r, length = resolution)
    z = collect(itp(xl' .* ones(length(yl)), ones(length(xl))' .* yl)')

    # set interpolated data outside enclosing circle to NaN
    incircle = [norm(Point2f(x, y) - c)  r for (i, x) in pairs(xl), (j, y) in pairs(yl)]
    z[(!).(incircle)] .= NaN

    # add bounding geometry to the data
    cxy = decompose(Point2f, Circle(c, r))
    cv = only.(itp.(cxy))
    xyl = Point2f.(xl, yl')
    append!(cxy, xyl[incircle])
    append!(cv, z[incircle])

    # delaunay triangulation for the mesh
    m = delaunay(convert(Matrix{Float64}, hcat(first.(cxy), last.(cxy))));

    # plot it all
    fig = Figure()
    ax = Axis(fig[1,1])
    mesh!(ax, m.points, m.simplices, color = cv, colormap = Reverse(:RdBu), shading=false)
    contour!(ax, xl, yl, z, levels=6, color=(:black, 0.5), linestyle=:dot)
    scatter!(ax, xy, color = v, colormap = Reverse(:RdBu), strokecolor = :black, strokewidth = 1, markersize = 5)
    return fig
end

and

data, xy = TopoPlots.example_data()
v = vec(data[:,340,1])
plottopo(v, xy)

results in
tmp

yakir12 added a commit to yakir12/TopoPlots.jl that referenced this issue Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants