Support frames #134

yakir12 · 2018-09-25T18:11:14Z

I often need to do some stats on image frames from some video. Sometimes the size of the images, their number, and their encoding means it's too big to hold in memory. So... Can we have the stats hold whole frames (i.e. arrays of colors, or just arrays of floats)?
So I could:

y = [rand(10,10) for i in 1:3]
s = Series(Mean())
fit!(s, y)

and have it return the mean across the outer dimension (so it's size would be 10 x 10)?

joshday · 2018-09-25T18:24:51Z

If I'm understanding correctly, you could:

1) Create a new stat

import OnlineStatsBase

mutable struct ElementwiseStat{T<:OnlineStat} <: OnlineStat{Matrix}
    value::Matrix{T}
    n::Int 
end
ElementwiseStat(n, p, stat = Mean()) = ElementwiseStat([copy(stat) for i in 1:n, j in 1:p], 0)
OnlineStatsBase._fit!(o::ElementwiseStat, y) = (fit!.(o.value, y); o.n += 1)

fit!(ElementwiseStat(10, 10), [rand(10,10) for i in 1:3])

2) Use Broadcasting

julia> o = [Mean() for i in 1:10, j in 1:10]
10×10 Array{Mean{EqualWeight},2}:
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0  …  Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0  …  Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0

julia> fit!.(o, rand(10,10))
10×10 Array{Mean{EqualWeight},2}:
 Mean: n=1 | value=0.722181   Mean: n=1 | value=0.541984   …  Mean: n=1 | value=0.979697
 Mean: n=1 | value=0.514386   Mean: n=1 | value=0.101196      Mean: n=1 | value=0.646448
 Mean: n=1 | value=0.0673274  Mean: n=1 | value=0.271753      Mean: n=1 | value=0.619896
 Mean: n=1 | value=0.213448   Mean: n=1 | value=0.379661      Mean: n=1 | value=0.456309
 Mean: n=1 | value=0.438634   Mean: n=1 | value=0.769196      Mean: n=1 | value=0.617309
 Mean: n=1 | value=0.658326   Mean: n=1 | value=0.0374227  …  Mean: n=1 | value=0.911561
 Mean: n=1 | value=0.742625   Mean: n=1 | value=0.469128      Mean: n=1 | value=0.644414
 Mean: n=1 | value=0.741743   Mean: n=1 | value=0.0346486     Mean: n=1 | value=0.0938855
 Mean: n=1 | value=0.519947   Mean: n=1 | value=0.480253      Mean: n=1 | value=0.487188
 Mean: n=1 | value=0.539101   Mean: n=1 | value=0.926043      Mean: n=1 | value=0.341837

yakir12 · 2018-09-26T08:26:13Z

This is exactly what I wanted, awesome, thank you!
Any wisdom as to which of the two suggested methods would be faster? EDIT: after some benchmarking I found that the broadcasting method allocates ~10 times less than the method # 1, and is therefore slightly faster. Both use however twice the memory that this approach does:

function fun()
    μ = zeros(sz...)
    for i in readdir(imgs)
        μ .+= Float64.(Gray.(load(joinpath(imgs, i))))
    end
    μ ./= length(readdir(imgs))
end

Also, since the elements in the images are typically RGB{N0f8}, fit! is understandably trying to iterate through the RGB values. So I thought I'd use a FTSeries to first transform the color to a float with:

julia> s = [FTSeries(Mean(); transform = x -> Float64(Gray(x))) for i in 1:5, j in 1:5]
5×5 Array{FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))},2}:
 FTSeries
  └── Mean: n=0 | value=0.0  …  FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0

julia> fit!.(s, rand(RGB{N0f8}, 5, 5))
ERROR: MethodError: no method matching iterate(::RGB{Normed{UInt8,8}})
Closest candidates are:
  iterate(::Core.SimpleVector) at essentials.jl:578
  iterate(::Core.SimpleVector, ::Any) at essentials.jl:578
  iterate(::ExponentialBackOff) at error.jl:171
  ...
Stacktrace:
 [1] fit!(::FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))}, ::RGB{Normed{UInt8,8}}) at /home/yakir/.julia/packages/OnlineStatsBase/Se4Hf/src/OnlineStatsBase.jl:76
 [2] _broadcast_getindex at ./broadcast.jl:574 [inlined]
 [3] getindex at ./broadcast.jl:507 [inlined]
 [4] copy at ./broadcast.jl:758 [inlined]
 [5] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(fit!),Tuple{Array{FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))},2},Array{RGB{Normed{UInt8,8}},2}}}) at ./broadcast.jl:724
 [6] top-level scope at none:0

But it's still trying to iterate before applying the transform. Any way I can flag fit! to avoid that "extraneous" iteration?

joshday · 2018-09-26T10:58:35Z

Both use however twice the memory that this approach does

I'd guess swapping out the broadcasting with an explicit loop would help. You're also keeping track of less state/calculating fewer things in your function than Mean needs to.

Making FTSeries better is on my list of things to do. It currently doesn't tap into Julia inference to figure out the what the return type of the transform is. Maybe https://github.com/JuliaArrays/MappedArrays.jl would be a better approach for you.

yakir12 · 2018-09-26T12:42:10Z

This is great, thank you so much for the help.

JobJob · 2018-10-08T15:37:20Z

Thanks to both of you for raising and providing solutions to this issue.

FWIW I benchmarked the 2 approaches and found they were pretty much the same in terms of memory and performance.

ElementwiseStat BenchmarkTools.Trial:
  memory estimate:  156.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     94.104 μs (0.00% GC)
  median time:      185.957 μs (0.00% GC)
  mean time:        221.334 μs (10.07% GC)
  maximum time:     49.221 ms (99.57% GC)
  --------------
  samples:          10000
  evals/sample:     1
========================================================
Broadcasted stat BenchmarkTools.Trial:
  memory estimate:  156.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     97.328 μs (0.00% GC)
  median time:      189.127 μs (0.00% GC)
  mean time:        223.295 μs (9.99% GC)
  maximum time:     49.724 ms (99.60% GC)
  --------------
  samples:          10000
  evals/sample:     1

using BenchmarkTools
using OnlineStats
import OnlineStatsBase

mutable struct ElementwiseStat{T<:OnlineStat} <: OnlineStat{Matrix}
    value::Matrix{T}
    n::Int
end

ElementwiseStat(rows, cols, stat = Mean()) = ElementwiseStat([copy(stat) for i in 1:rows, j in 1:cols], 0)
function OnlineStatsBase._fit!(o::ElementwiseStat, y)
    fit!.(o.value, y)
    o.n += 1
end

const nrows = 100
const ncols = 200
const data_count = 10
const data = [rand(Float64, nrows, ncols) for i in 1:data_count]

const elw_stat = ElementwiseStat(nrows, ncols, Variance())
# fit!(elw_stat, data)
b_elw = @benchmark fit!($elw_stat, $(data[2]))

const stat_mat = [Variance() for i in 1:nrows, j in 1:ncols]
# foreach(img->fit!.(stat_mat, img), data)
b_bcast = @benchmark fit!.($stat_mat, $(data[2]))

println()
display.((Text("ElementwiseStat "), b_elw))
println(); println("========================================================")
display.((Text("Broadcasted stat "), b_bcast))
println()

yakir12 · 2018-10-08T19:55:34Z

Wow, that is damn near identical. Awesome.

yakir12 closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support frames #134

Support frames #134

yakir12 commented Sep 25, 2018 •

edited

joshday commented Sep 25, 2018

yakir12 commented Sep 26, 2018 •

edited

joshday commented Sep 26, 2018

yakir12 commented Sep 26, 2018

JobJob commented Oct 8, 2018

yakir12 commented Oct 8, 2018

Support frames #134

Support frames #134

Comments

yakir12 commented Sep 25, 2018 • edited

joshday commented Sep 25, 2018

1) Create a new stat

2) Use Broadcasting

yakir12 commented Sep 26, 2018 • edited

joshday commented Sep 26, 2018

yakir12 commented Sep 26, 2018

JobJob commented Oct 8, 2018

yakir12 commented Oct 8, 2018

yakir12 commented Sep 25, 2018 •

edited

yakir12 commented Sep 26, 2018 •

edited