Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support frames #134

Closed
yakir12 opened this issue Sep 25, 2018 · 6 comments
Closed

Support frames #134

yakir12 opened this issue Sep 25, 2018 · 6 comments

Comments

@yakir12
Copy link

yakir12 commented Sep 25, 2018

I often need to do some stats on image frames from some video. Sometimes the size of the images, their number, and their encoding means it's too big to hold in memory. So... Can we have the stats hold whole frames (i.e. arrays of colors, or just arrays of floats)?
So I could:

y = [rand(10,10) for i in 1:3]
s = Series(Mean())
fit!(s, y)

and have it return the mean across the outer dimension (so it's size would be 10 x 10)?

@joshday
Copy link
Owner

joshday commented Sep 25, 2018

If I'm understanding correctly, you could:

1) Create a new stat

import OnlineStatsBase

mutable struct ElementwiseStat{T<:OnlineStat} <: OnlineStat{Matrix}
    value::Matrix{T}
    n::Int 
end
ElementwiseStat(n, p, stat = Mean()) = ElementwiseStat([copy(stat) for i in 1:n, j in 1:p], 0)
OnlineStatsBase._fit!(o::ElementwiseStat, y) = (fit!.(o.value, y); o.n += 1)

fit!(ElementwiseStat(10, 10), [rand(10,10) for i in 1:3])

2) Use Broadcasting

julia> o = [Mean() for i in 1:10, j in 1:10]
10×10 Array{Mean{EqualWeight},2}:
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0  …  Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0  …  Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0
 Mean: n=0 | value=0.0  Mean: n=0 | value=0.0     Mean: n=0 | value=0.0  Mean: n=0 | value=0.0

julia> fit!.(o, rand(10,10))
10×10 Array{Mean{EqualWeight},2}:
 Mean: n=1 | value=0.722181   Mean: n=1 | value=0.541984   …  Mean: n=1 | value=0.979697
 Mean: n=1 | value=0.514386   Mean: n=1 | value=0.101196      Mean: n=1 | value=0.646448
 Mean: n=1 | value=0.0673274  Mean: n=1 | value=0.271753      Mean: n=1 | value=0.619896
 Mean: n=1 | value=0.213448   Mean: n=1 | value=0.379661      Mean: n=1 | value=0.456309
 Mean: n=1 | value=0.438634   Mean: n=1 | value=0.769196      Mean: n=1 | value=0.617309
 Mean: n=1 | value=0.658326   Mean: n=1 | value=0.0374227  …  Mean: n=1 | value=0.911561
 Mean: n=1 | value=0.742625   Mean: n=1 | value=0.469128      Mean: n=1 | value=0.644414
 Mean: n=1 | value=0.741743   Mean: n=1 | value=0.0346486     Mean: n=1 | value=0.0938855
 Mean: n=1 | value=0.519947   Mean: n=1 | value=0.480253      Mean: n=1 | value=0.487188
 Mean: n=1 | value=0.539101   Mean: n=1 | value=0.926043      Mean: n=1 | value=0.341837

@yakir12
Copy link
Author

yakir12 commented Sep 26, 2018

This is exactly what I wanted, awesome, thank you!
Any wisdom as to which of the two suggested methods would be faster? EDIT: after some benchmarking I found that the broadcasting method allocates ~10 times less than the method # 1, and is therefore slightly faster. Both use however twice the memory that this approach does:

function fun()
    μ = zeros(sz...)
    for i in readdir(imgs)
        μ .+= Float64.(Gray.(load(joinpath(imgs, i))))
    end
    μ ./= length(readdir(imgs))
end

Also, since the elements in the images are typically RGB{N0f8}, fit! is understandably trying to iterate through the RGB values. So I thought I'd use a FTSeries to first transform the color to a float with:

julia> s = [FTSeries(Mean(); transform = x -> Float64(Gray(x))) for i in 1:5, j in 1:5]
5×5 Array{FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))},2}:
 FTSeries
  └── Mean: n=0 | value=0.0    FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0
 FTSeries
  └── Mean: n=0 | value=0.0     FTSeries
  └── Mean: n=0 | value=0.0

julia> fit!.(s, rand(RGB{N0f8}, 5, 5))
ERROR: MethodError: no method matching iterate(::RGB{Normed{UInt8,8}})
Closest candidates are:
  iterate(::Core.SimpleVector) at essentials.jl:578
  iterate(::Core.SimpleVector, ::Any) at essentials.jl:578
  iterate(::ExponentialBackOff) at error.jl:171
  ...
Stacktrace:
 [1] fit!(::FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))}, ::RGB{Normed{UInt8,8}}) at /home/yakir/.julia/packages/OnlineStatsBase/Se4Hf/src/OnlineStatsBase.jl:76
 [2] _broadcast_getindex at ./broadcast.jl:574 [inlined]
 [3] getindex at ./broadcast.jl:507 [inlined]
 [4] copy at ./broadcast.jl:758 [inlined]
 [5] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(fit!),Tuple{Array{FTSeries{Number,Tuple{Mean{EqualWeight}},getfield(OnlineStats, Symbol("##16#18")),getfield(Main, Symbol("##26#28"))},2},Array{RGB{Normed{UInt8,8}},2}}}) at ./broadcast.jl:724
 [6] top-level scope at none:0

But it's still trying to iterate before applying the transform. Any way I can flag fit! to avoid that "extraneous" iteration?

@joshday
Copy link
Owner

joshday commented Sep 26, 2018

Both use however twice the memory that this approach does

I'd guess swapping out the broadcasting with an explicit loop would help. You're also keeping track of less state/calculating fewer things in your function than Mean needs to.

Making FTSeries better is on my list of things to do. It currently doesn't tap into Julia inference to figure out the what the return type of the transform is. Maybe https://github.com/JuliaArrays/MappedArrays.jl would be a better approach for you.

@yakir12
Copy link
Author

yakir12 commented Sep 26, 2018

This is great, thank you so much for the help.

@yakir12 yakir12 closed this as completed Sep 26, 2018
@JobJob
Copy link

JobJob commented Oct 8, 2018

Thanks to both of you for raising and providing solutions to this issue.

FWIW I benchmarked the 2 approaches and found they were pretty much the same in terms of memory and performance.

ElementwiseStat BenchmarkTools.Trial:
  memory estimate:  156.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     94.104 μs (0.00% GC)
  median time:      185.957 μs (0.00% GC)
  mean time:        221.334 μs (10.07% GC)
  maximum time:     49.221 ms (99.57% GC)
  --------------
  samples:          10000
  evals/sample:     1
========================================================
Broadcasted stat BenchmarkTools.Trial:
  memory estimate:  156.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     97.328 μs (0.00% GC)
  median time:      189.127 μs (0.00% GC)
  mean time:        223.295 μs (9.99% GC)
  maximum time:     49.724 ms (99.60% GC)
  --------------
  samples:          10000
  evals/sample:     1
using BenchmarkTools
using OnlineStats
import OnlineStatsBase

mutable struct ElementwiseStat{T<:OnlineStat} <: OnlineStat{Matrix}
    value::Matrix{T}
    n::Int
end

ElementwiseStat(rows, cols, stat = Mean()) = ElementwiseStat([copy(stat) for i in 1:rows, j in 1:cols], 0)
function OnlineStatsBase._fit!(o::ElementwiseStat, y)
    fit!.(o.value, y)
    o.n += 1
end

const nrows = 100
const ncols = 200
const data_count = 10
const data = [rand(Float64, nrows, ncols) for i in 1:data_count]

const elw_stat = ElementwiseStat(nrows, ncols, Variance())
# fit!(elw_stat, data)
b_elw = @benchmark fit!($elw_stat, $(data[2]))

const stat_mat = [Variance() for i in 1:nrows, j in 1:ncols]
# foreach(img->fit!.(stat_mat, img), data)
b_bcast = @benchmark fit!.($stat_mat, $(data[2]))

println()
display.((Text("ElementwiseStat "), b_elw))
println(); println("========================================================")
display.((Text("Broadcasted stat "), b_bcast))
println()

@yakir12
Copy link
Author

yakir12 commented Oct 8, 2018

Wow, that is damn near identical. Awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants