Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce interval storage size #98

Closed
omus opened this issue Jun 3, 2020 · 2 comments
Closed

Reduce interval storage size #98

omus opened this issue Jun 3, 2020 · 2 comments

Comments

@omus
Copy link
Collaborator

omus commented Jun 3, 2020

This is an idea I've had for a while on but haven't investigated. Currently Inclusivity is defined as:

struct Inclusivity
    first::Bool
    last::Bool
end

This structure uses 2-bytes to store these two booleans but we can theoretically reduce the size down to 2-bits. Unfortunately Julia only allows structures as small as 8-bits/1-byte but this is still an improvement. Potentially this could save a bunch of memory when it comes to storing arrays of Intervals.

@omus
Copy link
Collaborator Author

omus commented Jun 3, 2020

I did an experiment with the current version and this new version of Inclusivity:

struct Inclusivity
    val::UInt8

    Inclusivity(x::UInt8) = new(x & 0b11)
end


Inclusivity(x::Integer) = Inclusivity(trunc(UInt8, x))
Inclusivity(first::Bool, last::Bool) = Inclusivity(UInt8(first) | UInt8(last) << 1)

Base.copy(inc::Inclusivity) = Inclusivity(inc.val)
Base.broadcastable(inc::Inclusivity) = Ref(inc)

function Base.convert(::Type{T}, inc::Inclusivity) where T <: Integer
    return convert(T, getfield(inc, :val))
end

Base.first(inc::Inclusivity) = getfield(inc, :val) & 0b01 > 0
Base.last(inc::Inclusivity) = getfield(inc, :val) & 0b10 > 0

function Base.getproperty(inc::Inclusivity, f::Symbol)
    if f === :first
        first(inc)
    elseif f === :last
        last(inc)
    else
        getfield(inc, f)
    end
end

isclosed(inc::Inclusivity) = x == 0b11
isopen(inc::Inclusivity) = x == 0b00

The experiment just checked the total size of the Interval using the original and modified Inclusivity.

Original:

julia> function demo(T)
           x = Interval(T(1), T(2), true, false)
           @show T sizeof(T) sizeof(x)
       end
demo (generic function with 1 method)

julia> demo.(Base.BitInteger_types);
T = Int8
sizeof(T) = 1
sizeof(x) = 4
T = Int16
sizeof(T) = 2
sizeof(x) = 6
T = Int32
sizeof(T) = 4
sizeof(x) = 12
T = Int64
sizeof(T) = 8
sizeof(x) = 24
T = Int128
sizeof(T) = 16
sizeof(x) = 40
T = UInt8
sizeof(T) = 1
sizeof(x) = 4
T = UInt16
sizeof(T) = 2
sizeof(x) = 6
T = UInt32
sizeof(T) = 4
sizeof(x) = 12
T = UInt64
sizeof(T) = 8
sizeof(x) = 24
T = UInt128
sizeof(T) = 16
sizeof(x) = 40

Modified:

julia> function demo(T)
           x = Interval(T(1), T(2), true, false)
           @show T sizeof(T) sizeof(x)
       end
demo (generic function with 1 method)

julia> demo.(Base.BitInteger_types);
T = Int8
sizeof(T) = 1
sizeof(x) = 3
T = Int16
sizeof(T) = 2
sizeof(x) = 6
T = Int32
sizeof(T) = 4
sizeof(x) = 12
T = Int64
sizeof(T) = 8
sizeof(x) = 24
T = Int128
sizeof(T) = 16
sizeof(x) = 40
T = UInt8
sizeof(T) = 1
sizeof(x) = 3
T = UInt16
sizeof(T) = 2
sizeof(x) = 6
T = UInt32
sizeof(T) = 4
sizeof(x) = 12
T = UInt64
sizeof(T) = 8
sizeof(x) = 24
T = UInt128
sizeof(T) = 16
sizeof(x) = 40

It appears that due to memory alignment this size reduction doesn't help except for in the case where the interval uses endpoints of 1-byte

@omus omus changed the title Use 8-bits to store Inclusivity Reduce interval storage size Jun 4, 2020
@omus
Copy link
Collaborator Author

omus commented Jun 4, 2020

The work in #89 has brought up more ideas here. If we were to also introduce a representation for unbounded intervals how does that impact memory. A quick demonstration:

struct Interval1{T}
    lower::T
    upper::T
    lower_inc::Bool
    upper_inc::Bool
end

struct Interval2{T}
    lower::T
    upper::T
    lower_inc::Bool
    upper_inc::Bool
    lower_bounded::Bool
    upper_bounded::Bool
end

struct Interval3{T}
    lower::T
    upper::T
    bound::UInt8
end

struct Interval4{T,L,U}
    lower::T
    upper::T
end

We already know that due to memory alignment we can see the greatest impact with smaller T types so we'll start with Int8. Take note Interval1 cannot actually support an unbounded interval representation and is here just for comparison:

julia> sizeof(Interval1{Int8}(1, 2, true, false))
4

julia> sizeof(Interval2{Int8}(1, 0, true, false, true, false))
6

julia> sizeof(Interval3{Int8}(1, 0, 0b0101))
3

julia> sizeof(Interval4{Int8,:closed,:unbounded}(1, 0))
2

Now lets inspect vectors of these eltypes. Note that all of these representations are bits types so no pointers should be present in the vectors:

julia> Base.summarysize(fill(Interval1{Int8}(1, 2, true, false), 100))
440

julia> Base.summarysize(fill(Interval2{Int8}(1, 0, true, false, true, false), 100))
640

julia> Base.summarysize(fill(Interval3{Int8}(1, 0, 0b0101), 100))
340

julia> Base.summarysize(fill(Interval4{Int8,:closed,:unbounded}(1, 0), 100))
240

Each vector has 40 bytes of overhead otherwise the results are as expected. We'll also test vectors of mixed types:

julia> Base.summarysize(repeat([Interval1{Int8}(1, 2, true, false), Interval1{Int8}(1, 2, false, true)], outer=50))
440

julia> Base.summarysize(repeat([Interval2{Int8}(1, 0, true, false, true, false), Interval2{Int8}(0, 1, false, true, false, true)], outer=50))
640

julia> Base.summarysize(repeat([Interval3{Int8}(1, 0, 0b0101), Interval3{Int8}(0, 1, 0b1010)], outer=50))
340

julia> Base.summarysize(repeat([Interval4{Int8,:closed,:unbounded}(1, 0), Interval4{Int8,:unbounded,:closed}(0, 1)], outer=50))
844

Using Interval4 with mixed bounds does introduce some overhead.

Let's look at a larger Int64 type as well:

julia> sizeof(Interval1{Int64}(1, 2, true, false))
24

julia> sizeof(Interval2{Int64}(1, 0, true, false, true, false))
24

julia> sizeof(Interval3{Int64}(1, 0, 0b0101))
24

julia> sizeof(Interval4{Int64,:closed,:unbounded}(1, 0))
16

The vector results are similar but the mixed vector results are

julia> Base.summarysize(repeat([Interval1{Int64}(1, 2, true, false), Interval1{Int64}(1, 2, false, true)], outer=50))
2440

julia> Base.summarysize(repeat([Interval2{Int64}(1, 0, true, false, true, false), Interval2{Int64}(0, 1, false, true, false, true)], outer=50))
2440

julia> Base.summarysize(repeat([Interval3{Int64}(1, 0, 0b0101), Interval3{Int64}(0, 1, 0b1010)], outer=50))
2440

julia> Base.summarysize(repeat([Interval4{Int64,:closed,:unbounded}(1, 0), Interval4{Int64,:unbounded,:closed}(0, 1)], outer=50))
872

Now Interval4 is seeing some good improvements

Finally, we'll look at Int128:

julia> Base.summarysize(repeat([Interval1{Int128}(1, 2, true, false), Interval1{Int128}(1, 2, false, true)], outer=50))
4040

julia> Base.summarysize(repeat([Interval2{Int128}(1, 0, true, false, true, false), Interval2{Int128}(0, 1, false, true, false, true)], outer=50))
4040

julia> Base.summarysize(repeat([Interval3{Int128}(1, 0, 0b0101), Interval3{Int128}(0, 1, 0b1010)], outer=50))
4040

julia> Base.summarysize(repeat([Interval4{Int128,:closed,:unbounded}(1, 0), Interval4{Int128,:unbounded,:closed}(0, 1)], outer=50))
904

I would say that using Interval4 provides a worthwhile memory savings with larger types (32-bits and above). Even though there can be some extra overhead with smaller types the memory footprint of these types is already small making this seem like a worthy trade off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant