Storing open-ness as a type parameter makes Postgres parsing type unstable #208

oxinabox · 2022-11-02T17:03:15Z

Postgres has a tzrange type, which LibPQ.jl returns as an Interval[ZonedDateTime} No openness type params, so it is type unstabl
But it doesn't know if it is open or closed because LibPQ stores intervals that may be open or closed in the same postgre column type. (including mixing them).

Given we interact with intervals from prosgre databases all the time.
and we dispatch on this type param very rarely if ever,
I wonder if it wouldn;t be better stored as a field rather than a type param.

However this would be a breaking change

The text was updated successfully, but these errors were encountered:

rofinn · 2022-11-08T21:36:11Z

FWIW, we do dispatch on those parameters in a few places, but rewriting that logic with an explicit condition or by passing those as arguments to subfunction could be a reasonable solution.

https://github.com/invenia/Intervals.jl/blob/master/src/interval.jl#L190

This would also make it easier to break up Intervals.jl into a set of IntervalSet.jl extensions as discussed here.

JuliaMath/IntervalSets.jl#67

oxinabox · 2022-11-28T15:49:24Z

@willtebbutt has a prototype for not doing this.
Not sure if he has run timing.
But it is extremely fast, and he is dealing with data that is explictly 1 array containing elements that mix boundedness.

willtebbutt · 2022-11-28T16:58:14Z

Yup. It's on a WIP MR to an internal package but there's not much code, so here's a snippet for reference:

struct DynamicInterval{T}
    infimum::T
    supremum::T
    lhs_is_open::Bool
    rhs_is_open::Bool
end

function Base.issubset(x::DynamicInterval, y::DynamicInterval)
    lower_is_in = if x.lhs_is_open
        x.infimum >= y.infimum
    else
        y.lhs_is_open ? (x.infimum > y.infimum) : (x.infimum >= y.infimum)
    end
    upper_is_in = if x.rhs_is_open
        x.supremum <= y.supremum
    else
        y.rhs_is_open ? (x.supremum < y.supremum) : (x.supremum <= y.supremum)
    end
    return lower_is_in && upper_is_in
end

Performance of construction of arrays of intervals with mixed open-ness / closed-ness and issubset is pretty good. For example:

using BenchmarkTools

_opens = [rand([true, false]) for _ in 1:1_000_000]
_closeds = [rand([true, false]) for _ in 1:1_000_000]
@benchmark [DynamicInterval(0.0, 1.0, o, c) for(o, c) in zip($_opens, $_closeds)]
BenchmarkTools.Trial: 794 samples with 1 evaluation.
 Range (min … max):  3.330 ms … 14.258 ms  ┊ GC (min … max): 0.00% …  6.20%
 Time  (median):     3.416 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.290 ms ±  4.255 ms  ┊ GC (mean ± σ):  7.31% ± 14.75%

  █              ▄                                 ▁▃▂     ▄  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄███▅▆▄▁▄█ ▇
  3.33 ms      Histogram: log(frequency) by time     14.1 ms <

 Memory estimate: 22.89 MiB, allocs estimate: 2.

xs = [DynamicInterval(0.0, 1.0, rand([true, false]), rand([true, false])) for _ in 1:1_000_000]
x = DynamicInterval(0.0, 1.0, false, false)

@benchmark map(Base.Fix1(issubset, $x), $xs)
BenchmarkTools.Trial: 2046 samples with 1 evaluation.
 Range (min … max):  2.130 ms …   3.905 ms  ┊ GC (min … max): 0.00% … 13.12%
 Time  (median):     2.389 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.436 ms ± 203.115 μs  ┊ GC (mean ± σ):  0.49% ±  2.54%

       ▁▂▅▆▅█▇▆▅▃▄▂▁                                           
  ▂▂▃▄▅█████████████▇▇▅▅▅▄▃▃▄▂▃▂▃▂▁▂▂▂▂▂▂▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▄
  2.13 ms         Histogram: frequency by time        3.36 ms <

 Memory estimate: 976.67 KiB, allocs estimate: 2.

Constrast this with:

using Intervals

_opens_typed = [rand([Open, Closed]) for _ in 1:1_000_000]
_closed_typed = [rand([Open, Closed]) for _ in 1:1_000_000]
@benchmark [Interval{o, c}(0.0, 1.0) for (o, c) in zip($_opens_typed, $_closed_typed)]
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.727 s …    2.053 s  ┊ GC (min … max): 13.63% … 22.27%
 Time  (median):     1.938 s               ┊ GC (median):    18.28%
 Time  (mean ± σ):   1.906 s ± 165.230 ms  ┊ GC (mean ± σ):  18.31% ±  4.32%

  █                                    █                   █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.73 s         Histogram: frequency by time         2.05 s <

 Memory estimate: 640.87 MiB, allocs estimate: 12000018.

xs_typed = [Interval{rand([Open, Closed]), rand([Open, Closed])}(0.0, 1.0) for _ in 1:1_000_000]
x_typed = Interval{Closed, Closed}(0.0, 1.0)

@benchmark map(Base.Fix1(issubset, $x_typed), $xs_typed)
BenchmarkTools.Trial: 85 samples with 1 evaluation.
 Range (min … max):  52.160 ms … 72.713 ms  ┊ GC (min … max):  0.00% … 0.00%
 Time  (median):     59.560 ms              ┊ GC (median):    10.26%
 Time  (mean ± σ):   59.213 ms ±  4.144 ms  ┊ GC (mean ± σ):   6.27% ± 4.96%

       █           ▂▂  ▂  ▂▂  ▂  ▂                             
  ▄▄█▆▄█▄▄█▄▆▆▄▄▁▆▄██▁▆█████▆███▄█▆▆▄▁▁▄▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  52.2 ms         Histogram: frequency by time        72.6 ms <

 Memory estimate: 31.47 MiB, allocs estimate: 1000002.

As you would hope, if you're working with a vector of intervals with fixed open/closed-ness, then the performance is better than the dynamic intervals (presumably just because it's able to eliminate a bunch of branching at compile time), but only by a factor of 3:

vs = [rand() for _ in 1:1_000_000]
@benchmark [Interval{Closed, Closed}(v - 0.5, 5.0) for v in $vs]
BenchmarkTools.Trial: 2499 samples with 1 evaluation.
 Range (min … max):  1.762 ms …   9.897 ms  ┊ GC (min … max): 0.00% …  9.05%
 Time  (median):     1.834 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.995 ms ± 505.555 μs  ┊ GC (mean ± σ):  7.18% ± 11.85%

    ▂█▆                                                        
  ▂▅████▅▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▄▄▄▃▃▂ ▃
  1.76 ms         Histogram: frequency by time        2.76 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 2.

# change end-points so that the comparison isn't entirely vaccuous
xs_typed_fixed = [Interval{Closed, Closed}(rand() - 0.5, 5.0) for _ in 1:1_000_000]
@benchmark map(Base.Fix1(issubset, $x_typed), $xs_typed_fixed)
BenchmarkTools.Trial: 6879 samples with 1 evaluation.
 Range (min … max):  685.808 μs …   2.068 ms  ┊ GC (min … max): 0.00% … 40.18%
 Time  (median):     702.151 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   721.751 μs ± 116.212 μs  ┊ GC (mean ± σ):  1.76% ±  6.55%

  ██▅▃▁                                                         ▁
  █████▇▇▅▄▃▁▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▃▇▇▅▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▆█ █
  686 μs        Histogram: log(frequency) by time       1.53 ms <

 Memory estimate: 976.67 KiB, allocs estimate: 2.

These kind of performance profiles suggest that you probably want dynamic intervals if you either have mixed open / closed-ness, and presumably it's going to be useful in cases where you don't know whether you have constant open / closed-ness and therefore can't just make a collection of things up front. From the above, I would characterise the gains to be had my knowing the open / closed-ness up front as "modest", but definitely something you want to have if you can.

rofinn · 2022-11-28T17:28:54Z

Okay, so my understanding as it relates to JuliaMath/IntervalSets.jl#67 is that we probably want:

A shared AbstractInterval type in IntervalSets.jl which isn't parameterized by the bounds, which would make it easier for Intervals.jl to extend.
A required API for grabbing bounds which folks can use as a fallback for writing generic algorithms.
Both parameterized and non-parameterized concrete interval types depending on the application.
Update LibPQ.jl to use the non-parameterized type.

rofinn mentioned this issue Apr 4, 2023

Order of type parameters is somewhat problematic #170

Open

rofinn added this to the Intervals 2.0 milestone Apr 4, 2023

rofinn linked a pull request May 18, 2023 that will close this issue

[WIP] Interval bounds field #217

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

oxinabox commented Nov 2, 2022

rofinn commented Nov 8, 2022

oxinabox commented Nov 28, 2022 •

edited

Loading

willtebbutt commented Nov 28, 2022 •

edited

Loading

rofinn commented Nov 28, 2022

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

Comments

oxinabox commented Nov 2, 2022

rofinn commented Nov 8, 2022

oxinabox commented Nov 28, 2022 • edited Loading

willtebbutt commented Nov 28, 2022 • edited Loading

rofinn commented Nov 28, 2022

oxinabox commented Nov 28, 2022 •

edited

Loading

willtebbutt commented Nov 28, 2022 •

edited

Loading