Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

Open
oxinabox opened this issue Nov 2, 2022 · 4 comments · May be fixed by #217
Open

Storing open-ness as a type parameter makes Postgres parsing type unstable #208

oxinabox opened this issue Nov 2, 2022 · 4 comments · May be fixed by #217
Milestone

Comments

@oxinabox
Copy link
Member

oxinabox commented Nov 2, 2022

Postgres has a tzrange type, which LibPQ.jl returns as an Interval[ZonedDateTime} No openness type params, so it is type unstabl
But it doesn't know if it is open or closed because LibPQ stores intervals that may be open or closed in the same postgre column type. (including mixing them).

Given we interact with intervals from prosgre databases all the time.
and we dispatch on this type param very rarely if ever,
I wonder if it wouldn;t be better stored as a field rather than a type param.

However this would be a breaking change

@rofinn
Copy link
Member

rofinn commented Nov 8, 2022

FWIW, we do dispatch on those parameters in a few places, but rewriting that logic with an explicit condition or by passing those as arguments to subfunction could be a reasonable solution.

https://github.com/invenia/Intervals.jl/blob/master/src/interval.jl#L190

This would also make it easier to break up Intervals.jl into a set of IntervalSet.jl extensions as discussed here.

JuliaMath/IntervalSets.jl#67

@oxinabox
Copy link
Member Author

oxinabox commented Nov 28, 2022

@willtebbutt has a prototype for not doing this.
Not sure if he has run timing.
But it is extremely fast, and he is dealing with data that is explictly 1 array containing elements that mix boundedness.

@willtebbutt
Copy link
Member

willtebbutt commented Nov 28, 2022

Yup. It's on a WIP MR to an internal package but there's not much code, so here's a snippet for reference:

struct DynamicInterval{T}
    infimum::T
    supremum::T
    lhs_is_open::Bool
    rhs_is_open::Bool
end

function Base.issubset(x::DynamicInterval, y::DynamicInterval)
    lower_is_in = if x.lhs_is_open
        x.infimum >= y.infimum
    else
        y.lhs_is_open ? (x.infimum > y.infimum) : (x.infimum >= y.infimum)
    end
    upper_is_in = if x.rhs_is_open
        x.supremum <= y.supremum
    else
        y.rhs_is_open ? (x.supremum < y.supremum) : (x.supremum <= y.supremum)
    end
    return lower_is_in && upper_is_in
end

Performance of construction of arrays of intervals with mixed open-ness / closed-ness and issubset is pretty good. For example:

using BenchmarkTools

_opens = [rand([true, false]) for _ in 1:1_000_000]
_closeds = [rand([true, false]) for _ in 1:1_000_000]
@benchmark [DynamicInterval(0.0, 1.0, o, c) for(o, c) in zip($_opens, $_closeds)]
BenchmarkTools.Trial: 794 samples with 1 evaluation.
 Range (min  max):  3.330 ms  14.258 ms  ┊ GC (min  max): 0.00%   6.20%
 Time  (median):     3.416 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.290 ms ±  4.255 ms  ┊ GC (mean ± σ):  7.31% ± 14.75%

  █              ▄                                 ▁▃▂     ▄  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄███▅▆▄▁▄█ ▇
  3.33 ms      Histogram: log(frequency) by time     14.1 ms <

 Memory estimate: 22.89 MiB, allocs estimate: 2.

xs = [DynamicInterval(0.0, 1.0, rand([true, false]), rand([true, false])) for _ in 1:1_000_000]
x = DynamicInterval(0.0, 1.0, false, false)

@benchmark map(Base.Fix1(issubset, $x), $xs)
BenchmarkTools.Trial: 2046 samples with 1 evaluation.
 Range (min  max):  2.130 ms    3.905 ms  ┊ GC (min  max): 0.00%  13.12%
 Time  (median):     2.389 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.436 ms ± 203.115 μs  ┊ GC (mean ± σ):  0.49% ±  2.54%

       ▁▂▅▆▅█▇▆▅▃▄▂▁                                           
  ▂▂▃▄▅█████████████▇▇▅▅▅▄▃▃▄▂▃▂▃▂▁▂▂▂▂▂▂▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▄
  2.13 ms         Histogram: frequency by time        3.36 ms <

 Memory estimate: 976.67 KiB, allocs estimate: 2.

Constrast this with:

using Intervals

_opens_typed = [rand([Open, Closed]) for _ in 1:1_000_000]
_closed_typed = [rand([Open, Closed]) for _ in 1:1_000_000]
@benchmark [Interval{o, c}(0.0, 1.0) for (o, c) in zip($_opens_typed, $_closed_typed)]
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  1.727 s     2.053 s  ┊ GC (min  max): 13.63%  22.27%
 Time  (median):     1.938 s               ┊ GC (median):    18.28%
 Time  (mean ± σ):   1.906 s ± 165.230 ms  ┊ GC (mean ± σ):  18.31% ±  4.32%

  █                                    █                   █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.73 s         Histogram: frequency by time         2.05 s <

 Memory estimate: 640.87 MiB, allocs estimate: 12000018.

xs_typed = [Interval{rand([Open, Closed]), rand([Open, Closed])}(0.0, 1.0) for _ in 1:1_000_000]
x_typed = Interval{Closed, Closed}(0.0, 1.0)

@benchmark map(Base.Fix1(issubset, $x_typed), $xs_typed)
BenchmarkTools.Trial: 85 samples with 1 evaluation.
 Range (min  max):  52.160 ms  72.713 ms  ┊ GC (min  max):  0.00%  0.00%
 Time  (median):     59.560 ms              ┊ GC (median):    10.26%
 Time  (mean ± σ):   59.213 ms ±  4.144 ms  ┊ GC (mean ± σ):   6.27% ± 4.96%

       █           ▂▂  ▂  ▂▂  ▂  ▂                             
  ▄▄█▆▄█▄▄█▄▆▆▄▄▁▆▄██▁▆█████▆███▄█▆▆▄▁▁▄▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  52.2 ms         Histogram: frequency by time        72.6 ms <

 Memory estimate: 31.47 MiB, allocs estimate: 1000002.

As you would hope, if you're working with a vector of intervals with fixed open/closed-ness, then the performance is better than the dynamic intervals (presumably just because it's able to eliminate a bunch of branching at compile time), but only by a factor of 3:

vs = [rand() for _ in 1:1_000_000]
@benchmark [Interval{Closed, Closed}(v - 0.5, 5.0) for v in $vs]
BenchmarkTools.Trial: 2499 samples with 1 evaluation.
 Range (min  max):  1.762 ms    9.897 ms  ┊ GC (min  max): 0.00%   9.05%
 Time  (median):     1.834 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.995 ms ± 505.555 μs  ┊ GC (mean ± σ):  7.18% ± 11.85%

    ▂█▆                                                        
  ▂▅████▅▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▄▄▄▃▃▂ ▃
  1.76 ms         Histogram: frequency by time        2.76 ms <

 Memory estimate: 15.26 MiB, allocs estimate: 2.

# change end-points so that the comparison isn't entirely vaccuous
xs_typed_fixed = [Interval{Closed, Closed}(rand() - 0.5, 5.0) for _ in 1:1_000_000]
@benchmark map(Base.Fix1(issubset, $x_typed), $xs_typed_fixed)
BenchmarkTools.Trial: 6879 samples with 1 evaluation.
 Range (min  max):  685.808 μs    2.068 ms  ┊ GC (min  max): 0.00%  40.18%
 Time  (median):     702.151 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   721.751 μs ± 116.212 μs  ┊ GC (mean ± σ):  1.76% ±  6.55%

  ██▅▃▁                                                         ▁
  █████▇▇▅▄▃▁▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▃▇▇▅▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▆█ █
  686 μs        Histogram: log(frequency) by time       1.53 ms <

 Memory estimate: 976.67 KiB, allocs estimate: 2.

These kind of performance profiles suggest that you probably want dynamic intervals if you either have mixed open / closed-ness, and presumably it's going to be useful in cases where you don't know whether you have constant open / closed-ness and therefore can't just make a collection of things up front. From the above, I would characterise the gains to be had my knowing the open / closed-ness up front as "modest", but definitely something you want to have if you can.

@rofinn
Copy link
Member

rofinn commented Nov 28, 2022

Okay, so my understanding as it relates to JuliaMath/IntervalSets.jl#67 is that we probably want:

  1. A shared AbstractInterval type in IntervalSets.jl which isn't parameterized by the bounds, which would make it easier for Intervals.jl to extend.
  2. A required API for grabbing bounds which folks can use as a fallback for writing generic algorithms.
  3. Both parameterized and non-parameterized concrete interval types depending on the application.
  4. Update LibPQ.jl to use the non-parameterized type.

@rofinn rofinn added this to the Intervals 2.0 milestone Apr 4, 2023
@rofinn rofinn linked a pull request May 18, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants