Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsing support for Interval #128

Merged
merged 9 commits into from
Jul 14, 2020
Merged

Add parsing support for Interval #128

merged 9 commits into from
Jul 14, 2020

Conversation

omus
Copy link
Collaborator

@omus omus commented Jul 10, 2020

Adds parsing support for Interval{T}. I ended up just using comma as the parsing delimiter here as I expect it's the syntax most people would expect. Because this PR currently only supports the comma delimiter I changed the printing of intervals to also use a comma. Using a comma for the delimiter when printing has a chance of having a user think that a printed closed-interval is actually a Vector (e.g. [1, 2]). We could work around this by leaving printing using .. and just support both delimiters when parsing.

The parsing performance seems pretty good here despite using a regex. I expect we can improve the parsing performance further but I'm happy with this for now:

julia> using Intervals, BenchmarkTools

julia> @btime parse(Interval{Int}, "[1,2)")
  731.831 ns (13 allocations: 656 bytes)
Interval{Int64,Closed,Open}(1, 2)

julia> @btime parse(Int, "1")
  26.491 ns (0 allocations: 0 bytes)
1

Fixes #116.

Update: Both parsing of delimiters .. and , are supported now and printing has been left unchanged.

@omus
Copy link
Collaborator Author

omus commented Jul 10, 2020

I know that LibPQ rolled its own interval parsing code so I took a look at it to ensure this PR works nicely there. The only issue I see is that LibPQ wants to call it's own parsing function. I'll try to support that.

@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #128 into master will increase coverage by 0.80%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   73.92%   74.72%   +0.80%     
==========================================
  Files           9       10       +1     
  Lines         441      455      +14     
==========================================
+ Hits          326      340      +14     
  Misses        115      115              
Impacted Files Coverage Δ
src/Intervals.jl 80.00% <ø> (ø)
src/parse.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be20134...82d93e1. Read the comment docs.

src/parse.jl Show resolved Hide resolved
@mattBrzezinski
Copy link
Member

The parsing performance seems pretty good here despite using a regex.

Are regexes in Julia inherently slow?

@mattBrzezinski
Copy link
Member

Just needs a version bump, I assume we would want to bring along a few of the other CRs across the line before making a new release though.

@omus
Copy link
Collaborator Author

omus commented Jul 10, 2020

The parsing performance seems pretty good here despite using a regex.

Are regexes in Julia inherently slow?

The issue isn't that regexes in Julia are slow but rather that regexes in general are slower than if you write a custom lower level parser. This is mostly due to the flexibility of regex and backtracking.

@omus
Copy link
Collaborator Author

omus commented Jul 10, 2020

Just needs a version bump, I assume we would want to bring along a few of the other CRs across the line before making a new release though.

For GitHub packages I've been bumping the version number outside of the PRs which avoids confusion on whether it has already been bumped. This will be 1.4.0 and I'll probably tag it soon after.

@mattBrzezinski
Copy link
Member

Just needs a version bump, I assume we would want to bring along a few of the other CRs across the line before making a new release though.

For GitHub packages I've been bumping the version number outside of the PRs which avoids confusion on whether it has already been bumped. This will be 1.4.0 and I'll probably tag it soon after.

Yeah that's what I meant, @iamed2 converted me into this practice :P

@omus
Copy link
Collaborator Author

omus commented Jul 10, 2020

Quick aside. I benchmarked using possessive quantifiers to prevent backtracking and there was no performance improvement in this case

src/parse.jl Outdated Show resolved Hide resolved
@iamed2
Copy link
Member

iamed2 commented Jul 10, 2020

IMO this should support ".." and we should continue to print "..". This is the standard notation for intervals in Julia.

@omus
Copy link
Collaborator Author

omus commented Jul 10, 2020

IMO this should support ".." and we should continue to print "..". This is the standard notation for intervals in Julia.

I'll make this change. I was starting to worry about how this could break tests which check printing.

omus added 7 commits July 13, 2020 15:09
Allows for round-trip parsing of printed intervals. Note I believe the
original reason we printed with `..` was to avoid any confusion between
a closed-interval and a vector (e.g. `[1, 2]`)
@omus omus changed the title Add parsing support for Interval / Change interval print delimiter Add parsing support for Interval Jul 13, 2020
@omus
Copy link
Collaborator Author

omus commented Jul 13, 2020

Parsing now supports both .. and , delimiters. I also improved quoting support by allowing an escaped quote character

@omus
Copy link
Collaborator Author

omus commented Jul 14, 2020

There's room for performance improvements here. I'll make a separate issue about that to be done later and get this feature out now

src/parse.jl Outdated Show resolved Hide resolved
test/interval.jl Outdated Show resolved Hide resolved
@omus
Copy link
Collaborator Author

omus commented Jul 14, 2020

Issue about improving performance: #131. The last round of regex changes didn't seem to negatively impact performance:

julia> @btime parse(Interval{Int}, "[1,2)")
  734.102 ns (13 allocations: 656 bytes)
Interval{Int64,Closed,Open}(1, 2)

@omus omus merged commit 728c8ef into master Jul 14, 2020
@omus omus deleted the cv/parsing branch July 14, 2020 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support parsing intervals from strings
4 participants