Feature/nom parser #9

Merged
merged 3 commits into from Nov 3, 2015

Conversation

Projects
None yet
2 participants
@hoodie
Contributor

hoodie commented Oct 30, 2015

Hi there,
I'm working on replacing the parsing implementation with nom.

this blog article has layed the groundwork.
I compared the performance of datetime, chrono and this. After switching from regex to nom the results of datetime are significantly better.
Bonus: with this datetime seems to also have a more complete implementation of the ISO8601 standard.

This is not mergeworthy yet, milliseconds are not supported yet, I just wanted to let you know.

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 1, 2015

Collaborator

Oh hey again,

I actually saw your parsing benchmarks the other day, and thought, aw man, is using a regex really that bad? And apparently it is! I’ve also heard of nom but never played around with it, and while I thought it would be faster I didn’t think it would be that much faster! So this is great.

A while back, I changed the parse module (83b464d) so that instead of parsing into LocalDateTime instances, it parses into field values, which makes the parsing implementation completely separate from the date implementation, allowing you to use the parser without using the dates. However, you’ve now got your own nom iso8601 crate, which also allows you to use the parser without using the dates. So it may be possible to replace the parse module entirely with this nom crate, as I don’t think that module does anything your crate doesn’t do too. What do you think?

Collaborator

ogham commented Nov 1, 2015

Oh hey again,

I actually saw your parsing benchmarks the other day, and thought, aw man, is using a regex really that bad? And apparently it is! I’ve also heard of nom but never played around with it, and while I thought it would be faster I didn’t think it would be that much faster! So this is great.

A while back, I changed the parse module (83b464d) so that instead of parsing into LocalDateTime instances, it parses into field values, which makes the parsing implementation completely separate from the date implementation, allowing you to use the parser without using the dates. However, you’ve now got your own nom iso8601 crate, which also allows you to use the parser without using the dates. So it may be possible to replace the parse module entirely with this nom crate, as I don’t think that module does anything your crate doesn’t do too. What do you think?

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 1, 2015

Collaborator

By the way, I was kind of hoping you didn’t notice that I removed some of your parsing tests when I made some recent changes. These tests have been living in a text file on my desktop since then, and I definitely don’t want to release a new version of datetime until all your code is back where it belongs. I hope you didn’t get the wrong idea where I accepted your PR and then removed its code… It’ll be back, I promise! 😥

Collaborator

ogham commented Nov 1, 2015

By the way, I was kind of hoping you didn’t notice that I removed some of your parsing tests when I made some recent changes. These tests have been living in a text file on my desktop since then, and I definitely don’t want to release a new version of datetime until all your code is back where it belongs. I hope you didn’t get the wrong idea where I accepted your PR and then removed its code… It’ll be back, I promise! 😥

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 1, 2015

Contributor

hey, that nom iso8601 came out shortly after you merged my regex version - but I didn't think anything of it at the time.
What I didn't notice when I started trying to adjust it to fit rust-datetime was how incomplete it is.
Right now I'm both contributing to the nom-parser and here trying to mix and match both implementations. My goal is to make iso8601 (the crate) a self sufficient crate that does nothing but what it is supposed to and meanwhile remove all parsing logic from datetime. Since parsing a date and validating it are two different jobs entirely and datetime is suited much better for telling whether a year as a Feb29 or not than a set of parsers :D Right now I'm still struggling a bit with noms macro-DSL and giving it a nicer api than it currently has. Afterwards I'll carve out the remains of my regex implementation and plugin the nom parser as an external dependency.

The thing about my Regex implementation is not that it is that much slower. I never took any time to optimize it, but as you can tell from the PR I received, what slows it down is not the application of the regex, but the initialization. After that it is only about 2x-3x slower. The nom version also gets slower as I implement more of the standard. Still my hope is, at the end of this integration rust-datetime will have a more complete implementation of the iso8601 standard than chrono and still parse strings faster.

Contributor

hoodie commented Nov 1, 2015

hey, that nom iso8601 came out shortly after you merged my regex version - but I didn't think anything of it at the time.
What I didn't notice when I started trying to adjust it to fit rust-datetime was how incomplete it is.
Right now I'm both contributing to the nom-parser and here trying to mix and match both implementations. My goal is to make iso8601 (the crate) a self sufficient crate that does nothing but what it is supposed to and meanwhile remove all parsing logic from datetime. Since parsing a date and validating it are two different jobs entirely and datetime is suited much better for telling whether a year as a Feb29 or not than a set of parsers :D Right now I'm still struggling a bit with noms macro-DSL and giving it a nicer api than it currently has. Afterwards I'll carve out the remains of my regex implementation and plugin the nom parser as an external dependency.

The thing about my Regex implementation is not that it is that much slower. I never took any time to optimize it, but as you can tell from the PR I received, what slows it down is not the application of the regex, but the initialization. After that it is only about 2x-3x slower. The nom version also gets slower as I implement more of the standard. Still my hope is, at the end of this integration rust-datetime will have a more complete implementation of the iso8601 standard than chrono and still parse strings faster.

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 1, 2015

Collaborator

My goal is to make iso8601 (the crate) a self sufficient crate that does nothing but what it is supposed to and meanwhile remove all parsing logic from datetime.

Sounds good—that was what the parse module was supposed to be, though your crate will be more complete than my module is. Especially with regards to weird parsing edge cases...

Since parsing a date and validating it are two different jobs entirely and datetime is suited much better for telling whether a year as a Feb29 or not than a set of parsers :D

I agree! datetime’s current parser will quite happily accept “29th February 2015” as a date, because there’s no way I’m putting that logic in a parsing routine.

The thing about my Regex implementation is not that it is that much slower. I never took any time to optimize it, but as you can tell from the PR I received, what slows it down is not the application of the regex, but the initialization.

That did surprise me, actually. I hoped that as more and more work was done on the Regex crate, it would get faster, and eventually the regex would move from being created at runtime to being compiled into actual code at compile time. But if nom can do the job just as well today, then that’s great.

Anyway, the only thing I can think of that’ll change in datetime in the near future is the module where fixed-offset timezones are, and that’s quite a small change. I remember changing the API for you a little to make parsing easier last time, so with luck that won’t be necessary anymore.

Collaborator

ogham commented Nov 1, 2015

My goal is to make iso8601 (the crate) a self sufficient crate that does nothing but what it is supposed to and meanwhile remove all parsing logic from datetime.

Sounds good—that was what the parse module was supposed to be, though your crate will be more complete than my module is. Especially with regards to weird parsing edge cases...

Since parsing a date and validating it are two different jobs entirely and datetime is suited much better for telling whether a year as a Feb29 or not than a set of parsers :D

I agree! datetime’s current parser will quite happily accept “29th February 2015” as a date, because there’s no way I’m putting that logic in a parsing routine.

The thing about my Regex implementation is not that it is that much slower. I never took any time to optimize it, but as you can tell from the PR I received, what slows it down is not the application of the regex, but the initialization.

That did surprise me, actually. I hoped that as more and more work was done on the Regex crate, it would get faster, and eventually the regex would move from being created at runtime to being compiled into actual code at compile time. But if nom can do the job just as well today, then that’s great.

Anyway, the only thing I can think of that’ll change in datetime in the near future is the module where fixed-offset timezones are, and that’s quite a small change. I remember changing the API for you a little to make parsing easier last time, so with luck that won’t be necessary anymore.

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 2, 2015

Contributor

Alright, I got it to work with nom and it's fast.
However I left in the parse.rs because it contains all of the glue I needed.
Unfortunately since we both tried to solve the same issues (you with lazy_static) this turned out to be a rather painful merge. I stopped after about 10minutes in and gave up, partly also because I don't know what else you changed.

Contributor

hoodie commented Nov 2, 2015

Alright, I got it to work with nom and it's fast.
However I left in the parse.rs because it contains all of the glue I needed.
Unfortunately since we both tried to solve the same issues (you with lazy_static) this turned out to be a rather painful merge. I stopped after about 10minutes in and gave up, partly also because I don't know what else you changed.

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 2, 2015

Collaborator

Ah, sorry about that. Want me to do it?

Collaborator

ogham commented Nov 2, 2015

Ah, sorry about that. Want me to do it?

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 2, 2015

Contributor

Would be nice if you did

Contributor

hoodie commented Nov 2, 2015

Would be nice if you did

@ogham ogham self-assigned this Nov 3, 2015

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 3, 2015

Collaborator

This merge sucks :p

Could you please upload your iso8601 crate to crates.io? I can't find it there at the moment, and it's referred to locally in the Cargo.toml:

[dependencies]
iso8601 = { path = "../iso8601" }
Collaborator

ogham commented Nov 3, 2015

This merge sucks :p

Could you please upload your iso8601 crate to crates.io? I can't find it there at the moment, and it's referred to locally in the Cargo.toml:

[dependencies]
iso8601 = { path = "../iso8601" }
@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 3, 2015

Contributor

ups, sorry, just point it to my repo http://github.com/hoodie/iso8601
its not ready for crates.io yet, I'm still talking to the Original Maintainer who also got some PRs from me
This is all a bit hot-needley :D

Contributor

hoodie commented Nov 3, 2015

ups, sorry, just point it to my repo http://github.com/hoodie/iso8601
its not ready for crates.io yet, I'm still talking to the Original Maintainer who also got some PRs from me
This is all a bit hot-needley :D

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 3, 2015

Collaborator

OK!

Collaborator

ogham commented Nov 3, 2015

OK!

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 3, 2015

Contributor

don't get me wrong, I consider my current version of the iso8601 to be very stable and well tested, also the API ought to remain virtually identical, I just didn't want to go ahead and release it without consorting with @badboy who originally wrote it

Contributor

hoodie commented Nov 3, 2015

don't get me wrong, I consider my current version of the iso8601 to be very stable and well tested, also the API ought to remain virtually identical, I just didn't want to go ahead and release it without consorting with @badboy who originally wrote it

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 3, 2015

Collaborator

Don't worry, that makes sense. Given that crates.io has all the crates in a single namespace, it's good to be polite and not take the "iso8601" name until it's agreed upon.

Collaborator

ogham commented Nov 3, 2015

Don't worry, that makes sense. Given that crates.io has all the crates in a single namespace, it's good to be polite and not take the "iso8601" name until it's agreed upon.

@ogham

This comment has been minimized.

Show comment
Hide comment
@ogham

ogham Nov 3, 2015

Collaborator

(PS, I have this merge mostly done, I'll finish it after lunch)

Collaborator

ogham commented Nov 3, 2015

(PS, I have this merge mostly done, I'll finish it after lunch)

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 3, 2015

Contributor

Bon appetit

Contributor

hoodie commented Nov 3, 2015

Bon appetit

@ogham ogham merged commit a544d79 into rust-datetime:master Nov 3, 2015

@hoodie

This comment has been minimized.

Show comment
Hide comment
@hoodie

hoodie Nov 3, 2015

Contributor

\O/

Contributor

hoodie commented Nov 3, 2015

\O/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment