Nom matching #26

DrLuke · 2022-01-18T20:06:59Z

This is my attempt at implementing a nom-based matcher to get rid of regex dependency as explained in #23.

It works by parsing the address pattern and separating it into AddressPatternComponents. When matching, the matching method iterates over the components and applies the correct nom parser for each component, consuming the address to be matched part by part. If everything is consumed and every component parser was run, the match is successful, otherwise it's a failure.

The one tricky thing to implement were wildcards in combination with other elements (e.g. /foo/*{bar,baz}andsometext). The wildcard parser was implemented to be greedy, so it tries to consume as many characters as possible.

This is still early WIP. The matcher is fully implemented, but everything around it needs work (documentation, removing unused functions, etc.), but I'd be happy about comments.

klingtnet · 2022-01-23T19:56:50Z

Just a quick life sign, I will try to review the implementation somewhen next week. I am already pretty happy to see nom matching getting implemented since this should resolve #23 .

klingtnet

Here's my first take on the implementation. Looks promising but we may need to optimize some parts.

src/address.rs

Co-authored-by: Andreas Linz <klingt.net@gmail.com>

DrLuke · 2022-02-16T22:16:54Z

As far as I'm concerned this can be merged (once the remaining conversations are resolved).

Some (potential) follow-up TODOs:

We need a struct for addresses so we only have to validate them once instead of every time match_address is invoked
Optimize away the remainder in match_address
nom is capable of reporting where exactly matching failed, and for what reason. I'm not sure if this feature is necessary though.

klingtnet

Again, good work and sorry for the delay 🙈

src/address.rs

klingtnet · 2022-03-05T19:23:28Z

As far as I'm concerned this can be merged (once the remaining conversations are resolved).

From my side as well, only thing, the range flip, should be removed in my opinion.

Some (potential) follow-up TODOs:

* We need a struct for addresses so we only have to validate them once instead of every time `match_address` is invoked

* Optimize away the `remainder` in `match_address`

* nom is capable of reporting where exactly matching failed, and for what reason. I'm not sure if this feature is necessary though.

All the potential follow-ups are reasonable and should be defined in GitHub issues.

klingtnet · 2022-03-20T08:40:27Z

I haven't noticed that you now fail on reversed character ranges otherwise I would have merged this already. Anyhow, I created three follow-up issue based on the optimizations of #26 (comment). Thanks again for the contribution!

DrLuke · 2022-03-20T10:08:36Z

Thank you for the great review!

DrLuke added 8 commits January 17, 2022 23:10

Add nom-based matching

12bcdcc

Add tests

d3f29c2

Improve naming

4ab835a

Add docs

d658652

Add method to check for allowed characters

1e6f054

Fix name of test

1fff928

Use new character tester

a78d754

Only contain legal characters

2bebd56

DrLuke added 18 commits January 23, 2022 21:45

Remove extra '/' character

49171dc

Test if wildcard matches all legal characters

2b4d566

Use correct test function

71221d0

Restructure

926950e

Add method to verify address

587752d

Add verify_address_pattern

79c3a7d

Verify address pattern before parsing

5244f1a

Add tests

08716fe

Remove obsolete tests

0856d16

Improve match_address

b5523e0

Remove obsolete code

8908575

Add choice parser

93ec200

Add character class parser

62c03bb

Deduplicate expanded character classes

15771b8

Increase test coverage

3fb59d0

Remove unused imports

343a2bc

rustfmt

9a847bd

Apply clippy suggestions

b99832b

klingtnet requested changes Feb 8, 2022

View reviewed changes

DrLuke and others added 2 commits February 16, 2022 22:08

Add back accidentally removed comment

c9265c2

Co-authored-by: Andreas Linz <klingt.net@gmail.com>

Use into

876fc48

Co-authored-by: Andreas Linz <klingt.net@gmail.com>

DrLuke and others added 7 commits February 16, 2022 22:11

Simplify

5c6d6eb

Replace deduplication function with hashset

96897e8

Use iterator instead of loop

53ae8ca

Clarify docs

bcdae05

Use predefined ASCII character classes

6054dbb

Co-authored-by: Andreas Linz <klingt.net@gmail.com>

Return error instead of panicking

f176155

Remove regex dependency

cf893a4

DrLuke changed the title ~~WIP: Nom matching~~ Nom matching Feb 16, 2022

Remove regex dependency

0448afd

klingtnet requested changes Mar 5, 2022

View reviewed changes

src/address.rs Outdated Show resolved Hide resolved

src/address.rs Show resolved Hide resolved

DrLuke added 3 commits March 13, 2022 22:11

Remove support for reversed character ranges

2da33e3

Make parser fail on reversed character ranges

4ecc3b3

Simplify parser

e524bc1

This was referenced Mar 20, 2022

Report nom parsing failure in errors #27

Open

Get rid of mutable remainder in match_address #28

Closed

Draft: Only match addresses once #29

Closed

klingtnet merged commit 33f9522 into klingtnet:master Mar 20, 2022

DrLuke deleted the nom-matching branch March 20, 2022 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nom matching #26

Nom matching #26

DrLuke commented Jan 18, 2022

klingtnet commented Jan 23, 2022

klingtnet left a comment

DrLuke commented Feb 16, 2022

klingtnet left a comment

klingtnet commented Mar 5, 2022

klingtnet commented Mar 20, 2022

DrLuke commented Mar 20, 2022

Nom matching #26

Nom matching #26

Conversation

DrLuke commented Jan 18, 2022

klingtnet commented Jan 23, 2022

klingtnet left a comment

Choose a reason for hiding this comment

DrLuke commented Feb 16, 2022

klingtnet left a comment

Choose a reason for hiding this comment

klingtnet commented Mar 5, 2022

klingtnet commented Mar 20, 2022

DrLuke commented Mar 20, 2022