Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Parse" (evaluate) generates smaller assembly #95

Closed
Andersama opened this issue Mar 30, 2020 · 6 comments
Closed

"Parse" (evaluate) generates smaller assembly #95

Andersama opened this issue Mar 30, 2020 · 6 comments

Comments

@Andersama
Copy link
Contributor

I had to write a bit of a helper to mimic what could be core to the ctre api, but here's a compiler explorer link that illustrates the difference.

https://gcc.godbolt.org/z/VDkQA-

This could also be incorporated in the search_re function, if at compile time the for loop is removed. The key part is to be able to analyze the tree and be sure that an ^ anchor token / assert_begin is what sits at the start of the regex, or at all sub expressions.

Simple test add:

template <typename Iterator, typename EndIterator, typename ...Pattern>
constexpr inline auto search_re(const Iterator begin, const EndIterator end, sequence<assert_begin, Pattern...> pattern) noexcept {
	using return_type = decltype(regex_results(std::declval<Iterator>(), find_captures(pattern)));
	using iterator_category = typename std::iterator_traits<Iterator>::iterator_category;
	return evaluate(begin, begin, end, return_type{}, ctll::list<start_mark, sequence<assert_begin, Pattern...>, end_mark, accept>());
}

And the generated assembly becomes identical.

hanickadot pushed a commit that referenced this issue Sep 9, 2020
… for ^ in all paths could be costly, but I will think about it.
@hanickadot
Copy link
Owner

Doing the analysis could be costly in current architecture of CTRE, I'm already doing something similar in case of greedy => possessive cycle optimisation. Anyway something similar as your parse function could be useful meantime. So I added function starts_with

Added in 4c97c59

hanickadot pushed a commit that referenced this issue Sep 9, 2020
… the for cycle in `search` (full solution to #95)
@hanickadot
Copy link
Owner

Ok, it was actually easier than I thought :D

Done in ec4bd97

@hanickadot
Copy link
Owner

@Andersama
Copy link
Contributor Author

Andersama commented Sep 10, 2020

Year or so later, starts_with is definitely a good name over parse 👍 I'll take it. Did my pr on pattern analysis make sense? Or is that still a wash? It would allow you to add a size check on the input string to see if it's within a valid range, it's a bit iffy conceptually for anything like variable length encoded utf8 strings, but otherwise I'd imagine it'd improve performance on more complicated regexs.

In my playing around with my edit working out the minimum and maximum length isn't particularly bad because I think so far as I can remember I tried to avoid instantiating a ton of templates.

@hanickadot
Copy link
Owner

I'm not sure about the PR, as you said it would be problematic with variable length encodings, and in future I want to have better support for unicode in general. Also it would introduce check for a length on every input, CTRE is able to work with zero-terminated strings too, which means basically running strlen. And one of the biggest issue is it would introduce longer compilation time due the analysis.

@Andersama
Copy link
Contributor Author

Well in the event those things can be specialized for w/o too hard of a heavy hit on compile times. My thought process was one bounds check per input string might not be too bad* assuming a random access iterator. It could also be limited to inputs where the pattern has particularly large bounds, I'm not sure the throughput of the library off-hand, but I'd imagine there's some rough # of character comparisons where the length check would make sense to do.

Would adding variations of the original functions with bounds checking impact compile times if the functions aren't instantiated? E.G. ctre::match_checked_bounds_re<>()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants