"Parse" (evaluate) generates smaller assembly #95

Andersama · 2020-03-30T04:43:04Z

I had to write a bit of a helper to mimic what could be core to the ctre api, but here's a compiler explorer link that illustrates the difference.

https://gcc.godbolt.org/z/VDkQA-

This could also be incorporated in the search_re function, if at compile time the for loop is removed. The key part is to be able to analyze the tree and be sure that an ^ anchor token / assert_begin is what sits at the start of the regex, or at all sub expressions.

Simple test add:

template <typename Iterator, typename EndIterator, typename ...Pattern>
constexpr inline auto search_re(const Iterator begin, const EndIterator end, sequence<assert_begin, Pattern...> pattern) noexcept {
	using return_type = decltype(regex_results(std::declval<Iterator>(), find_captures(pattern)));
	using iterator_category = typename std::iterator_traits<Iterator>::iterator_category;
	return evaluate(begin, begin, end, return_type{}, ctll::list<start_mark, sequence<assert_begin, Pattern...>, end_mark, accept>());
}

And the generated assembly becomes identical.

… for ^ in all paths could be costly, but I will think about it.

hanickadot · 2020-09-09T07:07:31Z

Doing the analysis could be costly in current architecture of CTRE, I'm already doing something similar in case of greedy => possessive cycle optimisation. Anyway something similar as your parse function could be useful meantime. So I added function starts_with

Added in 4c97c59

… the for cycle in `search` (full solution to #95)

hanickadot · 2020-09-09T07:29:34Z

Ok, it was actually easier than I thought :D

Done in ec4bd97

hanickadot · 2020-09-09T07:31:59Z

https://compiler-explorer.com/z/nvjG4E

Andersama · 2020-09-10T03:46:59Z

Year or so later, starts_with is definitely a good name over parse 👍 I'll take it. Did my pr on pattern analysis make sense? Or is that still a wash? It would allow you to add a size check on the input string to see if it's within a valid range, it's a bit iffy conceptually for anything like variable length encoded utf8 strings, but otherwise I'd imagine it'd improve performance on more complicated regexs.

In my playing around with my edit working out the minimum and maximum length isn't particularly bad because I think so far as I can remember I tried to avoid instantiating a ton of templates.

hanickadot · 2020-09-10T05:33:26Z

I'm not sure about the PR, as you said it would be problematic with variable length encodings, and in future I want to have better support for unicode in general. Also it would introduce check for a length on every input, CTRE is able to work with zero-terminated strings too, which means basically running strlen. And one of the biggest issue is it would introduce longer compilation time due the analysis.

Andersama · 2020-09-10T08:54:02Z

Well in the event those things can be specialized for w/o too hard of a heavy hit on compile times. My thought process was one bounds check per input string might not be too bad* assuming a random access iterator. It could also be limited to inputs where the pattern has particularly large bounds, I'm not sure the throughput of the library off-hand, but I'd imagine there's some rough # of character comparisons where the length check would make sense to do.

Would adding variations of the original functions with bounds checking impact compile times if the functions aren't instantiated? E.G. ctre::match_checked_bounds_re<>()

hanickadot pushed a commit that referenced this issue Sep 9, 2020

Partial fix for #95, adding function starts_with, analyzing subtree…

4c97c59

… for ^ in all paths could be costly, but I will think about it.

hanickadot added the nice-to-have label Sep 9, 2020

hanickadot pushed a commit that referenced this issue Sep 9, 2020

added detection for pattern which starts with anchor so we can remove…

ec4bd97

… the for cycle in `search` (full solution to #95)

hanickadot closed this as completed Sep 9, 2020

Andersama mentioned this issue Jan 5, 2021

Optimization: Check for implicit anchor #165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Parse" (evaluate) generates smaller assembly #95

"Parse" (evaluate) generates smaller assembly #95

Andersama commented Mar 30, 2020

hanickadot commented Sep 9, 2020

hanickadot commented Sep 9, 2020

hanickadot commented Sep 9, 2020

Andersama commented Sep 10, 2020 •

edited

Loading

hanickadot commented Sep 10, 2020

Andersama commented Sep 10, 2020

"Parse" (evaluate) generates smaller assembly #95

"Parse" (evaluate) generates smaller assembly #95

Comments

Andersama commented Mar 30, 2020

hanickadot commented Sep 9, 2020

hanickadot commented Sep 9, 2020

hanickadot commented Sep 9, 2020

Andersama commented Sep 10, 2020 • edited Loading

hanickadot commented Sep 10, 2020

Andersama commented Sep 10, 2020

Andersama commented Sep 10, 2020 •

edited

Loading