Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stalled. Mcol 3771 benchmark re2 to replace boost regex #1552

Conversation

mariadb-SergeyZefirov
Copy link
Contributor

Relevant JIRA ticket: https://jira.mariadb.org/browse/MCOL-3771?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=169986

What I did here is to remove a bunch of "#ifdef linux" conditionals from different parts of code and moved them into single header file (utils/common/regexp-port.h). That file defines a regex containing structure and everyone creates that structures and uses unified interface.

To utilize re2 I had to link it somehow and I added it as an external CMake project. This is second relatively big change.

I also did a dozen or so fixes of warnings, most probably I will have to revert some of them if not all.

@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-3771-benchmark-RE2-to-replace-boost-regex branch 4 times, most recently from 6ce7817 to b7d43a9 Compare November 4, 2020 09:40
@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-3771-benchmark-RE2-to-replace-boost-regex branch 4 times, most recently from e1d11f3 to 4242799 Compare November 13, 2020 16:11
@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-3771-benchmark-RE2-to-replace-boost-regex branch 2 times, most recently from 7bfe503 to ae1db38 Compare November 26, 2020 08:02
Started changing things

Progress keeping commit

Refactoring result

Better naming

With the name "regex.h" our own file occured in include file search path
set before POSIX's "regex.h", which we included from our regex.h for RE
functionality on Linux.

Renaming it to regex-port.h prevented this misadventure.

Primitive processor sports new regexes

And now the goal is to put them everywhere else.

Clean build for mariadb+columnstore

Fix warnings - remove VLA warnings

Fix warnings

I've found ONE justified use of variable length arrays in code so far:

/home/mariadb-SergeyZefirov/work/mariadb/sergueyz/MariaDBEnterprise/storage/columnstore/columnstore/utils/windowfunction/wf_udaf.cpp: In member function ‘virtual bool windowfunction::WF_udaf::dropValues(int64_t, int64_t)’:
/home/mariadb-SergeyZefirov/work/mariadb/sergueyz/MariaDBEnterprise/storage/columnstore/columnstore/utils/windowfunction/wf_udaf.cpp:152:56: warning: variable length array ‘flags’ is used [-Wvla]
         uint32_t flags[getContext().getParameterCount()];

This array is defined inside a loop.

Most probably, getContext().getParameterCount() is bound by some
reasonably big value and it is possible to define it using that value.

Investigating.

Also, there's a lot of "type range" warnings like "x < 0 is always false
for unsigned x". I also need to read code around these warnings and look
whether these comparisons are genuinely needed.

Fix warnings: clean build

Test use of libre2

Fix unnecessary install for libre2

Cmake does not attempt to install it into /usr anymore.

Remove regex zoo - now we use our only regex interface

Got missing symbols in the link stage

Going to investigate.

An idea to test on dev machine

cmake itself (very likely) or CMakeLists.txt in re2 ignore INSTALL_DIR
and our good superhuman StackOverflow intellect suggests direct setting
of INSTALL_DIR_PREFIX.

That strange behavior is 5 years old at this time.

Working installation command

Now to final touch - adding proper libraries, library link, etc.

Now we can use re2 if need to

Revert fixes of warnings

Revert warning fixes

Revert warning fixes

Revert warning fixes

Revert warning fixes

Trying fix for missing libre2.so

Commented two problematic subcomponents

Fix extraneous 'add_library'

Fresh libmarias3

Fix compilation error

libre2 should link now

Fix of incorrect flag used

Trying to save permissions during copy

Figuring out permissions

Rename idb_regex_t to mcs_regex_t

Mark slowdown reasons

Name our libre2 build differently

Better name for a library

Try to get better regexp performance

Fix compile errors

Fix install of libre2

Fight the sigsegv
@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-3771-benchmark-RE2-to-replace-boost-regex branch from ae1db38 to 4c34af1 Compare November 26, 2020 08:05
@drrtuy drrtuy changed the title Mcol 3771 benchmark re2 to replace boost regex Stalled. Mcol 3771 benchmark re2 to replace boost regex Mar 8, 2021
@drrtuy
Copy link
Collaborator

drrtuy commented Apr 6, 2021

With the latest patch on collation-aware LIKE MCS started to use MDB like facilities. JFYI MCS already uses PCRE provided by MDB for RLIKE processing.

@drrtuy drrtuy closed this Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants