Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++11 Regular Expressions #41

Merged
merged 6 commits into from
Aug 22, 2014
Merged

C++11 Regular Expressions #41

merged 6 commits into from
Aug 22, 2014

Conversation

mattyclarkson
Copy link
Contributor

I would like to get benchmark working on Windows. I see that #29 has done a lot of work in that respect but resulted in #30 about the replacement of the Regex class with std::regex.

This PR implements a C++11 backend to the Regex class that passes the unit tests. This is more to start a discussion of how to proceed with this branch.

From what I can see Regex is only used in one place in benchmark.cc so the Regex class could be dropped all together. However this is opposed by @dominichamon

i actually had some issues when i tried using std::regex for this project as the matching wasn't quite the same. I don't remember the details, i'm afraid, but i'd want much more testing before making this change.

So I need to make the unit tests much more thorough than what they are now. I'm not sure what strings I should be testing to make sure that the C++11 backend is actually doing the same thing (it uses the same extended matching flags).

There are other issues here, whilst __cplusplus >= 201103L is the correct check to work out if we are in C++11 mode gcc 4.7 and 4.8 don't actually have <regex> it only came in 4.9. So switching to just using std::regex would result in dropping support for anything before 4.9. So a much better check would be for CMake to compile a snippet to work out if we have regular expressions and fallback to POSIX regular expressions as neccessary.

Soooooo, where shall we go from here? I'm happy to put in as much effort as needed until benchmark builds from CMake for MinGW. VS should drop out nicely from this as CMake is awesome...but MSVC is crap so who knows what will happen there. Maybe clang to the rescue 😉

@ckennelly
Copy link
Contributor

I must admit that I didn't intend for those regular expressions tests to be comprehensive. (Given I was fixing a memory leak by adopting RAII semantics, I mostly wanted the test to check that.)

@mattyclarkson
Copy link
Contributor Author

@ckennelly, that's OK. I just don't want to break the project if I start adding in changes 😨

@ckennelly
Copy link
Contributor

CMake has a few built in modules, one of which is CHECK_INCLUDE_FILE_CXX (it's at least standard with 2.8.12) to verify <regex> is available. Alternatively, if you're in need of something more general, CHECK_CXX_SOURCE_COMPILES also should do the trick if you want to try compiling re.cc (or something similar).

(At the moment, I'm a bit envious of Google Test's glob/wildcard-style filtering: https://code.google.com/p/googletest/source/browse/trunk/src/gtest.cc#444)

@pleroy
Copy link
Contributor

pleroy commented Jul 31, 2014

On #29: unfortunately there was a first round of comments which I addressed/answered and then the review thread died out. And I went on vacation and stuff and didn't ping the thread. I guess it would be good to revive it.

On Regex: I agree that if we use the C++11 <regex> we can just drop the entire class and use <regex> directly in the flag parsing. As a matter of fact I don't quite understand @dominichamon's objection to using the C++11 library since it's hard to believe that there's a lot of code (scripts?) out there depending on the exact syntax of regexs in flags.

@mattyclarkson
Copy link
Contributor Author

@pleroy, that's alright, sometimes people get busy! Hoping to revive the Windows port.

Dropping the Regex class would nuke support for gcc 4.7 and 4.8 which have entirely functional C++11 libraries but don't have <regex> implemented. Having it fall back to POSIX regex is a win in those situations. If the collaborators are happy to require full C++11 support then we can move forward with just using <regex>.

Regarding @dominichamon objection I think he saw some issues with the matching of the regular expressions. This could have been an issue with the std::regex_constants::syntax_option_type not being set correctly. There are various flavours of regex syntax available. That is just a wild guess, I can't talk for him 😉

For now I'm just trying to get issue #30 solved so I can move on to the Windows building. Just need a shout out on what I should do. From @ckennelly comments above:

  1. Detect the correct regex header using CMake
  2. Implement the correct Regex back end depending on what was detected
  3. Determine some strings that should be matched that can test the back end correctly

I'll try to do 1 and 2 today but may slip until after the weekend. As the regex is basically the --benchmark_filter option used as a matching I guess I could just create some arbitrary string tests - do you guys have any --benchmark_filter options that you use daily that you could contribute to the tests?

I'm actually quite keen to keep the Regex class in there as it does provide a simple abstraction of the regular expression engines out there, someone might want to pull in PCRE in a fork or something.

@pleroy
Copy link
Contributor

pleroy commented Jul 31, 2014

@mattyclarkson: For testing, I'd suggest to make sure that you can select various groups of tests in benchmark_test.cc, e.g., the ones that have to do with Pi or exclude the LongTest or some such. I have not used particularly complicated regex myself but it you're interested you can find some of the command lines that I am using in the comments at the beginning of e.g. https://github.com/mockingbirdnest/Principia/blob/master/benchmarks/n_body_system.cpp (and other files in that directory).

@mattyclarkson
Copy link
Contributor Author

@pleroy, thanks, will look into it!

@mattyclarkson
Copy link
Contributor Author

The current commits do the following:

  • Tidy up CMakeLists.txt
  • Snippets to detect the correct regular expression engine
    • These need to be snippets to make sure the regex engine works (gcc < 4.9)
  • Implements the C++11 backend
  • Adds a test case for some more complicated regex test case patterns

RFC

set(CMAKE_CXX_FLAGS "-Wall -Werror -pedantic-errors --std=c++0x")
set(CMAKE_CXX_FLAGS_DEBUG "-g -O0 -DDEBUG")
set(CMAKE_CXX_FLAGS_RELEASE "-fno-strict-aliasing -O3 -DNDEBUG")

# Set OS
if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
add_definitions(-DOS_MACOSX)
add_definitions(-DOS_MACOSX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 spaces too much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dmah42
Copy link
Member

dmah42 commented Jul 31, 2014

The {{benchmark_filter}} option was originally meant to match the googletest {{gtest_filter}} option, but that is more of a glob than a regex, iirc. I'm actually ok with anything here that works consistently across versions.

The most common filter I use is "prefix_.*" to run a subset of benchmarks. That should work everywhere.

@mattyclarkson
Copy link
Contributor Author

I think this is ready for merge, if everyone is happy with it.

@mattyclarkson
Copy link
Contributor Author

I'll rebase this in the morning off master to catch the other merge PRs

@mattyclarkson
Copy link
Contributor Author

This is rebased on master now.

@mattyclarkson
Copy link
Contributor Author

Hold on before merging - the regular expression is matching correctly but the benchmark_test is filtering everything out.

@dmah42
Copy link
Member

dmah42 commented Aug 7, 2014

It might be the default which would need to be '.*' now.
On Aug 7, 2014 9:07 AM, "Matt Clarkson" notifications@github.com wrote:

Hold on before merging - the regular expression is matching correctly but
the benchmark_test is filtering everything out.

Reply to this email directly or view it on GitHub
#41 (comment).

@mattyclarkson
Copy link
Contributor Author

@dominichamon, have submitted a fix for that. That does mean that you now have to write a filter that is an exact regex match. Do you want me to make the regexec POSIX function do an exact match also?

@mattyclarkson
Copy link
Contributor Author

I guess the regexec is actually doing regex_search not regex_match. Would you like me to change the C++11 engine to regex_search? Not sure how to change the regexec without going for re_match GNU regular expression function.

@dmah42
Copy link
Member

dmah42 commented Aug 7, 2014

This is expected by users coming from Google test so I think that would be
a good change.
On Aug 7, 2014 9:37 AM, "Matt Clarkson" notifications@github.com wrote:

I guess the regexec is actually doing regex_search not regex_match. Would
you like me to change the C++11 engine to regex_search? Not sure how to
change the regexec without going for re_match GNU regular expression
function.

Reply to this email directly or view it on GitHub
#41 (comment).

@mattyclarkson
Copy link
Contributor Author

OK. Changed to regex_search and dropped patch for default . to .* as it is not needed now. Tested with both the C++11 backend and POSIX backend and they both pass the same tests.

@dmah42
Copy link
Member

dmah42 commented Aug 7, 2014

Great! That's the problem that I had when i did the naive move to c++11 regex :)

checking the patch now.

@@ -4,17 +4,17 @@ project (benchmark)
# Make sure we can import out CMake functions
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")

# We need threads in this project
# Resolve dependent packages
find_package(Threads REQUIRED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this now be in an 'if !c++11 ' block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well std::thread still requires you to like -pthread (well at least it did on gcc 4.7). I'll have a look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - if you don't have the find_package(Theads REQUIRED) you get linker errors due to pthread not being found. When I get around to doing some Windows porting I'll look into what this means on that platform. Looking at the FindThread.cmake it looks like it doesn't really do much for Windows, but that'll be another problem for another day I guess!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right - because they use pthreads under the hood for implementation. This may not be necessary for all compiler/OS versions, but it probably is ok to leave in.

@mattyclarkson
Copy link
Contributor Author

Rebased on master to resolve merge conflicts.

@dmah42
Copy link
Member

dmah42 commented Aug 12, 2014

there's a bug without your change: "*Calculate*" matches nothing but "Calculate*" matches the BM_CalculatePi benchmarks. This latter is as expected.

with your change, none of the following match BM_CalculatePi:

$ test/benchmark_test --benchmark_filter="Calculate*"
$ test/benchmark_test --benchmark_filter="*Calculate*"
$ test/benchmark_test --benchmark_filter="Calculate.*"
$ test/benchmark_test --benchmark_filter=".*Calculate.*"
$ test/benchmark_test --benchmark_filter=".*Calculate"
$ test/benchmark_test --benchmark_filter=".*Calculate.*"

@mattyclarkson
Copy link
Contributor Author

@dominichamon, *Calculate* isn't a regular expression, I thought that --filter was just a regular expression? Let me fix this up.

@mattyclarkson
Copy link
Contributor Author

I've pushed a set of tests to the benchmark_test branch that'll allow a set of tests for --benchmark_filter flag that I can then make sure pass at all times. If we could work on merging #47 I will have a test set to fix this bug?

It seems that --benchmark_filter=*Calculate doesn't let any tests run on master. I've added a test for this.

@dmah42
Copy link
Member

dmah42 commented Aug 20, 2014

#47 is merged

@mattyclarkson
Copy link
Contributor Author

Rebased on master. I ran make test and re-ran your examples above on Arch Linux (gcc 4.9.1):

std::regex, 100% tests passed, 0 tests failed out of 7:

$ test/benchmark_test --benchmark_filter="Calculate*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter="*Calculate*"
$ # Doesn't run any tests (expected, invalid regex)
$ test/benchmark_test --benchmark_filter="Calculate.*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate.*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate.*"
$ # Runs BM_CalculatePi tests

POSIX regex, 100% tests passed, 0 tests failed out of 7:

$ test/benchmark_test --benchmark_filter="Calculate*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter="*Calculate*"
$ # Doesn't run any tests (expected, invalid regex)
$ test/benchmark_test --benchmark_filter="Calculate.*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate.*"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate"
$ # Runs BM_CalculatePi tests
$ test/benchmark_test --benchmark_filter=".*Calculate.*"
$ # Runs BM_CalculatePi tests

Could you re-run on your box, to see if the tests pass and try some filters?

@dmah42
Copy link
Member

dmah42 commented Aug 20, 2014

Output from test/re_test:

$ cmake .
-- Check compiler flag -Wall
-- Check compiler flag -Wall -- works
-- Check compiler flag -Wshadow
-- Check compiler flag -Wshadow -- works
-- Check compiler flag -Werror
-- Check compiler flag -Werror -- works
-- Check compiler flag -pedantic-errors
-- Check compiler flag -pedantic-errors -- works
-- Check compiler release flag -fno-strict-aliasing
-- Check compiler release flag -fno-strict-aliasing -- works
-- git Version: v0.0.0
-- Version: 0.0.0
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX - Success
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX - Failed
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dominic/git/benchmark
dougie:~/git/benchmark [mattyclarkson-regex] $ ninja
[14/14] Linking CXX executable test/re_test
dougie:~/git/benchmark [mattyclarkson-regex] $ test/re_test
Running main() from gtest_main.cc
[==========] Running 7 tests from 1 test case. 
[----------] Global test environment set-up.
[----------] 7 tests from Regex
[ RUN      ] Regex.RegexSimple
test/re_test.cc:23: Failure
Value of: re.Match("a")
  Actual: false
Expected: true
test/re_test.cc:24: Failure
Value of: re.Match("aa")
   Actual: false
Expected: true
test/re_test.cc:25: Failure
Value of: re.Match("baa")
  Actual: false
Expected: true
[  FAILED  ] Regex.RegexSimple (0 ms)
[ RUN      ] Regex.RegexWildcard
test/re_test.cc:33: Failure
Value of: re.Match("")
  Actual: false
Expected: true
test/re_test.cc:34: Failure
Value of: re.Match("a")
  Actual: false
Expected: true
test/re_test.cc:35: Failure
Value of: re.Match("aa")
  Actual: false
Expected: true
[  FAILED  ] Regex.RegexWildcard (0 ms)
[ RUN      ] Regex.RegexAny
test/re_test.cc:45: Failure
Value of: re.Match("a")
  Actual: false
Expected: true
test/re_test.cc:46: Failure
Value of: re.Match("aa")
  Actual: false
Expected: true
[  FAILED  ] Regex.RegexAny (0 ms)
[ RUN      ] Regex.RegexExact
test/re_test.cc:54: Failure
Value of: re.Match("a")
  Actual: false
Expected: true
[  FAILED  ] Regex.RegexExact (0 ms)
[ RUN      ] Regex.RegexComplicated
test/re_test.cc:60: Failure
Value of: re.Init("([0-9]+ )?(mon|low)key(s)?", NULL)
  Actual: false
Expected: true
test/re_test.cc:62: Failure
Value of: re.Match("something monkey hands")
  Actual: false
Expected: true
test/re_test.cc:63: Failure
Value of: re.Match("1 lowkey")
  Actual: false
Expected: true
test/re_test.cc:64: Failure
Value of: re.Match("19 monkeys")
  Actual: false
Expected: true
[  FAILED  ] Regex.RegexComplicated (3 ms)
[ RUN      ] Regex.InvalidNoErrorMessage
[       OK ] Regex.InvalidNoErrorMessage (0 ms)
[ RUN      ] Regex.Invalid
[       OK ] Regex.Invalid (0 ms)
[----------] 7 tests from Regex (3 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (3 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] Regex.RegexSimple
[  FAILED  ] Regex.RegexWildcard
[  FAILED  ] Regex.RegexAny
[  FAILED  ] Regex.RegexExact
[  FAILED  ] Regex.RegexComplicated

 5 FAILED TESTS

Looking into it a bit further.

Environment:

$ uname -a
Linux dougie 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ $CXX --version
Ubuntu clang version 3.4-1ubuntu3 (tags/RELEASE_34/final) (based on LLVM 3.4)
Target: x86_64-pc-linux-gnu
Thread model: posix

Same errors with:

$ $CXX --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix

@dmah42
Copy link
Member

dmah42 commented Aug 20, 2014

Also:

$ test/benchmark_test 
Reading /proc/self/cputime_ns failed. Using getrusage().
Starting new interval; stopping in 500000
Per-iteration overhead for doing nothing: 1.92423e-08
Skipping BM_Factorial
Skipping BM_CalculatePiRange
Skipping BM_CalculatePi
Skipping BM_CalculatePi
Skipping BM_CalculatePi
Skipping BM_SetInsert
Skipping BM_Sequential<std::vector<int>>
Skipping BM_Sequential<std::list<int>>
Skipping BM_StringCompare
Skipping BM_SetupTeardown
Skipping BM_LongTest
Benchmarking on 4 X 2501 MHz CPUs
2014/08/20-16:04:00
CPU scaling is enabled: Benchmark timings may be noisy.
DEBUG: Benchmark    Time(ns)    CPU(ns) Iterations
--------------------------------------------------
$

@mattyclarkson
Copy link
Contributor Author

OK, I haven't tested with clang yet. I've got Ubuntu, Debian, Cent OS, Fedora VMs so will do a round of testing tomorrow with clang and gcc

@mattyclarkson
Copy link
Contributor Author

Ah, the reason is that the check_cxx_source_compiles doesn't run the code. gcc passes this but the actual regular expression engine is unimplemented. I'll change it to try_run or something like that.

@mattyclarkson
Copy link
Contributor Author

New commits fix this. They use try_run to make sure the regular expression engine actually works. Tested on Ubuntu 14.04.

I've fixed up the compiler flag tests as well as they weren't actually working.

@mattyclarkson
Copy link
Contributor Author

I checked try_run support and it's in the docs back to 2.8.0 so should be OK on drone.io

@dmah42 dmah42 merged commit 9593e64 into google:master Aug 22, 2014
@dmah42
Copy link
Member

dmah42 commented Aug 22, 2014

Merged! Thanks!

Now to dig into why clang doesn't like std::regex :(

@dmah42
Copy link
Member

dmah42 commented Aug 22, 2014

libstdc++ vs libc++, apparently.

@mattyclarkson
Copy link
Contributor Author

I haven't actually used clang for a while but I was under the impression
they were fully c++11 and c++14 compliant? Guess it depends on version. At
least it will fallback to one of the other engines!
On 22 Aug 2014 19:34, "Dominic Hamon" notifications@github.com wrote:

libstdc++ vs libc++, apparently.


Reply to this email directly or view it on GitHub
#41 (comment).

@dmah42
Copy link
Member

dmah42 commented Aug 23, 2014

It is, but libstdc++ isn't. Or something.

We could use libc++ if it is available but I'm not sure it is worth it.
On 23 Aug 2014 05:52, "Matt Clarkson" notifications@github.com wrote:

I haven't actually used clang for a while but I was under the impression
they were fully c++11 and c++14 compliant? Guess it depends on version. At
least it will fallback to one of the other engines!
On 22 Aug 2014 19:34, "Dominic Hamon" notifications@github.com wrote:

libstdc++ vs libc++, apparently.

Reply to this email directly or view it on GitHub
#41 (comment).

Reply to this email directly or view it on GitHub
#41 (comment).

@mattyclarkson
Copy link
Contributor Author

Ah, understood. Problem for another day... On to the Windows porting!
On 23 Aug 2014 15:31, "Dominic Hamon" notifications@github.com wrote:

It is, but libstdc++ isn't. Or something.

We could use libc++ if it is available but I'm not sure it is worth it.
On 23 Aug 2014 05:52, "Matt Clarkson" notifications@github.com wrote:

I haven't actually used clang for a while but I was under the
impression
they were fully c++11 and c++14 compliant? Guess it depends on version.
At
least it will fallback to one of the other engines!
On 22 Aug 2014 19:34, "Dominic Hamon" notifications@github.com wrote:

libstdc++ vs libc++, apparently.

Reply to this email directly or view it on GitHub
#41 (comment).

Reply to this email directly or view it on GitHub
#41 (comment).


Reply to this email directly or view it on GitHub
#41 (comment).

@mattyclarkson mattyclarkson deleted the regex branch August 28, 2014 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants