Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks comparing with other DFA-Regex-Engines? #10

Open
almondtools opened this issue Feb 27, 2022 · 6 comments
Open

Benchmarks comparing with other DFA-Regex-Engines? #10

almondtools opened this issue Feb 27, 2022 · 6 comments

Comments

@almondtools
Copy link

I have written a regex benchmark comparing different regex engines for Java. Lately I found your approach and would be curious how it performs compared to the other alternatives:

  • You can run the benchmark on your own
  • If your project was available as artifact in a maven repository I would offer to extend regexbenchmark by your project and start a new benchmark.
@hyperpape
Copy link
Owner

Thanks for the note. I'll have a look at your benchmarks, and keep them in mind.

Right now, I have a few things that I think need to be addressed before I push this to maven, and cut a 0.1 release.

@hyperpape
Copy link
Owner

@almondtools I was looking at the benchmarks--are there any scripts for handling the output?

@almondtools
Copy link
Author

I am not certain to understand ... I would suggest that you implement a triple

  • A benchmark extends MatcherBenchmark
  • An automaton implements Automaton which is referenced in the benchmark (an which is a wrapper of your algorithm)
  • A test extends MatcherBenchmarkTest

The tests search a pattern in a sample and compare the number of found results with a reference implementation. It is not checked whether all results are found at the correct location. I think the large test corpus (of the scaling benchmarks) prevents that a benchmark passes with pure luck.

Does it help you?

@hyperpape
Copy link
Owner

hyperpape commented Aug 15, 2023

Sorry, my earlier question was a bit vague.

Yes, I was able to implement those in a branch I have locally, and doing so helped me find two bugs in needle.

However, when I run the tests, it seems to give mostly unstructured output to the console. Is there a good technique for turning that data into a table or other format that's good for analysis so I can easily compare my library to others? I didn't know if I missed something in your repo that does that, or if there's a nicer way than reading the results and extracting data by hand.

@almondtools
Copy link
Author

Probably you found the files *bench*.cmd. They write the benchmark data to csv and text output (examples are attached), Unfortunately I did not develop tools to analyze or visualize the benchmark results. I did this for stringbench, but it was much effort and is probably not easy to reuse.

I also noticed that the benchmarks will have to be adjusted for other versions of java/jmh, hopefully you have solved this already.

result.csv
result.txt

@hyperpape
Copy link
Owner

Whoops, my apologies. I overlooked the command files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants