Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pattern(regex) for StringArbitrary #68

Open
mhyeon-lee opened this issue Jul 27, 2019 · 26 comments
Open

Support pattern(regex) for StringArbitrary #68

mhyeon-lee opened this issue Jul 27, 2019 · 26 comments

Comments

@mhyeon-lee
Copy link
Contributor

mhyeon-lee commented Jul 27, 2019

Testing Problem

I want to generate a string of sophisticated patterns.
It would be nice to support string patterning like REGEX.

Suggested Solution

  • Email pattern string
Arbitraries.strings()
     .pattern("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-........")

Discussion

@jlink
Copy link
Collaborator

jlink commented Jul 27, 2019

Supporting full regex syntax sounds like a monumental task. What are the basic features you need?

@mhyeon-lee
Copy link
Contributor Author

mhyeon-lee commented Jul 28, 2019

@jlink
I need to generate a custom formatted string.
Now, I would like to use the format below.

  • Email format
  • URL format
  • CreditCard
  • Tel Number
  • exclude specific char ( <-> SpringArbitrary#withChars)
  • ....

@jlink
Copy link
Collaborator

jlink commented Jul 28, 2019

What about being a bit more explicit about how those different formats compose, e.g.

@Provide
Arbitrary<String> emails() {
	Arbitrary<String> part = 
		Arbitraries.strings()
			.alpha().numeric()
			.withChars("!#$%&'*+/=?^_`{|}~-".toCharArray())
			.ofMinLength(1);
	Arbitrary<List<String>> nameParts = part.list().ofMinSize(1).ofMaxSize(5);
	Arbitrary<List<String>> domainParts = 
		Arbitraries
			.strings().alpha().numeric().ofMinLength(1).ofMaxLength(20)
			.list().ofMinSize(1).ofMaxSize(5);

	Arbitrary<String> topLevelDomains = Arbitraries.of("com", "org", "net");
	return Combinators.combine(nameParts, domainParts, topLevelDomains)
			  .as((np, dp, tld) -> {
				  String name = String.join(".", np);
				  String domain = String.join(".", dp);
				  return String.format("%s@%s.%s", name, domain, tld);
			  });
	}

It's definitely more verbose but IMO communicates better how an email (or any other format) is composed.

@mhyeon-lee
Copy link
Contributor Author

@jlink
Thank you for your guidance.
I am currently using it in a similar way.

@jlink
Copy link
Collaborator

jlink commented Jul 28, 2019

The feature is in the backlog. But not scheduled for implementation yet. Thanks for using jqwik!

@mhyeon-lee
Copy link
Contributor Author

There seems to be a javascript library for creating strings with regex.
I haven't checked how it works.

https://www.browserling.com/tools/text-from-regex

@jlink
Copy link
Collaborator

jlink commented Sep 3, 2019

@mhyeon-lee Thanks for the hint. I’ll check it out.

@abargnesi
Copy link

There is also mifmif/Generex which uses cs-au-dk/dk.brics.automaton under the covers to generate strings from regular expressions. Both of these are Java libraries.

@jlink
Copy link
Collaborator

jlink commented Oct 4, 2019

The problem with external libs in a testing framework is dependency on something that might possibly be in the (transitive) dependency list of a subject under test. That's why I'm fighting hard to not have those external deps in jqwik.

One possible solution is to provide regex generation as external 3rd party extension.
@mhyeon-lee Would you be willing to go for such an extension with my help?

@mhyeon-lee
Copy link
Contributor Author

@jlink
I'm already wrapping jqwik and trying various extensions.
If it supports 3rd party extension, It would be useful for me.

@jlink
Copy link
Collaborator

jlink commented Oct 12, 2019

An Extension API is what I’m currently working on.
For a RegexArbitrary I think you would not need it though.

@luvarqpp
Copy link
Contributor

Just funny note about valid email address and regexp. It seems way more complex to cover all valid email address: https://www.regular-expressions.info/email.html

@mmerdes
Copy link

mmerdes commented Jan 29, 2021

this might help:

https://westergaard.eu/2019/08/generating-test-data-using-regular-expressions-with-java/

@jlink
Copy link
Collaborator

jlink commented Feb 14, 2021

@mmerdes The article you link to really seems to offer a low effort approach that might work. Don’t you want to give it a try?

@mmerdes
Copy link

mmerdes commented Mar 19, 2021

yes

@mmerdes
Copy link

mmerdes commented Mar 22, 2021

also helpful: this tool for generating regexes from examples

https://github.com/pemistahl/grex

@jlink
Copy link
Collaborator

jlink commented Mar 22, 2021

This would enable an interesting workflow: Start with a few examples and let the lib generate regexes from that, which can then be validated/modified by hand.

@jlink
Copy link
Collaborator

jlink commented May 6, 2022

@jlink
Copy link
Collaborator

jlink commented May 6, 2022

Moreover, the Chain abstraction introduced in 1.7.0 could also be helpful for implementing regexs.

Here's an example: https://github.com/jlink/jqwik/blob/main/documentation/src/test/java/net/jqwik/docs/state/RegexChainExample.java

@SimY4
Copy link

SimY4 commented Jul 15, 2022

I just created a small project that supports generation of regex constrained strings for jqwik. Have a look and leave feedback if you're keen:

https://github.com/SimY4/coregex

jqwik usage example in unit tests: https://github.com/SimY4/coregex/blob/main/jqwik/src/test/java/com/github/simy4/coregex/jqwik/CoregexArbitraryConfiguratorTest.java

@jlink
Copy link
Collaborator

jlink commented Jul 16, 2022

@SimY4 Cool project. Just browsed through the jqwik-related code and stumbled upon the shrinking. It looks (I may be wrong though) as if shrinking is not deterministic since some kind of RNG is being used. If that’s the case it would break repeatability of test runs. One option I see to solve this problem is to generate a seed while generating the initial shrinkable and then use this seed as input to the RNG for shrinking. But that’s just an idea.

@adam-waldenberg
Copy link

Apart from https://github.com/SimY4/coregex as suggested by @SimY4, there are also some more mature/old libraries available to handle this. If introducing an external depedency for this is acceptable, then this could be pretty easily implemented with Xeger or Rxgen;

https://github.com/agarciadom/xeger
https://github.com/curious-odd-man/RgxGen

This would allow for a pretty quick implementation of Arbitraries.strings().pattern/regex() and a @Chars(pattern/regex = "").

@jlink
Copy link
Collaborator

jlink commented Nov 27, 2022

I haven’t looked at the exact libraries, but keep in mind that shrinking capabilities often account for a major part of arbitrary implementation. Often data generation libs do not cover that at all.
One option would be to create a jqwik extension module with this 3rd party dependency. Maybe you want to give it a try, @adam-waldenberg; I’d be willing to give support.

@adam-waldenberg
Copy link

adam-waldenberg commented Nov 27, 2022

@jlink Yes.... I could take a look at that. Took a quick look at the library, and looks very simple... It even has functionality to estimate the amount of unique values a regexp can generate.

Question, what is the preferred way for an extension to provide a new Arbitrary type like Arbitraries.strings().pattern(...) for example? ... Beyond that its just a @RegexChars annotation or similar.

@jlink
Copy link
Collaborator

jlink commented Nov 27, 2022

ˋArbitraries.strings()ˋ is probably the wrong starting point since regex generated strings won’t have the same configuration capabilities as character-based strings. You could have a look at the web module, which starts its DSL from a freshly introduced class ˋWebˋ. Maybe st like ˋRegex.fromPattern(..)ˋ could be the starting point for a regex module. And a ˋ@FromRegexˋ annotation as you suggested.

@LvargaDS
Copy link

Just one note according do human aspect. If we would be able to use regexp to generate some random strings, this regexp would be most probably copied from code under the test. It would lower than quality of PBT. PBT should re-define needs and invariant (properties) of some processes. And one thing is also to define input requirements again and have them separate to implementation.

Please, at least give some WARNING into documentation to this feature, when it will be available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants