Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a{0,50} generate short string #54

Open
gianmarialari opened this issue Nov 11, 2019 · 8 comments
Open

a{0,50} generate short string #54

gianmarialari opened this issue Nov 11, 2019 · 8 comments

Comments

@gianmarialari
Copy link

If I write a regex expression like a{0,100} I expect that Xeger.Generate() generates one of the possible matching sequences ("a", "aa","aaa","aaa" ...... "aaaaaaaaaaaaa[....]aaaaaaaaaaaaaa").

But Xeger.Generate() almost never generate sequence longer than 15. On stackoverflow Issue generating multiple occurrence with Fare/Xeger they told me

It looks like Xeger is randomly selecting possible transitions at each step and then appending the string matching that transition to the result. For your regex, once the matching string has 1 a, there are two possible allowed transitions: "Add another a" or "End of string". [...]

So, If I understood correctly this means that the probability to get long string is tremendously low.

Is there any simple way to make a{0,100} really generate sequences between 0 and 100 characters long? (I mean, with similar frequency:)).
Thank you, g.

@moodmosaic
Copy link
Owner

Thank you for reporting this, @gianmarialari.

This look like a xeger issue to me, and it'd be better if we could ping the maintainer(s) of the current upstream/java version. I believe, they can be found at https://github.com/bluezio/xeger.

I'd be really interested hearing any thoughts on this—perhaps this is something that has been already improved, and so we can just update the C#/.NET fork with what has changed over there.

@gianmarialari
Copy link
Author

gianmarialari commented Nov 14, 2019 via email

@moodmosaic
Copy link
Owner

Great, just saw it, agarciadom/xeger#3. Let's see what we get back.

@gianmarialari
Copy link
Author

Ciao Moodmosaic,

I post here a small program to transform a string like "a{n1 ,m1} bill{ n2, m2} carl{3}" to "a{r1} bill{r2} carl{3}" where r is a random number between {n,m}.

using System;
using System.Text.RegularExpressions;

namespace RegexQuantifier
{
    class Program
    {
        static string ConvertQuantifier(string input)
        //Convert a string containing any occurence of "{n,m}" in "{r}" with r=rnd(n,m);
        {
            string result = input;
            foreach (Match match in Regex.Matches(input, pattern: $@"\{{\s*\d+\s*,\s*\d+\s*\}}"))
            {
                string quantifier = match.Groups[0].Value;
                int min = int.Parse(Regex.Match(input: quantifier, pattern: $@"\d+").Value);
                int max = int.Parse(Regex.Match(input: quantifier, pattern: $@"\d+").NextMatch().Value);
                int r = new Random().Next(min, max + 1);
                result = Regex.Replace(input: result, pattern: quantifier, replacement: "{" + r.ToString() + "}");
            }
            return result;
        }
        static void Main(string[] args)
        {
            string input = "a{10 ,20} bill{ 0,   20} carl{3}";
            Console.WriteLine("Source string: " + input);
            Console.WriteLine("Output string: " + ConvertQuantifier(input));
        }
    }
}

My program probably contains few errors, it's surely not efficient and it could surely be better written but I hope other can enjoy it.

Thank you Moodmosaic.
G.

@moodmosaic
Copy link
Owner

Thank you, @gianmarialari 👍

@gianmarialari
Copy link
Author

gianmarialari commented Nov 19, 2019

@moodmosaic, here is a new version of the previous program.

The ConvertQuantifiers function is written in a more modular way, and hopefully a bit clearer. More important it fixes a bug. Unfortunately I'm not a regex expert so I'm not able to say if it works with all the regex string, but if I understood correctly regex quantifiers syntax, it should :)

I hope others will found it useful.

using System;
using System.Text.RegularExpressions;

namespace RegexQuantifier
{
    class Program
    {
        static string ConvertQuantifiers(string input)
        //Convert a string containing one or more occurences of {n,m} in {r} with r=rnd(n,m)
        {
            string EscapeQuantifiers(string inputQ) => inputQ.Replace($@"{{", $@"\{{").Replace($@"}}", $@"\}}");
            string TransformMinMaxToR(string inputMM) //Transfom {n,m} to {r} with r=rnd(n,m)
            {
                int min = int.Parse(Regex.Match(input: inputMM, pattern: $@"\d+").Value);
                int max = int.Parse(Regex.Match(input: inputMM, pattern: $@"\d+").NextMatch().Value);
                int r = new Random().Next(min, max + 1);
                return "{" + r.ToString() + "}";
            }

            string result = input;
            foreach (Match match in Regex.Matches(input, pattern: $@"\{{\s*\d+\s*,\s*\d+\s*\}}"))
            {
                string minMax = match.Groups[0].Value;
                string r = TransformMinMaxToR(minMax);
                string minMaxExcaped = EscapeQuantifiers(minMax);
                result = Regex.Replace(input: result, pattern: minMaxExcaped, replacement: r);
            }
            return result;
        }
        static void Main(string[] args)
        {
            Console.WriteLine("Given a regex pattern it replaces each quantifiers {n,m} to {r} with r=rnd(n,m)");
            Console.WriteLine("Example:");
            string input = "a{10 ,20} bill{ 0,   20} carl{3} (a[bc]{3,40})?xyz|ghi{0,10}.*hello";
            Console.WriteLine("Input : " + input);
            Console.WriteLine("Output: " + ConvertQuantifiers(input));
        }
    }
}

@moodmosaic
Copy link
Owner

That's great! Perhaps we can add some examples in the library!

@gianmarialari
Copy link
Author

If you think I can help please let me know, I will be glad to help.
Ciao, g.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants