GitHub - kunalspathak/Antigen: Fuzzing tool to validate .NET code generation

What is it?

Antigen is a fuzzer that generates random C# programs on the fly to test .NET's RyuJIT.

How does it work?

Antigen generates random C# programs and execute them in baseline and test mode. In baseline mode there is minimal optimizations enabled, while in test mode, it executes with full optimization, and some other stress switches to turn ON/OFF certain optimizations. It then compares the output of baseline and test execution and reports back the programs if their output didn't match or if there were any asserts hit during execution.

How does it help uncover issues?

Most of the unit tests are handwritten and might test a specific scenario so it might be hard to catch hidden issues. Antigen, like any other fuzzers, usually generate random C# code, sometimes having long expressions and statements that might not even be written in real world code patterns but does a great stress testing of the compilers and code generation component. By fuzzing, if there is any hidden issue in code generation, either we would hit the asserts in the compiler code or worst would give different output than baseline meaning the compiler didn't generate the machine code correctly. In either way, it uncovers several compiler issues.

What are the examples of expressions generated?

Antigen currently generates range of expressions:

Literals: All primitive types like int, long, float, char, string, etc.
Variable references
Unary and Binary operations
Assignments
Struct and nested struct declaration and usage
Method declaration
Method calls

Below is a sample expression generated by Antigen.

s_double_10 -= ((double)(((double)(((double)(((double)(((double)(((double)(LeafMethod4() - -2)) + ((double)(s_double_10 %= ((double)((s_double_10) + 58)))))) - ((double)(double_55 += ((double)(s_double_10 + LeafMethod4())))))) % ((double)((((double)(((double)(((double)(double_55 /= ((double)((s_double_10) + 22)))) + double_28)) / ((double)((((double)(s_double_10 *= ((double)(LeafMethod4() + double_28))))) + 96))))) + 2)))) + ((double)(((double)(((double)(((double)(p_double_44 += double_55)) % ((double)((p_double_44) + 44)))) - ((double)(((double)(double_55 - p_double_44)) + ((double)(p_double_44 += double_28)))))) + ((double)(((double)(double_28 += ((double)(s_double_10 / ((double)((double_28) + 7)))))) + ((double)(((double)(double_55 % ((double)((s_double_10) + 30)))) + LeafMethod4())))))))) * ((double)(((double)(((double)(((double)(double_28 + ((double)(double_55 * LeafMethod4())))) * ((double)(double_28 /= ((double)((((double)(LeafMethod4() + double_28))) + 72)))))) * ((double)(((double)(((double)(double_28 * LeafMethod4())) % ((double)((((double)(double_55 % ((double)((LeafMethod4()) + 57))))) + 8)))) + ((double)(((double)(LeafMethod4() % ((double)((s_double_10) + 67)))) * ((double)(double_28 += s_double_10)))))))) + p_double_44))));

Note: Antigen currently cast every right side expression to the left side variable type and there is a TODO item to get rid of them. Likewise, there is a work item to eliminate unwanted parenthesis.

What are the examples of statements generated?

Antigen currently generates range of statements:

Variable Declaration
Assignment statements
if-else statements
Loops: for, while-do, do-while
try-catch-finally
switch-case

Here is a section of program that Antigen generated:

byte_1 &= ((byte)(s_byte_1 &= ((byte)(((byte)(s_byte_1 %= ((byte)((((byte)(((byte)(((byte)(byte_1 * s_byte_1)) % ((byte)((((byte)(s_byte_1 ^= s_byte_1))) + 96)))) % ((byte)((byte_1) + 43))))) + 77)))) + ((byte)(LeafMethod1() + ((byte)(((byte)(((byte)(s_byte_1 - s_byte_1)) / ((byte)((((byte)(LeafMethod1() * LeafMethod1()))) + 65)))) - ((byte)(((byte)(s_byte_1 / ((byte)((LeafMethod1()) + 61)))) | ((byte)(byte_1 * byte_1))))))))))));
int __loopvar9 = s_loopInvariant - 12, __loopSecondaryVar9_0 = s_loopInvariant - 10;
do
{
    __loopvar9 += 4;
    if (__loopvar9 > s_loopInvariant + 4)
        break;
    __loopSecondaryVar9_0 += 3;
    s_double_4 %= ((double)(double_4 = LeafMethod4()));
    if (bool_0)
    {
        int __loopvar7 = s_loopInvariant + 10;
        while ((((bool)(bool_0 = ((bool)(bool_0 = bool_0))))))
        {
            __loopvar7 -= 4;
            if (__loopvar7 <= s_loopInvariant - 4)
                break;
            LeafMethod10();
            long_7 ^= ((long)(((long)(long_7 %= ((long)((((long)(((long)(((long)(long_7 & LeafMethod7())) - ((long)(long_7 = LeafMethod7())))) + ((long)(s_long_7 <<= ((int)(((int)(LeafMethod6() & LeafMethod6())) ^ ((int)(p_int_5 ^= int_6))))))))) + 90)))) * long_7));
            LeafMethod15();
        }

        LeafMethod11();
        s_uint_12 <<= s_int_6;
    }
    else
    {
        int __loopvar8 = s_loopInvariant, __loopSecondaryVar8_0 = s_loopInvariant - 10;
        for (;; __loopSecondaryVar8_0 += 3)
        {
            __loopvar8 += 4;
            if (__loopvar8 > s_loopInvariant + 16)
                break;
            s_sbyte_8 &= ((sbyte)(((sbyte)(sbyte_8 -= ((sbyte)(s_sbyte_8 = ((sbyte)(sbyte_8 >> ((int)(((int)(s_int_6 - LeafMethod6())) + ((int)(int_6 - LeafMethod6())))))))))) * s_sbyte_8));
            sbyte_8 /= ((sbyte)(((sbyte)(((sbyte)(LeafMethod8() << ((int)(((int)(int_6 >> 1)) % ((int)((((int)(int_6 += ((int)(LeafMethod6() & LeafMethod6()))))) + 28)))))) * ((sbyte)(((sbyte)(sbyte_8 <<= s_int_6)) % ((sbyte)((((sbyte)(sbyte_8 &= sbyte_8))) + 31)))))) ^ ((sbyte)(((sbyte)(s_sbyte_8 <<= LeafMethod6())) >> LeafMethod6()))));
            s2_15.ulong_1 %= ((ulong)(ulong_13 ^ ((ulong)(((ulong)(((ulong)(ulong_13 = ((ulong)(((ulong)(s_ulong_13 -= s_ulong_13)) * ((ulong)(s2_15.ulong_1 %= ((ulong)((s2_15.ulong_1) + 51)))))))) * ((ulong)(((ulong)(s_ulong_13 % ((ulong)((((ulong)(ulong_13 &= s2_15.ulong_1))) + 10)))) / ((ulong)((LeafMethod13()) + 71)))))) * ulong_13))));
            s_int_6 &= ((int)(LeafMethod6() % ((int)((((int)(s_int_6 -= ((int)(((int)(((int)(4 | ((int)(s_int_6 |= s_int_6)))) - s_int_6)) | ((int)(((int)(((int)(s_int_6 % ((int)((1) + 26)))) ^ ((int)(int_6 ^= s_int_6)))) / ((int)((((int)(s_int_6 = ((int)(int_6 - int_6))))) + 11))))))))) + 12))));
            long_7 ^= ((long)(((long)(((long)(((long)(s_long_7 <<= ((int)(p_int_5 >>= s_int_6)))) << ((int)(int_6 -= ((int)(((int)(p_int_5 -= int_6)) % ((int)((((int)(LeafMethod6() - s_int_6))) + 22)))))))) ^ ((long)(long_7 * LeafMethod7())))) | ((long)(((long)(((long)(long_7 | long_7)) >> ((int)(s_int_6 -= LeafMethod6())))) ^ s_long_7))));
            LeafMethod10();
            uint_12 = ((uint)(uint_12 - ((uint)(((uint)(p_uint_0 |= ((uint)(p_uint_0 >>= ((int)(((int)(s_int_6 |= int_6)) >> ((int)(s_int_6 << 94)))))))) * ((uint)(((uint)(((uint)(((uint)(s_uint_12 + LeafMethod12())) / ((uint)((uint_12) + 22)))) % ((uint)((((uint)(((uint)(s_uint_12 * LeafMethod12())) % ((uint)((((uint)(s_uint_12 + LeafMethod12()))) + 51))))) + 54)))) % ((uint)((((uint)(((uint)(((uint)(uint_12 % ((uint)((LeafMethod12()) + 10)))) + ((uint)(LeafMethod12() / ((uint)((LeafMethod12()) + 66)))))) & ((uint)(s_uint_12 = ((uint)(uint_12 | uint_12))))))) + 86))))))));
            s2_15.ulong_1 &= ulong_13;
            s_ushort_11 *= ushort_11;
            sbyte_8 &= ((sbyte)(sbyte_8 = LeafMethod8()));
        }
    }

    LeafMethod3();
}
while ((((bool)(int_6 == ((int)(((int)(((int)(int_6 * ((int)(((int)(s_int_6 /= ((int)((LeafMethod6()) + 43)))) ^ ((int)(int_6 ^= int_6)))))) & ((int)(int_6 -= ((int)(s_int_6 &= ((int)(int_6 % ((int)((p_int_5) + 21)))))))))) % ((int)((((int)(s_int_6 *= ((int)(((int)(int_6 &= ((int)(p_int_5 >> s_int_6)))) & ((int)(((int)(s_int_6 -= LeafMethod6())) ^ ((int)(LeafMethod6() | s_int_6))))))))) + 48))))))));

There is a long list of other statements, functionality that needs to be added like having more than 1 class, single dimension and multi-dimension arrays, SIMD APIs, etc.

How many test cases are validated / hour?

To recap, Antigen generates C# program, runs it using Corerun in baseline mode and another Corerun that runs in test mode. There is fair amount of improvement that can be done to this process, but for now, with this model, on a dual-core machine, with 2 threads, Antigen can generate and validate 1000 test cases / hour.

Where is it used?

Antigen is incorporated in dotnet/runtime repository to run weekly and on-demand on PRs that are making changes to RyuJIT.

What are the real issues found?

Below are some of the examples of .NET issues found by Antigen:

Can we get a reduce repro code?

Antigen also comes with a component called Trimmer which would trim the C# code as much as possible while still making sure that the original issue reproduces. Currently, it has very limited capability and is slow, but there are plans to improve it going forward.

What's up with the name "Antigen"?

"Antigen" name was chosen as a reminder that this tool was developed during Covid era. The name comes from one of the Covid-19 testing methodology "Rapid Antigen test (RAT)". Just as RAT was used to detect covid symptoms, Antigen tool is used to detect any issues in .NET.

Are there any other fuzzers?

There are lot of fuzzers to test compilers. One of them is Fuzzlyn developed by Jakob Botsch Nielsen.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
Antigen		Antigen
Trimmer		Trimmer
Utils		Utils
.editorconfig		.editorconfig
.gitignore		.gitignore
Antigen.sln		Antigen.sln
LICENSE		LICENSE
README.md		README.md
progress.md		progress.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antigen

Antigen

Trimmer

Trimmer

Utils

Utils

.editorconfig

.editorconfig

.gitignore

.gitignore

Antigen.sln

Antigen.sln

LICENSE

LICENSE

README.md

README.md

progress.md

progress.md

Repository files navigation

What is it?

How does it work?

How does it help uncover issues?

What are the examples of expressions generated?

What are the examples of statements generated?

How many test cases are validated / hour?

Where is it used?

What are the real issues found?

Can we get a reduce repro code?

What's up with the name "Antigen"?

Are there any other fuzzers?

About

Releases

Packages

Languages

License

kunalspathak/Antigen

Folders and files

Latest commit

History

Repository files navigation

What is it?

How does it work?

How does it help uncover issues?

What are the examples of expressions generated?

What are the examples of statements generated?

How many test cases are validated / hour?

Where is it used?

What are the real issues found?

Can we get a reduce repro code?

What's up with the name "Antigen"?

Are there any other fuzzers?

About

Resources

License

Stars

Watchers

Forks

Languages