Skip to content

kunalspathak/Antigen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is it?

Antigen is a fuzzer that generates random C# programs on the fly to test .NET's RyuJIT.

How does it work?

Antigen generates random C# programs and execute them in baseline and test mode. In baseline mode there is minimal optimizations enabled, while in test mode, it executes with full optimization, and some other stress switches to turn ON/OFF certain optimizations. It then compares the output of baseline and test execution and reports back the programs if their output didn't match or if there were any asserts hit during execution.

How does it help uncover issues?

Most of the unit tests are handwritten and might test a specific scenario so it might be hard to catch hidden issues. Antigen, like any other fuzzers, usually generate random C# code, sometimes having long expressions and statements that might not even be written in real world code patterns but does a great stress testing of the compilers and code generation component. By fuzzing, if there is any hidden issue in code generation, either we would hit the asserts in the compiler code or worst would give different output than baseline meaning the compiler didn't generate the machine code correctly. In either way, it uncovers several compiler issues.

What are the examples of expressions generated?

Antigen currently generates range of expressions:

  • Literals: All primitive types like int, long, float, char, string, etc.
  • Variable references
  • Unary and Binary operations
  • Assignments
  • Struct and nested struct declaration and usage
  • Method declaration
  • Method calls

Below is a sample expression generated by Antigen.

s_double_10 -= ((double)(((double)(((double)(((double)(((double)(((double)(LeafMethod4() - -2)) + ((double)(s_double_10 %= ((double)((s_double_10) + 58)))))) - ((double)(double_55 += ((double)(s_double_10 + LeafMethod4())))))) % ((double)((((double)(((double)(((double)(double_55 /= ((double)((s_double_10) + 22)))) + double_28)) / ((double)((((double)(s_double_10 *= ((double)(LeafMethod4() + double_28))))) + 96))))) + 2)))) + ((double)(((double)(((double)(((double)(p_double_44 += double_55)) % ((double)((p_double_44) + 44)))) - ((double)(((double)(double_55 - p_double_44)) + ((double)(p_double_44 += double_28)))))) + ((double)(((double)(double_28 += ((double)(s_double_10 / ((double)((double_28) + 7)))))) + ((double)(((double)(double_55 % ((double)((s_double_10) + 30)))) + LeafMethod4())))))))) * ((double)(((double)(((double)(((double)(double_28 + ((double)(double_55 * LeafMethod4())))) * ((double)(double_28 /= ((double)((((double)(LeafMethod4() + double_28))) + 72)))))) * ((double)(((double)(((double)(double_28 * LeafMethod4())) % ((double)((((double)(double_55 % ((double)((LeafMethod4()) + 57))))) + 8)))) + ((double)(((double)(LeafMethod4() % ((double)((s_double_10) + 67)))) * ((double)(double_28 += s_double_10)))))))) + p_double_44))));

Note: Antigen currently cast every right side expression to the left side variable type and there is a TODO item to get rid of them. Likewise, there is a work item to eliminate unwanted parenthesis.

What are the examples of statements generated?

Antigen currently generates range of statements:

  • Variable Declaration
  • Assignment statements
  • if-else statements
  • Loops: for, while-do, do-while
  • try-catch-finally
  • switch-case

Here is a section of program that Antigen generated:

byte_1 &= ((byte)(s_byte_1 &= ((byte)(((byte)(s_byte_1 %= ((byte)((((byte)(((byte)(((byte)(byte_1 * s_byte_1)) % ((byte)((((byte)(s_byte_1 ^= s_byte_1))) + 96)))) % ((byte)((byte_1) + 43))))) + 77)))) + ((byte)(LeafMethod1() + ((byte)(((byte)(((byte)(s_byte_1 - s_byte_1)) / ((byte)((((byte)(LeafMethod1() * LeafMethod1()))) + 65)))) - ((byte)(((byte)(s_byte_1 / ((byte)((LeafMethod1()) + 61)))) | ((byte)(byte_1 * byte_1))))))))))));
int __loopvar9 = s_loopInvariant - 12, __loopSecondaryVar9_0 = s_loopInvariant - 10;
do
{
    __loopvar9 += 4;
    if (__loopvar9 > s_loopInvariant + 4)
        break;
    __loopSecondaryVar9_0 += 3;
    s_double_4 %= ((double)(double_4 = LeafMethod4()));
    if (bool_0)
    {
        int __loopvar7 = s_loopInvariant + 10;
        while ((((bool)(bool_0 = ((bool)(bool_0 = bool_0))))))
        {
            __loopvar7 -= 4;
            if (__loopvar7 <= s_loopInvariant - 4)
                break;
            LeafMethod10();
            long_7 ^= ((long)(((long)(long_7 %= ((long)((((long)(((long)(((long)(long_7 & LeafMethod7())) - ((long)(long_7 = LeafMethod7())))) + ((long)(s_long_7 <<= ((int)(((int)(LeafMethod6() & LeafMethod6())) ^ ((int)(p_int_5 ^= int_6))))))))) + 90)))) * long_7));
            LeafMethod15();
        }

        LeafMethod11();
        s_uint_12 <<= s_int_6;
    }
    else
    {
        int __loopvar8 = s_loopInvariant, __loopSecondaryVar8_0 = s_loopInvariant - 10;
        for (;; __loopSecondaryVar8_0 += 3)
        {
            __loopvar8 += 4;
            if (__loopvar8 > s_loopInvariant + 16)
                break;
            s_sbyte_8 &= ((sbyte)(((sbyte)(sbyte_8 -= ((sbyte)(s_sbyte_8 = ((sbyte)(sbyte_8 >> ((int)(((int)(s_int_6 - LeafMethod6())) + ((int)(int_6 - LeafMethod6())))))))))) * s_sbyte_8));
            sbyte_8 /= ((sbyte)(((sbyte)(((sbyte)(LeafMethod8() << ((int)(((int)(int_6 >> 1)) % ((int)((((int)(int_6 += ((int)(LeafMethod6() & LeafMethod6()))))) + 28)))))) * ((sbyte)(((sbyte)(sbyte_8 <<= s_int_6)) % ((sbyte)((((sbyte)(sbyte_8 &= sbyte_8))) + 31)))))) ^ ((sbyte)(((sbyte)(s_sbyte_8 <<= LeafMethod6())) >> LeafMethod6()))));
            s2_15.ulong_1 %= ((ulong)(ulong_13 ^ ((ulong)(((ulong)(((ulong)(ulong_13 = ((ulong)(((ulong)(s_ulong_13 -= s_ulong_13)) * ((ulong)(s2_15.ulong_1 %= ((ulong)((s2_15.ulong_1) + 51)))))))) * ((ulong)(((ulong)(s_ulong_13 % ((ulong)((((ulong)(ulong_13 &= s2_15.ulong_1))) + 10)))) / ((ulong)((LeafMethod13()) + 71)))))) * ulong_13))));
            s_int_6 &= ((int)(LeafMethod6() % ((int)((((int)(s_int_6 -= ((int)(((int)(((int)(4 | ((int)(s_int_6 |= s_int_6)))) - s_int_6)) | ((int)(((int)(((int)(s_int_6 % ((int)((1) + 26)))) ^ ((int)(int_6 ^= s_int_6)))) / ((int)((((int)(s_int_6 = ((int)(int_6 - int_6))))) + 11))))))))) + 12))));
            long_7 ^= ((long)(((long)(((long)(((long)(s_long_7 <<= ((int)(p_int_5 >>= s_int_6)))) << ((int)(int_6 -= ((int)(((int)(p_int_5 -= int_6)) % ((int)((((int)(LeafMethod6() - s_int_6))) + 22)))))))) ^ ((long)(long_7 * LeafMethod7())))) | ((long)(((long)(((long)(long_7 | long_7)) >> ((int)(s_int_6 -= LeafMethod6())))) ^ s_long_7))));
            LeafMethod10();
            uint_12 = ((uint)(uint_12 - ((uint)(((uint)(p_uint_0 |= ((uint)(p_uint_0 >>= ((int)(((int)(s_int_6 |= int_6)) >> ((int)(s_int_6 << 94)))))))) * ((uint)(((uint)(((uint)(((uint)(s_uint_12 + LeafMethod12())) / ((uint)((uint_12) + 22)))) % ((uint)((((uint)(((uint)(s_uint_12 * LeafMethod12())) % ((uint)((((uint)(s_uint_12 + LeafMethod12()))) + 51))))) + 54)))) % ((uint)((((uint)(((uint)(((uint)(uint_12 % ((uint)((LeafMethod12()) + 10)))) + ((uint)(LeafMethod12() / ((uint)((LeafMethod12()) + 66)))))) & ((uint)(s_uint_12 = ((uint)(uint_12 | uint_12))))))) + 86))))))));
            s2_15.ulong_1 &= ulong_13;
            s_ushort_11 *= ushort_11;
            sbyte_8 &= ((sbyte)(sbyte_8 = LeafMethod8()));
        }
    }

    LeafMethod3();
}
while ((((bool)(int_6 == ((int)(((int)(((int)(int_6 * ((int)(((int)(s_int_6 /= ((int)((LeafMethod6()) + 43)))) ^ ((int)(int_6 ^= int_6)))))) & ((int)(int_6 -= ((int)(s_int_6 &= ((int)(int_6 % ((int)((p_int_5) + 21)))))))))) % ((int)((((int)(s_int_6 *= ((int)(((int)(int_6 &= ((int)(p_int_5 >> s_int_6)))) & ((int)(((int)(s_int_6 -= LeafMethod6())) ^ ((int)(LeafMethod6() | s_int_6))))))))) + 48))))))));

There is a long list of other statements, functionality that needs to be added like having more than 1 class, single dimension and multi-dimension arrays, SIMD APIs, etc.

How many test cases are validated / hour?

To recap, Antigen generates C# program, runs it using Corerun in baseline mode and another Corerun that runs in test mode. There is fair amount of improvement that can be done to this process, but for now, with this model, on a dual-core machine, with 2 threads, Antigen can generate and validate 1000 test cases / hour.

Where is it used?

Antigen is incorporated in dotnet/runtime repository to run weekly and on-demand on PRs that are making changes to RyuJIT.

What are the real issues found?

Below are some of the examples of .NET issues found by Antigen:

Can we get a reduce repro code?

Antigen also comes with a component called Trimmer which would trim the C# code as much as possible while still making sure that the original issue reproduces. Currently, it has very limited capability and is slow, but there are plans to improve it going forward.

What's up with the name "Antigen"?

"Antigen" name was chosen as a reminder that this tool was developed during Covid era. The name comes from one of the Covid-19 testing methodology "Rapid Antigen test (RAT)". Just as RAT was used to detect covid symptoms, Antigen tool is used to detect any issues in .NET.

Are there any other fuzzers?

There are lot of fuzzers to test compilers. One of them is Fuzzlyn developed by Jakob Botsch Nielsen.

About

Fuzzing tool to validate .NET code generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages