Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIG-1115 Modify generate.jstestfuzz to support context functionality #8

Merged
merged 13 commits into from
Nov 13, 2019

Conversation

vrachev
Copy link

@vrachev vrachev commented Nov 8, 2019

Posting this for visibility - will merge in once jstestfuzz side of TIG-1115 is done. It's not yet ready as there are breaking changes that I need to resolve first.

@vrachev vrachev changed the title [WIP] TIG-1115 tracking fixes TIG-1115 Modify generate.jstestfuzz to support context functionality Nov 11, 2019
@vrachev
Copy link
Author

vrachev commented Nov 11, 2019

This is ready for a look.

@@ -62,15 +62,6 @@ describe("bin/nearleyc", function() {
}
});

it('builds for CoffeeScript', function() {
Copy link
Author

@vrachev vrachev Nov 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coffeescript syntax with objects is weird. I spent sometime trying to figure out how to get it right but couldn't quite get it.

For example this object fails syntax:

obj: {num: '5', str: 'valid so far', func1: () => "valid so far", func2: () => "fails on last comma"}

I got annoyed and just deleted the test.

@vrachev
Copy link
Author

vrachev commented Nov 11, 2019

NVM it's broken again.

@vrachev
Copy link
Author

vrachev commented Nov 11, 2019

Ok I fixed the issue (incorrect regex). Ready for review again.

Here's insert.ne:

@preprocessor jstestfuzz

# Grammar for generating input data for fuzzers to execute against.
#
# This grammar is used by a version of nearley-unparse with weights specified to control how
# likely different rules are chosen. As a result, the following guidelines should be kept in mind:
#
#  * Avoid macros that generate multiple rules. Macros compile down to several internal rules. As
#    a result it can be different to determine how to specify the weights for those rules.
#  * Avoid EBNF syntax. Like macros EBNF expressions generate several internal rules that cannot be
#    determined without looking at the compiled grammar.

# Rules
DocumentObj -> "{_id: " ^INC ", " StringRule NumRule DateRule ArrayRule ObjectRule "}"
Array -> "[" ValueList "]"
Array -> "[]"
Array -> NullType

StringRule -> ^KEY_str ": " StringValue ", " | null
NumRule -> ^KEY_num ": " NumValue ", " | null
DateRule -> ^KEY_date ": " DateValue ", " | null
ArrayRule -> ^KEY_array ": " Array ", " {^
    (generator, context) => {
        let context_KEY_array = context.copy();
        context_KEY_array.keyStack.push('array');
        return [context, context_KEY_array, context_KEY_array, context_KEY_array];
    }
^}
ArrayRule -> null 
ObjectRule -> ^KEY_obj ": " NestedObj ", " {^
    (generator, context) => {
        let context_KEY_obj = context.copy();
        context_KEY_obj.keyStack.push('obj');
        return [context, context_KEY_obj, context_KEY_obj, context_KEY_obj];
    }
^}
ObjectRule -> null 

NestedObj -> DocumentObj
NestedObj -> "{}"
NestedObj -> NullType

ValueList -> Value
ValueList -> ValueList ", " Value

Value -> StringValue
Value -> NumValue
Value -> DateValue
Value -> Array
Value -> DocumentObj

StringValue -> ^STRING
StringValue -> NullType
NumValue -> ^NUMBER
NumValue -> ^REAL
NumValue -> "Infinity"
NumValue -> NullType
DateValue -> ^DATE
DateValue -> NullType

# Null Type Values

NullType -> "null"
# NullType -> "undefined" Until SERVER-37392 is resolved 'undefined' values will not be generated
NullType -> "NaN"

Here's insert.compiled.ts:


// Generated automatically by nearley, version 0.3.0
// http://github.com/Hardmath123/nearley
export type TokenValue = any;

export interface Token {
    produce: (generator: any, context: any) => any;
    name?: string;
    test?: (value: TokenValue) => any;
};

export interface Literal {
    literal: string;
}

export type NearleySymbol = Token | Literal | RegExp | string

export interface NearleyRule {
    name: string;
    symbols: NearleySymbol[];
    postprocess?: (d: any[], loc?: number, reject?: {}) => any;
    updateContext?: (generator: any, context: any) => any;
}

export interface Compiled {
    ParserRules: NearleyRule[];
    ParserStart: string;
}

export function compileGrammar(): Compiled {
    // Bypasses TS6133. Allow declared but unused functions.
    // tslint:disable-next-line
    function id(d: any[]): any { return d[0]; }

    

    // tslint:disable-next-line: variable-name
    const ParserRules: NearleyRule[] = [
    {"name": "DocumentObj$string$1", "symbols": [{"literal":"{"}, {"literal":"_"}, {"literal":"i"}, {"literal":"d"}, {"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "DocumentObj$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "DocumentObj", "symbols": ["DocumentObj$string$1", {produce: (generator, context) => generator.get(context, "INC"), name: "INC"}, "DocumentObj$string$2", "StringRule", "NumRule", "DateRule", "ArrayRule", "ObjectRule", {"literal":"}"}], "updateContext": (generator, context) => context},
    {"name": "Array", "symbols": [{"literal":"["}, "ValueList", {"literal":"]"}], "updateContext": (generator, context) => context},
    {"name": "Array$string$1", "symbols": [{"literal":"["}, {"literal":"]"}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "Array", "symbols": ["Array$string$1"], "updateContext": (generator, context) => context},
    {"name": "Array", "symbols": ["NullType"], "updateContext": (generator, context) => context},
    {"name": "StringRule$string$1", "symbols": [{"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "StringRule$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "StringRule", "symbols": [{produce: (generator, context) => generator.get(context, "KEY_str"), name: "KEY_str"}, "StringRule$string$1", "StringValue", "StringRule$string$2"], "updateContext": (generator, context) => context},
    {"name": "StringRule", "symbols": [], "updateContext": (generator, context) => context},
    {"name": "NumRule$string$1", "symbols": [{"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NumRule$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NumRule", "symbols": [{produce: (generator, context) => generator.get(context, "KEY_num"), name: "KEY_num"}, "NumRule$string$1", "NumValue", "NumRule$string$2"], "updateContext": (generator, context) => context},
    {"name": "NumRule", "symbols": [], "updateContext": (generator, context) => context},
    {"name": "DateRule$string$1", "symbols": [{"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "DateRule$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "DateRule", "symbols": [{produce: (generator, context) => generator.get(context, "KEY_date"), name: "KEY_date"}, "DateRule$string$1", "DateValue", "DateRule$string$2"], "updateContext": (generator, context) => context},
    {"name": "DateRule", "symbols": [], "updateContext": (generator, context) => context},
    {"name": "ArrayRule$string$1", "symbols": [{"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "ArrayRule$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "ArrayRule", "symbols": [{produce: (generator, context) => generator.get(context, "KEY_array"), name: "KEY_array"}, "ArrayRule$string$1", "Array", "ArrayRule$string$2"], "updateContext": 
        (generator, context) => {
            let context_KEY_array = context.copy();
            context_KEY_array.keyStack.push('array');
            return [context, context_KEY_array, context_KEY_array, context_KEY_array];
        }
        },
    {"name": "ArrayRule", "symbols": [], "updateContext": (generator, context) => context},
    {"name": "ObjectRule$string$1", "symbols": [{"literal":":"}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "ObjectRule$string$2", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "ObjectRule", "symbols": [{produce: (generator, context) => generator.get(context, "KEY_obj"), name: "KEY_obj"}, "ObjectRule$string$1", "NestedObj", "ObjectRule$string$2"], "updateContext": 
        (generator, context) => {
            let context_KEY_obj = context.copy();
            context_KEY_obj.keyStack.push('obj');
            return [context, context_KEY_obj, context_KEY_obj, context_KEY_obj];
        }
        },
    {"name": "ObjectRule", "symbols": [], "updateContext": (generator, context) => context},
    {"name": "NestedObj", "symbols": ["DocumentObj"], "updateContext": (generator, context) => context},
    {"name": "NestedObj$string$1", "symbols": [{"literal":"{"}, {"literal":"}"}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NestedObj", "symbols": ["NestedObj$string$1"], "updateContext": (generator, context) => context},
    {"name": "NestedObj", "symbols": ["NullType"], "updateContext": (generator, context) => context},
    {"name": "ValueList", "symbols": ["Value"], "updateContext": (generator, context) => context},
    {"name": "ValueList$string$1", "symbols": [{"literal":","}, {"literal":" "}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "ValueList", "symbols": ["ValueList", "ValueList$string$1", "Value"], "updateContext": (generator, context) => context},
    {"name": "Value", "symbols": ["StringValue"], "updateContext": (generator, context) => context},
    {"name": "Value", "symbols": ["NumValue"], "updateContext": (generator, context) => context},
    {"name": "Value", "symbols": ["DateValue"], "updateContext": (generator, context) => context},
    {"name": "Value", "symbols": ["Array"], "updateContext": (generator, context) => context},
    {"name": "Value", "symbols": ["DocumentObj"], "updateContext": (generator, context) => context},
    {"name": "StringValue", "symbols": [{produce: (generator, context) => generator.get(context, "STRING"), name: "STRING"}], "updateContext": (generator, context) => context},
    {"name": "StringValue", "symbols": ["NullType"], "updateContext": (generator, context) => context},
    {"name": "NumValue", "symbols": [{produce: (generator, context) => generator.get(context, "NUMBER"), name: "NUMBER"}], "updateContext": (generator, context) => context},
    {"name": "NumValue", "symbols": [{produce: (generator, context) => generator.get(context, "REAL"), name: "REAL"}], "updateContext": (generator, context) => context},
    {"name": "NumValue$string$1", "symbols": [{"literal":"I"}, {"literal":"n"}, {"literal":"f"}, {"literal":"i"}, {"literal":"n"}, {"literal":"i"}, {"literal":"t"}, {"literal":"y"}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NumValue", "symbols": ["NumValue$string$1"], "updateContext": (generator, context) => context},
    {"name": "NumValue", "symbols": ["NullType"], "updateContext": (generator, context) => context},
    {"name": "DateValue", "symbols": [{produce: (generator, context) => generator.get(context, "DATE"), name: "DATE"}], "updateContext": (generator, context) => context},
    {"name": "DateValue", "symbols": ["NullType"], "updateContext": (generator, context) => context},
    {"name": "NullType$string$1", "symbols": [{"literal":"n"}, {"literal":"u"}, {"literal":"l"}, {"literal":"l"}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NullType", "symbols": ["NullType$string$1"], "updateContext": (generator, context) => context},
    {"name": "NullType$string$2", "symbols": [{"literal":"N"}, {"literal":"a"}, {"literal":"N"}], "postprocess": d => d.join(''), "updateContext": (generator, context) => context},
    {"name": "NullType", "symbols": ["NullType$string$2"], "updateContext": (generator, context) => context}
];

    // tslint:disable-next-line: variable-name
    const ParserStart = "DocumentObj";

    return {ParserRules, ParserStart};
}

Couple of things to note:

  1. Because Nearley bootstraps itself, it's difficult to make non-invasive changes to how .ne files get compiled. I added functionality to compile tokens into something like: {produce: (generator, context) => generator.get(context, "NUMBER"), name: "NUMBER"} so that engineers don't have to write that out for every token. But because tokens are used internally to bootstrap nearley, I couldn't just change how % values get compiled. To address this, I added ^ for tokens, and {^ ... ^} for javascript blocks that get compiled into updateContext.
  2. For a rule such as:
ArrayRule -> ^KEY_array ": " Array ", " {^
    (generator, context) => {
        let context_KEY_array = context.copy();
        context_KEY_array.keyStack.push('array');
        return [context, context_KEY_array, context_KEY_array, context_KEY_array];
    }
^}
```, the javascript blob will get assigned to `updateContext`. That function will then be called in `nearly-unparse`. This will be the new mechanism for passing around context. It might be easier to understand in future TIG-1115 PRs in jstestfuzz where the function will get called.
3) I've mostly given up on trying to introduce non-invasive changes to nearley. The maximum I'm trying to do is not break existing tests, but otherwise it's not worth trying to make it fit in. 

@vrachev vrachev requested a review from guoyr November 11, 2019 22:01
Copy link

@guoyr guoyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excellent stuff. I'm going to add a few tests so I can play around with it a bit.

@vrachev
Copy link
Author

vrachev commented Nov 13, 2019

I added a test for lexing 'jsForUpdateContext'

@vrachev vrachev merged commit ccb659a into mongodb-forks:master Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants