Skip to content

Conversation

@paustint
Copy link
Collaborator

@paustint paustint commented Sep 23, 2019

2.0.0

Summary

Version 2.0 brings some significant bundle size and performance improvements. This library now uses Chevrotain instead of antlr4. With this change, everything related to parsing had to be re-written from scratch. Chevrotain uses pure javascript to handle lexing, parsing, and visiting the generated ast/cst as opposed to using a grammar file and generating a javascript parser based on the grammar.

With this change, the data model was reviewed and analyzed, and there are some significant breaking changes to the data structures. Review the 🔥breaking changes🔥 below for a detailed description of each breaking change.

Bundle Size

soql-parser-js bundles all of the library code and three dependencies chevrotain (which relies on regexp-to-ast) and lodash.get (required by chevrotain) into the javascript bundle. Previously, antlr4 was not bundled and was required to be installed separately.

To compare the bundle size, the following small program was written and then compiled using the default configuration of webpack, and the output bundle was compared.

  • Version 1.x: 545kb (this includes all required dependencies)
  • Version 2.0: 197kb (this includes all required dependencies)
var soqlParser = require('soql-parser-js');

const query = soqlParser.parseQuery(`SELECT Id FROM Account WHERE Id = 'FOO'`);
console.log('query', query);
const soql = soqlParser.composeQuery(query);
console.log('soql', soql);

Benchmarks

Here is an example benchmark of parsing all the unit tests 1,000 times

OLD PARSER: ~6.2 seconds for ~60K parses
NEW PARSER: ~2.25 seconds for 60K parses

Breaking Changes 🔥

General Changes

  • The CLI was removed.
  • The parseQuery() function no longer accepts options as a second parameter.
  • rawValue will always have a space between parameters GROUPING(Id, BillingCountry)
  • Some literalType values may have differing case from prior versions, regardless of the data input.
    • TRUE, FALSE, and all functions except those listed below will always be returned in uppercase, regardless of case of input.
    • Exceptions:
      • toLabel, convertTimezone, convertCurrency will always be in camelCase.
    • Added new available types for DateLiteral and DateNLiteral.
  • A new LiteralType value was added for APEX_BIND_VARIABLE.

Compose Query

  • getComposedField() is deprecated, you should now use getField(). getComposedField() will remain available for backward compatibility.
  • getField()/getComposedField() has the following changes:
    1. fn property is has been deprecated (but still exists), you should now use functionName instead.
    2. The from property has been removed for subqueries. The relationshipName is required to be populated to compose a subquery.
  • On the FormatOptions interface fieldMaxLineLen was renamed to fieldMaxLineLength.
export interface FormatOptions {
  numIndent?: number;
- fieldMaxLineLen?: number;
+ fieldMaxLineLength?: number;
  fieldSubqueryParensOnOwnLine?: boolean;
  whereClauseOperatorsIndented?: boolean;
  logging?: boolean;
}

Parse Query

  • rawValue will now be included on Field if objectPrefix is defined.
  • alias may be included on Field, if defined.
  • On FieldFunctionExpression, fn was renamed to functionName. this was done because all other usages of fn were FunctionExp, but it was a string in this case.
  • The parameters type on FieldFunctionExpression was modified to allow an array of varying types.
  • Removed from property from FieldSubquery.
  • having was removed from QueryBase and now lives as a property on GroupByClause.
  • On the Condition object, literalType may be an array. This will be an array if value is an array and there are variable types within the value. For example: WHERE Foo IN ('a', null, 'b') would produce literalType: ['STRING', 'NULL', 'STRING'].
  • The GroupByClause has the following modifications:
    • field is now optional, and will be populated only if the grouping is on a single field.
    • type has been renamed to fn and will be populated when CUBE and ROLLUP are used.
    • The having clause has been moved as a top-level property to the GroupByClause and will be populated only if a having clause is present.
  • The HavingCondition now has a literalType that will be populated with the type of the value property.
  • FunctionExp has the following modifications
    • text was renamed to rawValue to be more consistent with other places in the data model.
    • name was renamed to functionName.
    • parameter was renamed to parameters and the type was changed to (string | FunctionExp)[] to support nested functions. This will ALWAYS be an array now even if there is only one parameter.
    • fn was removed, as nested functionParameters are always stored as an entry in the parameters array.
export interface Field {
  type: 'Field';
  field: string;
  objectPrefix?: string;
+ rawValue?: string;
+ alias?: string;
}

export interface FieldFunctionExpression {
  type: 'FieldFunctionExpression';
- fn: string;
+ functionName: string;
- parameters?: string[] | FieldFunctionExpression[];
+ parameters?: (string | FieldFunctionExpression)[];
  alias?: string;
  isAggregateFn?: boolean;
  rawValue?: string;
}

export interface FieldRelationship {
  type: 'FieldRelationship';
  field: string;
  relationships: string[];
  objectPrefix?: string;
  rawValue?: string;
+ alias?: string;
}

export interface FieldSubquery {
  type: 'FieldSubquery';
  subquery: Subquery;
- from?: string;
}

export interface QueryBase {
  fields: FieldType[];
  sObjectAlias?: string;
  where?: WhereClause;
  limit?: number;
  offset?: number;
  groupBy?: GroupByClause;
- having?: HavingClause;
  orderBy?: OrderByClause | OrderByClause[];
  withDataCategory?: WithDataCategoryClause;
  withSecurityEnforced?: boolean;
  for?: ForClause;
  update?: UpdateClause;
}

export interface Condition {
  openParen?: number;
  closeParen?: number;
  logicalPrefix?: LogicalPrefix;
  field?: string;
  fn?: FunctionExp;
  operator: Operator;
  value?: string | string[];
  valueQuery?: Query;
- literalType?: LiteralType;
+ literalType?: LiteralType | LiteralType[];
  dateLiteralVariable?: number;parsed
}

export interface GroupByClause {
- field: string | string[];
+ field?: string | string[];
- type?: GroupByType;
+ fn?: FunctionExp;
+ having?: HavingClause;
}

export interface HavingCondition {
  openParen?: number;
  closeParen?: number;
  field?: string;
  fn?: FunctionExp;
  operator: string;
  value: string | number;
+ literalType?: String;
}

export interface FunctionExp {
- text?: string;
+ rawValue?: string;
- name?: string;
+ functionName?: string;
  alias?: string;
- parameter?: string | string[];
+ parameters?: (string | FunctionExp)[];
  isAggregateFn?: boolean;
- fn?: FunctionExp;
}

resolves #72
resolves #34

@paustint paustint self-assigned this Sep 23, 2019
@paustint paustint changed the title V2 Version 2.0 Sep 23, 2019
Refactored to use Chevrotain instead of ANTLR
Added lex, parser, visitor
Adjusted code structure
Migrated to Webpack instead of Rollup
Documented breaking changes in the changelog

Fixes #34
fixed typeof when ELSE was not included in query
 Updated README to include new documentation
Updated doc dependencies
added release-it back in
Upgraded and fixed tyeps for doc website
Added support for APEX_BIND_VARIABLE literalType
Fixed minor bugs with parsing types with NULL as the rhs
Fixed bug with an array of variable types in where clause
worked around #74
Added useRawValueForFn as an option to compose to allow better control over composing functions
field aliasing is now only available for FieldFunctionExpressions instead of all fields
Added unit tests for some broken use-cases and fixed issues. (nested functions in groupBy did not parse or compose properly) resolves #75
Fixed the offset parsing to ensure that it is not improperly added as an alias

resolves #74
Added support to turn on or off apex bind variables - resolves #76
Added support for USING SCOPE
Added support for currency prefixed numbers in where clause expressions
Added two additional LiteralTypes to support currency prefixed numbers
Fixed bug with date N literals - not all items were working
cleaned up code comments
Added numerous additional test-cases based on SFDC documentation and minor bugfixes to support adjustments
Added check and forced a parsing error if parens are not matched

Added notes to do a large refactor in the future for better parsing of parens

resolves #80
the having clause now shares the same condition code as where clause
updated changelog
updated compose to support new having clause structure
updated parser to allow aggregate clause in value for having clauses
updated unit tests
@paustint paustint merged commit fa9d3cd into master Oct 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate moving away from ANTLR4 to an alternative TYPEOF does not work in IN clause SOQL queries

2 participants