Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review #2

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Code review #2

wants to merge 11 commits into from

Conversation

jgonggrijp
Copy link

The purpose of this PR is not to actually merge the changes, but to illustrate my comments with code. It is a work in progress. For most convenient consumption, I suggest reading the commits one by one.

This comment also applies to the generated parser itself, but I'm
illustrating with the JSON example runner because all the variations
are visible within a short section of code: (1) const/let, (2) var
and (3) the absence of any such keyword.

Mixing (1) and (2) is frowned upon; it makes the scope of variables
unpredictable. (1) is newer, hence often considered better. In this
case, however, I would suggest that you consistently go with (2), at
least in the generated parser module. There are two reasons for my
recommendation:

 1. var has roughly the same scope semantics as local variables in
    Python. Since you are transpiling from Python, this makes it
    easier to generate reliable code.
 2. var is older, hence portable to more different JS environments.

(3), while allowed, should be avoided at all cost, as it always
creates global variables (which is not the same as var at module
scope). The fact that this form is allowed at all should be
considered a historical artifact.
You can often get away without the semicolons, but there are corner
cases where it can cause the JS interpreter/compiler to do the wrong
thing, especially after inadvertent subsequent edits. It is generally
considered better style to finish every statement with a semicolon
AND a line end. This also speeds up JS parsing in some cases.

This comment only applies to the example runner; it appears that the
generated parser consistently includes the semicolons. Nevertheless,
including them in your example code makes a better impression and
also reduces the probability that users will accidentally break the
code through editing.
JS users are used to dense code, with only single blank lines between
blocks or occasionally inside long blocks. You can stretch this a bit
by using double blank lines before classes and standalone functions
like in PEP8, but you will have to be very consistent so that it is
clear what convention you're maintaining. JS users will get irritated
seeing a blank line before a closing semicolon or a blank line
separating a single line of code from the rest of a function body.
Using multiline strings as documentation is not a thing in JS; you
are expected to always use comments, which should precede the thing
they document. The style I've demonstrated in the commit could be
broadly referred to as a literate commenting style, which is for
example practiced in Underscore and related libraries. It can be read
as-is together with the code, or it can be used to present the
comments and the code side by side, as illustrated below. Needless to
say, this necessitates writing a separate manual by hand (second
link).

https://underscorejs.org/docs/underscore-esm.html
https://underscorejs.org

A possible alternative style, which is very popular, is the jsdoc
format:

/**
 * Short description
 * Possibly longer explanation with examples
 * @param name description
 * @returns description
 */

This is more geared to automatically generating reference
documentation that is separate from the code. In order for such
separate documentation to be useful, the comments need to be quite
extensive, which can render the code itself less readable. It should
also be noted that the generated reference documentation is generally
still not sufficient to completely replace a purpose-written manual.

https://github.com/lodash/lodash/blob/c84fe82760fb2d3e03a63379b297a1cc1a2fce12/lodash.js#L459
Two-space indents are often considered a convention in JavaScript.
Even in the JS community, however, there is sizable subpopulation that
appreciates larger indents for better readability. Ultimately, it is
important not to do what is most popular but what you believe is best.
If you subscribe to PEP8's four-space indent recommendation, there is
no reason not to apply it in JS as well.

(The reason two-space indents became fashion in JS is also outdated.
When JS programmers were still writing module wrappers by hand and
there was no async/await syntax yet, indentation tended to go very
deep very quickly. This produced an incentive to sacrifice readability
in exchange for needing to wrap lines less often. There is no longer
a need to make this sacrifice.)
JS users expect the word "polyfill" to mean something very specific,
i.e., code that emulates a builtin feature of *JavaScript* if it is
not available by an outdated target environment. Using the word
differently might cause confusion.
When possible, avoid abbreviated names because they cause cognitive
strain. Highly idiomatic single-letter variables such as i/j/k in a
loop can be acceptable.

An additional remark on this snippet: it appears that the re_
parameter is just an alias of the Python re module emulation a few
lines up. Avoid this pattern if possible; using re directly will make
the code much easier to understand.
JavaScript's modern iterator support is of course convenient for
transpiling Python generators. Unfortunately, they do have a
performance cost. for-of loops also work on plain arrays, so when it
is possible to replace a generator function by a normal function that
returns a plain array, it is likely that this will make your parser
faster.

for-of loops on arrays, in turn, should be replaced by old-fashioned
for-;; loops for raw performance. The for-of construct involves
converting the array to iterator first through the Array[@@iterator]
method and then calling iterator#next on each loop iteration, inducing
considerable function call overhead.

Of course, these considerations only apply to hot loops.

An added bonus of reducing iterators and for-of constructs is that
compatibility transpilers like Babel will produce less boilerplate
code.
Introducing a library dependency like Underscore allows you to bring
back much of the notational elegance that Python brings through
itertools and functools, plus some JavaScript-specific extras.

Pros:

 - Shorter, more elegant, more maintainable code.
 - Removes the need for many of the generic functions that you have
   written in lines 87-333.
 - Completely portable; removes need for many constructs that cause
   compatibility transpilers to generate boilerplate code or introduce
   polyfills.

Cons:

 - Performance penalty is similar to for-of due to function call
   overhead. Nothing beats the raw performance of bare for-;; loops.
 - Having a dependency is perceived as a burden in the JS community.
   Underscore is however sufficiently small that its added weight
   can be completely offset by the reduced code size, especially
   after compatibility transpilation.
 - No support for iterators (although this can be easily patched).

This commit serves to illustrate how Underscore could be used. Some of
the code changes might be better left out. In general, I recommend to
use Underscore where possible and for-;; where necessary.

For completeness, I will mention that I maintain Underscore and that
there are alternatives such as Lodash and Ramda.
Please ignore this commit.
Could use some polishing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant