You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following recent paper proposes a
similar technique for representing AST.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019.
code2vec: learning distributed representations of code. Proc. ACM
Program. Lang. 3, POPL, Article 40 (January 2019), 29 pages. DOI: https://doi.org/10.1145/3290353
Please add discussions comparing ASTToken2Vec and this study for
differences, potential issues, and so on.
The text was updated successfully, but these errors were encountered:
The domain of embedding is a code snippet, a few lines of code. (Ours: one lexical token of code.)
The applications are semantic labeling of code snippets. For example, a NN receives a nameless function definition of a few lines of code and answers "reverse array" as the function of the code. (Ours: input a partial input program, output a next few tokens.)
It finds that a human-given identifier often can be decomposed into vector components of fragments of meanings. (equalsIgnoreCase = equals + toLowerCase) This fact justifies our use of word2vec like embedding for token prediction.
It uses an attention model called "path-based" one, which can relate one AST node with the other one in a position like "go up tree by the name field, down the if statement, down the return statement."
The text was updated successfully, but these errors were encountered: