POPL'19論文との違い (査読者B) #13

masuhar · 2019-11-16T05:28:33Z

The following recent paper proposes a
similar technique for representing AST.
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019.
code2vec: learning distributed representations of code. Proc. ACM
Program. Lang. 3, POPL, Article 40 (January 2019), 29 pages. DOI:
https://doi.org/10.1145/3290353
Please add discussions comparing ASTToken2Vec and this study for
differences, potential issues, and so on.

masuhar · 2020-02-02T01:50:44Z

3290353.pdf

masuhar · 2020-02-02T02:04:44Z

An embedding method for program text.
The domain of embedding is a code snippet, a few lines of code. (Ours: one lexical token of code.)
The applications are semantic labeling of code snippets. For example, a NN receives a nameless function definition of a few lines of code and answers "reverse array" as the function of the code. (Ours: input a partial input program, output a next few tokens.)
It finds that a human-given identifier often can be decomposed into vector components of fragments of meanings. (equalsIgnoreCase = equals + toLowerCase) This fact justifies our use of word2vec like embedding for token prediction.
It uses an attention model called "path-based" one, which can relate one AST node with the other one in a position like "go up tree by the name field, down the if statement, down the return statement."

masuhar mentioned this issue Nov 16, 2019

照会 (第 1 回) #5

Open

masuhar closed this as completed in 177363d Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POPL'19論文との違い (査読者B) #13

POPL'19論文との違い (査読者B) #13

masuhar commented Nov 16, 2019

masuhar commented Feb 2, 2020

masuhar commented Feb 2, 2020 •

edited

POPL'19論文との違い (査読者B) #13

POPL'19論文との違い (査読者B) #13

Comments

masuhar commented Nov 16, 2019

masuhar commented Feb 2, 2020

masuhar commented Feb 2, 2020 • edited

masuhar commented Feb 2, 2020 •

edited