You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that the "jsondiff" and "diff" outputs of the gumtree executable produce edit scripts containing seemingly incorrect and inconsistent node IDs / numbers. Unfortunately, I don't seem to be able to find any documentation regarding the encoding used in the edit scripts? Based on my intuition, and the description given in the paper, the outputs certainly seem to be incorrect.
My assumption is that the node IDs used in the edit scripts are generated using a depth-first search of the tree, starting at 0 (at the Program node). Any "Update" edits I make seem to validate this assumption, but "Move", "Delete" and "Insert" commands all seem to either use a different ID scheme, or they're incorrect.
Example of incorrect(?) behaviour:
Take the program P, given by the source below:
int main() {
printf("Hello");
printf("world!");
}
Together with a modified form of the program P', given by the source code:
int main() {
printf("Hello");
printf("small");
printf("universe!");
}
We yield the following diff from GumTree (gumtree diff P P'):
Insert ExprStatement(19) into Compound(29) at 1
Insert Some(18) into ExprStatement(19) at 0
Insert FunCall(16) into Some(18) at 0
Insert GenericString: ;(17) into Some(18) at 1
Insert Ident(12) into FunCall(16) at 0
Insert GenericList(15) into FunCall(16) at 1
Insert GenericString: printf(11) into Ident(12) at 0
Insert Left(14) into GenericList(15) at 0
Insert Constant: "small"(13) into Left(14) at 0
Update Constant: "world!"(13) to "universe!"
Using a little script I wrote, we get the following node numbers for P (where the second number is the position of the node in the corresponding file):
Assigning number 0 to Program at 0
Assigning number 1 to Definition at 0
Assigning number 2 to Definition at 0
Assigning number 3 to ParamList at 8
Assigning number 4 to GenericString at 4
Assigning number 5 to Compound at 11
Assigning number 6 to ExprStatement at 15
Assigning number 7 to Some at 15
Assigning number 8 to FunCall at 15
Assigning number 9 to Ident at 15
Assigning number 10 to GenericString at 15
Assigning number 11 to GenericList at 21
Assigning number 12 to Left at 22
Assigning number 13 to Constant at 22
Assigning number 14 to GenericString at 30
Assigning number 15 to ExprStatement at 34
Assigning number 16 to Some at 34
Assigning number 17 to FunCall at 34
Assigning number 18 to Ident at 34
Assigning number 19 to GenericString at 34
Assigning number 20 to GenericList at 40
Assigning number 21 to Left at 41
Assigning number 22 to Constant at 41
Assigning number 23 to GenericString at 50
Assigning number 24 to FinalDef at 54
and the following numbers for P'
Assigning number 0 to Program at 0
Assigning number 1 to Definition at 0
Assigning number 2 to Definition at 0
Assigning number 3 to ParamList at 8
Assigning number 4 to GenericString at 4
Assigning number 5 to Compound at 11
Assigning number 6 to ExprStatement at 15
Assigning number 7 to Some at 15
Assigning number 8 to FunCall at 15
Assigning number 9 to Ident at 15
Assigning number 10 to GenericString at 15
Assigning number 11 to GenericList at 21
Assigning number 12 to Left at 22
Assigning number 13 to Constant at 22
Assigning number 14 to GenericString at 30
Assigning number 15 to ExprStatement at 34
Assigning number 16 to Some at 34
Assigning number 17 to FunCall at 34
Assigning number 18 to Ident at 34
Assigning number 19 to GenericString at 34
Assigning number 20 to GenericList at 40
Assigning number 21 to Left at 41
Assigning number 22 to Constant at 41
Assigning number 23 to GenericString at 49
Assigning number 24 to ExprStatement at 53
Assigning number 25 to Some at 53
Assigning number 26 to FunCall at 53
Assigning number 27 to Ident at 53
Assigning number 28 to GenericString at 53
Assigning number 29 to GenericList at 59
Assigning number 30 to Left at 60
Assigning number 31 to Constant at 60
Assigning number 32 to GenericString at 72
Assigning number 33 to FinalDef at 76
Notice, there is no Compound at 29. In fact, there is no node at 29 at all. But the Update constants refer to the correct nodes. Also, where are the IDs for the inserted elements coming from? Intuitively, I would expect the diff to provide a edit script that I could theoretically execute, using only the parsed AST of P.
All help would be much appreciated! Deepest apologies if the misunderstanding is on my part.
Many thanks,
Chris
Update: I now realise that GumTree is assigning numbers in post-order, rather than pre-order, as I had assumed. The insertion IDs are still off (they're the IDs after the event, not before; i.e. it's not a script from X to Y). I'm going to write a script to correct the IDs to produce the script I want :-)
The text was updated successfully, but these errors were encountered:
ChrisTimperley
changed the title
Inconsistent / incorrect node numbering in C edit scripts
Unintuitive node numbering in C edit scripts
Aug 28, 2016
I've noticed that the "jsondiff" and "diff" outputs of the gumtree executable produce edit scripts containing seemingly incorrect and inconsistent node IDs / numbers. Unfortunately, I don't seem to be able to find any documentation regarding the encoding used in the edit scripts? Based on my intuition, and the description given in the paper, the outputs certainly seem to be incorrect.
My assumption is that the node IDs used in the edit scripts are generated using a depth-first search of the tree, starting at 0 (at the Program node). Any "Update" edits I make seem to validate this assumption, but "Move", "Delete" and "Insert" commands all seem to either use a different ID scheme, or they're incorrect.
Example of incorrect(?) behaviour:
Take the program P, given by the source below:
Together with a modified form of the program P', given by the source code:
We yield the following diff from GumTree (
gumtree diff P P'
):Using a little script I wrote, we get the following node numbers for P (where the second number is the position of the node in the corresponding file):
and the following numbers for P'
Notice, there is no Compound at 29. In fact, there is no node at 29 at all. But the Update constants refer to the correct nodes. Also, where are the IDs for the inserted elements coming from? Intuitively, I would expect the diff to provide a edit script that I could theoretically execute, using only the parsed AST of P.
All help would be much appreciated! Deepest apologies if the misunderstanding is on my part.
Many thanks,
Chris
Update: I now realise that GumTree is assigning numbers in post-order, rather than pre-order, as I had assumed. The insertion IDs are still off (they're the IDs after the event, not before; i.e. it's not a script from X to Y). I'm going to write a script to correct the IDs to produce the script I want :-)
The text was updated successfully, but these errors were encountered: