Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive node numbering in C edit scripts #29

Closed
ChrisTimperley opened this issue Aug 27, 2016 · 1 comment
Closed

Unintuitive node numbering in C edit scripts #29

ChrisTimperley opened this issue Aug 27, 2016 · 1 comment

Comments

@ChrisTimperley
Copy link

ChrisTimperley commented Aug 27, 2016

I've noticed that the "jsondiff" and "diff" outputs of the gumtree executable produce edit scripts containing seemingly incorrect and inconsistent node IDs / numbers. Unfortunately, I don't seem to be able to find any documentation regarding the encoding used in the edit scripts? Based on my intuition, and the description given in the paper, the outputs certainly seem to be incorrect.

My assumption is that the node IDs used in the edit scripts are generated using a depth-first search of the tree, starting at 0 (at the Program node). Any "Update" edits I make seem to validate this assumption, but "Move", "Delete" and "Insert" commands all seem to either use a different ID scheme, or they're incorrect.

Example of incorrect(?) behaviour:
Take the program P, given by the source below:

int main() {
  printf("Hello");
  printf("world!");
}

Together with a modified form of the program P', given by the source code:

int main() {
  printf("Hello");
  printf("small");
  printf("universe!");
}

We yield the following diff from GumTree (gumtree diff P P'):

Insert ExprStatement(19) into Compound(29) at 1
Insert Some(18) into ExprStatement(19) at 0
Insert FunCall(16) into Some(18) at 0
Insert GenericString: ;(17) into Some(18) at 1
Insert Ident(12) into FunCall(16) at 0
Insert GenericList(15) into FunCall(16) at 1
Insert GenericString: printf(11) into Ident(12) at 0
Insert Left(14) into GenericList(15) at 0
Insert Constant: "small"(13) into Left(14) at 0
Update Constant: "world!"(13) to "universe!"

Using a little script I wrote, we get the following node numbers for P (where the second number is the position of the node in the corresponding file):

Assigning number 0 to Program at 0
Assigning number 1 to Definition at 0
Assigning number 2 to Definition at 0
Assigning number 3 to ParamList at 8
Assigning number 4 to GenericString at 4
Assigning number 5 to Compound at 11
Assigning number 6 to ExprStatement at 15
Assigning number 7 to Some at 15
Assigning number 8 to FunCall at 15
Assigning number 9 to Ident at 15
Assigning number 10 to GenericString at 15
Assigning number 11 to GenericList at 21
Assigning number 12 to Left at 22
Assigning number 13 to Constant at 22
Assigning number 14 to GenericString at 30
Assigning number 15 to ExprStatement at 34
Assigning number 16 to Some at 34
Assigning number 17 to FunCall at 34
Assigning number 18 to Ident at 34
Assigning number 19 to GenericString at 34
Assigning number 20 to GenericList at 40
Assigning number 21 to Left at 41
Assigning number 22 to Constant at 41
Assigning number 23 to GenericString at 50
Assigning number 24 to FinalDef at 54

and the following numbers for P'

Assigning number 0 to Program at 0
Assigning number 1 to Definition at 0
Assigning number 2 to Definition at 0
Assigning number 3 to ParamList at 8
Assigning number 4 to GenericString at 4
Assigning number 5 to Compound at 11
Assigning number 6 to ExprStatement at 15
Assigning number 7 to Some at 15
Assigning number 8 to FunCall at 15
Assigning number 9 to Ident at 15
Assigning number 10 to GenericString at 15
Assigning number 11 to GenericList at 21
Assigning number 12 to Left at 22
Assigning number 13 to Constant at 22
Assigning number 14 to GenericString at 30
Assigning number 15 to ExprStatement at 34
Assigning number 16 to Some at 34
Assigning number 17 to FunCall at 34
Assigning number 18 to Ident at 34
Assigning number 19 to GenericString at 34
Assigning number 20 to GenericList at 40
Assigning number 21 to Left at 41
Assigning number 22 to Constant at 41
Assigning number 23 to GenericString at 49
Assigning number 24 to ExprStatement at 53
Assigning number 25 to Some at 53
Assigning number 26 to FunCall at 53
Assigning number 27 to Ident at 53
Assigning number 28 to GenericString at 53
Assigning number 29 to GenericList at 59
Assigning number 30 to Left at 60
Assigning number 31 to Constant at 60
Assigning number 32 to GenericString at 72
Assigning number 33 to FinalDef at 76

Notice, there is no Compound at 29. In fact, there is no node at 29 at all. But the Update constants refer to the correct nodes. Also, where are the IDs for the inserted elements coming from? Intuitively, I would expect the diff to provide a edit script that I could theoretically execute, using only the parsed AST of P.

All help would be much appreciated! Deepest apologies if the misunderstanding is on my part.

Many thanks,

Chris

Update: I now realise that GumTree is assigning numbers in post-order, rather than pre-order, as I had assumed. The insertion IDs are still off (they're the IDs after the event, not before; i.e. it's not a script from X to Y). I'm going to write a script to correct the IDs to produce the script I want :-)

@ChrisTimperley ChrisTimperley changed the title Inconsistent / incorrect node numbering in C edit scripts Unintuitive node numbering in C edit scripts Aug 28, 2016
@ChrisTimperley
Copy link
Author

Implemented a little script to produce an edit script with the expected IDs. Might be worth explaining the numbering system in the Wiki a little more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant