Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indentation and Comments inconsistent with the original code #42

Closed
Symbolk opened this issue Sep 14, 2018 · 44 comments · Fixed by #89
Closed

Indentation and Comments inconsistent with the original code #42

Symbolk opened this issue Sep 14, 2018 · 44 comments · Fixed by #89
Labels
Projects

Comments

@Symbolk
Copy link

Symbolk commented Sep 14, 2018

Since the code is parsed and prettyprinted with JavaCC, though indented with JavaParser, the indentation of the original code and some comments (it's strange that some are orphan comments but some are not) are lost. When diff with git merged results using textual tools (like git diff or VSCode diff), there will be too many differences. And in some cases, comments are as important as codes.

I am trying to preserve the indentation and comments in original files, but in face of some strange problems. I want to know as authors or contributors, what's your way of thinking about implementing it? @guilhermejccavalcanti @pauloborba @gaabs

@pauloborba
Copy link
Collaborator

@leoribeiro36 this might be the explanation for some of the differences that lead to s3m FNs. I would not expect many FNs due to that, but maybe a few.

@pauloborba
Copy link
Collaborator

@Symbolk I guess the basic idea would be to keep comments and indentation info in the AST (as in Eclipse JDT), but I haven't really thought in detail about that. Could you please detail a bit more the problems you are having? Have you also seen this alternative approach? https://www.modelica.org/events/modelica2008/Proceedings/sessions/session6a3.pdf

@pauloborba
Copy link
Collaborator

@jvcoutinho could you please add tests to check how the tool should handle comments? please check @Symbolk comments above, and ask for the concrete merge scenarios in which he observed lost comments.

@jvcoutinho
Copy link
Collaborator

@Symbolk as requested above by @phmb, can you please post here the files that lead to outputs losing comments? You can compact them.

@jvcoutinho
Copy link
Collaborator

s3m
git

Comments problem example

@pauloborba
Copy link
Collaborator

@jvcoutinho could not understand that. both developers commented different declarations? what's the merge result?

@jvcoutinho
Copy link
Collaborator

jvcoutinho commented Oct 9, 2018

@pauloborba Actually, by their description, one developer included a field while another developer commented an existing one.

The problem is that the comment disappeared.

@jvcoutinho
Copy link
Collaborator

The only way I could reproduce this exact scenario is when the field in base is also commented (and it seems to be the case in the example):

comment3
comment3output

But that's reasonable, since only right changed the file.

@jvcoutinho
Copy link
Collaborator

jvcoutinho commented Oct 9, 2018

Although, I'd tested more and found some inconsistencies (?), indeed.

comment
commentoutput

comment2
comment2output

@pauloborba
Copy link
Collaborator

yes @jvcoutinho, we need to debug that. please check when base has field A, left removes A, right adds a comment just before A. the result should contain only the comment. does the tool returns an empty class?

@jvcoutinho
Copy link
Collaborator

yes, @pauloborba
comment4
comment4output

@pauloborba
Copy link
Collaborator

OK @jvcoutinho, so we have enough information now. I believe s3m stores comments in the closest declaration node, not in special CommentNodes, right @guilhermejccavalcanti?

@guilhermejccavalcanti
Copy link
Owner

Right, comments are stored into a separate field prefix of each node:
image

Also, they are not merged. If the three versions have comments, the left comment prevails due to precedence order of the merge calls and how the merged node is created. First, merge(left,base), then merge(leftBase, right), each call creates the merged node as a clone of the first parameter, which is always left, and update its body/subtree with the merge result. So, besides updating node body/subtree, would be necessary to update the prefix field, perhaps calling textual merge on it.

@Symbolk
Copy link
Author

Symbolk commented Oct 10, 2018

Sorry for the delay to reply.

Originally, I want to develope a merge tool based on jFSTMerge, which produces better or more reasonable merge results than git textual merge.

However, since the code is parsed, merged and then prettyprinted with AST, the original indentation and orphan comments can be missing in the merged results, which are undesirable in practical projects.

For example, in some big teams or projects (say AOSP), developers are required to wrap their newly added or edited code within two block comments, for subsequent review, blame or debug:

import android.widget.Toolbar;

/* Peter/62645 20180637  begin*/
//[HSM]
import android.view.Menu;
/* Peter/62645 20180637  end*/
import com.android.internal.annotations.GuardedBy;

(P.S. the comment format: /* Developer/ID Date begin(end) */)

Under this case, the jFSTMerge results drops the end comment while the git merge results does not (of course). Here are the 3-way contents:

Ours:

import android.widget.Toolbar;

/* Peter/62645 20180637  begin*/
//[HSM]
import android.view.Menu;
/* Peter/62645 20180637  end*/
import com.android.internal.annotations.GuardedBy;

Base:

import android.widget.Toolbar;

import com.android.internal.annotations.GuardedBy;

Theirs:

import android.widget.Toolbar;

import com.android.internal.annotations.GuardedBy;

jFSTMerge results:

import android.widget.Toolbar;

/* <DTS2014042818262 xiongshiyi/00165767 20140428 begin */
//[HSM]
import android.hsm.HwSystemManager;

import com.android.internal.annotations.GuardedBy;

@Symbolk
Copy link
Author

Symbolk commented Oct 10, 2018

yes, @pauloborba
comment4
comment4output

What is the 3-way diff tool you are using? Seems VSCode?

@jvcoutinho
Copy link
Collaborator

@Symbolk Yes, it is VSCode.
You can split the editor 2 times and activate Zen Mode (Ctrl + K + Z by default) to get this view.

@Symbolk
Copy link
Author

Symbolk commented Oct 11, 2018

@Symbolk Yes, it is VSCode.
You can split the editor 2 times and activate Zen Mode (Ctrl + K + Z by default) to get this view.

Ok, thanks. Previously I thought that was an extension to provide the view like BeyondCompare 3-way merge, I am thinking about developing one like this.

@pauloborba
Copy link
Collaborator

that would be great @Symbolk. you might want to have a look at https://semanticmerge.com.

@pauloborba
Copy link
Collaborator

@jvcoutinho it seems we have similar problems with annotations too

@guilhermejccavalcanti
Copy link
Owner

guilhermejccavalcanti commented Oct 11, 2018 via email

@pauloborba pauloborba added this to To do in s3m Oct 31, 2018
@pauloborba
Copy link
Collaborator

@jvcoutinho ask if ISI developers adopt an indentation tool/standard

@jvcoutinho
Copy link
Collaborator

@pauloborba this might be the next to go

@pauloborba
Copy link
Collaborator

yes @jvcoutinho, please go ahead

@pauloborba
Copy link
Collaborator

that would be great @Symbolk. you might want to have a look at https://semanticmerge.com.

have you made progress here @Symbolk?

@jvcoutinho
Copy link
Collaborator

jvcoutinho commented Jun 28, 2019

@pauloborba @guilhermejccavalcanti I've been looking the way the creator of Spoofax, Eelco Visser, had treated parsing and pretty-printing. In his paper "Declarative Specification of Indentation Rules", he says that the grammar should have special declarations to specify the indentation. These declarations are ignored by the parser (for layout-sensitive languages like Haskell or Python, these declarations play an important role in parsing too) but relevant for the pretty-printer.

@guilhermejccavalcanti
Copy link
Owner

guilhermejccavalcanti commented Jun 29, 2019 via email

@pauloborba
Copy link
Collaborator

@jvcoutinho you could first try to solve the comments issue.

this should be simpler, and satisfactory for companies that use a code formatting standard/tool. s3m could simply merge and then call the formatting tool.

then you try to solve the indentation/white space issue.

@jvcoutinho
Copy link
Collaborator

jvcoutinho commented Jul 9, 2019

So, the comments issue.
Currently, S3M stores comments in the FSTTerminal field prefix. It contains all the tokens since the end of the last terminal read. This implies comments depend on terminals to be considered. Also, textual merge runs over this field, leading possibly to conflicts.

There are two main problems with the current approach:

  • False positives;
  • Orphan comments.

In the scenario below, left removed method a() and right added a comment above method a() and another below a().

Merge

Comments represent no semantic "threat". I believe comments merge shouldn't lead to conflicts (S3M used to consider left version, while SemanticMerge always consider right version). Anyway, resolving false positives is the easy part.

Orphan comments, on the other hand, are harder to deal with. I propose two solutions:

  • Use unstructured merge to search for comments that s3m had forgotten. This would be basically a textual approach and it's more susceptible to errors.
  • Store comments as tree nodes (just like JDT parser does). In this approach, they have no identifiers, so we would have to build a handler for them.

What do you think, @pauloborba @guilhermejccavalcanti?

@jvcoutinho
Copy link
Collaborator

  1. Check if it's worth to update the grammar to include an attribute comments in a FSTNonTerminal node.

  2. Check if we should keep a comment of a removed node.

  3. S3M doesn't ignore comments in the middle of the signature:
    comment

@pauloborba
Copy link
Collaborator

pauloborba commented Jul 10, 2019 via email

@jvcoutinho
Copy link
Collaborator

S3M reports 2 conflicts (the prefix field and the declaration field) in different lines.

comment

@jvcoutinho
Copy link
Collaborator

Indentação
preservar original;
se nós forem adicionados ou removidos? (não há problema)
se left/right adicionarem exatamente o mesmo método, qual escolher?
espaçamento entre métodos, contar linhas ou padronizar?

Comentários
fazer com que a chave que fecha a classe ("}") virasse um FSTTerminal para capturar os comentários órfãos como prefixo

@guilhermejccavalcanti
Copy link
Owner

@jvcoutinho me deparei com um possível bug no cenário em anexo, dá uma checada por favor.
Basicamente, em left e base havia Comentário seguido de Anotação. E right inverteu isso, ficou Anotação seguida de Comentário. Left não mudou nada no comentário nem na anotação, e deu conflito.

comment.zip

@jvcoutinho
Copy link
Collaborator

Bem, de fato é um bug. Vou colocar os trechos de código aqui para explicar.

LEFT/BASE

@LocalBean
@Stateless
/**
 * Classe de servido para gerenciar a entidade Apelido.
 * 
 * @author rbonifacio
 */
public class PalavraChaveServico extends ServicoGenerico<PalavraChave> {

RIGHT

/**
 * Classe de servido para gerenciar a entidade Apelido.
 * 
 * @author rbonifacio
 */
@LocalBean
@Stateless
public class PalavraChaveServico extends AbstractServicoSisdot<PalavraChave> {

MERGE

<<<<<<< MINE
=======
/**
 * Classe de servido para gerenciar a entidade Apelido.
 * 
 * @author rbonifacio
 */
>>>>>>> YOURS
@LocalBean
@Stateless
public class  PalavraChaveServico extends AbstractServicoSisdot<PalavraChave> {

Em left, tanto as anotações, o comentário e o token public foram colocados dentro do corpo de um mesmo nó (a princípio, esse nó conteria apenas os modificadores do método, o que sugere bug também na gramática). Em right, as anotações e o public foram colocados dentro do corpo de um mesmo nó. Porém, o comentário entrou como prefixo desse nó.

Assim, o conflito se deu quando o merge textual foi chamado nos prefixos. O estranho é que o prefixo de left é idêntico ao prefixo da base, então não sei ao certo o motivo do conflito. O não-estruturado não reportou.

Aí vai alguns prints:
Nó de left/base:
2019-08-02 (3)

Nó de right:
2019-08-02 (4)

@pauloborba
Copy link
Collaborator

@jvcoutinho please check if this is related to the case where left removes declaration but not its associated comment. add a test case for that.

also check how modifiers are represented in the tree.

@jvcoutinho
Copy link
Collaborator

jvcoutinho commented Aug 6, 2019

@pauloborba

also check how modifiers are represented in the tree.

Exactly one Modifiers node is created for each class declaration, and it contains all of the modifiers in its content (even if the class has no modifiers, resulting in an empty string)

please check if this is related to the case where left removes declaration but not its associated comment.

the result is only the comment (and only one, no duplicates), but wrapped in a conflict:
image

although, just as the bug above, one of the contributions and the base are exactly the same. I believe prefix merge has a internal bug.

@pauloborba
Copy link
Collaborator

@jvcoutinho yes, it should be handled as a child of a non-terminal node, right?

@guilhermejccavalcanti
Copy link
Owner

guilhermejccavalcanti commented Aug 6, 2019

I believe that the comments are not in the prefix because they are beetwen the annotations and the modifier. So, I guess this is the condition implemented on featurehouse "the comments are prefix only if there is nothing before them". Not sure if this is a grammar issue since this is based on original Java 8 grammar. We should avoid changes on grammar unless it is wrong, otherwise the tool will have behavior hard to explain.

Now, the question about "why is a conflict reported if there is only a single change?". This happens because the base version of the comment is empty, so, as there is no base, textual merge applies a two-way merge, and reports a conflict for every difference.

image

So, a slightly improvement would be to compare the strings representing the three versions, and when only one of them is different, the difference is merge result. This will fix this issue.
This is actually what we do for the entire files (https://github.com/guilhermejccavalcanti/jFSTMerge/blob/master/src/main/java/br/ufpe/cin/app/JFSTMerge.java#L210)

@pauloborba
Copy link
Collaborator

@jvcoutinho would we have a similar problem when merging a method body that is empty in the base?

@jvcoutinho
Copy link
Collaborator

@pauloborba No because its signature is included in the content to be merged. Base could never be empty in these cases.

@jvcoutinho
Copy link
Collaborator

Now, the question about "why is a conflict reported if there is only a single change?". This happens because the base version of the comment is empty, so, as there is no base, textual merge applies a two-way merge, and reports a conflict for every difference.

@guilhermejccavalcanti @pauloborba Fixed in the current PR (#89).

@pauloborba
Copy link
Collaborator

@jvcoutinho please add tests

@pauloborba
Copy link
Collaborator

@guilhermejccavalcanti could you please review that?

@pauloborba
Copy link
Collaborator

@pauloborba No because its signature is included in the content to be merged. Base could never be empty in these cases.

just to make sure that a similar problem doesn't happen when handling bodies in isolation.

@pauloborba pauloborba moved this from To do to Done in s3m Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
s3m
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants