Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add operations to manipulate a document #828

Merged
merged 7 commits into from Jan 25, 2018

Conversation

gibarsin
Copy link
Contributor

@gibarsin gibarsin commented Jan 3, 2018

Summary

To be able to make the fixes persist on the file on which the AST was created on, a new handler is needed that is able to manage all the operations (insert, delete and replace) and apply them in order to the original file.

Additions

  • Document interface to represent a file in which fixes can be applied.
  • DocumentOperation represents one operation (insert, delete or replace) that will be managed by the DocumentOperationsApplier as to when or in what order to apply all of them.
  • RegionByLine and RegionByOffset are both representations of regions in a document. The difference between them are, when one is used over the other. The RegionByLine is first used when an operation is created because the regions of the tokens are represented in this manner and there is no way to map it to (offset, length) until the operation has reached the document, which internally holds a map <LineNumber, Offset> to convert this type of region to a RegionByOffset.

Additional notes

  • DocumentFile which is an implementation of Document uses a reader and a writer to apply the fixes onto the file. The file is read only once, which improves efficiency, but it is expected to receive all the operations sorted.

@gibarsin
Copy link
Contributor Author

gibarsin commented Jan 3, 2018

I cannot seem to be able to use lambdas since 1.8 or some API methods since 1.9 because the source compiling version is 1.7, is this done on purpose?

@oowekyala
Copy link
Member

@gibarsin Yes, most of pmd is compiled with Java 7, except the UI module, and the Apex and Scala modules IIRC. You don't seem to be using many lambdas anyway, it should be simple enough to fix?

@gibarsin
Copy link
Contributor Author

gibarsin commented Jan 4, 2018

@oowekyala Everything's Java 7 compliant now, thanks for the answer!

@jsotuyod jsotuyod added this to the 6.1.0 milestone Jan 7, 2018
@jsotuyod jsotuyod self-assigned this Jan 8, 2018
@jsotuyod jsotuyod changed the title Add operations to manipulate a document [core] Add operations to manipulate a document Jan 8, 2018
private int currentPosition = 0;

private final Path temporaryPath = Files.createTempFile("pmd-", null);
private final Writer writer = new FileWriter(temporaryPath.toFile());
Copy link
Member

@jsotuyod jsotuyod Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer Files.newBufferedWriter to avoid finalizers. Refers to AvoidFileStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use this method because it's documented since Java 1.8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was looking at the method without the charset parameter.

private final Writer writer = new FileWriter(temporaryPath.toFile());

public DocumentFile(final File file) throws IOException {
reader = new BufferedReader(new FileReader(file));
Copy link
Member

@jsotuyod jsotuyod Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer Files.newBufferedReader to avoid finalizers. Refers to AvoidFileStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use this method because it's documented since Java 1.8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was looking at the method without the charset parameter.


while (scanner.hasNextLine()) {
lineToOffset.add(currentGlobalOffset);
currentGlobalOffset += scanner.nextLine().length();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this offset computation is actually wrong. scanner.nextLine() will ignore new-line characters (which may be 1 or 2 depending on the usage of LF / CR). Your tests should validate this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recommend that I access the private field lineToOffset in a test to assert the stored values? One approach is to make it protected and create a class which extends this one which adds a getter to the field, but I do not know if it's a pleasant way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start by having your tests not use single-line files... And you should check different new lines are honored. Your current implementation for writing lines also breaks that by forcing the usage of the system-defined new line characters.

reader.close();
writer.close();
Files.copy(temporaryPath, Paths.get(filePath), StandardCopyOption.REPLACE_EXISTING);
temporaryPath.toFile().delete();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this copy + delete rather than simply rename?

return comparison;
}

private boolean areInsertOperations(final RegionByLine r1, final RegionByLine r2) {
Copy link
Member

@jsotuyod jsotuyod Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

areInsertOperations doesn't check if the operation is an insert? That's weird...

Copy link
Contributor Author

@gibarsin gibarsin Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the method is misleading because it checks for two insert operations at the same offset in the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing guarantees such assumption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An insert operation doesn't take a region of length greater than zero in the file, but it has an offset to where insert it. For both operations to be an insertion at the same offset they have to start at the same line and column and their length representation at the file to be zero (i.e. beginLine == endLine and beginColumn == endColumn). This is what the method is checking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an implementation detail of an insert operation that:

  1. this class shouldn't know
  2. this class can't guarantee will not be true for any other type of operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. By the class you mean the DocumentOperationNonOverlappingRegionsComparator or DocumentOperationsApplierForNonOverlappingRegions?
  2. It is true that you can create a replace/delete operation with the same region as an insert operation, although it would not make any sense in a real case scenario. The underlying question in this method would be: if both start at the same offset, will they bother each other? If both operations don't start at the same offset, then I have to check that one starts after the other one finishes. If both operations start at the same offset and have length zero then they can be put one after the other, because they won't overlap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For this scenario, neither.
  2. If that's what your method actually checks, that's how this method should be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a solution I can delegate the comparison of operations to the comparison of their regions by using the comparable interface implemented in a region.


public DocumentOperationsApplierForNonOverlappingRegions(final Document document) {
this.document = Objects.requireNonNull(document);
comparator = new DocumentOperationNonOverlappingRegionsComparator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no reason for this to be an instance field, could be a static final


@After
public void deleteTemporaryFiles() {
temporaryFile.delete();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TemporaryFolder does this on it's own

…ine separators in the same source file && other minor fixes
@gibarsin
Copy link
Contributor Author

gibarsin commented Jan 8, 2018

The way in which the length of the line separator is being obtained now works (tested with one source file with different types of line separators) but DocumentFile knows "too much" about what group of the matchResult to query to have the line separation match by the scanner. I think a better approach would be to extend the Scanner class and add the method getLineLengthWithLineSeparator() which is currently being held in the DocumentFile class.

Copy link
Member

@jsotuyod jsotuyod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother about that right now. You seem to be constantly over-engineering your solutions; and are yet to:

  1. produce a minimal working sample
  2. actually edit an AST during analysis
  3. bind this to an AST edition

There will be plenty of time to refactor once anything proves to be a problem.

private final Path temporaryPath = Files.createTempFile("pmd-", null);
private final Writer writer = new FileWriter(temporaryPath.toFile());
private final Path temporaryPath = Files.createTempFile("pmd-", ".tmp");
private final Writer writer = Files.newBufferedWriter(temporaryPath, StandardCharsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the charset assumption may be wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anywhere I can get the charset for the file i'm editing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PMD itself takes it from the user input (CLI, Gradle, Maven, etc.), we should probably honor that. Just have both encodings be the same, and taken at the constructor, so you can hook it to the current execution configuration at the proper layer.


public DocumentFile(final File file) throws IOException {
reader = new BufferedReader(new FileReader(file));
reader = Files.newBufferedReader(file.toPath(), StandardCharsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the charset assumption may be wrong

this.filePath = file.getAbsolutePath();
mapLinesToOffsets();
}

private void mapLinesToOffsets() throws IOException {
try (Scanner scanner = new Scanner(filePath)) {
try (Scanner scanner = new Scanner(Paths.get(filePath))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not have filePath be a Path directly?

@@ -85,7 +85,7 @@ public int compare(final DocumentOperation o1, final DocumentOperation o2) {
return comparison;
}

private boolean areInsertOperations(final RegionByLine r1, final RegionByLine r2) {
private boolean areInsertOperationsAtTheSameOffset(final RegionByLine r1, final RegionByLine r2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we keep not knowing (from these classes perspective) if these are insert operations. You either check the operation type, or you drop that from the name.

@@ -146,4 +163,8 @@ private void writeUntilEOF() throws IOException {
writer.write(line);
}
}

public List<Integer> getLineToOffset() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public? at most it should be package-private if this is test-only

@gibarsin
Copy link
Contributor Author

gibarsin commented Jan 8, 2018

I have checked all the problems you mentioned and it seems there are none left.

@jsotuyod jsotuyod added an:enhancement An improvement on existing features / rules in:pmd-internals Affects PMD's internals labels Jan 9, 2018
@oowekyala oowekyala added in:autofixes Affects the autofixes framework and removed in:pmd-internals Affects PMD's internals labels Jan 23, 2018
@jsotuyod jsotuyod merged commit 40ad978 into pmd:master Jan 25, 2018
jsotuyod added a commit that referenced this pull request Jan 25, 2018
@jsotuyod
Copy link
Member

jsotuyod commented Jan 25, 2018

Merged! Thanks for the wait.

Bare in mind I moved the document package outside the util, as these are not really utilities but core functionality. Lots of the stuff under util is undergoing deprecation processes anyway (we already killed a lot of code in 6.0.0, we deprecated even more for 7.0.0, including the old designer code in there)

@gibarsin
Copy link
Contributor Author

@jsotuyod Thanks for the heads up! No worries about the waiting time.

@gibarsin gibarsin deleted the documentOperations branch January 25, 2018 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
an:enhancement An improvement on existing features / rules in:autofixes Affects the autofixes framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants