Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to get raw text from DiffRow #44

Closed
rosta opened this issue Jul 27, 2019 · 4 comments
Closed

Need to get raw text from DiffRow #44

rosta opened this issue Jul 27, 2019 · 4 comments
Assignees

Comments

@rosta
Copy link
Contributor

rosta commented Jul 27, 2019

Code snippet

	DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.mergeOriginalRevised(false)
			.inlineDiffByWord(true)
			.oldTag(f -> "~~")
			.newTag(f -> "**")
			.ignoreWhiteSpaces(true)
			.build();
	List<DiffRow> rows = generator.generateDiffRows(content1, content2);
	int line = 1;
	for (DiffRow row : rows) {
		if (isIncluded(row)) {
			// Write out the markdown ...
		}
		line++;
	}

The function isIncluded() is implemented as

	private boolean isIncluded(DiffRow row) {
		if ( row.getTag() == Tag.EQUAL) {
			return false;
		}
		return excludePatterns.stream()
				.noneMatch(p -> p.matcher(row.getOldLine()).find()
						|| p.matcher(row.getNewLine()).find());
	}

where excludePatterns is a list of compiled regular expressions provided by the user.

Challenge

The pattern is matched on the formatted lines, so the user has to provide regular expressions such as
\* &lt;dt&gt;Generated&lt;/dt&gt;&lt;dd&gt;[0-9 :~*-]*&lt;/dd&gt;
rather than
\* <dt>Generated</dt><dd>[0-9 :-]*</dt>
which is more readable and less error prone.

In matter of fact the first regular expression doesn't work in any case because the markdown tag falls in the middle of the </dd> tag. For example:

&lt;dd&gt;2019-07-**26** **05:00&lt;**/dd&gt;

NB I could have used the reportLinesUnchanged(boolean) builder config, but then I would lose the formatted lines which are used to output markdown code. (See the comment below as well)

Request

Provide methods in the DiffRow
public String getRawOldLine()
and
public String getRawNewLine().

(By the way many thanks for this library, it's been very useful)

@rosta
Copy link
Contributor Author

rosta commented Jul 29, 2019

It appears that the configuration reportLinesUnchanged has no effect when showInlineDiffs is true. That is '<' and '>' characters are converted to HTML entities regardless.

@wumpz
Copy link
Collaborator

wumpz commented Jul 29, 2019

That is right. However, this option should be optional. It will double the needed memory of each line.

I will look into it but do not have much time at the moment.

@rosta
Copy link
Contributor Author

rosta commented Jul 29, 2019

I agree completely that it should be optional.

Pull request with a possible implementation has been opened for review.

@wumpz wumpz self-assigned this Jul 29, 2019
@github-actions
Copy link

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants