Need to get raw text from DiffRow #44

rosta · 2019-07-27T15:08:18Z

Code snippet

	DiffRowGenerator generator = DiffRowGenerator.create()
			.showInlineDiffs(true)
			.mergeOriginalRevised(false)
			.inlineDiffByWord(true)
			.oldTag(f -> "~~")
			.newTag(f -> "**")
			.ignoreWhiteSpaces(true)
			.build();
	List<DiffRow> rows = generator.generateDiffRows(content1, content2);
	int line = 1;
	for (DiffRow row : rows) {
		if (isIncluded(row)) {
			// Write out the markdown ...
		}
		line++;
	}

The function isIncluded() is implemented as

	private boolean isIncluded(DiffRow row) {
		if ( row.getTag() == Tag.EQUAL) {
			return false;
		}
		return excludePatterns.stream()
				.noneMatch(p -> p.matcher(row.getOldLine()).find()
						|| p.matcher(row.getNewLine()).find());
	}

where excludePatterns is a list of compiled regular expressions provided by the user.

Challenge

The pattern is matched on the formatted lines, so the user has to provide regular expressions such as
\* <dt>Generated</dt><dd>[0-9 :~*-]*</dd>
rather than
\* <dt>Generated</dt><dd>[0-9 :-]*</dt>
which is more readable and less error prone.

In matter of fact the first regular expression doesn't work in any case because the markdown tag falls in the middle of the </dd> tag. For example:

<dd>2019-07-**26** **05:00<**/dd>

NB I could have used the reportLinesUnchanged(boolean) builder config, but then I would lose the formatted lines which are used to output markdown code. (See the comment below as well)

Request

Provide methods in the DiffRow
public String getRawOldLine()
and
public String getRawNewLine().

(By the way many thanks for this library, it's been very useful)

The text was updated successfully, but these errors were encountered:

rosta · 2019-07-29T05:18:13Z

It appears that the configuration reportLinesUnchanged has no effect when showInlineDiffs is true. That is '<' and '>' characters are converted to HTML entities regardless.

wumpz · 2019-07-29T06:45:07Z

That is right. However, this option should be optional. It will double the needed memory of each line.

I will look into it but do not have much time at the moment.

rosta · 2019-07-29T07:07:28Z

I agree completely that it should be optional.

Pull request with a possible implementation has been opened for review.

github-actions · 2020-12-14T02:02:33Z

Stale issue message

wumpz self-assigned this Jul 29, 2019

github-actions bot added the no-issue-activity label Dec 14, 2020

github-actions bot closed this as completed Dec 21, 2020

wumpz removed the no-issue-activity label Dec 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to get raw text from DiffRow #44

Need to get raw text from DiffRow #44

rosta commented Jul 27, 2019 •

edited

rosta commented Jul 29, 2019

wumpz commented Jul 29, 2019

rosta commented Jul 29, 2019 •

edited

github-actions bot commented Dec 14, 2020

Need to get raw text from DiffRow #44

Need to get raw text from DiffRow #44

Comments

rosta commented Jul 27, 2019 • edited

Code snippet

Challenge

Request

rosta commented Jul 29, 2019

wumpz commented Jul 29, 2019

rosta commented Jul 29, 2019 • edited

github-actions bot commented Dec 14, 2020

rosta commented Jul 27, 2019 •

edited

rosta commented Jul 29, 2019 •

edited