Fix `diffTrimmedLines` not keep whitespaces in the output #219

Mingun · 2018-04-21T16:25:15Z

This fix a bug with use of ignoreWhitespace option. Before this fix you get back array with trimmed versions of the lines in the input, that not the expected behavior

Mingun · 2018-10-01T07:31:21Z

@kpdecker could you please look at this (and other my PRs)?

ghedwards · 2021-01-12T01:25:14Z

I found what seems to be an easier solution , remove the trim from the tokenize and move it to the extractCommon , something like this ?

if ( this.options.ignoreWhitespace ) {
				while (newPos + 1 < newLen && oldPos + 1 < oldLen && this.equals(newString[newPos + 1].trim(), oldString[oldPos + 1].trim())) {
					newPos++;
					oldPos++;
					commonCount++;
				}
			} else {
				while (newPos + 1 < newLen && oldPos + 1 < oldLen && this.equals(newString[newPos + 1], oldString[oldPos + 1])) {
					newPos++;
					oldPos++;
					commonCount++;
				}
      }

danowensdev · 2022-09-30T11:28:34Z

@kpdecker Did you take a look at this?

ExplodingCabbage · 2023-12-15T17:40:14Z

This fix a bug with use of ignoreWhitespace option. Before this fix you get back array with trimmed versions of the lines in the input, that not the expected behavior

Hmmm... isn't it? After reading the docs on ignoreWhitespace and diffTrimmedLines for the first time just now, I was honestly pretty unsure what to expect:

diffTrimmedLines trims leading and trailing whitespace from each line, then runs diffLines, returning a diff as if none of the lines in the texts being differed ever had leading or trailing whitespace
When two lines differ only by whitespace, diffTrimmedLines treats them as equal and returns a diff showing the line from the old string being preserved (as if the line in the new string had always had the old string's leading and trailing whitespace)
When two lines differ only by whitespace, diffTrimmedLines treats them as equal and returns a diff showing the line from the new string being preserved (as if the line in the old string had always had the new string's leading and trailing whitespace)
When two lines differ only by whitespace, diffTrimmedLines returns some kind of special component in the result indicating a line was preserved but with whitespace-only changes

From some quick experimentation, I gather that what's actually happening under the hood is option 1, but you were intuitively expecting option 3. Maybe that's reasonable but it's really nonobvious to me that your expectation will be universal, and this is a breaking change, which makes me against merging it. (Also, mightn't some people expect option 2 - i.e. that the value properties they see in the result will reflect the whitespace the line had in the old text, rather than in the new text?)

The existing API and names IMO add confusion here that we can never escape from without a backwards compatibility break. To me, the name diffTrimmedLines suggests behaviour 1 above, but the name ignoreWhitespace suggests behaviour 2, 3, or 4, yet in fact they're the same thing. (Why do we have two ways to invoke the same behaviour anyway?)

One way to give the people expecting behaviour 3 what they want without breaking backwards compatibility would be to have diff components returned by diffTrimmedLines contain value, oldValue, and newValue properties. Here's a mocked up example of what I'm imagining the output being:

> old = 'There was an old man named Michael Finnegan    \n' +
... 'He had whiskers on his chinnegan    \n' +
... 'The wind came along and blew them in again    \n' +
... 'Poor old Michael Finnegan    \n'
> nyoo = '    There was an old man named Michael Finnegan\n' +
... '    He went fishing with a pin again\n' +
... '    Caught a fish but it flopped back in again\n' +
... '    Poor old Michael Finnegan\n'
> diff.diffTrimmedLines(old, nyoo)
[
  {
    count: 1,
    oldValue: 'There was an old man named Michael Finnegan    \n',
    value: 'There was an old man named Michael Finnegan\n',
    newValue: '    There was an old man named Michael Finnegan\n'
  },
  {
    count: 2,
    added: undefined,
    removed: true,
    oldValue: 'He had whiskers on his chinnegan    \n' +
      'The wind came along and blew them in again    \n',
    value: 'He had whiskers on his chinnegan\n' +
      'The wind came along and blew them in again\n'
  },
  {
    count: 2,
    added: true,
    removed: undefined,
    value: 'He went fishing with a pin again\n' +
      'Caught a fish but it flopped back in again\n',
    newValue: '    He went fishing with a pin again\n' +
      '    Caught a fish but it flopped back in again\n'
  },
  {
    count: 1,
    oldValue: 'Poor old Michael Finnegan    \n',
    value: 'Poor old Michael Finnegan\n',
    newValue: '    Poor old Michael Finnegan\n'
  }
]

Ugh, but... I notice that jsdiff already has an ignoreCase option that you'd expect to behave analogously to ignoreWhitespace, but ignoreCase behaves in the way this PR suggests implementing - i.e. it puts the value from the new text into the value property:

> diff.diffChars("hello world", "Hello, World!", {ignoreCase: true})
[
  { count: 5, value: 'Hello' },
  { count: 1, added: true, removed: undefined, value: ',' },
  { count: 6, value: ' World' },
  { count: 1, added: true, removed: undefined, value: '!' }
]

So my proposed compability-preserving compromise idea has the downside of leaving the behaviours of ignoreCase and ignoreWhitespace inconsistent with each other, which this PR resolves.

A kinda horrible situation. I'll think about this some more and come back to it.

ExplodingCabbage · 2023-12-20T17:46:11Z

I've just realised that the ignoreWhitespace option of diffWords also puts the whitespace from the new text into value, unlike the ignoreWhitespace option of diffLines - so the behaviour isn't even consistent between the two, currently! An argument in favour of this PR...

Mingun mentioned this pull request Jan 10, 2019

Is there a way to preserve whitespace in the result ? #241

Closed

Fix kpdecker#241: diffTrimmedLines now keep whitespaces in the output

1b186fd

Mingun force-pushed the diffTrim-use-payload branch from d163f66 to 1b186fd Compare January 10, 2019 19:45

ExplodingCabbage added the breaking-change label Dec 18, 2023

ExplodingCabbage mentioned this pull request Dec 20, 2023

diffChars case insensitive #128

Closed

ExplodingCabbage added the bugfix label Dec 20, 2023

ExplodingCabbage mentioned this pull request Dec 20, 2023

Newlines are not properly compared when using diffWords #140

Closed

This was referenced Jan 10, 2024

[question] Is it possible to preserve the whitespace when using ignoreWhitespace ? It breaks the formatting #389

Closed

Unexpected diffLines result on a basic example #324

Open

ExplodingCabbage mentioned this pull request Feb 14, 2024

Sort out behaviour of newlineIsToken and ignoreWhitespace #486

Merged

5 tasks

ExplodingCabbage closed this in #486 Feb 15, 2024

ExplodingCabbage mentioned this pull request Feb 16, 2024

Add further assertion to test, as suggested by Mingun #491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `diffTrimmedLines` not keep whitespaces in the output #219

Fix `diffTrimmedLines` not keep whitespaces in the output #219

Mingun commented Apr 21, 2018

Mingun commented Oct 1, 2018

ghedwards commented Jan 12, 2021

danowensdev commented Sep 30, 2022

ExplodingCabbage commented Dec 15, 2023

ExplodingCabbage commented Dec 20, 2023

Fix diffTrimmedLines not keep whitespaces in the output #219

Fix diffTrimmedLines not keep whitespaces in the output #219

Conversation

Mingun commented Apr 21, 2018

Mingun commented Oct 1, 2018

ghedwards commented Jan 12, 2021

danowensdev commented Sep 30, 2022

ExplodingCabbage commented Dec 15, 2023

ExplodingCabbage commented Dec 20, 2023

Fix `diffTrimmedLines` not keep whitespaces in the output #219

Fix `diffTrimmedLines` not keep whitespaces in the output #219