Add difference highlighting to ExUnit #4430

Merged
merged 10 commits into from Apr 5, 2016

Projects

None yet
@lexmag
Member
lexmag commented Mar 25, 2016
@lexmag
Member
lexmag commented Mar 27, 2016

This how it looks with big inputs:
screen shot 2016-03-27 at 21 14 23

Test code:

defmodule ExUnitDiffTest do
  use ExUnit.Case

  @map1 Enum.into(1..50, %{}, &{&1, &1})
  @map2 Enum.reduce(5..10, @map1, &Map.delete(&2, &1))

  test "the truth" do
    assert @map1 == @map2
  end
end
@whatyouhide whatyouhide and 2 others commented on an outdated diff Mar 27, 2016
lib/elixir/lib/string.ex
@@ -1812,4 +1812,91 @@ defmodule String do
{comm + 1, trans, current}
end
end
+
+ @spec difference(t, t) :: [{:eq | :ins | :del, t}] | nil
+ def difference(str1, str2) do
@whatyouhide
whatyouhide Mar 27, 2016 Member

Any reason why we're not calling this diff instead of difference? diff may identify what this does in an easier way (as it's a known concept).

@lexmag
lexmag Mar 27, 2016 Member

I've chosen difference as it's very common for Elixir to have fully spelled names (I wish head and tail would exist), for example, it goes nicely along equivalent? function.

diff may identify what this does in an easier way (as it's a known concept)

As you may see linked paper use diff in text only once for UNIX command.
Windows users, in turn, could wonder why we not called this function some other name.

@whatyouhide
whatyouhide Mar 27, 2016 Member

I didn't read the paper yet :), but yeah it seems fair enough. I just feel that the "diff" is a concept as a whole, even if it's just an abbreviation of difference.

@josevalim
josevalim Mar 31, 2016 Member

Reminder to rename this to meyers_difference and include docs. :)

@ericmj
Member
ericmj commented Mar 27, 2016

I may be looking at it wrong but to me the diff coloring doesn't make sense in the picture you posted.

@josevalim
Member

@lexmag this is great! ❤️ It is definitely a step in the proper direction.

I am wondering though if we should break this per data type. For example, comparing a map with a list with likely emit the wrong diff. And as @ericmj pointed out, the diff for maps in this particular case looks a bit off too.

What if generally do this:

  1. We first compare data types: "Left side is a PID, right side is a Tuple". No diffing necessary.
  2. If the data types match and are not structs, we do per data-type comparison. For example, for maps, we could say: "keys X are missing from left", "keys Y are missing from right", "different values for keys Z" (similar for lists, etc)
  3. If the types are not native (i.e. structs) then we rely on your current difference because I don't think we could do better in such cases?

The only issue I see with this proposal is how we are going to handle composition. For example, if we have a list of maps and the map on the second element of the list is different due to a missing key, how to show this information? How deep should we traverse?

/cc @myronmarston who has a lot of experience on this I believe :)

@lexmag
Member
lexmag commented Mar 28, 2016

I've intentionally uploaded this screenshot to highlight that difference doesn't always look as one may expect.
Please note, we work with inspected values. For example:
lhs part: => 43, 10 => 10, 9 => 9, 19 => 19,
rhs part: => 43, 19 => 19,
Since we work with strings but not actual pairs, we getting => 43, 1 part as a match, that's showed on screenshot.
In most situation it works well:
screen shot 2016-03-28 at 13 10 15
Nevertheless, I agree it could be even better if we find difference individually on data type level.

@whatyouhide
Member

I feel like it's worth mentioning that I think String.difference/2 is still a very useful and solid function, maybe a little on the border of what belongs to the stdlib and what doesn't but still great to have.

That said, the examples that @lexmag showed highlight that the string diff approach works very well for some data types (integers, printable strings, printable char lists, atoms) where diffing on the result of inspect/2 yields great results. For "container" data types (maps, lists, <<>> binaries) I think a diff like the one proposed by @josevalim could be visually better. Still, the problems of how deep the diff should go remains.

Btw, can't structs be handled just by diffing each field with the algorithm above?

@josevalim
Member

For structs, we can diff the fields as long as it does not have its own inspect implementation. Also, we should probably call the function meyer_difference or something similar to mirror jaro_distance.

@myronmarston
Contributor

/cc @myronmarston who has a lot of experience on this I believe :)

If you're referring to the diffing we do in RSpec...I've never been particularly happy with it, TBH, and what @lexmag has done here already looks better :). I don't have much to add.

@lexmag
Member
lexmag commented Mar 29, 2016

I propose to go with inspect + String.myers_difference for every data type, with real world input it works flawlessly.
Speaking about the issue on first screenshot, I must to mention it's very artificial example: a lot of spaces and ones1. In my real world tests this never has happened.
It's worth to point that even this mishighlighting gives a good clue what element is additional.

Also if we add extra character " around 1 in those inputs it works as expected:
screen shot 2016-03-30 at 00 26 40

Would like to know what are your thoughts on it? :bowtie:

@whatyouhide
Member

Whatever we may end up doing in the future, I think this is still an awesomely awesome improvement over what we have today, and a wonderful first iteration :), so I'm definitely 👍 on this. We can work on trying to see where @josevalim's algorithm goes after this is in!

@josevalim
Member

I agree this is a great improvement but I am not convinced it is the best way to show this information for maps. Most of the information in the diff above is not useful, except by a handful of keys. If we could focus on those, it would be even better. For example, even if we format it like this:

%{..., "10" => "10" (in red), "3" => "3" (in green), ...}

It would give us perfect precision on what does not match.

@josevalim
Member

@lexmag another way to put my latest comment is: what if for maps and structs and we do a pre-pass allowing us to only include the fields that have changed and putting everything else under ...?

@josevalim josevalim commented on the diff Mar 29, 2016
lib/ex_unit/lib/ex_unit/formatter.ex
end
defp make_into_lines(reasons, padding) do
padding <> Enum.join(reasons, "\n" <> padding) <> "\n"
end
+ defp format_diff(struct, formatter) do
@josevalim
josevalim Mar 29, 2016 Member

What if we make this public as format_diff(left, right, formatter) so we can unit test it? :)

@lexmag
Member
lexmag commented Mar 29, 2016

Alright, now equal parts are compacted.
EDIT: improved compaction:
screen shot 2016-03-30 at 02 10 21

@josevalim
Member

@lexmag Sorry, maybe I was not clear, I would like the compactation to happen at the data type level and not at the inspected type level. Doing this would also solve part of the quote dance issue. :) Maybe you can move your PR to inside the Elixir repository? This way I can also push commits to it. Maybe this would help move the discussion forward?

@lexmag
Member
lexmag commented Mar 30, 2016

@josevalim sorry, I understood you right from the beginning. I just wanted, for a start, to see how it'll will work if we keep inspection (sill believe in this approach :bowtie:), "data type level" compaction will be the next tryout. Also I'll push these changes inside the Elixir repository, so you can play with it. :)
Though I'd like to keep current PR open not to break discussion apart.

@josevalim
Member

@lexmag ah perfect. I don't think compaction in general is good because in case of integers and strings we may lose important context. I am pretty sure we can compact on maps. I am still unsure if we can do so for lists/tuples. :D

@lexmag
Member
lexmag commented Mar 30, 2016

I've pushed a commit that adds separate diffing for maps.
There is a comparison with the previous one:

screen shot 2016-03-31 at 01 35 09

screen shot 2016-03-31 at 01 34 41

Still some work is left: structs, keywords style, "..." delimiter.

@josevalim josevalim commented on an outdated diff Mar 31, 2016
lib/ex_unit/lib/ex_unit/formatter.ex
+ {:ok, ^val1} ->
+ acc
+ {:ok, val2} ->
+ {surplus, [{key, {val1, val2}} | altered]}
+ :error ->
+ {[{key, val1} | surplus], altered}
+ end
+ end)
+ missing = Enum.reject(right, fn {key, _} -> Map.has_key?(left, key) end)
+ result = Enum.reduce(missing, [], fn({key, val}, acc) ->
+ [formatter.(:diff_insert, inspect(key) <> " => " <> inspect(val)) | acc]
+ end)
+ result = Enum.reduce(surplus, result, fn({key, val}, acc) ->
+ [formatter.(:diff_delete, inspect(key) <> " => " <> inspect(val)) | acc]
+ end)
+ result = Enum.reduce(altered, result, fn({key, {val1, val2}}, acc) ->
@josevalim
josevalim Mar 31, 2016 Member

Maybe we should use the regular difference here. I.e. call format_diff/3 recursively.

@josevalim
josevalim Mar 31, 2016 Member

Calling it recursively means nested maps would work too. :)

@josevalim
Member

This is really sweat! ❤️ 💚 💙 💛 💜

@paulcsmith
Contributor

I love the latest iteration. So easy to parse and quickly see what's going on. Awesome work!!

@josevalim josevalim commented on an outdated diff Apr 3, 2016
lib/ex_unit/lib/ex_unit/formatter.ex
+ end
+
+ @doc """
+ Formats the difference between `left` and `right`.
+
+ Returns `nil` if they are not the same data type.
+ """
+ def format_diff(left, right, formatter_fun)
+
+ def format_diff(<<left::bytes>>, <<right::bytes>>, formatter) do
+ String.myers_difference(left, right)
+ |> Enum.map_join(&format_diff_fragment(&1, formatter))
+ |> String.replace("\n", "\n" <> @inspect_padding)
+ end
+
+ def format_diff(%name{} = left, %name{} = right, formatter) do
@josevalim
josevalim Apr 3, 2016 Member

This is awesome! ❤️

@josevalim josevalim commented on an outdated diff Apr 3, 2016
lib/ex_unit/lib/ex_unit/formatter.ex
+ end
+
+ def format_diff(%name{} = left, %name{} = right, formatter) do
+ left = Map.from_struct(left)
+ right = Map.from_struct(right)
+ format_map_diff(left, right, inspect(name), formatter)
+ end
+
+ def format_diff(%_{}, %_{}, _formatter), do: nil
+
+ def format_diff(%{} = left, %{} = right, formatter) do
+ format_map_diff(left, right, "", formatter)
+ end
+
+ def format_diff(left, right, formatter)
+ when is_number(left) and is_number(right)
@josevalim
josevalim Apr 3, 2016 Member

Please indent the when with four spaces from def, six spaces total (that's how it is in most of Elixir's source).

@josevalim josevalim commented on an outdated diff Apr 3, 2016
lib/ex_unit/lib/ex_unit/formatter.ex
+
+ def format_diff(%{} = left, %{} = right, formatter) do
+ format_map_diff(left, right, "", formatter)
+ end
+
+ def format_diff(left, right, formatter)
+ when is_number(left) and is_number(right)
+ when is_tuple(left) and is_tuple(right)
+ when is_list(left) and is_list(right) do
+ format_diff(inspect(left), inspect(right), formatter)
+ end
+
+ def format_diff(_left, _right, _formatter), do: nil
+
+ defp format_map_diff(left, right, name, formatter) do
+ {surplus, altered} =
@josevalim
josevalim Apr 3, 2016 Member

Put the calculation of surplus, altered and missing into a separate function called map_diff. :)

@lexmag lexmag changed the title from [WIP] Add difference highlighting to ExUnit to Add difference highlighting to ExUnit Apr 3, 2016
@lexmag
Member
lexmag commented Apr 4, 2016

This is ready for review.

There is an extensive showcase:
screen shot 2016-04-04 at 16 06 22
screen shot 2016-04-04 at 16 06 43

@josevalim
Member

That's awesome! Btw, we have an examples directory inside ex_unit. Let's add all of those formatting cases into a separate file in there called difference.exs so we can check those easily!

@whatyouhide whatyouhide and 1 other commented on an outdated diff Apr 4, 2016
lib/elixir/lib/string.ex
@@ -1812,4 +1812,104 @@ defmodule String do
{comm + 1, trans, current}
end
end
+
+ @doc """
+ Returns a keyword list that represents an edit graph.
+
+ The algorithm is outlined in the "An O(ND) Difference Algorithm and Its Variations" paper.
@whatyouhide
whatyouhide Apr 4, 2016 Member

Maybe we can add a link to this paper? Maybe https://www.cis.upenn.edu/~bcpierce/courses/dd/papers/diff.ps.

@lexmag
lexmag Apr 4, 2016 Member

I thought about that, but doesn't seem there is an official/reliable link exists.

@whatyouhide
whatyouhide Apr 4, 2016 Member

Then we can at least mention the author?

"An [...]" by E. Meyers.

@josevalim
Member

@lexmag this looks good to go. Only two small things left:

  1. Please don't forget to add your current example set to the examples directory
  2. Let's fix the indentation of when

And :shipit:

@whatyouhide whatyouhide and 1 other commented on an outdated diff Apr 4, 2016
lib/elixir/lib/string.ex
@@ -1755,8 +1755,8 @@ defmodule String do
end
end
- @compile {:inline, decompose: 1}
- defp decompose(string) do
+ @compile {:inline, dissect: 1}
+ defp dissect(string) do
@whatyouhide
whatyouhide Apr 4, 2016 Member

@lexmag wdyt about calling this something more straightforward, like chars_and_their_count? It's a private function and it's used in very few places, so the long name shouldn't be a problem. Maybe I'm missing something, but neither dissect nor decompose make it easy for me to see what this function does.

@lexmag
lexmag Apr 4, 2016 Member

Renamed to chars_and_length as its second part is an equivalent of String.length/1.

@whatyouhide whatyouhide commented on the diff Apr 4, 2016
lib/elixir/lib/string.ex
+ compact_reverse(rest, [{kind, char <> chars} | acc])
+ end
+
+ defp compact_reverse([elem | rest], acc) do
+ compact_reverse(rest, [elem | acc])
+ end
+
+ defp each_diagonal(diag, limit, _paths, next_paths) when diag > limit do
+ {:next, Enum.reverse(next_paths)}
+ end
+
+ defp each_diagonal(diag, limit, paths, next_paths) do
+ {path, rest} = proceed_path(diag, limit, paths)
+ with {:cont, path} <- follow_snake(path) do
+ each_diagonal(diag + 2, limit, rest, [path | next_paths])
+ end
@whatyouhide
whatyouhide Apr 4, 2016 Member

This is one instance where based on my taste I would avoid using with. We're just matching on a single <- clause, so we would only need one case, and this with makes the return value of each_diagonal harder to grasp. Personally, this is clearer to read for me:

case follow_snake(path) do
  {:done, edits} ->
    {:done, edits}
  {:cont, path} ->
    each_diagonal(diag + 2, limit, rest, [path | next_paths])
end

(or the other way around, with the {:cont, path} patter first, either is fine)

@lexmag, wdyt?

@lexmag
lexmag Apr 4, 2016 Member

@whatyouhide I'll stick with with as it quite the same if we put {:done, edits} (result -> result) as second pattern.

@josevalim
Member

:shipit:

@lexmag lexmag merged commit 1057356 into elixir-lang:master Apr 5, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lexmag lexmag deleted the lexmag:ex_unit-difference-highlighting branch Apr 5, 2016
@lexmag
Member
lexmag commented Apr 5, 2016

🎉

@josevalim
Member

😻

@michalmuskala
Contributor

This is perfect. Thank you @lexmag. I'd say that only that feature would warrant releasing next version. It's currently a huge pain to analyze errors from ExUnit with bigger data.

@josevalim
Member

Yes. We are consistently adding excellent features to master, hopefully we can have a release around june/july.

@chrisarcand

Wow. Nice work @lexmag! This is stellar!

@venkatd
Contributor
venkatd commented Apr 6, 2016

@lexmag love what you've done here :)

I think icdiff does a great job at this if you're looking for some inspiration. I like how they opted to show the output in two columns for easy comparison.

image

My personal pain was in comparing multi-line statements so I had hacked together a naive compare helper method to print out the differences. It looks like this:
image

Point is, I think showing a visual comparison of LHS and RHS can be situation specific. It'd be great to have some hooks to customize the formatting.

Thoughts?

@lexmag
Member
lexmag commented Apr 6, 2016

@venkatd I think with the highlighting from this PR your example will get much better output. :bowtie:

I agree comparison of LHS and RHS can be situation specific, we'd like to know what that cases are to see if we can improve/adjust default formatting, or if it requires custom formatter.
Currently ExUnit supports :formatters option where you can specify your own formatter (it defaults to ExUnit.CLIFormatter).

@mindreframer

Amazing!!! 😎 😃

@sunaku
Contributor
sunaku commented Apr 30, 2016

Great work @lexmag! 👍 I had the same dream for Elixir back in PR #3420 but got sidetracked yakshaving a parallel/functional diff algorithm (which I can now complete at my leisure, without guilt of holding Elixir back, and release as a separate hex package) that's as quick as git-diff(1). 🎅 Thanks for making it real. 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment