Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highlight differences between assertion LHS & RHS #3420

Closed
wants to merge 2 commits into from

Conversation

sunaku
Copy link
Contributor

@sunaku sunaku commented Jun 22, 2015

This patch enhances ExUnit.Formatter.format_kind_reason/4 by adding a
new diff: indicator that visually highlights the differences between
the left- and right-hand sides of a failed assertion using git-diff(1).

It also registers a new :diff configuration option, which lets users
specify additional command-line options for the git-diff(1) program.

To see this patch in action, try running the following assertions:

assert "hello world" == "world wide web"
assert %{a: 1, b: 2} == %{b: 2}
assert [1, 2, 3] == [4, 2, 6]
assert :hello == :world
assert 123 == 426

Although these examples are trivial, this functionality is indispensable
when comparing complex data structures found in real-world test suites.

📷 Here are some screenshots of the functionality this patch provides:

selection_057
selection_056
selection_055

See also https://sunaku.github.io/minitest-colordiff.html#bin-gdiff

This patch enhances ExUnit.Formatter.format_kind_reason/4 by adding a
new `diff:` indicator that visually highlights the differences between
the left- and right-hand sides of a failed assertion using git-diff(1).

It also registers a new `:diff` configuration option, which lets users
specify _additional_ command-line options for the git-diff(1) program.

To see this patch in action, try running the following assertions:

    assert "hello world" == "world wide web"
    assert %{a: 1, b: 2} == %{b: 2}
    assert [1, 2, 3] == [4, 2, 6]
    assert :hello == :world
    assert 123 == 426

Although these examples are trivial, this functionality is indispensable
when comparing complex data structures found in real-world test suites.
@eksperimental
Copy link
Contributor

this is beautiful.

I would like to see the same examples highlighted in the LHS and RHS lines and omitting the diff line (ala github), I have the feeling that they could be simpler to understand provided we are dealing with results longer than just a few chars

@josevalim
Copy link
Member

this is beautiful.

Agreed.

I am not a fan of invoking git automatically though and for every assertion. Maybe we can leave git diff running as a port and then write to it command by command and get the output as needed?

@sunaku
Copy link
Contributor Author

sunaku commented Jun 22, 2015

@eksperimental Injecting highlighted regions parsed from git-diff(1) output back into the original LHS and RHS strings sounds tricky. 😱 But otherwise, the diff-highlight script provided with Git (at /usr/share/doc/git/contrib/diff-highlight/diff-highlight on my system) seems to fulfill your preferred diffing style. In contrast, this patch implements the alternative, character-wise style of diffing. 😅

In my experience, both styles are useful in their own way and, as you're rightfully observed, sometimes one is more useful than the other. But in terms of this patch, I believe we should continue to provide the original LHS and RHS values unmodified (in case the user wants to copy the values out into IEX for quick manipulation) and not inject potentially syntax-breaking fragments (I'm not speaking about the ANSI color escapes; those are typically ignored by the terminal text selection/copy mechanism) into their midst. 🎯

Thus I believe having an additional diff: indicator with character-wise diffs (as this patch currently does) is a reasonable trade-off to injecting difference highlights back into the LHS and RHS values. :neckbeard:

@sunaku
Copy link
Contributor Author

sunaku commented Jun 22, 2015

@josevalim The git diff command is only run when an assertion fails and both LHS and RHS are present. To my knowledge, it isn't possible to have the program stay alive to receive more diffing requests after it handles the initial one: it simply dies afterward and must be launched again for the next request.

@josevalim
Copy link
Member

Even though, it still imposes a requirement on git and we certainly should not have that. We could make it configurable but, if it is configurable, nobody will use it.

Maybe the best idea is to port Meyers algorithm for diff-ing. More info:

@lexmag
Copy link
Member

lexmag commented Jun 22, 2015

As for me, it looks nice and might really be helpful, but git-diff just makes it feel like a first step toward the https://pbs.twimg.com/media/Bmm42jWCMAAEVPS.jpg.

@ericmj
Copy link
Member

ericmj commented Jun 22, 2015

This does indeed look awesome but I agree with the other commentators; we need to implement the diffing algorithms in Elixir.

@sunaku
Copy link
Contributor Author

sunaku commented Jun 22, 2015

I tried to remove the dependency on git diff in commit sunaku@b4d0788 using the tdiff library @josevalim suggested, but the results are too coarse-grained (word-level differences are lost 😱):
selection_058

Oh well, hopefully I'll get a chance to hack on the tdiff library this weekend or whatever. 😰

@sunaku
Copy link
Contributor Author

sunaku commented Jun 23, 2015

Good news! 🎉 It turns out that, in my haste, I made a mistake in using the tdiff library. 😅

I have now updated this pull request with an additional commit that swaps in tdiff correctly.

And the best part 🏆 is that this new git-less aproach works better than my original patch! 💪

selection_059
selection_060
selection_061

Thanks @josevalim for suggesting the tdiff library and everyone for your guidance! 😺

Now, the next obstacle I'm facing is that make test_ex_unit in the root folder is failing due to the newly added tdiff dependency inside the lib/ex_unit/mix.exs file:

** (UndefinedFunctionError) undefined function: :tdiff.diff/2 (module :tdiff is not available)

Any suggestions on how to fix this (before I hack away indiscriminately at the Makefile)? 😕

@sunaku
Copy link
Contributor Author

sunaku commented Jun 23, 2015

Hurray! 🎉 The last obstacle has been cleared. 😤 Please review and merge! :shipit:

@bitwalker
Copy link
Contributor

Just as a passive observer is it ok that ex_unit would be taking on a dependency on a third-party library, a non-Elixir one at that, and especially one that hasn't been updated in 4 years? Not that libs have to be continuously updated if they do what they intend to do well, and the fact that it's Erlang vs Elixir is mostly irrelevant, but just taking on a third-party dep seems like a first (unless I've missed something). I guess my impression was that @sunaku would need to implement the diffing algorithm in Elixir, in ex_unit itself.

@josevalim
Copy link
Member

@sunaku I didn't mean to use the tdiff library itself but I meant it as an example of how it is possible to write our own. As @bitwalker says, the only way we can get this feature into core is by having no dependencies. Even if it is an Erlang project, it is still code we would have to maintain and there are other complications like licenses (which tdiff doesn't have, so we must always assume we can't use it).

@josevalim
Copy link
Member

@sunaku in any case, your current patch shows that using a diff implementation in pure erlang/elixir is a viable option, so good job on validating this assumption before we move forward! :D

@sunaku
Copy link
Contributor Author

sunaku commented Jun 23, 2015

I see now, thanks for clarifying. 😅 I'll implement a small ExUnit.Diff module next. 👷

@sunaku
Copy link
Contributor Author

sunaku commented Jun 25, 2015

💀 After struggling to implement the LCS algorithm, 🚧 building a sequence of edit operations from it, 😱 discovering its abysmal performance, 📝 failing to memoize it after several attempts (it's complicated to pass a HashDict around that can be modified by any level of the recursion in FP 😭), 📚 (re-)reading research papers on LCS / diffing algorithms again and again, 😌 I finally understood the vdelta algorithm well enough to build my own (:innocent: or so I think) variant that appears to work in O(n+m) time and space! 😤

🏆 Here's the code (I still need to document it properly and complete the test suite):

https://github.com/sunaku/elixir/blob/ex_unit_diff_dev/lib/ex_unit/lib/ex_unit/diff.ex

🔥 Here's an example of how to use it (notice how fast it is! 😊):

ExUnit.Diff.diff(String.graphemes(File.read!("mix.lock")), String.graphemes(File.read!("mix.exs")))

Update: My algorithm made assumptions that don't apply in the general case. 😓 Nice try but no cigar!

@josevalim
Copy link
Member

Awesome, good job! In the documentation, don't forget to mention any paper you have used as reference. It would help someone who needs to maintain the code later. :)

@josevalim
Copy link
Member

Ping! Let us know if you need help moving forward with this.

@sunaku
Copy link
Contributor Author

sunaku commented Jun 29, 2015

😞 I spent all my free time last week working on my algorithm, rethinking the problem in different ways (directed graph, sparse adjacency list, clustering contiguous sequences, resolving ties with weights, etc.) but no matter what I tried, the dreaded O(m^2) time complexity was in my face at every turn because I could never quite avoid having to compute the LCS (longest common subsequence) of the inputs. :neckbeard:

On the plus side, I found ways to trick the Erlang tdiff library (because it's greedy) 😇 into returning suboptimal results 💩, so porting it as-is to Elixir isn't something I'd recommend we do. And as expected, Git's diff implementation remains superior in terms of speed as well as resilience to such trickery. 😈

⌛ Yesterday, I found some newer research on solving the LCS problem in clever ways. Give me another week's time to absorb it and try things out. If it can't wait, proceed with any Elixir diff implementation. 👍

@josevalim
Copy link
Member

No rush, just wondering! I would honestly try to port myers paper, I have no idea how complex it would be though.

@sunaku
Copy link
Contributor Author

sunaku commented Jun 29, 2015

The Myers algorithm is rather complicated (finding "snakes" by traversing from both top and bottom, hoping to meet in the middle). 😅 I'm more hopeful about the newer research on dominant points:

Majid Sazvar, Mahmoud Naghibzadeh, and Nayyereh Saadati. 2012. Quick-MLCS: a new algorithm for the multiple longest common subsequence problem. In Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering (C3S2E '12). ACM, New York, NY, USA, 61-66. DOI=10.1145/2347583.2347591 http://doi.acm.org/10.1145/2347583.2347591

@sunaku
Copy link
Contributor Author

sunaku commented Jul 7, 2015

Good news! 🎉 I was able to invent a new algorithm to solve this problem quickly (as fast as git diff but with higher quality results). 😤 I'm refining it now and will soon write an article and then a paper on it. 🎓 Expect an update (with the final code for this PR) in a few weeks time. 🚀 Thanks for your patience! 😅

@williamgueiros
Copy link
Contributor

👍

@josevalim
Copy link
Member

@sunaku I am closing this because the PR as is won't be merged. However, we are all anxiously waiting for your updates and new PR! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants