New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an assertBytesEqual to unittest and use it for bytes assertEqual #54373
Comments
Just as with regular strings, when comparing failed output involving bytes strings it is really helpful to have a diff showing which bytes have changed. The attached patch adds an assertMultiLineEqual method to unittest and uses it for comparing bytes in assertEqual. The diff output is created by first breaking the byte strings into lines using 'splitlines(True)' just as in assertMultiLineEqual, but then each line is converted to a string using 'repr' before the lines sets are passed to ndiff. This results in output containing escaped bytes, making it easier to compare the differences in the byte strings. (NB: this patch is the end result of several attempts to make the unittest output more useful in the email package when testing bytes handling). |
After talking with Michael on #python-dev, I've revised the patch to make it a real assertBytesEqual method rather than a pretend-the-bytes-are-strings method. This version allows the byte strings to be split on an arbitrary byte string, which makes it more useful for getting diffs of structured binary data if that data has a convenient break point. And the default is no splitting, which is what assertEqual uses when comparing bytes. One of the tests is failing, and it's late and I can't figure out where the extra space is coming from. Maybe someone else will check it out and spot the problem before I get back to it. |
My best guess currently is that the failing test is a bug in difflib, but I haven't dug into that code yet to prove it. It's still possible it's something stupid in my code, but as far as I can see there's no character over where that - is in the difflib output (that is, the two lines that it says are different appear identical when viewed with cat -A). |
Am discussing this with the OP on IRC and tabling it for a while so we can better think out the API. The goal is to let assertEqual(a, b) do straight-comparions of raw bytes, but to give a nice looking diff (possibly translated with line breaks or somesuch) when the test fails. This will be helpful in testing the email module. The current patch requires that assertBytesEqual be exposed and called directly so that a user can specify a split-at argument. The purpose of that argument is to approximate the line breaking that occurs naturally in text. The OP does not want to decode the bytes prior to the equality test, but does want a readable diff whenever the bytes represent ascii text. |
David - would you get a good approximation of what you want simply with:
(This actually returns "b'first'" "b'second'" so you may want a convenience function that chops the leading and trailing b'/') As ascii returns unicode it would automatically delegate to assertMultilineEqual. The obvious way to hook this up by default for assertEqual is having the split-character as '\n'. This would not be meaningful for using assertEqual to compare bytes that *aren't* text. |
Rejecting this one for reasons we discussed earlier. The assertEqual() method needs to be the primary interface. Everything else is starting to mix content and presentation (i.e. passing in separators). The existing repr() works fine with bytes and Michael's suggested ascii() cast would be the preferred technique in the common cases. What might be useful is a less specialized patch letting assertEqual() take an argument pointing to some repr or pre-processing function that would be called after an equality test fails but before it is diffed. That would support a clear separation of concerns and be easily extendable by users would need something more than an ascii() cast. |
Agreed on the closing. The pre-diff processing function would be a great addition. For the record, I am currently satisfying my use case by doing this: self.assertEqual(bstr1.split(b'\n'), bstr2.split(b'\n')) which produces a very readable diff. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: