ref(grouping): Restrict values in IPv4 regex#114362
Conversation
cvxluo
left a comment
There was a problem hiding this comment.
should we add a test for this too?
So I actually wrote a test for the false-positive-ness part of the IP pattern (asserting that it either did or didn't call the # Cases where we might or might not trigger a false positive with our IP regex (necessitating use of
# the slower fallback parameterization method if we do). The goal is to have as many of these as
# possible have False for their third parameter.
ip_false_positive_cases = [
# (name, input, whether callback is expected to have been called)
("ip - too many initial characters", "12345::6:789", False),
("ip - too many final characters", "123:4::56789", False),
("ip - too many initial colons", ":::1121", False),
("ip - too many interior colons", "1231:::1121", True),
("ip - too many final colons", "1231:::", False),
("ip - three colons alone", ":::", False),
("ip - single leading colon", "Script error. :0:0", False),
("ip - single trailing colon", "12::31:", False),
("ip - too few segments", "12:31:99", True),
("ip - v4 leading zeros", "11.21.12.001", False),
("ip - v4 segment > 255", "12.31.12.908", False),
("ip - v4 too many segments", "11.21.12.31.12", False),
("date - colon btwn date and time", "21/Nov/2012:12:31:12", True),
]
@pytest.mark.parametrize(("name", "input", "callback_call_expected"), ip_false_positive_cases)
@patch("sentry.grouping.parameterization.is_valid_ip", wraps=is_valid_ip)
def test_ip_false_positives(
mock_is_valid_ip: MagicMock, name: str, input: str, callback_call_expected: bool
) -> None:
parameterizer.parameterize(input)
if callback_call_expected:
mock_is_valid_ip.assert_called()
else:
mock_is_valid_ip.assert_not_called()I could add it back in... WDYT? As for showing that in the end we don't parameterize bogus values as IP addresses, the third- and second-to-last cases above are also included as regular tests: sentry/tests/sentry/grouping/test_parameterization.py Lines 335 to 346 in 52922b1 They don't (yet) show up how they ideally might, but we (correctly) don't label them as IPs. |
personally i think this test is pretty reasonable, e.g. i understand better what change you're going for by reading the examples in the test |
2922327 to
46ce9a9
Compare
|
Okay, so I added the test in #114458 and rebased this on top of that, so that now the changes here are reflected in just the relevant test cases getting updated. |
This tightens our IPv4 regex to only allow valid values in each spot (nothing higher than 255, no leading zeros). No behavior changes, but should reduce the number of times we're hitting the false-positive fallback (which is slower than the happy path).
This tightens our IPv4 regex to only allow valid values in each spot (nothing higher than 255, no leading zeros). No behavior changes, but should reduce the number of times we're hitting the false-positive fallback (which is slower than the happy path).