Skip to content

ref(grouping): Restrict values in IPv4 regex#114362

Merged
lobsterkatie merged 2 commits intomasterfrom
kmclb-restrict-values-in-ipv4-regex
Apr 30, 2026
Merged

ref(grouping): Restrict values in IPv4 regex#114362
lobsterkatie merged 2 commits intomasterfrom
kmclb-restrict-values-in-ipv4-regex

Conversation

@lobsterkatie
Copy link
Copy Markdown
Member

This tightens our IPv4 regex to only allow valid values in each spot (nothing higher than 255, no leading zeros). No behavior changes, but should reduce the number of times we're hitting the false-positive fallback (which is slower than the happy path).

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 29, 2026
@lobsterkatie lobsterkatie marked this pull request as ready for review April 29, 2026 21:33
@lobsterkatie lobsterkatie requested a review from a team as a code owner April 29, 2026 21:33
Copy link
Copy Markdown
Contributor

@cvxluo cvxluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a test for this too?

@lobsterkatie
Copy link
Copy Markdown
Member Author

should we add a test for this too?

So I actually wrote a test for the false-positive-ness part of the IP pattern (asserting that it either did or didn't call the is_valid_ip helper on things we know not to be valid IP addresses) but it felt kinda like overkill? It looked like this:

# Cases where we might or might not trigger a false positive with our IP regex (necessitating use of
# the slower fallback parameterization method if we do). The goal is to have as many of these as
# possible have False for their third parameter.
ip_false_positive_cases = [
    # (name, input, whether callback is expected to have been called)
    ("ip - too many initial characters", "12345::6:789", False),
    ("ip - too many final characters", "123:4::56789", False),
    ("ip - too many initial colons", ":::1121", False),
    ("ip - too many interior colons", "1231:::1121", True),
    ("ip - too many final colons", "1231:::", False),
    ("ip - three colons alone", ":::", False),
    ("ip - single leading colon", "Script error. :0:0", False),
    ("ip - single trailing colon", "12::31:", False),
    ("ip - too few segments", "12:31:99", True),
    ("ip - v4 leading zeros", "11.21.12.001", False),
    ("ip - v4 segment > 255", "12.31.12.908", False),
    ("ip - v4 too many segments", "11.21.12.31.12", False),
    ("date - colon btwn date and time", "21/Nov/2012:12:31:12", True),
]


@pytest.mark.parametrize(("name", "input", "callback_call_expected"), ip_false_positive_cases)
@patch("sentry.grouping.parameterization.is_valid_ip", wraps=is_valid_ip)
def test_ip_false_positives(
    mock_is_valid_ip: MagicMock, name: str, input: str, callback_call_expected: bool
) -> None:
    parameterizer.parameterize(input)

    if callback_call_expected:
        mock_is_valid_ip.assert_called()
    else:
        mock_is_valid_ip.assert_not_called()

I could add it back in... WDYT?

As for showing that in the end we don't parameterize bogus values as IP addresses, the third- and second-to-last cases above are also included as regular tests:

(
"ip - v4, leading zeros",
"11.21.12.001",
"<int>.<int>.<int>.<int>",
"<float>.<float>",
),
(
"ip - v4, segment > 255",
"12.31.12.908",
"<int>.<int>.<int>.<int>",
"<float>.<float>",
),

They don't (yet) show up how they ideally might, but we (correctly) don't label them as IPs.

@cvxluo
Copy link
Copy Markdown
Contributor

cvxluo commented Apr 30, 2026

So I actually wrote a test for the false-positive-ness part of the IP pattern (asserting that it either did or didn't call the is_valid_ip helper on things we know not to be valid IP addresses) but it felt kinda like overkill? It looked like this:

personally i think this test is pretty reasonable, e.g. i understand better what change you're going for by reading the examples in the test

@lobsterkatie lobsterkatie force-pushed the kmclb-restrict-values-in-ipv4-regex branch from 2922327 to 46ce9a9 Compare April 30, 2026 19:19
@lobsterkatie
Copy link
Copy Markdown
Member Author

Okay, so I added the test in #114458 and rebased this on top of that, so that now the changes here are reflected in just the relevant test cases getting updated.

@lobsterkatie lobsterkatie merged commit 2e38b9c into master Apr 30, 2026
76 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-restrict-values-in-ipv4-regex branch April 30, 2026 19:32
cleptric pushed a commit that referenced this pull request May 5, 2026
This tightens our IPv4 regex to only allow valid values in each spot (nothing higher than 255, no leading zeros). No behavior changes, but should reduce the number of times we're hitting the false-positive fallback (which is slower than the happy path).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants