Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ahocorasick: improve matching with subdomains #2331

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

IvanNardi
Copy link
Collaborator

@IvanNardi IvanNardi commented Feb 27, 2024

The basic idea is to have the following logic:

  • pattern "DOMAIN" matches the domain itself (i.e exact match) and any subdomains (i.e. "ANYTHING.DOMAIN")
  • pattern "DOMAIN." matches also any strings for which is a prefix [please, note that this kind of match is handy but it is quite dangerous...]
  • pattern "-DOMAIN" matches also any strings for which is a postfix

Examples:

  • pattern "wikipedia.it":
    • "wikipiedia.it" -> OK
    • "foo.wikipedia.it -> OK
    • "foowikipedia.it -> NO MATCH
    • "wikipedia.it.com -> NO MATCH
  • pattern "wikipedia.":
    • "wikipedia.it" -> OK
    • "foo.wikipedia.it -> OK
    • "foowikipedia.it -> NO MATCH
    • "wikipedia.it.com -> OK
  • pattern "-wikipedia.it":
    • "wikipedia.it" -> NO MATCH
    • "foo.wikipedia.it -> NO MATCH
    • "0001-wikipedia.it -> OK
    • "foo.0001-wikipedia.it -> OK

Bottom line:

  • exact match
  • prefix with "." (always, implicit)
  • prefix with "-" (only if explicitly set)
  • postfix with "." (only if explicitly set)

That means that the patterns cannot start with '.' anymore.

Close #2330

@IvanNardi IvanNardi marked this pull request as draft February 27, 2024 11:06
@IvanNardi IvanNardi force-pushed the ahocorasick branch 4 times, most recently from c12b755 to 98ec618 Compare February 27, 2024 17:18
@lucaderi
Copy link
Member

I am good with the patch even though I think that it should be better to just patch the engine and not the individual protocols. When ready we can merge it

@IvanNardi IvanNardi force-pushed the ahocorasick branch 2 times, most recently from a7ee66f to 82f6dc5 Compare February 28, 2024 18:54
@IvanNardi IvanNardi marked this pull request as ready for review February 28, 2024 18:55
@IvanNardi
Copy link
Collaborator Author

Latest version changes the code only in the callback

@IvanNardi
Copy link
Collaborator Author

The error reported by the fuzzing job is unrelated to this change: it is exactly the bug reported in #2258

The basic idea is to have the following logic:
* pattern "DOMAIN" matches the domain itself (i.e exact match) *and* any
subdomains (i.e. "ANYTHING.DOMAIN")
* pattern "DOMAIN." matches *also* any strings for which is a prefix
[please, note that this kind of match is handy but it is quite
dangerous...]
* pattern "-DOMAIN" matches *also* any strings for which is a postfix

Examples:
* pattern "wikipedia.it":
  * "wikipiedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> NO MATCH
* pattern "wikipedia.":
  * "wikipedia.it" -> OK
  * "foo.wikipedia.it -> OK
  * "foowikipedia.it -> NO MATCH
  * "wikipedia.it.com -> OK
* pattern "-wikipedia.it":
  * "wikipedia.it" -> NO MATCH
  * "foo.wikipedia.it -> NO MATCH
  * "0001-wikipedia.it -> OK
  * "foo.0001-wikipedia.it -> OK

Bottom line:
* exact match
* prefix with "." (always, implicit)
* prefix with "-" (only if esplicitly set)
* postfix with "." (only if esplicitly set)

That means that the patterns cannot start with '.' anymore.

Close ntop#2330
Copy link

sonarcloud bot commented Mar 6, 2024

Quality Gate Passed Quality Gate passed

Issues
263 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@IvanNardi IvanNardi merged commit 21da53d into ntop:dev Mar 6, 2024
24 of 33 checks passed
@IvanNardi IvanNardi deleted the ahocorasick branch March 6, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ahocorasick: wrong match
2 participants