-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: phishing detection #9610
base: main
Are you sure you want to change the base?
Feat: phishing detection #9610
Conversation
} | ||
|
||
private function isLink(string $text): bool { | ||
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not using a custom Regexp.
But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).
If I'm correct then FILTER_VALIDATE_URL
would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something
. Additionally it doesn't accept URLs containing non-ASCII characters.
I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?
Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gist.github.com/gruber/8891611 maybe this can be a more robust solution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text_Filter is based on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the original source, yes. But a library might bring the benefit that possible changes (there's a list of TLDs in there, which may change) are tended for by multiple people/projects – I'm unsure about the Horde libs in this regard, though.
use OCA\Mail\AddressList; | ||
use OCA\Mail\Contracts\ITrustedSenderService; | ||
|
||
class PhishingDetectionService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this new class needs tests
} | ||
foreach ($zippedArray as $zipped) { | ||
if($this->isLink($zipped['linkText'])) { | ||
if (str_contains($zipped['linkText'], $zipped['href']) === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use a library to normalize the href-content, because encodings and funny details (e.g. data-URLs) can easily break this comparison even if both parts actually reference the same resource.
Also currently this check would flag a link that e.g. doesn't contain its URL-scheme in its link text, which seems a bit harsh to me? (Example that would be flagged: <a href="https://cloud.nextcloud.com/">cloud.nextcloud.com</a>
).
} | ||
|
||
private function isLink(string $text): bool { | ||
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not using a custom Regexp.
But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).
If I'm correct then FILTER_VALIDATE_URL
would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something
. Additionally it doesn't accept URLs containing non-ASCII characters.
I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?
Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?
]; | ||
} | ||
foreach ($zippedArray as $zipped) { | ||
if($this->isLink($zipped['linkText'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches only if the link text only contains one URL-like "word". An anchor element with a link text like "login at cloud.nextcloud.com" would not be checked in this code.
Great presentation today 👍 Sorry, I didn't have the opportunity to review the pr.
|
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
71db744
to
88e87cd
Compare
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not test but left some comments on the code.
0c914e1
to
b3d696b
Compare
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
815468d
to
450ceaf
Compare
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
$this->customEmailCheck = $customEmailCheck; | ||
$this->dateCheck = $dateCheck; | ||
$this->replyToCheck = $replyToCheck; | ||
$this->list = new PhishingDetectionList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a temporary variable. Move it into checkHeadersForPhishing
or anywhere else it's used.
The current way introduces side effects and if you call checkHeadersForPhishing
twice you get outdated results too from the previous invocation.
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Ref #9453