Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: phishing detection #9610

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Feat: phishing detection #9610

wants to merge 16 commits into from

Conversation

hamza221
Copy link
Contributor

@hamza221 hamza221 commented Apr 29, 2024

Ref #9453

}

private function isLink(string $text): bool {
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not using a custom Regexp.

But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).

If I'm correct then FILTER_VALIDATE_URL would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something. Additionally it doesn't accept URLs containing non-ASCII characters.

I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?

Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/gruber/8891611 maybe this can be a more robust solution ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text_Filter is based on it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the original source, yes. But a library might bring the benefit that possible changes (there's a list of TLDs in there, which may change) are tended for by multiple people/projects – I'm unsure about the Horde libs in this regard, though.

use OCA\Mail\AddressList;
use OCA\Mail\Contracts\ITrustedSenderService;

class PhishingDetectionService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new class needs tests

lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
}
foreach ($zippedArray as $zipped) {
if($this->isLink($zipped['linkText'])) {
if (str_contains($zipped['linkText'], $zipped['href']) === false) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use a library to normalize the href-content, because encodings and funny details (e.g. data-URLs) can easily break this comparison even if both parts actually reference the same resource.

Also currently this check would flag a link that e.g. doesn't contain its URL-scheme in its link text, which seems a bit harsh to me? (Example that would be flagged: <a href="https://cloud.nextcloud.com/">cloud.nextcloud.com</a>).

}

private function isLink(string $text): bool {
$pattern = '/^(https?:\/\/|www\.|[a-zA-Z0-9-]+\.[a-zA-Z]{2,})/i';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not using a custom Regexp.

But if I understand correctly this method isn't supposed to match only fully qualified URLs, but all text that looks roughly like a URL (as part of a link text).

If I'm correct then FILTER_VALIDATE_URL would be too strict because it doesn't accept e.g. cloud.nextcloud.com/p/something. Additionally it doesn't accept URLs containing non-ASCII characters.

I'd suggest to use a library to find URLs in text. The Horde lib "Text_Filter" contains a class that links all found URLs, maybe some methods of that can be used?

Also maybe the method could be renamed to something more telling (e.g. "textLooksLikeALink()"), and get a comment about its purpose?

lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetectionService.php Outdated Show resolved Hide resolved
];
}
foreach ($zippedArray as $zipped) {
if($this->isLink($zipped['linkText'])) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches only if the link text only contains one URL-like "word". An anchor element with a link text like "login at cloud.nextcloud.com" would not be checked in this code.

@kesselb
Copy link
Contributor

kesselb commented May 28, 2024

Great presentation today 👍

Sorry, I didn't have the opportunity to review the pr.

  1. I'm uncertain about the reply-to warning. The purpose of reply-to is to send the response to a different email. I wonder if we want to use a less scary warning for such a case. For example, our calendar invitations are also using reply-to.

  2. The PhishingDetectionService reminds me a bit about our CheckSetupController in server. I think we could move every check to an own class. Testing is also easier than. Wdyt @ChristophWurst?

Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
@hamza221 hamza221 mentioned this pull request May 29, 2024
15 tasks
Copy link
Member

@st3iny st3iny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test but left some comments on the code.

lib/IMAP/ImapMessageFetcher.php Outdated Show resolved Hide resolved
lib/PhishingDetectionResult.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/ReplyToCheck.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Show resolved Hide resolved
lib/IMAP/ImapMessageFetcher.php Outdated Show resolved Hide resolved
lib/PhishingDetectionList.php Outdated Show resolved Hide resolved
lib/PhishingDetectionResult.php Outdated Show resolved Hide resolved
lib/Service/PhishingDetection/DateCheck.php Outdated Show resolved Hide resolved
lib/Address.php Show resolved Hide resolved
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
$this->customEmailCheck = $customEmailCheck;
$this->dateCheck = $dateCheck;
$this->replyToCheck = $replyToCheck;
$this->list = new PhishingDetectionList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a temporary variable. Move it into checkHeadersForPhishing or anywhere else it's used.

The current way introduces side effects and if you call checkHeadersForPhishing twice you get outdated results too from the previous invocation.

Signed-off-by: Hamza Mahjoubi <hamzamahjoubi221@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants