Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email Protection Systems Generate Invalid Traffic #9798

Closed
mabumusa1 opened this issue Mar 17, 2021 · 12 comments
Closed

Email Protection Systems Generate Invalid Traffic #9798

mabumusa1 opened this issue Mar 17, 2021 · 12 comments
Labels
bug Issues or PR's relating to bugs email Anything related to email enhancement Any improvement to an existing feature or functionality stale Issues which have not received an update within 90 days T3 Hard difficulty to fix (issue) or test (PR) tracking Anything related to tracking

Comments

@mabumusa1
Copy link
Member

mabumusa1 commented Mar 17, 2021

Q A
Mautic version ANY
PHP version ANY
Browser ANY

Bug Description

This is not a Mautic bug by itself but it impacts Mautic a lot, there are many protection systems like https://www.proofpoint.com/us/products/email-security-and-protection/email-protection which do the following on the emails sent by Mautic

  1. They collect all the emails
  2. It parses the emails, and then follow all the links including the tracking pixels (so all the emails sent are marked as read).
  3. It scrambles the UTM codes and then follows the links with scrambled UTM, therefore it messes up the analytics.
  4. Traffic comes from different IPs, with different headers including User Agent which makes it hard to block invalid traffic.
  5. To make the case worst, they follow the unsubscribe link and unsubscribe the whole list that you sent to, without a preferance center your list is marked as DNC

I opened this thread for discussion as there is no clear way to solve it

@mabumusa1 mabumusa1 added the needs-triage For new issues/PRs that need to be triaged label Mar 17, 2021
@kuzmany
Copy link
Member

kuzmany commented Mar 18, 2021

We have a lot of discussion about that in company. We've talked about 2 solutions:

  1. write algorithm to detect bots clicks depends on time/threshold and numbers of clicks. These require a lot of changes, new column is_bot in email_stats and channel_url_trackables etc. The the results would be questionable
    This solution is something like HubSpot already done https://community.hubspot.com/t5/Email-Marketing-Tool/Are-Bots-Affecting-Your-Email/td-p/302428)

  2. Invisible Recaptcha3

Add page before redirect with Recaptacha and decide based on score

  • good score - redirect to page
  • bad score - show link to page to manually go to page

I like this idea.

@mabumusa1 do you have any opinion?

@YosuCadilla
Copy link

@mabumusa1 I developed a Mautic plugin for a client who had this problem.

This type of software is usually used by large companies running their own email servers.

We tried to find an elegant and scientific method to solve the problem, like identify the browser agent, IPs and looked at the data for other possible ways to isolate the bad clicks or the bad click producers and we tested a few, didn't work consistently...

Browser agents change a lot. We identified a few browser agents doing a lot of damage, but in the next round of emails, those had changed. Also, if you check data over time, browser agents that were clearly harmful to one email, were part of what looked like legit clicks in other emails. Also, there usually is a great number of fake clicks coming from a handful of "bad browser agents" but that is maybe 50% of the total, and the rest of bad browsers make just one or a few clicks, hard to define patterns, maybe a good job for an AI.

I was also unsuccessful with isolating IPs, these are corporate servers behind corporate networks with reverse proxies, reaching out to the internet over a pool of IPs. In one extreme case, we had the same person (contact) click on a link from 5 different locations all around the US, from coast to coast in a 10-second window.
The issue is that both, legit and fake clicks, usually come from the very same IPs, so no joy...

What ended working decently well for us was to add an invisible link to all outgoing emails, then once a click to the invisible link happens, we check all the clicks in a 10-second window and we eliminate all of them (we copy them to a different table for further analysis).
This works under the assumption that the security bots/scanners click on all the links on an email and it is eliminating (probably) well over 85% of the bad clicks, however, when we look at the data there are a few SMALL inconsistencies here and there, so the method is not perfect. Tweaking the time window as well as the position of the invisible link on the email allowed us to increase the effectiveness by 10%, so well worth dedicating some time to this.

After a few adjustments to the scripts, the CMO of the company decided this method was doing the job well enough, and there was no need to double the development cost to squeeze an extra 5% reliability, hence no further development or research was deemed necessary.
It's been working for a few months already, we have some surprises now and then, but nothing big enough to make us consider more research or new development for now.

If you ask me, this is an excellent problem for an AI, this is what these excel at, finding patterns, so if we ever decide to improve the current scripts, I will strongly recommend training an AI with the data from the Mautic database and see what comes out.

Another thing that might change is the moment in time we run the filters. Right now we are running the scripts from a cronjob, hence the data first makes it to the database and then is evaluated and removed if deemed wrong.
The next iteration, if it ever happens, will be implemented as an external, real-time pre-filter (probably at the apache level), so the bad clicks never make it to the database in the first place.

Interesting possibility with the Recaptcha @kuzmany, so basically every link would point to or be intercepted by a "proxy page" where the Recaptcha lives and then redirected to the real page, right?

@kuzmany
Copy link
Member

kuzmany commented Mar 19, 2021

@YosuCadilla thank you for your experiences

What ended working decently well for us was to add an invisible link to all outgoing emails, then once a click to the invisible link happens, we check all the clicks in a 10-second window and we eliminate all of them... Tweaking the time window as well as the position of the invisible link on the email allowed us to increase the effectiveness by 10%, so well worth dedicating some time to this.

Did you increase or decrease that time tresholds?
That means after tweaking invisible link resolved 90% of bots clicks at least?

Interesting possibility with the Recaptcha @kuzmany, so basically every link would point to or be intercepted by a "proxy page" where the Recaptcha lives and then redirected to the real page, right?

Yes, all urls are tracked, then it's easy to add before redirection routine (stats, redirect) some page and continue to standard redirection after passed verification.

@YosuCadilla
Copy link

Did you increase or decrease that time tresholds?
There isn't a perfect number, each click seems to take about one second (but can vary for each destination domain). So the final timings depend on the position of the invisible link relative to the rest of the links in the email and the number of links in the email.
For example, we ended using +/- 10 seconds, because the invisible link is in the middle and there are 7-8 links on each email.
I think you can increase the number up to 20 ,30 or even more seconds, the risk here is that if the final recipient (the real person) happens to open the email and click on a link within this time window, the click would be discarded, so the shorter the window the better, but give it enough time to catch all the fake clicks.

That means after tweaking invisible link resolved 90% of bots clicks at least?
Clarification about the % of true clicks (effectivity): What we measured is the % of true/failed detections among the positives (emails with clicks on the invisible link), meaning how good/bad is the script at caching all the fake clicks once the invisible link is clicked. If a bot clicks on just one, a few, or all the links except the invisible link, we don't see anything at all (and that's why it is good enough but far from perfect).

However, our level of detected bad clicks matches what others described on the HubSpot thread, and the click ratios are now much more aligned with industry standards.

@RCheesley RCheesley added bug Issues or PR's relating to bugs email Anything related to email enhancement Any improvement to an existing feature or functionality tracking Anything related to tracking T3 Hard difficulty to fix (issue) or test (PR) and removed needs-triage For new issues/PRs that need to be triaged labels Mar 24, 2021
@stale
Copy link

stale bot commented Jun 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues which have not received an update within 90 days label Jun 22, 2021
@stale
Copy link

stale bot commented Jul 7, 2021

This issue has been automatically closed because it has not had recent activity. If the reported issue persists, please create a new issue and link back to this one for reference. Thank you for your contributions.

@adiux
Copy link
Contributor

adiux commented Sep 10, 2021

I think this issue is important and we should discuss and address it in the community.

@adiux adiux reopened this Sep 10, 2021
@stale stale bot removed the stale Issues which have not received an update within 90 days label Sep 10, 2021
@mautibot
Copy link

This issue has been mentioned on Mautic Community Forums. There might be relevant details there:

https://forum.mautic.org/t/possible-work-around-for-reporting-open-and-clicks-without-bot-data/16989/13

@kuzmany
Copy link
Member

kuzmany commented Oct 5, 2021

We've already worked on solution with recpatcha page before go page.
I will report data when we get it.
This PR is part of it: #10503

@stale
Copy link

stale bot commented Jan 3, 2022

This issue or PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you would like to keep it open please let us know by replying and confirming that this is still relevant to the latest version of Mautic and we will try to get to it as soon as we can. Thank you for your contributions.

@stale stale bot added the stale Issues which have not received an update within 90 days label Jan 3, 2022
@stale
Copy link

stale bot commented Jan 17, 2022

This issue or PR has been automatically closed because it has not had recent activity. In the case of issues, if it persists in the latest version of Mautic, please create a new issue and link back to this one for reference. With PRs if you wish to pick up the PR and update it so that it can be considered for a future release, please comment and we will re-open it. Thank you for your contributions.

@stale stale bot closed this as completed Jan 17, 2022
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If this issue is continuing with the lastest stable version of Mautic, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues or PR's relating to bugs email Anything related to email enhancement Any improvement to an existing feature or functionality stale Issues which have not received an update within 90 days T3 Hard difficulty to fix (issue) or test (PR) tracking Anything related to tracking
Projects
None yet
Development

No branches or pull requests

6 participants