Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend list of known Referrer Spammers #5099

Closed
mattab opened this issue May 6, 2014 · 128 comments
Closed

Extend list of known Referrer Spammers #5099

mattab opened this issue May 6, 2014 · 128 comments

Comments

@mattab
Copy link
Member

@mattab mattab commented May 6, 2014

In #2268 we have implemented a Referrer spam list, initially seeded with the worst of all spammers: semalt.

However it turns out there are thousands of other spammers that attack Piwik users with their lame websites. It will be hard to keep track of them all and write the list in the config file here.

List of spammers:

What is our best way to move forward?

@anonymous-piwik-user
Copy link

@anonymous-piwik-user anonymous-piwik-user commented May 6, 2014

I'll start:

How about an antispam plugin that adds a 'Spam' button next to each visit? Not sure about how this would work on the backend, whether it is a straight filter, a learning feature (SpamAssassin) for each installation or a database managed some place (akismet).

@anonymous-piwik-user
Copy link

@anonymous-piwik-user anonymous-piwik-user commented May 7, 2014

Well, I don't have the technical know-how to discuss how to do this. But a button next to each referer which can send it to a block list, sounds like a good idea to me.

Mostly I'm visiting to report what I think is another referer spammer:

web.mail.comcast.net (http://web.mail.comcast.net/zimbra/mail?app=mail)

@tassoman
Copy link
Contributor

@tassoman tassoman commented May 7, 2014

I can't say if it's useful in this case but in the past I was very happy with Bad Behavior

@anonymous-piwik-user
Copy link

@anonymous-piwik-user anonymous-piwik-user commented May 7, 2014

Re: Bad Behavior

I don't think it's the right tool for this job. I had a longer response that I spent ~20 minutes writing, but it was destroyed by the Trac spam checker for a pattern that I don't recall having in the message.

I think if we are going to have a discussion on spam, then we need to disable spam control on this task, have the discussion some place else entirely, or post all messages to pastebin, because it seems like every message I try to post here is getting caught by the spam checker.

@mattab
Copy link
Member Author

@mattab mattab commented May 8, 2014

it was destroyed by the Trac spam checker for a pattern that I don't recall having in the message.

I highly recommend to use this, it's saved me hours / days of frustration: https://addons.mozilla.org/en-US/firefox/addon/lazarus-form-recovery/

@anonymous-piwik-user
Copy link

@anonymous-piwik-user anonymous-piwik-user commented May 9, 2014

Replying to tassoman:
I can't say if it's useful in this case but in the past I was very happy with Bad Behavior (link removed)

I have Bad Behavior installed (on my SMF/Tiny Portal site), yet still am "visited" by semalt almost daily (at least until the Referer Spam blacklist comes out in a stable release).

And I have another question. Would it be a 2-way button, like a toggle, so that if you accident-ally clicked it, you could click again to unblock? If undoing the block will be hard to do, perhaps it should not be quite so convenient. Or maybe have a comfirmation "are you sure you want to ....." or "OK" button?


Idk what might be wrong with this message, but I also go the spam error. It said it has a "dental" pattern. What the heck??!

(I'm removing the link from the quote, maybe that's the problem.)(No, still "dental"..... I'll try removing the whole quote.) (Maybe it means a text string? I'll try breaking up accident-ally.)

Edit -- Bingo! Seems to be the text string.

@mattab
Copy link
Member Author

@mattab mattab commented May 11, 2014

In acb1bc2: Actually call the Referrer Spam check.
Fixes #2268 Refs #5099

@canajun2eh
Copy link

@canajun2eh canajun2eh commented May 25, 2014

I think the list of referrers to be excluded from Piwik reports is a good idea, and Matt has done a good job in getting this started.

However, the current implementation requires manual editing of the configuration file, which is generally not recommended for the average user. It would be better if there were a user interface for this feature, much like the way in which the "Settings --> Manage Websites --> Global websites settings --> Global list of user agents to exclude" list can be managed by a Superuser. Site spammers to be excluded should be listed one per line sorted alphabetically. Adding or removing a referrer would be as simple as adding or removing an entry from the displayed list.

The out-of-the-box list should be pre-loaded with semalt.com and possibly one or two other known site spammers. If somebody then really wants to count such referrers, they can remove them from the predefined list.

This simple user interface would be much easier to implement than having a button somewhere to add a displayed referrer to the Exclude list.

@anonymous-piwik-user
Copy link

@anonymous-piwik-user anonymous-piwik-user commented May 31, 2014

Whatever solution is used, I hope that the future of this nuisance is considered. When I first started looking at access logs, maybe 12 years ago, I never saw any such thing as referral spam. Now, as seen in this pastebin, I get 54 spammers from one week. I don't recall when they started doing this, but I suspect it is only going to grow. I don't know if it will reach the epic proportions of comment spam, but whatever tool is created I hope this is taken into account to reduce workload of future devs.

@mattab mattab added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
@brynnd
Copy link

@brynnd brynnd commented Jul 20, 2014

Ok, looks like I'm official now :-p Hope I'm doing this right....

I got a new referer the other day, and when I followed the link, my system security (ESET) blocked it. Here's the URL: http+://youtube-downloader.savetubevideo.com/youtube-downloader.php?u=http+://mydomain.

And I realize there's such a thing as false positives, etc. But that's the 1st time I've gotten an apparently dangerous site, as a referrer. And that is a suspicious (looking) URL. So far, just that 1 incidence. I suppose I could find more info about it, from ESET, if it would be helpful.

All best :-)

PS - Hhmm....getting "an error" when I try to post. (What helpful error msg!) I seem to recall someplace where putting a URL in a comment would cause an error. So I'll mess up URL above with http+://etc. Cross fingers....

Nope -- still "There was an error".....
Oh wait -- there's another url within an url....cross fingers again....

@brynnd
Copy link

@brynnd brynnd commented Jul 22, 2014

fyi, it wasn't the link -- it was some 1 of my Firefox extensions....which I'll have to figure out which one to post this (or just disable all of them again).

Anyway, I think I have another referrer spammer. Twice in 2 days, and the link goes to a nearly blank page (3 dots and nothing else). The URL is: http://musicas.kambasoft.com/2.php?u=http://mydomain.

Thanks,
brynn

PS -- AdBlock Plus is the culprit ;-)

@jlj
Copy link

@jlj jlj commented Jul 26, 2014

; If you find new spam entries in Referrers>Websites, please report them here: #5099

Same top list of referrer spams for me as for brynnd:

  1. http://semalt.semalt.com/crawler.php?u=http://mydomain.com (already in global.ini.php)
  2. http://youtube-downloader.savetubevideo.com/youtube-downloader.php?u=http://mydomain.com
  3. http://musicas.kambasoft.com/2.php?u=http://mydomain.com
@brynnd
Copy link

@brynnd brynnd commented Jul 27, 2014

And new version of the kambasoft:

http://5.kambasoft.com/2.php?u=http://mydomain.com

@brynnd
Copy link

@brynnd brynnd commented Jul 28, 2014

Maybe should just make it *.kambasoft.com. (New one today: http://9.kambasoft.com/2.php?u=mydomain)

@jlj
Copy link

@jlj jlj commented Jul 29, 2014

@brynnd, setting the known Referrer Spammers list in piwik config file piwik/config/global.ini.php with these 3 domains had the expected effect of removing all known referrer spam visits from my site's statistics in the last 3 days.

[Edit Feb 19, 2015: clarification that the configuration shall be made in the local config file]
To do this, add the lines below in the [Tracker] section of your local config file piwik/config/config.ini.php (create the section if it does not exist):
[/Edit]

; Comma separated list of known Referrer Spammers, ie. bot visits that set a fake Referrer field.
; All Visits with a Referrer URL host set to one of these will be excluded.
; If you find new spam entries in Referrers>Websites, please report them here: https://github.com/piwik/piwik/issues/5099
referrer_urls_spam = "semalt.com,savetubevideo.com,kambasoft.com"

Hope it will do the same for you. :-)

@brynnd
Copy link

@brynnd brynnd commented Jul 30, 2014

Thank you jlj. For the most part, I'm content to wait for next stable release, for newly reported spammers to be added. But it's good to know how to do it manually :-)

@brynnd
Copy link

@brynnd brynnd commented Aug 17, 2014

Another:

getpocket.com

@Globulopolis
Copy link
Contributor

@Globulopolis Globulopolis commented Aug 31, 2014

@brynnd
Copy link

@brynnd brynnd commented Sep 2, 2014

urlopener.blogspot.com

I'm not sure if I would exactly call this one spam. But it's not the kind of referrer which I think Piwik intends to provide.

sabl0r pushed a commit to sabl0r/piwik that referenced this issue Sep 23, 2014
@brynnd
Copy link

@brynnd brynnd commented Oct 10, 2014

herahair.com

@mattab mattab removed this from the Mid term milestone Oct 11, 2014
@brynnd
Copy link

@brynnd brynnd commented Apr 10, 2015

I guess this is a new trend, and perhaps needs to be considered in a permanent feature. Now it looks like these guys are changing their domain names just slightly, to bypass our blocks!

I leads me to wonder (AGAIN!) what benefit these guys gain in doing this. "Spam" is "spam" and I guess no one really knows or understands its purpose. But I still do wonder why they waste time on this. What benefit could there possibly be? For anyone?

Note that also there is buttons-for-website.com and buttons-for-your-website.com ggrrr!

Also, another new one: best-seo-offer.com

@AgentGod
Copy link

@AgentGod AgentGod commented Apr 11, 2015

+1 best-seo-offer.com
+1 buttons-for-your-website.com

@fvdm
Copy link

@fvdm fvdm commented Apr 11, 2015

+1 for regex support, would be very useful

@campino2k
Copy link

@campino2k campino2k commented Apr 13, 2015

+1 best-seo-offer.com

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Apr 13, 2015

I guess this is a new trend, and perhaps needs to be considered in a permanent feature. Now it looks like these guys are changing their domain names just slightly, to bypass our blocks!

I agree this is getting more and more of a problem. By the way it seems that Google Analytics is not tackling this problem yet (I see all those spammers in GA), which gives Piwik a nice little plus.

Anyway I believe too that we need to find a more robust solution for this: as a user I don't like to have to wait for new Piwik releases to exclude new spammers (my data keeps being polluted in the meantime). I also think the current way of reporting spammers (GitHub issue) is not the best.

We need:

  • to make it easier for users to report new spammers
  • Piwik to auto-update the spammers list
  • while still keep the list up to date in new releases (for the Piwik installs that are setup to avoid any external network call)

@mattab I'd like to open a separate issue to discuss a solution (I have a few ideas) so that we keep this issue for reporting spammers, what do you think? Should we tackle this soon or let it be for now?

@mattab
Copy link
Member Author

@mattab mattab commented Apr 14, 2015

@mattab I'd like to open a separate issue to discuss a solution (I have a few ideas) so that we keep this issue for reporting spammers, what do you think? Should we tackle this soon or let it be for now?

something we can try with low effort is to put the list in a separate file with one spammer per line, and then ask the community to issue pull request, because it's so easy to make a PR on github web interface alone, it could be quick and efficient solution.

to discuss auto update feature +1 to discuss in a separate new issue to make sure it is followed up

edit: as it looks like Referrer spamming is here to stay, I guess many people managing websites will have issues with this. there is value in Piwik sharing our list with the world!
-> maybe we create a separate repo with just the spammer list?

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Apr 14, 2015

I've opened #7674 to continue the discussion.

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Apr 17, 2015

It seems we are not the only one trying to build such list, see http://www.reddit.com/r/Wordpress/comments/2qteln/i_want_to_build_a_list_of_referrer_spam_links_to/ (seems to be updated regularly)

@Fensterbank
Copy link
Contributor

@Fensterbank Fensterbank commented Apr 18, 2015

While there is no better way to report spammers at the moment, I'll continue here :)

  • buttons-for-your-website.com (which is not the same than buttons-for-website.com, which is already added to the list)
  • best-seo-offer.com
@AgentGod
Copy link

@AgentGod AgentGod commented Apr 18, 2015

Entire list alphabetically:

4webmasters.org,7makemoneyonline.com,adcash.com,anticrawler.org,best-seo-offer.com,best-seo-solution.com,bestwebsitesawards.com,blackhatworth.com,buttons-for-website.com,buttons-for-your-website.com,cenokos.ru,cenoval.ru,cityadspix.com,darodar.com,econom.co,iskalko.ru,edakgfvwql.ru,forum.smailik.org,Get-Free-Traffic-Now.com,gobongo.info,googlsucks.com,hulfingtonpost.com,humanorightswatch.org,ilovevitaly.co,ilovevitaly.com,ilovevitaly.ru,kambasoft.com,luxup.ru,make-money-online.7makemoneyonline.com,myftpupload.com,o-o-6-o-o.ru,o-o-8-o-o.ru,priceg.com,prlog.ru,ranksonic.info,ranksonic.org,savetubevideo.com,screentoolkit.com,semalt.com,semalt.semalt.com,seoexperimenty.ru,simple-share-buttons.com,slftsdybbg.ru,social-buttons.com,socialseet.ru,superiends.org,theguardlan.com,vodkoved.ru,websocial.me,ykecwqlixx.ru

@adegans
Copy link

@adegans adegans commented Apr 18, 2015

I think this can easily be countered if someone builds a thingy in Piwik to download a new list every so much hours, say once every 24 hours.

And the serving system where people can submit links. If 10 (?) people submit the same link it's added in the downloadable list which each piwik setup can fetch every 24 hours. This could employ a simple filter to get double submissions like www.semalt.com vs. semalt.com so the list stays somewhat compact and clean.

This submission system can be a thing in Github (Similar to torrent blocklists - https://gist.github.com/johntyree/3331662) or something more fancy.

I'm not overly familiar with Github and new to Piwik. Just came across this because I have a referrer spam problem also.

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Apr 19, 2015

The list has been moved to https://github.com/piwik/referrer-spam-blacklist in order to make it more visible and more practical to maintain.

Please open issues and pull requests in that new repository :)

If you are interested to know how this list will be handled, read this issue: #7674

@Fensterbank thanks I confirm those 2, I have added them to the new list.

@mnapoli mnapoli closed this Apr 19, 2015
@mattab
Copy link
Member Author

@mattab mattab commented Apr 21, 2015

How do I contribute to the Referrer spammer Piwik list?

To add a new referrer spammer to the list, click here to edit the spammers.txt file and create a pull request. Alternatively you can create a new issue.

Looking forward in the future to maintaining this spammer list together as a community 👍

mattab added a commit that referenced this issue Apr 29, 2015
mattab added a commit that referenced this issue Apr 29, 2015
@sbrickey
Copy link

@sbrickey sbrickey commented May 5, 2015

Since the list is now maintained via file on github, any chance that the admin site can check for newer versions, and ideally go download it automatically?

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented May 6, 2015

@brynnd
Copy link

@brynnd brynnd commented Jun 6, 2015

Well, it's exciting to see some new energy and seeing this project move forwards!

Unfortunately, I'm not very familiar with programming or how this site works. When I click the link in mattab's last msg, to edit the spammers.txt file, it says:

-- You need to fork this repository to propose changes.
-- Sorry, you’re not able to edit this repository directly— you need to fork it and propose your changes from there instead.

I don't know what that means to "fork this repository". So I used the other option and reported a new spammer (another semalt variety). But I'm still not sure how to proceed with my piwik installation.

I did read #7674, but unfortunately, I don't understand much of it. I also read https://github.com/piwik/referrer-spam-blacklist, but again, don't understand much.

If I have the current version of Piwik, am I getting all the spammers blocked? Or do I need to continue adding new spammers to my config.ini.php file?

Thank you very much :-)

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Jun 6, 2015

@brynnd it's fine if you weren't able to edit the file directly, opening an issue is good too. When it is added to the list it will be included in the new Piwik version, so that's why it's important to keep Piwik up to date.

In the future we want to auto-update the list so that you get the latest spammers blocked even before the new Piwik release is available (issue #7674).

@brynnd
Copy link

@brynnd brynnd commented Jun 9, 2015

Thanks mnapoli!

Where can I check the current list, so I don't accidentally add duplicates to the list? Especially these new semalt-related one, where they're just changing the domain by a character or 2/

@mattab
Copy link
Member Author

@mattab mattab commented Jun 9, 2015

@brynnd latest version is at: https://github.com/piwik/referrer-spam-blacklist/

Please note: you don't need to add the semalt variation if they are sub-domains of semalt.com (or any other spammer). but if they are new domain names (not sub-domains) then please suggest the new spammers on this project: https://github.com/piwik/referrer-spam-blacklist/

@ghost
Copy link

@ghost ghost commented Jun 25, 2015

I've added several domains to piwik/config/config.ini.php as described above. A week has passed and every day I see new entries from these same domains. Am I missing something basic, or is this worth opening a new issue?

This is at the end of my config.ini.php, and I've restarted Apache (and later the server) but modifying this configuration file has had no effect at all:

[Tracker]
referrer_urls_spam = "100dollars-seo.com,semaltmedia.com"
@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Jun 25, 2015

@tombrossman with the latest Piwik versions this INI config option isn't used anymore. There will be a new Piwik release very soon (probably tomorrow), else you can update to the latest beta and those spammers will be blocked.

With Piwik 2.14 the spammers list will be updated automatically.

@ghost
Copy link

@ghost ghost commented Jun 25, 2015

Ah, thanks - that makes sense now. I thought it was me doing something stupid again...

@jpjp
Copy link

@jpjp jpjp commented Jul 13, 2015

Is anti-referral spamming included in piwik 2.14 and enabled by default? Is there any way to retroactively apply it, like with the geoip location dbs? I couldn't find any documentation on this. Thanks.

@mnapoli
Copy link
Contributor

@mnapoli mnapoli commented Jul 19, 2015

Is anti-referral spamming included in piwik 2.14 and enabled by default?

Yes

Is there any way to retroactively apply it, like with the geoip location dbs?

No, it doesn't apply retroactively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet