Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a website to report spammers #97

Open
mnapoli opened this issue Aug 10, 2015 · 10 comments
Open

Create a website to report spammers #97

mnapoli opened this issue Aug 10, 2015 · 10 comments

Comments

@mnapoli
Copy link
Contributor

mnapoli commented Aug 10, 2015

Using GitHub issues and pull requests to report new spammers is starting to show its limits:

  • we want confirmation when adding new domains
  • bulk pull requests can't be validated at once
  • hard to find intersections between bulk pull requests to validate single domains (e.g. Add seoanalyses.com #96)
  • pull request conflicts
  • trouble with keeping the list sorted (some PRs ignore the sort)
  • poor traceability (sometimes the person committing to the repo is not the one who opened the issue or pull request for many reasons)
  • not so easy for users that are not very familiar with GitHub

How about we create a website dedicated to fighting referrer spam?

  • the list would be exposed and downloadable on the website
  • users could report new spammers very easily (a simple form)
  • users could vote on domain already reported to confirm it as a spammer
  • each domain would have its own page listing everybody who reported it/confirmed it as spammer (good traceability, hopefully would also pop up on Google when people search for it)
  • domain with enough votes could be merged back into the list (still hosted on GitHub): it could be done manually at first, and automatically later

Users could log in using GitHub (at first) so that we are sure one person can vote once, and to avoid vote manipulation.

This would also be a good way to promote Piwik and it spammer blacklist initiative.

ping @mattab and ping @quba which whom we discussed the idea.

@calebpaine
Copy link

How would this website operate? How would you prevent the spammers from gaming the system to downvote their domains? I like the concept/idea, but it seems very ambitious for right now.

@mnapoli
Copy link
Contributor Author

mnapoli commented Aug 10, 2015

In a first version users could log in using their GitHub account. That way there will be no more problems than what we can have today.

What do you find ambitious?

@desbma
Copy link
Contributor

desbma commented Aug 11, 2015

I think this is a good initiative, however I see two main challenges:

  • For it to be efficient, you need to maximize the number of voters. Referer spam is not a problem specific to Piwik. Are you willing to promote the site to a a larger audience (not only Piwik users), provide tools like Google Analytics filters, etc.?
    In short will you make the site "let's fight referer spam", and not just "let's improve Piwik's blocklist"?
  • If the site becomes popular enough, as @calebpaine said, there is a risk spammers will use it to downvotes their spammed domains, or even worse to flag as spam domains of competitors, etc. How will you prevent that?

Random possible ideas to make the system more reliable, and "confirm" a domain as spam:

  • If domain A gets spammed with domain B as referer, automatically check if there is a link from B to A. If not, we know the request has been forged and does not come from a legitimate HTTP client. Not easy to do however with highly dynamic sites, pages specific for logged in users, etc.
  • Set up a honeypot: a domain with no content, not indexed on search engines. I bet their spam bots just scan IP ranges and send requests when TCP port 80 is open. All domains sent as referer to this honeypot can be confirmed as spam.

@mnapoli
Copy link
Contributor Author

mnapoli commented Aug 12, 2015

For it to be efficient, you need to maximize the number of voters.

One solution we discussed was to create a feature in Piwik to let users report spammers (quick solution: link to the website, better solution: report a referrer in one click).

Referer spam is not a problem specific to Piwik. Are you willing to promote the site to a a larger audience (not only Piwik users), provide tools like Google Analytics filters, etc.?
In short will you make the site "let's fight referer spam", and not just "let's improve Piwik's blocklist"?

Promoting the website would happen for sure. For tools, I'm sure this is out of scope for a first version. On the long term I don't know.

If the site becomes popular enough, as @calebpaine said, there is a risk spammers will use it to downvotes their spammed domains, or even worse to flag as spam domains of competitors, etc. How will you prevent that?

This has been answered already.

@desbma
Copy link
Contributor

desbma commented Aug 12, 2015

Rephrasing my thoughts: how will you prevent a spammer from creating 2 GitHub accounts, and downvote a legitimate domain (or upvote a spammy domain)?

@mnapoli
Copy link
Contributor Author

mnapoli commented Aug 12, 2015

The same issue exists today, yet it isn't a problem. If the quality of votes is an issue, we'll find a solution. There's no point in freezing any progress just because challenges might appear in the future.

@desbma
Copy link
Contributor

desbma commented Aug 12, 2015

The same issue exists today, yet it isn't a problem.

The only difference is the number of users. I doubt the spammers know about this list yet, but if it becomes very popular they probably will.

If the quality of votes is an issue, we'll find a solution. There's no point in freezing any progress just because challenges might appear in the future.

Nobody said you should freeze anything, but there is no harm in thinking before building.

A way to make abusing the system more difficult is to require a number of votes proportional to the total number of voters, for example if you have 100 users, require 5 votes, 1000 users, 50 votes, etc.

@brynnd
Copy link

brynnd commented Aug 18, 2015

I can only comment as a user who is committed to reporting the spammers. I do feel a little intimidated with github. For example, I saw a new issue with "awaiting confirmation" label. I can't figure out how to add that to the issue I just posted with a new spammer.

So I would welcome a more simple to use website. However, I agree with an earlier comment -- how do you prevent spammers from actually joining?

I don't think you could prevent spammers from joining. So the site would have to contain the ability for other users to report users who appear to be always voting against approving spammers.

Just a couple of thoughts from a not-so-tech-savvy user :-)

@paulhudson
Copy link

Hey all,

I think I'd like a similar solution to the DNSBL lists out there or Drupals https://www.mollom.com.

  • No upvotes, just down votes for banning
  • manual removal process for the banned referer as on Spamhaus.
  • Perhaps a simple API for down voting?
  • spam traps could be a great idea and work well for signup and email spam

As for a rule base:

  • weighted threshold of votes for a block relative to overall volume of reports perhaps?
  • first block could be for X time period and compare reports during block and after block expires
  • manual removal process only auto excepted say 3 times
  • block votes/reports to come from different class C IP's etc and a sensible time range to qualify

I'm not sure having people signup to github or anywhere else really helps or at least is worth the barrier for people contributing.

I'm happy to contribute dev time into this.

@gdementen
Copy link

+1 for a solution to let users report spammers directly within Piwik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants