Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spam protection measures #1420

Open
Traumflug opened this issue Dec 4, 2015 · 25 comments
Open

Spam protection measures #1420

Traumflug opened this issue Dec 4, 2015 · 25 comments

Comments

@Traumflug
Copy link

Let's face it: DokuWiki comes with close to zero spam protection measures. Spammers get in only hours after an installation and then new users flood in, one every 3 minutes.

So let me talk a bit about my several years of experience in this field:

  • Blocking by IP address is pretty pointless. These spammers have thousands of IPs available. Many of them being changing DSL addresses, so if you block them, you also block legitimate users.
  • Blocking by keywords helps a bit, but not much. Ever recognised that the spammer's favorite "Nike" is also in the legitimate word "Elektroniken"?
  • Making their spam useless, like adding a 'rel=nofollow', is pretty pointless as well. They spam as much as they can, without looking right or left or at what's actually happening.

What does work? Well, these spammers have weaknesses, too.

  1. One of them is that they generate plenty of accounts. All with different email addresses and from different IPs, but on average wikis there is no flood of 1000 new users a day, even if you're open to everybody.
  2. They don't read the pages shown. Just small deviations from the default registration process, well described for legitimate users, keep them away.
  3. Almost all their spam posts contain external links. Spreading such links is the entire point of their spamming, so they can't work around doing so. On MediaWiki it has worked very well to reject the first 5 edits with external links. 98% less spam. It's perfectly fine to explain the workaround (create a number of edits without external link) on the rejection page. Legitimate users read what's written there, so they get their stuff in without too much hassle anyways.
  4. My current spammer doesn't use the link in the confirmation email. I get email rejection notices, the account is created anyways. Apparently this link is predictable. Fail.
  5. My current spammer also uses no user agent. I'll get back to this in a minute.
  6. IP address. It looks valid (192.95.4.148), but doesn't respond to a ping and also no DNS entry. Not sure how they manage to send HTTP requests from such an address, but doing simple tests like a DNS lookup would kick them out.

Our wiki is here: RepRap DIY. For nice looking wikis, DokuWiki is really great!

@Traumflug
Copy link
Author

As I prefer to put code to my bold words, here's a first patch: https://gist.github.com/Traumflug/850fd50085380cb6a6c7

It checks for a valid user agent on registration, tackling 5. in the opening post.

That said, it's likely a good idea to do the same check earlier already, before the confirmation email is sent. IIRC, sending confirmation email is an optional, nondefault plugin, so I'm currently not sure on where to do this.

@selfthinker
Copy link
Collaborator

Have you tried anything listed on https://www.dokuwiki.org/antispam?
It's possible that the CAPTCHA plugin already does what your patch does. (I don't know the plugin very well, but I would assume it does that from the description on that page.)

@Traumflug
Copy link
Author

These recommendations on the antispam page are actually what led me to the conclusion that better default protection is urgently needed. An unaltered installation has doors for spammers wide open, so people face a steep learning curve in time pressure ... or switch to another wiki software quickly.

The Captcha plugin looks nice, captchas are indeed useful, but is a) not part of the default installation and b) has nothing to do with the patch provided above.

The patch does a simple plausibility test, something carefully written software should always do.

@selfthinker
Copy link
Collaborator

Instead of changing anything in the core, an option could also be to include the CAPTCHA plugin in the core. If you're downloading DokuWiki from the official download page, you can add that plugin (and a few others) to your download. It's the very first plugin on the list. Maybe it should be highlighted more how important it is?

How has the CAPTCHA nothing to do with your patch provided? The main goal of your patch is to try to find out if the registration is coming from a spam bot or not, right? Isn't that the same thing a CAPTCHA does? You use a different method than the plugin, but the plugin utilises many different methods. And its default method is also invisible and doing something automatic in the background.
If it will improve things, I would rather want to add your method to the CAPTCHA plugin and not to the core.

Maybe the main problem here is that we're not communicating various anti spam features enough? Even the simple fact that you can disable registrations is something new DokuWiki users only learn the hard way. For that reason we had moved the option for that into the installation screen. I'm not sure what is the best way to make all of these things clearer?

@Traumflug
Copy link
Author

The main goal of your patch is to try to find out if the registration is coming from a spam bot or not, right?

OK, they both have a similar goal. Still they implement that goal in an entirely different way. Putting all such stuff into the Captcha plugin would make this plugin not a captcha plugin, but a fix-missing-core-features plugin.

I'm not sure what is the best way to make all of these things clearer?

Make basic spam protection part of the core. Such measures are far more essential than nice-to-have features like LDAP authentication or abbreviations, which are in core now. Making the download a few megabytes bigger doesn't matter, having a spam attack only hours after a fresh installation does. Having an option to turn off spam protection for intranets is fine, having a requirement to add it or turn it on after installation is not.

"Works out of the box with default settings" is the most important property of any modern software. 9 of 10 people don't even look at settings, much less read instructions.

Plugins are certainly a nice strategy, but they also have a problem: there are a whole lot of them, so it's difficult to find out which ones are well supported, which ones work well, which ones solve the problem at hand. For example, I've choosen the preregister plugin over Captcha, because it looked "more official", better integrated, independent from Google. Even with this it took several hours of investigation before I could do this decision. As far as I can see there is no distinction between official ( = well supported) plugins and quick hacks ( = works for the developer, only).

I hope I don't offend you too much with all these statements. With other projects I had to learn the hard way that users try very hard to be a dumb as possible. That said, dealing nicely with the super-dumbo makes projects successful :-)

@mprins
Copy link
Contributor

mprins commented Dec 6, 2015

@Traumflug a large part of the DW installed base are never accessible from the www, making anti-spam measures irrelevant for that user base.

Your patch basically does browser-sniffing which if not a bad idea is a brittle approach at best. ( you would probably be better off using a redirect rule in your webserver to just kick those out at the front door)

@Traumflug
Copy link
Author

OK. I take that www users and open communities aren't welcome. No problem. As long as such simple measures are unique to my installation they work much better anyways, saving me a lot of work.

@michitux
Copy link
Collaborator

michitux commented Dec 6, 2015

@Traumflug open communities are definitely welcome. Dokuwiki.org is one of them. Anti-spam measures like blacklists for words and URLs commonly used by spammers as well as a mass-revert tool are part of DokuWiki core.

As far as I've understood what you are suggesting is a patch that disallows registrations with empty user agents. It might be that this measure is effective at the moment, but if we include this patch it is very likely that spammers will patch this very fast as it is basically no problem at all for them to include a valid user agent. I'm not sure if the amount of code and the additional translation strings are worth the effort. I agree though that it is a measure to distinguish between bots and humans, which is exactly the goal of the captcha plugin.

A note concerning the captcha plugin: This plugin has nothing to do with Google. You might confuse this with the reCaptcha-plugins which indeed use Google services. The captcha plugin is developed by Andreas Gohr, the main developer of DokuWiki. Furthermore it is offered on the official DokuWiki download page. I can't see how preregister might look more official than that.

However I think what preregister promises should be a core feature as I can imagine that it is effective, definitely increases the amount of work needed by spammers (for every account, it's not a simple modification of their scripts) and is also in other ways useful as for subscriptions you are probably legally required to have a verified email address anyway. I've just had a short look at the code of the preregister plugin, I wouldn't recommend using it in its current form as it has security issues. I have contacted the author and added a security warning to the plugin page. I can also confirm that the tokens are not very random, md5(time()) is very predictable.

@michitux
Copy link
Collaborator

michitux commented Dec 6, 2015

@Traumflug You also mentioned the difficulty of finding good/recommended plugins. We are aware of that issue, but unfortunately we have no solution for that. One of our attempts has been to recommend some plugins for certain use cases, we named that solutions. Unfortunately they are far from being complete or even maintained, partially also due to the lack of people who care about them. Do you have any suggestions how we could improve that situation? We should take the discussion about this to the mailinglist if you have any suggestions in this regard.

@Klap-in
Copy link
Collaborator

Klap-in commented Dec 6, 2015

A more recent initiative is at the page https://www.dokuwiki.org/solutions:nice

There is not yet a category for spam/hardening registration /etc.
Interesting addition.

@Traumflug
Copy link
Author

As far as I've understood what you are suggesting is a patch that disallows registrations with empty user agents.

The suggestion is to make spam protection core part of DokuWiki. Start safe, allow to relax in closed environments. Currently that's not the case. Building a fortress which is still inviting to legitimate users isn't easy and certainly can't be done with a simple plugin. Look at how much email procedures and protocols have changed over the last 20 years, this might give an idea of how much might be neccessary to get Wikis into a similarly safe state.

Looking at the comments above I can't see any will to proceed in this direction, so what would be the point for me to write more such code? The given patch is only one of about 10 I already dicovered a need for. And likely there are even many more holes to plug.

I have contacted the [preregister plugin] author and added a security warning to the plugin page.

IMHO, writing bug reports would be better than adding generic warnings. Also, contacting the author tries to make him responsible. Not likely to work, people don't like to be responsible, to be enforced to work. It's the whole DokuWiki community which wants plugins to be safe, so it's also fine to report issues to the whole community ( = here) and to let everybody work on it.

Regarding this md5(time()), there's already a per-installation generated 'salt' for user passwords, to me it looks like a natural choice to use this 'salt' for the recaptcha token, too.

@selfthinker
Copy link
Collaborator

In order to try to understand what your requirements are: What exactly is not covered by the CAPTCHA plugin? The only thing I understood is that it's not part of the core. But apart from that?

Building a fortress which is still inviting to legitimate users isn't easy and certainly can't be done with a simple plugin.

Why do you think a plugin cannot do what the core can do? If that would be a case that would be bad software design. I would also think that it will have to be a plugin as you would need to release the whole core whenever a spammer adjusts their script to overcome whatever we came up with. Doing that in a plugin makes much more sense as you can update that any time.

Also, contacting the author tries to make him responsible.

Why would you say that he is not responsible? He is responsible, it's his code! Letting the community care about and fix those things sounds like a nice idea, except that it rarely works like that in the ideal world. See https://en.wikipedia.org/wiki/Diffusion_of_responsibility
"The whole community" cannot fix anything, it will always be an individual who fixes something. That way nothing would ever get done. And "here" is also not equivalent with "the whole community". I bet many more people will see the warning than read issues here on GitHub.

@Traumflug
Copy link
Author

What exactly is not covered by the CAPTCHA plugin?

For example, there's nothing which measures natural human behaviour, like editing speed, number of edits, perfection of edits, duplicate edits.

Another example: Wikipedia, certainly very experienced with spammers, introduced the review mechanism. Edits enter the default visible page only if it comes from a user with at least 150 edits, after being reviewed by a sighter ( >= 250 edits) or, after a few days, automatically. For this you need something to count an account's edits, you have to run maintenance scripts, you need additional HTML and CSS, additional mechanisms to view a page depending on the current user, a mechanism to elevate user privileges automatically, and so on.

Maybe one can put all this into a plugin, but then this plugin would also turn core upside down.

He is responsible, it's his code!

Glad you share this attitude so openly. Seeing this I'll make sure to not contribute any code, because people trying to make me responsible for yet another (part of an) open source project is the last thing I want. I'm working on DokuWiki now because I have a problem to solve. I happily take extra care and extra efforts for being a good, helpful open source citizen (like writing an issue report like this, like uploading a patch, like offering to review the patch to better align with the project's goals). But not long after my installation works the way I need it I'll move on to the next project, solving the next problem.

My own golden rule is: never try to enforce anything! People always contribute for their own advantage only.

@selfthinker
Copy link
Collaborator

For example, there's nothing which measures natural human behaviour, like editing speed, number of edits, perfection of edits, duplicate edits.

That sounds like a good idea... for a plugin. I'm pretty sure in MediaWiki it's also done by a plugin (or even several). ;-)

Glad you share this attitude so openly. Seeing this I'll make sure to not contribute any code, because people trying to make me responsible for yet another (part of an) open source project is the last thing I want.

Just to make it clear: I was talking about plugins, not the core. Contributions to the core would obviously need to be fixed by anyone in the community. For that reason we wouldn't add anything we don't feel comfortable about.

My own golden rule is: never try to enforce anything! People always contribute for their own advantage only.

Who is trying to enforce anything on anybody? My understanding is that you want to enforce "the community" to care of all 1165 plugins and 131 templates? It's not that we wouldn't want to, it's that we simply cannot do that. It is simply unrealistic and impossible!
Which Open Source community do you know which does that?
Apart from that, we're not keeping anyone from doing that either, people "are allowed" to fix other people's issues, of course (and they do, sometimes).

I find your comments disrespectful. DokuWiki development is already declining for the last 2 years or so because we don't have enough developers. Nobody gets paid for working on DokuWiki. You're asking us to spend even more of our free time to fix thousands of issue from hundreds of inexperienced people's code? (Not saying the preregister plugin is from an inexperienced dev, haven't looked at it, just saying the majority of plugins are.)

@Traumflug
Copy link
Author

Who is trying to enforce anything on anybody?

You. Trying to make a person responsible is trying to enforce that person to put in additional work.

DokuWiki development is already declining for the last 2 years

This is currently the same for pretty much every open source project. The ones which don't suffer are driven by commercial interests (and paid developers). Obviously the attitudes which made open source big for some 20 years no longer work as well as they used to.

You're asking us to spend even more of our free time to fix thousands of issue from hundreds of inexperienced people's code?

It's disappointing you receive it this way. Nowhere above I asked you to do any work besides copy&pasting the commit into the repo. I asked for the freedom to contribute to the project. The answer is that several people try hard to find reasons to turn the (admittedly small) contribution away. If so many people prefer to keep insufficiencies there's no point in fighting this.

@selfthinker
Copy link
Collaborator

Who is trying to enforce anything on anybody?

You. Trying to make a person responsible is trying to enforce that person to put in additional work.

I wonder if we're having a misunderstanding or simply different world views...
I'm not asking anybody to be responsible and I'm surely not enforcing it (because I can't!). Everybody is responsible for their own actions. How can someone do something (i.e. anything, outside the coding world or inside) and not be responsible for it? That's a simple fact of life.
Of course, you can share responsibility, but only with people who have consented to that, you cannot force it on a community.

The good thing in the Open Source world is that whenever someone stops caring for something they did (guessing 95% of all cases), someone else has the possibility to pick it up (guessing 5% of all cases).

The most important question is: Who is doing that "additional work"? The answer is simple: Whoever cares!
And in most cases that means: No-one!

A side question is: Who is responsible for whatever requires that "additional work"?
My answer is: The author is responsible. But I know that that will not necessarily mean anything, depending on if the author cares or not.
Your answer is: The community is responsible. But you should also know that that will not necessarily mean anything, depending on if an individual or group in the community cares or not.

Ultimately it leads to the same result.

The answer is that several people try hard to find reasons to turn the (admittedly small) contribution away. If so many people prefer to keep insufficiencies there's no point in fighting this.

(Ignoring the trolling part of it.) We feel responsible for the core and would like to do what's best for it. If we disagree on what's best, that's probably due to a difference in experience or culture or philosophy. All we can do is trying to make each other understand each other's viewpoint. But that seems to be failing right now...

@Traumflug
Copy link
Author

I'm not asking anybody to be responsible [...]

My answer is: The author is responsible. [...]

In my world, such a combination is called a contradiction.

@turnermm
Copy link
Contributor

turnermm commented Dec 7, 2015

@michitux was in touch with me about the security hole in preregister and I've patched it.

@selfthinker
Copy link
Collaborator

I'm not asking anybody to be responsible [...]

My answer is: The author is responsible. [...]

In my world, such a combination is called a contradiction.

Not in my world. I'm not making anyone responsible, they are responsible. Is in your world no-one responsible for anything?
Maybe "responsibility" means something else in both our worlds...

I have no idea how to say this in any other words so that you can understand me. I officially give up now.

@splitbrain
Copy link
Collaborator

I am on vacation, so just a few notes:

  • I'd be happy to have DokuWiki more spam resilient by default
  • if you have code that would achieve that, please share by opening individual pull requests
  • some code might be better in a plugin some is better suited for core. that should be decided case by case. refactoring towards both ways is simple. pull requests are needed for that
  • keep the discussion civil and on topic please

@furun
Copy link
Contributor

furun commented Dec 8, 2015

If i can share a little experiences...
Maybe a bit of text, but it is a important topic.

My wiki was spamed quick after i set it up.
So i start developing a anti-spam plugin.
Something what should be a little tool, become a MONSTER.
To defend a large amount of possible attacks, and block them intelligent and full automatically, is a big! Task.

I understand the argument to have some features and functions in the main code, and not in a plugin, (i had this argument too for spam defence at the beginning, i like core functions in core code, very much). But spame defence is so complex, it makes sense to use a plugin (event based system) for it. (I had to define own events in DW core code, to let my plugin work.)
The plugin could be part of the main code, but i changed my opinion about spam blocking in core-code, wen i crated my plugin. It is to complex. (Even speed can become a issue, i needed to optimize some functions because they have slow down the code too much.)
(Some smart blocking techniques need information buffering/caching/logging, and therefore mysql.)

A simple check for empty header infos is not enough at all. Not empty header infos can include malicious/infectious attacks.
Alone for a database of informations like spamlinks and IP addresses it needs at least 1 person to do the maintenance of it, and it will take work and time!

! Spam blocking is a never-ending-story, and needs work. This work should be done independent from dokuwiki as a general wiki-spam defense project. And id needs a active community for maintenance. (I am not aware if there are some already.)
i think, DW can only support basic and simple spam defence functionality. Only database independent checks.

A plugin what could be usefull and could be implemented (no big databases)
http://bad-behavior.ioerror.us/
(recently, ms-bots has added a new ip-range to there search bots, and causes 1000s of block warnings from bad-behavior in my logs... All must be maintained... For ever... Spam defence development has no finish line. Unfortunately :-( )

Part of my database is this
http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1

And some blocks should be done in htaccess, before a attacker reach PHP. Dokuwiki could put some of the common in DW, as comment, or separate info.
Like:
RewriteCond %{HTTP_USER_AGENT} <? [OR,NC]
RewriteCond %{HTTP_USER_AGENT} ?> [OR,NC]
RewriteCond %{HTTP_USER_AGENT} libwww [OR,NC]
RewriteCond %{HTTP_USER_AGENT} Python-urllib [OR,NC]
RewriteCond %{HTTP_REFERER} <? [OR,NC]
RewriteCond %{HTTP_REFERER} ?> [OR,NC]
... And many many many more, and no guarantee against false positives...

Captcha is so important, it could be part of the bundle in dw. (The visual captcha has some weaknesses, and the sound captcha it easy breakable, because it has no distortion and noise.)
http://www.captcha.ru/en/kcaptcha/
Is a bit stronger, but non is ever perfect.
But even captcha can not do all. But mostly enough for small websites with few visitors.

(I get it done that no automated spamer make it on my page again, for many years. And sometimes i get a award for my work in form of a frustrated "fxx you" from a human-hacker wen his exchanged IP is blocked over and over again. I anyway have some few vandals. But today, i had no time for this much work anymore.)
(And... Why is my plugin not open source? It takes so much time, i don't have anymore. And spam defence can cause disfunctionality, ok for my alone and 1 page, but not for public. And, code and database is a monster and should be rewritten... In short, i can not take the responsibility for my code, no time. Therefore, not open source.)

And at least, it would be interesting to know, how severe the spam problem is by general dw users.

@turnermm
Copy link
Contributor

turnermm commented Dec 8, 2015

There is in fact a bad-behavior plugin: https://www.dokuwiki.org/plugin:badbehaviour

@furun
Copy link
Contributor

furun commented Dec 8, 2015

@turnermm Yes.
But, Last updated on: 2013-03-24
Sadly spam defence needs permanent updates.
The developer is Andreas Gohr, and i speculate, simply to overloaded with to much work. (And all for free, but developers need to eat :-)

new original version includes
bad-behavior-mediawiki.php
maybe the bad-behavior developers are willing to implement a
bad-behavior-dokuwiki.php
?

And new header spam checks should be send directly to
http://bad-behavior.ioerror.us/

@furun
Copy link
Contributor

furun commented Dec 8, 2015

A other database for bots and IP blocking
https://www.abuse.ch/

@selfthinker
Copy link
Collaborator

And then there is stopforumspam.com. Not sure how that compares to the others, but that's what we use for the forum (which provides the authentication for the wiki on dokuwiki.org).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants