Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppress creating accounts by bots/scripts #1083

Open
malenki opened this issue Nov 9, 2015 · 23 comments
Open

suppress creating accounts by bots/scripts #1083

malenki opened this issue Nov 9, 2015 · 23 comments
Labels
moderation Related to moderator features, like reports, issues or user blocks

Comments

@malenki
Copy link

malenki commented Nov 9, 2015

Having had a look at user accounts created from November 05 to 09 (that is, during the last 108 hours) there are 11318 accounts. A simple

grep _name *| cut -d '"' -f 4| grep -E -i "[a-z][0-9]{1,3}" -c 

gives 6576 occurrences of names like this:

William925i6f
William925i6fs0
William936f5an8
William937j8ka4
William937l8mc4
William937l8md4
William937m8n9
William939v1zr0
William947m8n
William947r3yu4
William948v2wd6
William959x5if7
William962h2ro1
William975o4he9
William9b2h1dn8
William9f7j7ja4
William9i4d2kx5
William9l9f5px2
William9o3k6uf1
William9q0q0qf6
William9q9l8hu1
William9q9o9nc4
William9r0o8jx2
William9r1t1tk6
William9s1t9pg6
William9s1u0tj7
William9s2s0cs2

There are also some false positives, but a more strict

grep _name *| cut -d '"' -f 4| grep -E -i "[a-z][0-9]{1,3}[a-z][0-9]{1,3}" -c

still has 4367 hits. Looking for names with spam in them:

grep _name *| egrep -i "premium|cash|bank|credit|mobil|phone|handy|pharma|viagra|free|generic" -c

gives a quite meagre 30 results.

With the above I only want to show that at least during the last four days around 50% of the newly registered accounts seem to be created by bots/scripts and assumedly won't be used for the benefit of OSM. Instead of having to remove these users by hand by admins it would spare work if they had a harder time to register accounts using scripts.

I know that it is easy to say "I'd like to have" but hard to solve the issue. Though I hope you can find a solution.

See also: #841
My dairy entry resulting in this issue.
The data sample I used you can find here.

@katpatuka
Copy link

Well spotted!

At least a captcha would be good on signup page if scripts could be used to create accounts.

@gravitystorm
Copy link
Collaborator

@katpatuka Yes, a captcha would be nice, but as far as I know there aren't any non-proprietary, effective captchas available.

I investigated this a few months ago (with respect to the wiki, not this site) and I found https://www.mediawiki.org/wiki/Extension:ConfirmEdit provides a good, up-to-date overview of the options. I don't think we'd go for the only one on that list marked as "high" effectiveness (English-only, potential for adverts), and ReCaptcha always has the problem that by using it, we're helping Google create a proprietary map dataset!

If anyone knows of an effective, non-proprietary captcha then that would be very useful.

@tomhughes
Copy link
Member

Are these accounts actually doing anybody any harm?

We're not a startup that is using user numbers as a measure of success, so if they're just sitting there then who cares?

At the end of the day anything we do will just be a constant arms race where we make life harder and harder for the real users.

If there was a good solution then that would be wonderful, but there isn't, or everybody would be using it!

@Zverik
Copy link
Contributor

Zverik commented Nov 9, 2015

I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.

@gplv2
Copy link

gplv2 commented Nov 19, 2015

It's meant to raise the bar. Prevention is always easier than getting things cured. If someone doesn't know how to register (a 1-time action) for OSM as it stands now, that person is probably not suited to map around.

@planemad
Copy link

Noticed this while joining publiclab.org and was pretty fun to fill out. Of course, you need to know English for this.

screenshot 2015-11-26 12 42 52

@mikelmaron
Copy link
Contributor

@planemad I'd be curious to hear from PublicLab if that helped cut down their spam problems

@getschomp
Copy link

We could use a honeypot or hidden form field that only bots see. https://github.com/markets/invisible_captcha https://github.com/curtis/honeypot-captcha
That seems like the simplest and most unobtrusive solution.
Could I maybe work on this?

@planemad
Copy link

@getschomp the honeypot definitely sounds like the smartest idea so far.

@d1g
Copy link

d1g commented Jan 30, 2017

Are these accounts actually doing anybody any harm?

@tomhughes they complicate otherwise easy analysis, so yes; useless accounts are useless for anything

http://www.openstreetmap.org/user/SimonPoole/diary/40246

@simonpoole
Copy link
Contributor

@d1g there is no indication that these accounts are being added automatically, quite the contrary.

@srravya
Copy link

srravya commented Mar 14, 2017

A user diary completely spammed with comments. Except the first comment (which was legit), rest of them seem to be by multiple spam accounts.

@d1g
Copy link

d1g commented Mar 16, 2017

@simonpoole, I saw how several sites asked users to confirm accounts in the next 3 months or a year. Then they withdrew unused accounts.

We can deactivate unused accounts (without any edits or comments since 2004) once.

Clear public announcement apriori is a must, of course.

@HolgerJeromin
Copy link
Contributor

We can deactivate unused accounts (without any edits or comments since 2004) once.

What would be the benefit?

@simonpoole
Copy link
Contributor

@d1g I would be (very very) strongly against doing that. Every day we have users that "reactivate" their pre-licence change account by accepting the CTs, 400 since the beginning of this year alone, 1000's over the past years.

All these accounts were last used -before- May 2010. There is no reason to believe that this pattern is different for more recent accounts either, so by force removing them we would simply be shooting ourselves in the foot.

@d1g
Copy link

d1g commented Mar 21, 2017

@HolgerJeromin, to filter out scripted accounts (they semi-defeat benefits of the registration). It may work for very active communities. Creditability of users matters in OSM, probably even more than in Wikimedia projects.

I can buy hundreds of accounts virtually anywhere on the black market. Doesn't mean they would be used for anything good.

But - as Simon pointed - it is painful in OSM: logins can span years.

@CloCkWeRX
Copy link

One area this manifests in is via the diary feature (and corresponding RSS feeds), and we now have the default of review-my-changesets set for newer accounts.

Could we optionally use a (potentially proprietary) captcha on a new diary post; when:

  • There are no reviewed changesets associated with the account
  • And or an editor hasn't got the cookie/whatever mechanism ID uses to prompt new editors with the tutorial present

Options such as https://github.com/desirepath41/visualCaptcha do exist now; even if they aren't maintained at the moment.

With very specific criteria, we could avoid accidentally adding barriers for new editors; and only mildly inconvenience people who jump into editing via JOSM or other editors.

@CloCkWeRX
Copy link

Another potential option: rate limit diary posts per account/IP address after the first .. 3-4? 10? with an exponential backoff. Large institutions, VPN users and similar may be slightly affected; but this could be done by excluding posts from the RSS feed (modelling a diary "publish at" timestamp, and only selecting posts to publish between Time.now and 3.days.ago or similar)

@tomhughes
Copy link
Member

That would do absolutely nothing to stop the current spam attacks.

@natrius
Copy link

natrius commented Mar 16, 2021

Currently reports in the german forum for more spam-related activity (https://forum.openstreetmap.org/viewtopic.php?pid=822680#p822680). Captcha could lead to problems for visual impaired people. Other than that, the honey-pot is a nice first solution.

https://switching.software/replace/google-recaptcha/ and a longer text about why reCaptcha is probably not needed https://nearcyan.com/you-probably-dont-need-recaptcha/ there are also several suggestions for alternatives listed. A simple question+answer field looks like it could work, but its an additional thing everybody has to answer. But this also happens when introducing a Captcha.

Restricting PM for a specific time may help, but is not good for people who register just to create a note and/or want to ask someone legitimate questions.

@tomhughes
Copy link
Member

Yes there's some activity but it's a tiny number of accounts and quite likely manual so a captcha won't help at all.

@tuckerrc tuckerrc added the moderation Related to moderator features, like reports, issues or user blocks label Mar 16, 2021
@Dimitar5555
Copy link
Contributor

I am strongly against captcha, because it raises the bar dramatically. We should aim to simplify registration process, not make it harder. These spammy accounts do no harm, I suppose, until they vandalize the map.

That quote didn't age very well. 😅

Approximately 5k bot accounts were blocked between 19 and 20 August by SomeoneElse alone. Who knows how many more are sitting and waiting for the "wake up packet". The worst part is that the DWG has to dedicate time for blocking them and people have to dedicate even more time on cleaning up. This work is definitely not pointless but it would be better if people don't have to do it. The other problem is that the object versions get inflated very quickly which could become a serious problem in the future.

Cloudflare offers a (seemingly good) free service called Cloudflare Turnstile. It works diffrently compared to ReCaptcha and it doesn't require the user to do anything (except to check a box). It can also be made invisible if you don't want the box to be seen.

@tomhughes
Copy link
Member

Well I know because I've been actively working with DWG to deal with those accounts so thank for the constructive commentary but I will now get back to doing useful work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moderation Related to moderator features, like reports, issues or user blocks
Projects
None yet
Development

No branches or pull requests