Spam reporting / flagging #19283

fnetX · 2022-03-31T16:23:13Z

Feature Description

One of the biggest issues of larger Gitea installs like Codeberg is that there is no easy way to report misbehaviour (like spam and abusive comments) directly in the UI.
For Codeberg, we use some workarounds, people contact us via third-party channels (email, mastodon, matrix etc), and we are working on a moderation system that simplifies some workflows for us: https://codeberg.org/Codeberg/moderation

With upcoming federation, this problem likely expands to many more instances. If a user doesn't want to allow registration, but still receive activity (e.g. issuse, pulls) from another instance, they'll likely face the issue of spam sooner or later. I'd even go so far as to say that we can't enable federation until this issue is solved.

Back to our moderation toolbox: We don't think that Gitea should provide full-fledged moderation features as we require (including user warnings, maybe quota enforcement, quarantining, public log etc), and to take that responsibility from the Gitea codebase, we are developing this as a service that hooks into the Gitea database.
On the other hand, there are some rudimentary tasks that should probably be covered by Gitea. That is, of course the basic api calls such a tool needs (e.g. #15588). And now I'd like to discuss what else needs to be implemented in Gitea.

My proposal is the following:

Gitea should allow to collect reports for content of all types, at least issues / comments, repos, users and orgs; maybe even report per file in a repo.
There should be an admin dashboard where they are provided as a simple list to review and dismiss (action buttons like "remove", "ban" etc can, but doesn't have to be included).
This should also be available via API so external tools can hook in.

While I'd say that this shouldn't be too hard to implement (e.g. create a new table "reports" with issue_id, comment_id, user_id etc and an optional comment and dismissed=0|1 state) for Gitea in the current form (and in fact, we'd likely be available to provide such a solution), I don't know if this system shouldn't better be built with federation in mind, allowing to automatically report content to the origin instance and propagating decisions etc. The way Mastodon (and other fediverse apps) already do it already works fine, but this sounds too complex for us to "quickly implement".

What do you think? What is the way to go here? Should this be built with respect to federation - and thus probably wait until the codebase is better prepared in this regard, because currently I can't find enough references on how this could look like?

Thank you very much for the consideration.

Screenshots

No response

techknowlogick · 2022-04-13T20:11:44Z

consolidating with #12

fnetX · 2022-04-14T08:52:00Z

I don't think closing is a good idea, as I see huge difference between both issues: This issue is about a discussion how to collaborate on implementing instance-wide or cross-instance spam fighting of all kind and how to implement this.

I read the other issue before opening this one, and it's more like a concept for self-moderating your own content on another instance, e.g. by blocking users from your projects. Looking at other platforms, you also can't solve the features to allow blocking users with your own preference and a moderation backend for the platform itself in one issue, IMO.

Mikaela · 2022-04-22T04:48:28Z

I think this also needs user/organization-wide blocking option (#17453) while waiting for an admin to review the report

richmahn · 2023-10-10T18:25:45Z

I notice a lot of spam users come to put their website in their profile and in their description usually write about being an escort service, consultant, web developer, or a few other "professions". Here is how I find my spam users:

Last logged in the same day they created their account (they have to to set website & description), have set those two fields, and don't have any repos (usually, some do create a repo just to also add a website URL to it).

You'd think we could first discourage this kind of behavior by not allowing website to be set within 24 hours of creating account.

Then we could have some key words to search description and website like "escort" "consulting" etc. and make that customizable in the config file.

lunny · 2023-10-11T07:42:37Z

Have you enabled login captcha?

techknowlogick · 2023-10-11T17:08:07Z

@lunny Sadly captcha has limitations when real humans sign up to post spam, although some AI can bypass captcha.

@richmahn What I've been doing for Gitea.com, and have shared my knowledge with many other instances (blender, CB, etc.) is to set up a DB row event trigger to call an HTTP endpoint when certain user information is updated. I did it this way as user information webhooks still need to be implemented in the application yet.

I also set up a webhook for issue creation/modification and run them through Spamassasin, so should anything trigger, then the account is temporarily restricted and an alert is sent to the admin.

I also created the concept of placeholder users/orgs, where instead of deleting a spam user and freeing the username/email, by setting them as a placeholder/reserved then they are unable to re-use the information to sign up again once deteceted.

karolyi · 2023-12-13T11:29:21Z

@techknowlogick how are you able to feed plain text (non-email format) into spamassassin? your idea sounds feasible to me but the best I could gather is that you need to embed the text into an emulated email format so spamassassin can evaluate it. it heavily relies on the email headers so it probably won't give a reliable score.

I've been getting random registrations (even with using captcha and the required email confirmation) on my gitea instance. Today I've got 4 new registrations, linking some sites where you can download cracked programs. Probably SEO spam, happened many times before. Looking at the source of the registrations, they were IP addresses from Pakistan and India, going through the motions so it wasn't automated. Needless to say, I purged them hours later when I woke up, manually.

The "profile changed" callback would be a really useful mechanism as there might come a time where users will register and then change their profiles to spam ones later on.

I think this will become an increasingly bigger issue over time.

mscherer · 2024-02-06T09:28:25Z

Since the spam is targetting SEO (most of the time), maybe it is worth to plug any new URI to URIBL ? so spammer URI would be detected and blocked, rather that doing it at registration time ?

I am not sure if URIBL still work and if this is still worthwhile.

karolyi · 2024-05-06T11:26:49Z

Just an idea here, having deleted countless spamming users:

Gitea should put users that enter any URL-like thing into their description and/or website, on a must-be-approved-manually status. It's why they register.

MichaelHinrichs · 2024-07-24T16:49:58Z

Since a report button still hasn't been added, here are some instances of spam, to show just how bad this problem is.
Look at how absurdly long this list is. Some entire servers are filled with nothing but spam. Hopefully this will motivate someone to give this issue priority.

https://git.deuxfleurs.fr/Tasconnectlogistics
https://git.deuxfleurs.fr/willidea
https://git.deuxfleurs.fr/james7088
https://git.deuxfleurs.fr/pawlaneau
https://git.deuxfleurs.fr/RavanSeo1
https://git.deuxfleurs.fr/mailsdaddy

https://git.ourworld.tf/accidentinjurylawyers6718
https://git.ourworld.tf/frydge4446
https://git.ourworld.tf/bunkbedsstore8841
https://git.ourworld.tf/g28carkeys3626
https://git.ourworld.tf/mymobilityscooters7339

https://code.antopie.org/FuriaS
https://code.antopie.org/nikhilofficialtour
https://code.antopie.org/linzalamba215
https://code.antopie.org/tuffgear
https://code.antopie.org/kumkum
https://code.antopie.org/kerry765
https://code.antopie.org/clintonjavery
https://code.antopie.org/wellnesscounselingseo
https://code.antopie.org/nevastechbc
https://code.antopie.org/Nirmala

https://nusaeiwyj.com/gitea/explore/users
https://gitjh.fun/explore/users
https://gitr.pro/explore/users

karolyi · 2024-07-24T18:53:21Z

@MichaelHinrichs, it is this why I've moved away from Gitea to Forgejo, where at least you're able to get emails about new registrations. in order to timely delete them.

Since it annoyed the hell outta me, I started to investigate and I think I now have a working solution that stops the spammers. It is a mixture of IP range banning and useragent checks combined with fail2ban IP level banning (with increasing ban times) and abuseipdb reporting. It is working now, but I'm sure they'll change their tactics over time, when I'll also adjust mine accordingly.

A big part of the spammers are mostly indians/pakistanis, probably they get paid for registrations (they have to solve a captcha and then confirm their email), whereas others seem to be using bots for discovery and then manual registrations again (because of the captcha again). The bots, if not indians/pakistanis, are using VPN services (not TOR), according to the abuseipdb reports.

From what I've seen, I'm starting to suspect that there's a bigger operation at play for these spammers. Then again clearly, they're not the sharpest tool in the shed.

Fingers crossed, since I managed to set up my hand-crafted system for it, I haven't gotten any new spammer registrations. This lasted now for about a week.

This should be done on behalf of everybody who maintains a gitea/forgejo whatever with open registrations. Too bad some people just resell the service and don't care about its cleanliness.

lunny · 2024-07-28T05:29:17Z

@MichaelHinrichs, it is this why I've moved away from Gitea to Forgejo, where at least you're able to get emails about new registrations. in order to timely delete them.

Since it annoyed the hell outta me, I started to investigate and I think I now have a working solution that stops the spammers. It is a mixture of IP range banning and useragent checks combined with fail2ban IP level banning (with increasing ban times) and abuseipdb reporting. It is working now, but I'm sure they'll change their tactics over time, when I'll also adjust mine accordingly.

A big part of the spammers are mostly indians/pakistanis, probably they get paid for registrations (they have to solve a captcha and then confirm their email), whereas others seem to be using bots for discovery and then manual registrations again (because of the captcha again). The bots, if not indians/pakistanis, are using VPN services (not TOR), according to the abuseipdb reports.

From what I've seen, I'm starting to suspect that there's a bigger operation at play for these spammers. Then again clearly, they're not the sharpest tool in the shed.

Fingers crossed, since I managed to set up my hand-crafted system for it, I haven't gotten any new spammer registrations. This lasted now for about a week.

This should be done on behalf of everybody who maintains a gitea/forgejo whatever with open registrations. Too bad some people just resell the service and don't care about its cleanliness.

I don't receiving emails for every new registration will truly help to get to know which one is a spam user for bigger instances. Maybe for a small instance, the strategy does work.

We need a comprehensive proposal to against spammers. An abuse/ban system is a basic requirement but it has no help for smaller instances.

karolyi · 2024-07-28T09:52:51Z

While you are ruminating on options here, thousands of gitea instances get spammed into oblivion.

I couldn't wait it out and took action on my own. Whatever you see here described by me is the outcome of it. For me, spammers have subsided and I only get real registrations for the time being.

But to be constructive to the actual topic at hand, I've found this while looking for solutions: https://stopforumspam.com/

It's not my service, but I'll look into integrating it for my services. Also, an abuseipdb integration could be useful as well. With the current development pace of gitea, I don't think it's gonna happen timely, to be honest.

techknowlogick · 2024-08-17T03:05:48Z

@karolyi indeed. as gitea.com is a public instance, with open registration, we get a ton of spam there too, and it's highpriority for me too. I've been sharing as much knowledge with others as I can. In my (infinitely expanding) backlog of tasks, I have "write a blog post about various ways to protect against spam" as lots are hardcoded to my environment and need to be cleaned up. Things you could use right now are DN42's https://git.dn42.dev/dn42/dn42userd which they've used to what sounds like great success (note: the repo lacks a license, so use at your own risk). I've also just pushed #31852 which will help out with things too.

fnetX added type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/proposal The new feature has not been accepted yet but needs to be discussed first. labels Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spam reporting / flagging #19283

Spam reporting / flagging #19283

fnetX commented Mar 31, 2022

techknowlogick commented Apr 13, 2022

fnetX commented Apr 14, 2022

Mikaela commented Apr 22, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

richmahn commented Oct 10, 2023

lunny commented Oct 11, 2023

techknowlogick commented Oct 11, 2023

karolyi commented Dec 13, 2023 •

edited

Loading

mscherer commented Feb 6, 2024

karolyi commented May 6, 2024

MichaelHinrichs commented Jul 24, 2024

karolyi commented Jul 24, 2024

lunny commented Jul 28, 2024

karolyi commented Jul 28, 2024

techknowlogick commented Aug 17, 2024

Spam reporting / flagging #19283

Spam reporting / flagging #19283

Comments

fnetX commented Mar 31, 2022

Feature Description

Screenshots

techknowlogick commented Apr 13, 2022

fnetX commented Apr 14, 2022

Mikaela commented Apr 22, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

richmahn commented Oct 10, 2023

lunny commented Oct 11, 2023

techknowlogick commented Oct 11, 2023

karolyi commented Dec 13, 2023 • edited Loading

mscherer commented Feb 6, 2024

karolyi commented May 6, 2024

MichaelHinrichs commented Jul 24, 2024

karolyi commented Jul 24, 2024

lunny commented Jul 28, 2024

karolyi commented Jul 28, 2024

techknowlogick commented Aug 17, 2024

karolyi commented Dec 13, 2023 •

edited

Loading