-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spam reporting / flagging #19283
Comments
consolidating with #12 |
I don't think closing is a good idea, as I see huge difference between both issues: This issue is about a discussion how to collaborate on implementing instance-wide or cross-instance spam fighting of all kind and how to implement this. I read the other issue before opening this one, and it's more like a concept for self-moderating your own content on another instance, e.g. by blocking users from your projects. Looking at other platforms, you also can't solve the features to allow blocking users with your own preference and a moderation backend for the platform itself in one issue, IMO. |
I think this also needs user/organization-wide blocking option (#17453) while waiting for an admin to review the report |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I notice a lot of spam users come to put their website in their profile and in their description usually write about being an escort service, consultant, web developer, or a few other "professions". Here is how I find my spam users: Last logged in the same day they created their account (they have to to set website & description), have set those two fields, and don't have any repos (usually, some do create a repo just to also add a website URL to it). You'd think we could first discourage this kind of behavior by not allowing website to be set within 24 hours of creating account. Then we could have some key words to search description and website like "escort" "consulting" etc. and make that customizable in the config file. |
Have you enabled login captcha? |
@lunny Sadly captcha has limitations when real humans sign up to post spam, although some AI can bypass captcha. @richmahn What I've been doing for Gitea.com, and have shared my knowledge with many other instances (blender, CB, etc.) is to set up a DB row event trigger to call an HTTP endpoint when certain user information is updated. I did it this way as user information webhooks still need to be implemented in the application yet. I also set up a webhook for issue creation/modification and run them through Spamassasin, so should anything trigger, then the account is temporarily restricted and an alert is sent to the admin. I also created the concept of placeholder users/orgs, where instead of deleting a spam user and freeing the username/email, by setting them as a placeholder/reserved then they are unable to re-use the information to sign up again once deteceted. |
@techknowlogick how are you able to feed plain text (non-email format) into spamassassin? your idea sounds feasible to me but the best I could gather is that you need to embed the text into an emulated email format so spamassassin can evaluate it. it heavily relies on the email headers so it probably won't give a reliable score. I've been getting random registrations (even with using captcha and the required email confirmation) on my gitea instance. Today I've got 4 new registrations, linking some sites where you can download cracked programs. Probably SEO spam, happened many times before. Looking at the source of the registrations, they were IP addresses from Pakistan and India, going through the motions so it wasn't automated. Needless to say, I purged them hours later when I woke up, manually. The "profile changed" callback would be a really useful mechanism as there might come a time where users will register and then change their profiles to spam ones later on. I think this will become an increasingly bigger issue over time. |
Since the spam is targetting SEO (most of the time), maybe it is worth to plug any new URI to URIBL ? so spammer URI would be detected and blocked, rather that doing it at registration time ? I am not sure if URIBL still work and if this is still worthwhile. |
Just an idea here, having deleted countless spamming users: Gitea should put users that enter any URL-like thing into their description and/or website, on a must-be-approved-manually status. It's why they register. |
@MichaelHinrichs, it is this why I've moved away from Gitea to Forgejo, where at least you're able to get emails about new registrations. in order to timely delete them. Since it annoyed the hell outta me, I started to investigate and I think I now have a working solution that stops the spammers. It is a mixture of IP range banning and useragent checks combined with fail2ban IP level banning (with increasing ban times) and abuseipdb reporting. It is working now, but I'm sure they'll change their tactics over time, when I'll also adjust mine accordingly. A big part of the spammers are mostly indians/pakistanis, probably they get paid for registrations (they have to solve a captcha and then confirm their email), whereas others seem to be using bots for discovery and then manual registrations again (because of the captcha again). The bots, if not indians/pakistanis, are using VPN services (not TOR), according to the abuseipdb reports. From what I've seen, I'm starting to suspect that there's a bigger operation at play for these spammers. Then again clearly, they're not the sharpest tool in the shed. Fingers crossed, since I managed to set up my hand-crafted system for it, I haven't gotten any new spammer registrations. This lasted now for about a week. This should be done on behalf of everybody who maintains a gitea/forgejo whatever with open registrations. Too bad some people just resell the service and don't care about its cleanliness. |
I don't receiving emails for every new registration will truly help to get to know which one is a spam user for bigger instances. Maybe for a small instance, the strategy does work. We need a comprehensive proposal to against spammers. An abuse/ban system is a basic requirement but it has no help for smaller instances. |
While you are ruminating on options here, thousands of gitea instances get spammed into oblivion. I couldn't wait it out and took action on my own. Whatever you see here described by me is the outcome of it. For me, spammers have subsided and I only get real registrations for the time being. But to be constructive to the actual topic at hand, I've found this while looking for solutions: https://stopforumspam.com/ It's not my service, but I'll look into integrating it for my services. Also, an abuseipdb integration could be useful as well. With the current development pace of gitea, I don't think it's gonna happen timely, to be honest. |
@karolyi indeed. as gitea.com is a public instance, with open registration, we get a ton of spam there too, and it's highpriority for me too. I've been sharing as much knowledge with others as I can. In my (infinitely expanding) backlog of tasks, I have "write a blog post about various ways to protect against spam" as lots are hardcoded to my environment and need to be cleaned up. Things you could use right now are DN42's https://git.dn42.dev/dn42/dn42userd which they've used to what sounds like great success (note: the repo lacks a license, so use at your own risk). I've also just pushed #31852 which will help out with things too. |
Feature Description
One of the biggest issues of larger Gitea installs like Codeberg is that there is no easy way to report misbehaviour (like spam and abusive comments) directly in the UI.
For Codeberg, we use some workarounds, people contact us via third-party channels (email, mastodon, matrix etc), and we are working on a moderation system that simplifies some workflows for us: https://codeberg.org/Codeberg/moderation
With upcoming federation, this problem likely expands to many more instances. If a user doesn't want to allow registration, but still receive activity (e.g. issuse, pulls) from another instance, they'll likely face the issue of spam sooner or later. I'd even go so far as to say that we can't enable federation until this issue is solved.
Back to our moderation toolbox: We don't think that Gitea should provide full-fledged moderation features as we require (including user warnings, maybe quota enforcement, quarantining, public log etc), and to take that responsibility from the Gitea codebase, we are developing this as a service that hooks into the Gitea database.
On the other hand, there are some rudimentary tasks that should probably be covered by Gitea. That is, of course the basic api calls such a tool needs (e.g. #15588). And now I'd like to discuss what else needs to be implemented in Gitea.
My proposal is the following:
While I'd say that this shouldn't be too hard to implement (e.g. create a new table "reports" with issue_id, comment_id, user_id etc and an optional comment and dismissed=0|1 state) for Gitea in the current form (and in fact, we'd likely be available to provide such a solution), I don't know if this system shouldn't better be built with federation in mind, allowing to automatically report content to the origin instance and propagating decisions etc. The way Mastodon (and other fediverse apps) already do it already works fine, but this sounds too complex for us to "quickly implement".
What do you think? What is the way to go here? Should this be built with respect to federation - and thus probably wait until the codebase is better prepared in this regard, because currently I can't find enough references on how this could look like?
Thank you very much for the consideration.
Screenshots
No response
The text was updated successfully, but these errors were encountered: