-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spam detection in Junction #273
Comments
How about also,
|
All these three should be implemented. |
The order for implementing above should |
How about just use a third party to evaluate the spammy nature of a particular comment. I suggest using I'm personally not in favour to including the spam detection logic coded into junction itself. If not askimet, then a publish pypi library that will give spam score for a comment should do. [1] |
akismet 👍 |
Reading the docs suggests |
I am taking this up. Will try and get it ready for PyCon 2017. |
First will collect comments across last 3-4 PyCons and using one of the libraries - get its spam detection rating - on each. Idea is to find out how much true and false positives are reported. Will update here with some libraries which are pure Python as much as possible. |
On Thu, 6 Apr 2017 at 11:34 AM Anand B Pillai ***@***.***> wrote:
First will collect comments across last 3-4 PyCons and using one of the
libraries - get its spam detection rating - on each. Idea is to find out
how much true and false positives are reported. Will update here with some
libraries which are pure Python as much as possible.
I haven't seen any spam in last few years public comments. So it may be
better to rely some external dataset as well. But the application domain
here is proposal comments and but one of the available spam identification
dataset belongs to email category [0].
[0]: http://csmining.org/index.php/spam-email-datasets-.html
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AATCef3caOIDMsY8LCcQGXsyn4LFAmxjks5rtIB2gaJpZM4E0aX2>
.
--
Sent from Gmail Mobile
|
Here is a csv file which has spam and non-spam content from junction. |
Unfortunately the spam data is gone from one of the recent changes. Need to dump it somewhere, maybe in a different branch |
Recently @pythonhacker and @harisibrahimkv reported few comments looks like a spam. Junction right now only allows logged in users to comment.
There are two levels of deducting/marking comment as a spam. During signup check the reputation of email/ip of user, if it is potential spam email id/ip, block the user, legit user can write back to the concerned person to activate the account.
One more way is to have a spam deduction based on the content. There are two ways to implement, one user can report this as spam, once threshold is crossed, differentiate it in UI and automatic spam detector can mark this as spam.
Once user has crossed the threshold of spam posting, account should be blocked.
Reddit also does similar stuff.
@vigneshsarma thoughts ?
The text was updated successfully, but these errors were encountered: