Spam detection in Junction #273

kracekumar · 2015-06-02T10:43:16Z

Recently @pythonhacker and @harisibrahimkv reported few comments looks like a spam. Junction right now only allows logged in users to comment.

There are two levels of deducting/marking comment as a spam. During signup check the reputation of email/ip of user, if it is potential spam email id/ip, block the user, legit user can write back to the concerned person to activate the account.

One more way is to have a spam deduction based on the content. There are two ways to implement, one user can report this as spam, once threshold is crossed, differentiate it in UI and automatic spam detector can mark this as spam.

Once user has crossed the threshold of spam posting, account should be blocked.

Reddit also does similar stuff.

@vigneshsarma thoughts ?

pythonhacker · 2015-06-09T12:53:48Z

How about also,

Thresholding comments from same login ? If you posted a comment on a talk at time t , you cannot post another comment within t + delta_t where delta_t could be even an hour.
Not allowing postings from known spam IPs - there are well known lists we can scrub an internal SPAM IP list to.
Not allowing postings with less than n characters. This prevents postings like All the best Sir!, Wonderful Sir! etc. This forces people to think and write something. Possibly keep n at something like 120 or 140 chars.

kracekumar · 2015-06-10T05:43:28Z

All these three should be implemented.

kracekumar · 2015-06-10T05:45:55Z

The order for implementing above should Point 3 -> Point 2 -> Point 1. We need to prevent spam account creation as well.

theskumar · 2015-06-15T11:24:31Z

How about just use a third party to evaluate the spammy nature of a particular comment. I suggest using askimet[1] if anyone doesn't have any objection with it.

I'm personally not in favour to including the spam detection logic coded into junction itself. If not askimet, then a publish pypi library that will give spam score for a comment should do.

[1]
https://github.com/miracle2k/python-akismet
https://akismet.com/

ChillarAnand · 2015-06-15T13:59:17Z

akismet 👍

kracekumar · 2015-07-05T08:14:28Z

Reading the docs suggests wordpress api key is required. I am sure, there will be other libraries/services, if some one can do quick prototype, it will be awesome.

pythonhacker · 2017-04-06T06:03:10Z

I am taking this up. Will try and get it ready for PyCon 2017.

pythonhacker · 2017-04-06T06:04:38Z

First will collect comments across last 3-4 PyCons and using one of the libraries - get its spam detection rating - on each. Idea is to find out how much true and false positives are reported. Will update here with some libraries which are pure Python as much as possible.

kracekumar · 2017-04-09T07:59:50Z

On Thu, 6 Apr 2017 at 11:34 AM Anand B Pillai ***@***.***> wrote: First will collect comments across last 3-4 PyCons and using one of the libraries - get its spam detection rating - on each. Idea is to find out how much true and false positives are reported. Will update here with some libraries which are pure Python as much as possible.

I haven't seen any spam in last few years public comments. So it may be better to rely some external dataset as well. But the application domain here is proposal comments and but one of the available spam identification dataset belongs to email category [0]. [0]: http://csmining.org/index.php/spam-email-datasets-.html —

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#273 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATCef3caOIDMsY8LCcQGXsyn4LFAmxjks5rtIB2gaJpZM4E0aX2> .

-- Sent from Gmail Mobile

kracekumar · 2017-06-22T20:03:08Z

Here is a csv file which has spam and non-spam content from junction.

ananyo2012 · 2020-03-20T18:48:33Z

Unfortunately the spam data is gone from one of the recent changes. Need to dump it somewhere, maybe in a different branch

kracekumar added the type/enhancement label Jun 10, 2015

pythonhacker self-assigned this Apr 6, 2017

pythonhacker changed the title ~~Spam deduction in Junction~~ Spam detection in Junction Apr 6, 2017

kracekumar mentioned this issue Jun 10, 2017

Prevent user from posting spam #530

Closed

palnabarun added this to the Enhancements milestone Mar 20, 2020

ananyo2012 added the triage/needs-design-decision label Mar 20, 2020

ananyo2012 mentioned this issue Mar 20, 2020

Fighting Spam #569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spam detection in Junction #273

Spam detection in Junction #273

kracekumar commented Jun 2, 2015

pythonhacker commented Jun 9, 2015

kracekumar commented Jun 10, 2015

kracekumar commented Jun 10, 2015

theskumar commented Jun 15, 2015

ChillarAnand commented Jun 15, 2015

kracekumar commented Jul 5, 2015

pythonhacker commented Apr 6, 2017

pythonhacker commented Apr 6, 2017

kracekumar commented Apr 9, 2017 via email

kracekumar commented Jun 22, 2017

ananyo2012 commented Mar 20, 2020

Spam detection in Junction #273

Spam detection in Junction #273

Comments

kracekumar commented Jun 2, 2015

pythonhacker commented Jun 9, 2015

kracekumar commented Jun 10, 2015

kracekumar commented Jun 10, 2015

theskumar commented Jun 15, 2015

ChillarAnand commented Jun 15, 2015

kracekumar commented Jul 5, 2015

pythonhacker commented Apr 6, 2017

pythonhacker commented Apr 6, 2017

kracekumar commented Apr 9, 2017 via email

kracekumar commented Jun 22, 2017

ananyo2012 commented Mar 20, 2020