SNP Filter

Currently, employed to detect the posts containing terrorist propaganda. SNP filter, reads the social media posts and filters the results based on three specifications:

Terrorism Tags
Motivation Tags
Out-of-context (OOC) Tags

Sample content of each Tag set is given in the table below:

Tag Set	Content
Terrorist Tags (S1)	Specifically terrorism agents or terrorism actions, such as {isis, kill, threaten,taliban}
Motivation Tags (S2)	{Encourage, Inspire, etc.}
OOC Tags (S3)	{"Name of People from Unrelated Context", etc.}

Algorithms

Preprocess Data

Input: twitter
Output: preprocessedTwitter
1) Split the post into independent sentences 
2) Remove the stop words from text
3) Decompose compound data, such as #supportISIS =>support ISIS
3) Stem the words to root words, such as support, supported, supporting => support

Filter Algorithm

Input : a new twitter
Out put : Accept or not
1)if the twitter contains words in filter set 1 then 
2)   if the twitter contains words in filter set 2 then
3)        if the twitter contains words in filter set 3 then
4)               return not accepted
5)        else
6)               return accepted

How to Run the Code

Basic Filtering

The code in Mainapp.java is doing basic filtering. Run this file, in the result, should see a set of filtered twitter set in the standard output. This set should contain the twitters that have motivational terrorism information in them.

Limitations

1) Multiple Nouns

Example of multiple nouns : Suiside Belt, cut their heads off, call upon

Since our matching algorithm workds via key word matching in a 1-gram model, such that it can't work well on Multiple Nouns. For example, it will be hard to filter out a twitter that contains the object Suicide Belt. The problem happens because when we break down the sencenten, the smallest unit after dividing is a token made up of one word, such as embasy, isis, division etc. In the meantime, the tags sets are also made up of single word, such as terrorism, isis, threaten.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
res		res
src/com/snp		src/com/snp
.DS_Store		.DS_Store
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE		LICENSE
LISM.jar		LISM.jar
MDAGTest.jar		MDAGTest.jar
README.md		README.md
clarifai-api-java-1.0.1.jar		clarifai-api-java-1.0.1.jar
co_occurrence_probabilities.txt		co_occurrence_probabilities.txt
consistency.txt		consistency.txt
consistency_threshold.txt		consistency_threshold.txt
cooccurrence.txt		cooccurrence.txt
cooccurrence_threshold.txt		cooccurrence_threshold.txt
graph.txt		graph.txt
marginal_counts.txt		marginal_counts.txt
marginal_counts1.txt		marginal_counts1.txt
marginal_probabilities.txt		marginal_probabilities.txt
vocab.txt		vocab.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SNP Filter

Algorithms

Preprocess Data

Filter Algorithm

How to Run the Code

Basic Filtering

Limitations

1) Multiple Nouns

About

Releases

Packages

Contributors 2

Languages

License

shirish57/SNP_Filter

Folders and files

Latest commit

History

Repository files navigation

SNP Filter

Algorithms

Preprocess Data

Filter Algorithm

How to Run the Code

Basic Filtering

Limitations

1) Multiple Nouns

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages