Skip to content

jcolag/Keywords

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Keywords

This is a quick project to demonstrate/prototype keyword searches for suspicious texts, with features to be added over time.

Briefly, Keywords will extract the words in a text file above a specified size, filter out common words, and collect the words by stem/root word.

For the sake of looking productive, it then uses DuckDuckGo to search Snopes and quickly parses it to see if the contents (a recirculated e-mail, for example) represent a known scam.

Warnings

Please don't use this for any sort of production work.

I mean, seriously, for the sake of expedience, I'm using a search engine as a search engine and (to use the term a bit liberally) spidering another site. I'm also assuming that neither site layout will ever change. I even take a very naive view of HTML structure, just for the sake of it.

The number of things that can go wrong and the number of people you might offend is astronomical. So, just don't actually use the thing for anything more than a quick test or a learning experience.

Credits

Rather than try to reinvent the wheel with my own stemming algorithm, I happily use the stemmify gem to collect words by (likely) common root.

The default list of common words is an aggregate of Basic English, the typical vocabulary of Voice of America, and at least one list of Stop Words.

As mentioned, the search uses both DuckDuckGo and Snopes.

About

Keyword search for suspicious texts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages