Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.rst

Python Biased Stop Words

http://img.shields.io/badge/license-MIT-yellow.svg?style=flat https://img.shields.io/badge/contact-Gregology-blue.svg?style=flat

Overview

Biases are bugs

Stop words are words which are filtered out before processing of natural language data. Often in text analysis there are non-casual correlations, consider the following documents:

  • He is an astronaut, he is on Venus
  • He is an accountant, he is on Earth
  • She is an astronaut, she is on Mars

Processing these documents into two topics will result in gendered clustering. If we remove the gendered terms:

  • is an astronaut, is on Venus
  • is an accountant, is on Earth
  • is an astronaut, is on Mars

Processing will result in job clustering. Both clusterings are valid, however if you are interested in employing an astronaut, you don't want male accountants showing up. There are many other examples of non casual relationships occurring in natural language; religion, ethnicity, and age to name but a few.

Available genres

  • Gendered Terms
  • US Names
  • Religious Terms (Partial)

More will be available soon. Contribute at https://github.com/gregology/biased-words

Interactive Notebook

Explore this package in an Interactive Notebook

https://user-images.githubusercontent.com/1595448/48975588-00661d00-f042-11e8-97c6-ded19ad45f51.png

Hosted by binder

Installation

biased-stop-words is available on PyPI

http://pypi.python.org/pypi/biased-stop-words

Install via pip

$ pip install biased-stop-words

Or via easy_install

$ easy_install biased-stop-words

Or directly from biased-stop-words's git repo <https://github.com/gregology/biased-words>

$ git clone --recursive git://github.com/gregology/biased-stop-words.git
$ cd biased-stop-words
$ python setup.py install

Basic usage

>>> from biased_stop_words import genres, get_stop_words
>>> genres()
'religious, gendered, us-common-names, us-names, us-male-names, us-female-names, gendered-nouns'
>>> get_stop_words('gendered', 'us-common-names')
[u'trenton', u'augustine', u'khalil', u'aiden', u'elisabeth', u'andre', u'khanum', u'elva', u'fran...

Running Test

$ python biased_stop_words/tests.py

Python compatibility

Developed for Python 2 & 3.

About

lists of biased stop words from various genres

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.