Skip to content
Python implementation of Sap et al.'s gender prediction algorithm for Twitter.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Apr 7, 2018
README.md Fixed readme Apr 7, 2018
SapGenderPrediction.py
gender_lex.csv
happyfuntokenizing.py First commit. Apr 7, 2018

README.md

TwitterGenderPredictor

JT Wolohan

jwolohan@indiana.edu

Description

This is a Python implementation of Sap et al.'s gender prediction algorithm for Twitter. The algorithm should be 90% accurate given a large sample of users and a reasonable amount of data for each user.

Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., ... & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1146-1151).

Use

  1. Clone the repository.
  2. Import SapGenderPrediction.
  3. Initiate a GndrPrdct class object.
  4. Call the predict_gender method on a string collection of tweets.

Predictions are returned as integers. 0 is a prediction of male, 1 is a prediction of female.

Example

# Step 2
from SapGenderPrediction import GndrPrdct

# Step 3
Classifier  = GndrPrdct()
tweets = ["This is a tweet.", "I'm another tweet!", "Hey, @realDonaldTrump, I'm yet another tweet!"]

# Step 4
Classifier.predict_gender(" ".join(tweets))
You can’t perform that action at this time.