Skip to content
Journal of Statistical Education Paper on Using OkCupid Data for Data Science Courses
R TeX
Branch: master
Clone or download
Pull request Compare This branch is even with rudeboybert:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README_files/figure-markdown_github
.gitignore
JSE.R
JSE.Rnw
JSE.bib
JSE.pdf
README.Rmd
README.md
okcupid_codebook.txt
profiles.csv.zip

README.md

OkCupid Profile Data for Intro Stats and Data Science Courses

Albert Y. Kim and Adriana Escobedo-Land

Data and code for OkCupid Profile Data for Introductory Statistics and Data Science Courses (Journal of Statistics Education July 2015, Volume 23, Number 2).

  • JSE.bib: bibliography file
  • JSE.pdf: PDF of document
  • JSE.Rnw: R Sweave document to recreate JSE.pdf.
  • JSE.R: R code used in document
  • okcupid_codebook.txt: codebook for all variables
  • profiles.csv.zip: CSV file of profile data (unzip this first)

Note:

  • Permission to use this data set was explicitly granted by OkCupid.
  • Usernames are not included.
  • JSE.Rnw Sweave document was compiled using the knitr package. In RStudio, go to "Tools" -> "Project Options" -> "Sweave" -> "Weave Rnw files using:" and select knitr.

Preview

Distribution of Male and Female Heights

Joint Distribution of Sex and Sexual Orientation

A mosaicplot of the cross-classification of the 59946 users' sex and sexual orientation:

Logistic Regression to Predict Gender

Linear regression (in red) and logistic regression (in blue) compared. Note both the x-axis (height) and y-axis (is female: 1 if user is female, 0 if user is male) have random jitter added to better visualize the number of points involved for each (height x gender) pair.

Fitted probabilities p-hat of each user being female along witha decision threshold (in red) used to predict if user is female or not.

You can’t perform that action at this time.