Skip to content
Branch: master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

okcupiddata

Build Status CRAN_Status_Badge CRAN RStudio mirror downloads

R package of cleaned profile data from OkCupid Profile Data for Introductory Statistics and Data Science Courses (Journal of Statistics Education 2015): 59,946 OkCupid users who were living within 25 miles of San Francisco, had active profiles on June 26, 2012, were online in the previous year, and had at least one picture in their profile.

The data in this package are a "cleaned" version of the original data from the above paper, in that the following variables are modified for easier use by novices:

  • Essay responses: Due to file size restrictions, only the first 140 characters of each user's first essay response (essay0: my self summary) are included
  • Missing income values: Previously coded as -1, they are now coded as NA
  • All other missing values: Previously coded as "", they are now coded as NA
  • offspring and sign: String instances of "?’" are replaced with apostrophes
  • last_online: Date/time strings are converted to USA/Pacific timezone POSIXct date-time objects

Note:

  • The original data, publication, code, and codebook can be found here.
  • The original data, and hence also this cleaned data, did not include usernames.
  • Permission to use this data was explicitly granted by OkCupid.

Installation

Get the released version from CRAN:

install.packages("okcupiddata")

Or the development version from GitHub:

# If you haven't installed devtools yet, do so:
# install.packages("devtools")
devtools::install_github("rudeboybert/okcupiddata")

Load Data

To load the profile data, run:

data(profiles)

If you prefer having the originally published Journal of Statistics Education data, which also include the complete essay responses, then do not use this package; simply run the following code:

# Download the data (run only once):
url <- "https://github.com/rudeboybert/JSE_OkCupid/blob/master/profiles.csv.zip?raw=true"
temp_zip_file <- tempfile()
download.file(url, temp_zip_file)
unzip(temp_zip_file, "profiles.csv")
# Load CSV into R:
profiles <- read.csv(file="profiles.csv", header=TRUE, stringsAsFactors = FALSE)

About

R Package of Cleaned OkCupid Data

Resources

Releases

No releases published

Languages

You can’t perform that action at this time.