okcupiddata
R package of cleaned profile data from OkCupid Profile Data for Introductory Statistics and Data Science Courses (Journal of Statistics Education 2015): 59,946 OkCupid users who were living within 25 miles of San Francisco, had active profiles on June 26, 2012, were online in the previous year, and had at least one picture in their profile.
The data in this package are a "cleaned" version of the original data from the above paper, in that the following variables are modified for easier use by novices:
- Essay responses: Due to file size restrictions, only the first 140 characters of each user's first essay response (
essay0: my self summary) are included - Missing
incomevalues: Previously coded as-1, they are now coded asNA - All other missing values: Previously coded as
"", they are now coded asNA offspringandsign: String instances of"?’"are replaced with apostropheslast_online: Date/time strings are converted toUSA/Pacifictimezone POSIXct date-time objects
Note:
- The original data, publication, code, and codebook can be found here.
- The original data, and hence also this cleaned data, did not include usernames.
- Permission to use this data was explicitly granted by OkCupid.
Installation
Get the released version from CRAN:
install.packages("okcupiddata")Or the development version from GitHub:
# If you haven't installed devtools yet, do so:
# install.packages("devtools")
devtools::install_github("rudeboybert/okcupiddata")Load Data
To load the profile data, run:
data(profiles)If you prefer having the originally published Journal of Statistics Education data, which also include the complete essay responses, then do not use this package; simply run the following code:
# Download the data (run only once):
url <- "https://github.com/rudeboybert/JSE_OkCupid/blob/master/profiles.csv.zip?raw=true"
temp_zip_file <- tempfile()
download.file(url, temp_zip_file)
unzip(temp_zip_file, "profiles.csv")
# Load CSV into R:
profiles <- read.csv(file="profiles.csv", header=TRUE, stringsAsFactors = FALSE)