Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 991 Bytes

README.md

File metadata and controls

8 lines (5 loc) · 991 Bytes

Data for Doing Data Science

Data sets from Doing Data Science by Cathy O'Neil and Rachel Schutt. I highly recommended picking up a copy for yourself.

On pages 37-38:

This folder contains 31 simulated days of ads shown and clicks recorded on the New York Times home page. Rows represent users, and the variables are: Age, Gender (0 = female, 1 = male), Impressions (number impressions), Clicks (number clicks), and a binary indicator for signed in or not Signed_in. We need to create two new variables: age_group, which contains six levels of Age ("<18", "18-24", "25-34", "35-44", "45-54", "55-64", and "65+"), and CTR or clickthrough-rate, calculated as the number of clicks / the number of impressions.