Readings in applied data science
Switch branches/tags
Nothing to show
Clone or download

Stats 337: Readings in Applied Data Science

Stats 337 is a small discussion class available to Stanford students in Spring 2018. Student in this class will read 3-4 papers (or equivalent) per week, write a brief response, and then discuss the papers (and related ideas) in class.


These readings reflect my personal thoughts about applied data science, and are skewed towards topics that I think are important but are generally under appreciated. It is not a systematic attempt to survey the field. That said, if you think there's something major that I've missed, please feel free to submit an issue (or pull request!). These readings will evolve as the quarter goes by.

Many of the readings come from Practical Data Science for Stats, a join PeerJ collection and special issue of the American Statistician. Jenny Bryan and I pulled this collection together in order to publish some of the important parts of data science that were previously unpublished. Other readings are blog posts because so much of applied data science is outside the comfort zone of traditional academic fields.

The development of much of this course has been driven by conversations on twitter. A big thanks go to everyone who has helped me out! Key threads: classroom discussion, ethics, google sheets, citation management.

What the *&!% is data science? (Apr 2)

In-class resources

Data collection and collaboration (Apr 9)

In-class photos

Spend 3-5 minutes filling out class feedback.

Software engineering (Apr 16)

Collaborative google doc

DevOps (Apr 23)

Collaborative google doc

Teaching (Apr 30)

Reproducibility (May 7)

Ethics (May 14)

Career (May 21)



Annotated bibliographies

Many students in the spring 2018 elected to share their final annotated bibliographies


This is a discussion based class so the majority of your final grade will come from your preparation for discussion (weekly 1-page responses, 30%), and your in-class participation (also 30%). This class is not meant to be self-contained, so the final component of your grade will be an annotated bibliography (40%) describing other papers that you read outside of this class. The goal of these assessments is to force you to do things that are in your own best interests, and to encourage you learn helpful workflows that will stand you in good stead outside of this class.

I am not interested in policing excuses so no late responses will be accepted, and absences from class will count as a zero for participation. That said, I also don't want one bad week to affect your final grade, so your lowest two scores from each will be dropped.


Each week (after the first week), you need to turn in a 1-2 page written response to the papers that you read that week. The goal of response is to ensure that you've read the weekly readings, thought about them, and connected them to your existing knowledge, interests, and experience. In your response, you should briefly summarise the paper (1-2 sentences to jog your memory when you re-read your notes), and then focus on your response to the paper: How did it make you feel? What questions were you left with? What do you think it got wrong? If you found one of the readings to be particularly thought provoking, feel free to devote your entire response to that paper.

Each response will be graded on the check/plus/minus system. You will get a check if you briefly summarise the readings and add your own commentary. You will get a check-plus if you synthesize the readings, and combine them with outside knowledge/experience. You will get a check-minus if you only summarise the paper. (I will likely evolve these guidelines to be more concrete once I've read a few responses.)

If you're not familiar with reading academic papers (or you want to polish your skills), you might want to read these guidelines from Jeff Leek. I'd also highly recommend that you learn and use a citation management system. Having a system for managing citations is crucial if you plan to write a thesis. If you don't have an existing system, start by reading the advice of Caleb McDaniel.


This is a discussion class so your classroom participation is essential. But don't worry if you're an introvert, shy, or English is your second language: there will be plenty of opportunities to participate that don't require verbal agility. In this class, I'll be drawing on the techniques described in The Discussion Book by Stephen D. Brookfield and Stephen Preskill to make sure that everyone gets a chance to participate. I'll also collect regular feedback to make sure that everything is going well.

Annotated bibliography.

Your final project will be an annotated bibliography containing at least 20 papers or blog posts related to data science that we did not cover in this course. (See citation tracing)

Due June 6 (electronically)

There are three components to the bibliography:

  • Executive summary (25%). Introduce the overall theme of your bibliography in 1-2 paragraphs. Then use 1-2 pages to synthesise the most important or interesting from your annotated bibliography.

  • Top 3 (25%). List the three papers that you would most highly recommend and describe briefly why.

  • Bibliography (50%). List all the papers you have read with a proper reference and any notes you find helpful.

Each component will be graded 1 (C), 2 (B), or 3 (A):

  • Executive summary:

    • 3:
    • 2:
    • 1:
  • Top 3:

    • 3: Your description of the top 3 papers makes me want to run out and read them immediately, and you make that easy with impeccable citations and links to pdfs

    • 2:

    • 1: You have listed 3 papers and briefly described why they are interesting.

  • Bibliography:

    • 1: 6-10 papers
    • 2: 11-16 papers
    • 3: >25 papers


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.