Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Harper-Lite: Simple Lesson Discovery and Aggregation

Harper's goal is better learning outcomes with less effort; Harper-Lite's is to get the ball rolling with as little effort as possible. Its design principles are:

  1. Instead of building another repository or portal, we will make it easier for teachers to make their lessons findable, then aggregate the information they give us.
  2. We will use tagging, voting, and direct online annotation to provide feedback to lesson creators.
  3. We will assume that people are going to adapt and remix material instead of using it as-is.
  4. We will not require people to be technically sophisticated in order to comment, contribute, and collaborate.


Harper's design is shaped by:

  • The success of RSS for decentralized content syndication.
  • Mike Caulfield's analysis of Stack Overflow in terms of choral explanations.
  • This list of what has been learned from previous open educational resource (OER) projects.

The project is named after William Rainey Harper, first president of the University of Chicago, who encouraged universities to offer distance education courses in the 1890s. "Harper" also happens to be the name of my youngest niece.

I'm grateful to Mike Caulfield, Mine Çetinkaya-Rundel, Neal Davis, Garrett Grolemund, Alison Hill, Alyson Indrunas, François Michonneau, Bracken Mosbacker, Nick Radcliffe, Raniere Silva, Alasdair Smith, Jon Udell, and David Wiley for comments, and to Peter Quill for show us that you only need twelve percent of a plan to save the galaxy. Please see this page for a description of a larger and more complicated system.


To make a lesson easier to find and re-use, authors create one extra text file called harper.yml and put it in the lesson's root directory. This single-file convention is inspired by the feed.xml and feed.rss files used for blogging, and by the and files now commonly used in software projects; we use the name because Alasdair Smith pointed out that some systems use for the actual content of the lesson.

harper.yml is formatted as YAML and must contain all of the fields shown below:

schema: "harper-lite 0.1"
title: "The Title of the Lesson"
abstract: >
  A single-paragraph summary of the lesson, the shorter the better.
  It can be broken across multiple lines as shown here, and can
  use [Markdown]( formatting.
version: "1.3"
  - "Some Name <email@address>"
  - "Another Name <their@address>"
package: http://package.url/
license: http://license.url/
  - "package_1"
  - "package_2>="
  - "Some learning objective."
  - "Some other learning objective."
  - "a glossary term"
  - "a skill"
  - "a term defined in this lesson"
  - "a skill taught in this lesson"
notes: >
  One or more sentences describing things instructors ought to know, formatted with Markdown.
  For example, the notes could explain how long the lesson usually takes to do.


  • The schema field identifies this as Harper-Lite Version 0.1.
  • The title and abstract fields should be self-explanatory.
  • The version field is the version of the lesson (typically also the version of the package it's for).
  • Contributors do not need to provide email addresses.
  • The license value is the URL of the license rather than an abbreviation. We will strongly encourage people to point at licenses at
  • The package field is the URL for the downloadable lesson materials.
  • requirements specifies any packages required, organized under one sub-key per language, using the syntax preferred by that language's default package manager.
  • Learning objectives should be single sentences, each with an active verb, describing something observable.
  • The prereq and postreq fields are discussed below.
  • notes is for anything else that might be helpful.

The prereq field is a list of terms or skills the lesson requires learners to have; the postreq field is a list of terms or skills the lesson teaches. We have used these rather clumsy names to avoid collision with requirements (which here means "software requirements"). The items under prereq and postreq are meant to be keywords, not full explanations. These will be used to optimize search and to help people stitch lessons together (e.g., "The lesson I want requires X---where can I learn about it?").

An example of a Harper file is shown below;

schema: "harper-lite 0.1"
title: "Tests of Univariate Normality"
abstract: >
  How can we tell if univariate data is normally distributed?
  This lesson describes three tests and explains the strengths and weaknesses of each.
version: "1.3.1"
  - "Walter Bishop <>"
  - "nickr (>="
  - "Describe the 68-95-99.7 rule and explain why it works and when it fails."
  - "Describe and apply the Shapiro-Wilk test for normality of univariate data."
  - "Describe and apply the ECF test for normality of multivariate data."
  - "Normal distribution"
  - "Quantiles"
  - "Statistical power"
  - "Covariance matrix"
  - "Empirical characteristic function"
  - "Install R package"
  - "`quantile`"
  - "Kurtosis risk"
  - "68/95/99.7 Rule"
  - "Shapiro-Wilk test"
  - "ECF test"
notes: >
  This material was originally developed as part of a sequence on data integrity
  for seniors and graduate students in statistics, then modified for delivery in
  conference workshops.  The Shapiro-Wilk material feels pretty solid; the ECF
  material has only been used a couple of times, and probably needs more work.

Please see this file for a Harper description of the Software Carpentry introduction to the Unix shell.


We support collaboration with a voting mechanism like Stack Overflow's, but with subject headings drawn from lessons:

  1. An author registers a lesson by providing the URL to its Harper file, just as someone can register a blog by providing a URL to an aggregator.

  2. The site extracts data from the file and displays the lesson under each of teaches headings to make it easier to find.

  3. When someone views the page for a keyword, they can provide comments on the lessons that appear there and upvote or downvote them. This means that a single lesson could appear as the top "answer" for one keyword and the third "answer" to another.

  4. We will provide an interactive upload mode: after providing a URL for a lesson, the author is walked through a very simple online form that asks for the information that goes in the lesson's Harper file. We save that information and give it back to them to download and add to the lesson. We still require that the Harper file be in the lesson's root directory as a check on authenticity: allowing person X to describe a lesson created by person Y opens up too many opportunities for abuse.


How does an instructor find a lesson?

  1. She does a DuckDuckGo search for the word "lesson" and the keywords she's interested in. If we've done our job right, the lessons indexed by our site show up near the top of her search, just as Stack Overflow answers show up near the top of a search for a technical question.
  2. She comes to our site and does a targeted search for her keywords.

How does an author register a lesson?

  1. They create a directory on their website with a meaningful name that includes a Harper file and possibly a zip file containing starter materials (if the lesson's exercises need any).
  2. They sign into their account on our site and register the URL of the lesson directory.
  3. Our site validates their Harper file.

As noted above, we don't allow people to register lessons that they don't control, i.e., they have to be able to add to the lesson's website in order for us to accept the registration.

Critically, our site does some fuzzy matching to suggest rewording of terms to match ones in lessons that are already registered. The term "folksonomy" seems to have lost its luster, but encouraging convergence on a shared vocabulary seems crucial to making all of this work.

How does someone ask for a lesson?

Rather than waiting for people to register their own lessons, we provide a "please register this" workflow:

  1. Jeanne stumbles across a lesson she finds interesting.

  2. She checks if there's a record of it.

  3. If there isn't, but someone before her has expressed an interest, we add one to the "wanna have" count in our database.

  4. If there isn't any record in our database, we scrape the site to get contact info for the lesson's author, Vlad. We then send Vlad a single message saying, "Hey, someone's interested in your lesson."

  5. Vlad will probably ignore the message, but if he responds by creating a record for his lesson, we stash that so that future inquiries will resolve and notify Jeanne that the record is now available. This workflow serves the triple purpose of attracting more authors, not spamming them, and ensuring that the record for a lesson comes from someone identified in the lesson's content (i.e., probably not a disgruntled former student).

How does an author update a lesson?

  1. They sign into their account, go to "My Lessons", and push the "update" button next to the lesson in question.

Note that we archive past Harper files but not the actual lesson content. (We're an index not a repository.)


Harper: Lesson Discovery and Aggregation




No releases published


No packages published