This is a lesson on tidying data. Specifically, what to do when a conceptual variable is spread out over multiple data frames and across 2 or more variables in a data frame.

Data used: words spoken by characters of different races and gender in the Lord of the Rings movie trilogy.

  • 01-intro shows untidy and tidy data. Then we demonstrate how tidy data is more useful for analysis and visualization. Includes references, resources, and exercises.
  • 02-gather shows how to tidy data, using gather() from the tidyr package. Includes references, resources, and exercises.
  • 03-spread shows how to untidy data, using spread() from the tidyr package. This might be useful at the end of an analysis, for preparing figures or tables.
  • 04-tidy-bonus-content is not part of the lesson but may be useful as learners try to apply the principles of tidy data in more general settings. Includes links to packages used. It is out of date!


Tidy data lesson using Lord of the Rings data.



