# Recasting data

Sometimes long data needs to be wide, and sometimes wide data needs to be long. I'll explain.

You are soon going to discover that long before you can visualize data, you need to have it in a form that the visualization library can deal with. One of the ways that isn't immediately obvious is how your data is cast. Most of the data you will encounter will be wide -- each row will represent a single entity with multiple measures for that entity. So think of states. Your dataset could have population, average life expectancy and other demographic data. 

But what if your visualization library needs one row for each measure? That's where recasting your data comes in. We can use a library called `reshape2` to `melt` or `cast` the data, depending on what we need.

So let's transform a dataset we've already used -- registered voters in Nebraska -- from wide data to long data and back again. First, we'll import the library and then open the data. 

In [1]:
library(reshape2)

In [2]:
voters <- read.csv("../../Data/registeredvoters.csv")

In [3]:
head(voters)

County,Republican10,Democrat10,Libertarian10,Nonpartisan10,Total10,Republican16,Democrat16,Nonpartisan16,Libertarian16,Total16
Adams,10018,5536,6,2972,18532,10746,5027,3591,163,19527
Antelope,3005,1147,0,538,4690,3088,863,594,12,4557
Arthur,284,52,0,10,346,286,37,15,3,341
Banner,424,53,0,53,530,427,38,73,7,545
Blaine,314,56,0,24,394,310,43,29,2,384
Boone,2390,1156,0,408,3954,2469,901,404,11,3785


Making data long, in most cases, is very, very easy. It's simple. We're going to create a new data frame called longvoters, and then `melt` our voters data into it. Then we'll run `head` and you'll see each measure gets it's own row -- so each county has 10 rows of data for it. 

In [4]:
longvoters <- melt(voters)
head(longvoters)

Using County as id variables


County,variable,value
Adams,Republican10,10018
Antelope,Republican10,3005
Arthur,Republican10,284
Banner,Republican10,424
Blaine,Republican10,314
Boone,Republican10,2390


Then, we can put it back together again by casting it using `dcast`. With `dcast`, we need to tell it which variable is our main identifier -- which is County. And you'll see, we've changed the data twice, but it looks identical to the original dataset. 

In [9]:
widevoters <- dcast(longvoters, County ~ variable)
head(widevoters)

County,Republican10,Democrat10,Libertarian10,Nonpartisan10,Total10,Republican16,Democrat16,Nonpartisan16,Libertarian16,Total16
Adams,10018,5536,6,2972,18532,10746,5027,3591,163,19527
Antelope,3005,1147,0,538,4690,3088,863,594,12,4557
Arthur,284,52,0,10,346,286,37,15,3,341
Banner,424,53,0,53,530,427,38,73,7,545
Blaine,314,56,0,24,394,310,43,29,2,384
Boone,2390,1156,0,408,3954,2469,901,404,11,3785


## Assignment

Melt the population estimates data from assignment 3. 

#### Rubric

1. Did you import the data correctly?
2. Did you apply melt correctly?
3. Did you explain your steps using Markdown comments?