# Column headers are values, not variable names

*This is one of the more common data manipulations to get to a tidy form!*

For example, many government data sets are in a format which is **good for visual lookup, but not for analysis and exploration**, with measurements in various years spread across multiple columns.

### A toy example – years as column headers

Let's define a simple, small DataFrame with that structure:

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({'state':['Maine','Alaska','Ohio'],
                  '2009':[1,2,3],
                  '2010':[4,5,6],
                  '2011':[7,8,9]})
df

Unnamed: 0,state,2009,2010,2011
0,Maine,1,4,7
1,Alaska,2,5,8
2,Ohio,3,6,9


## Transforming from *wide* into *tall* format

The problems with this format are 

- **The column headers are really a Dimension that should have its own *Year* column**
- **The values that are spread across the multiple rows and columns in the body of the table are a Measure that should be in a single column.**

Confusingly, every language seems to have its own term for this process / transformation:

- **In Pandas you do a "melt"**
- In the R `tidyr` package this is a "gather"
- In OpenRefine it's a "Transpose->Transpose cells across columns into rows..." operation
- In Tableau this is called a "Pivot"
- Many call this process "un-pivoting", since a *Pivot Table* in Excel converts data in the opposite direction, from the *tall* format into *wide*. 


### Minimally, you need to specify 

- the DataFrame to "melt"
- a list of which columns don't get "un-pivoted" – these values will get repeated.

*Notice that the column headers by default end up in a column called "variable", and the table body values end up in a column called "value".*

In [3]:
df2 = pd.melt(df, ['state'])
df2

Unnamed: 0,state,variable,value
0,Maine,2009,1
1,Alaska,2009,2
2,Ohio,2009,3
3,Maine,2010,4
4,Alaska,2010,5
5,Ohio,2010,6
6,Maine,2011,7
7,Alaska,2011,8
8,Ohio,2011,9


## More complete `.melt()` statement

More fully, you can explicitly specify the

- list of columns that don't get melted (and, thus, will get repeated): `id_vars=`
- list of columns that get melted from columns into rows: `value_vars=`
- name you want for the column that used to be column headers: `var_name=`
- name you want for the column that used to be the table body values: `value_name=`


In [4]:
df2 = pd.melt(df, id_vars=['state'], 
              value_vars=['2009','2010','2011'], 
              var_name='year', value_name='number')
df2

Unnamed: 0,state,year,number
0,Maine,2009,1
1,Alaska,2009,2
2,Ohio,2009,3
3,Maine,2010,4
4,Alaska,2010,5
5,Ohio,2010,6
6,Maine,2011,7
7,Alaska,2011,8
8,Ohio,2011,9
