In [22]:
import pandas as pd

## Reorganizing Data into a "Tidy" / "Long" / "Tabular" / "Design" DataFrame
### `pd.melt()`

The `pd.melt()` function and `DataFrame.melt()` method take a single dataframe and make it **taller** by taking data stored in column names and putting it into the rows along with the rest of the data, adding extra metadata in the process.

For example, it can turn this `df` DataFrame:

| Month | Year | Monday | Tuesday | Wednesday |
| :--:  | :--: | :--:   | :--:    | :--:      |
| January | 2021 | 0 | -2 | -1 |
| February | 2021 | 2 | 4 | -2 |

into this:

| Month | Year | Weekday | Temperature |
| :--:  | :--: | :--:    |  :--:       |
| January | 2021 | Monday | 0 |
| January | 2021 | Tuesday | -2 |
| January | 2021 | Wednesday | -1 |
| February | 2021 | Monday | 2 |
| February | 2021 | Tuesday | 4 | 
| February | 2021 | Wednesday | -2 |

with one line of code:

```python
pd.melt(
    df, 
    id_vars=['Month', 'Year'],  # The columns that should stay the same
    value_vars=['Monday', 'Tuesday', 'Wednesday'],   # The columns that should melt
    var_name='Weekday',  # The new Column that will represent the melted column name's variable
    value_name='Temperature'  # The new Column that the data represents
)
```

**Note**: Melting a dataframe also called *"tidying"* data, making a *"long"* dataframe from a *"wide"* dataframe, or building a *design matrix*

#### Exercises

Let's practice tidying dataframes with the `pd.melt()` function.  

In [23]:
df = pd.DataFrame({
    'Attendee': ['Mark', 'Susan', 'June', 'Lingling'],
    'Group': ['MPI', 'LMU', 'LMU', 'MPI'],
    'Monday': [True, True, False, True],
    'Tuesday': [False, True, False, True],
    'Wednesday': [True, True, True, True],
})
df

Unnamed: 0,Attendee,Group,Monday,Tuesday,Wednesday
0,Mark,MPI,True,False,True
1,Susan,LMU,True,True,True
2,June,LMU,False,False,True
3,Lingling,MPI,True,True,True


Melt this dataset into four columns: "Attendee", "Group", and "DayOfWeek", "Attended"

Melt the 1948 U.S. Unemployment dataset into 3 columns: Year, Month, and Unemployment

In [19]:
from bokeh.sampledata import unemployment1948
data = unemployment1948.data[['Year', 'Jan', 'Feb', 'Mar']]
data.head()

Unnamed: 0,Year,Jan,Feb,Mar
0,1948,4.0,4.7,4.5
1,1949,5.0,5.8,5.6
2,1950,7.6,7.9,7.1
3,1951,4.4,4.2,3.8
4,1952,3.7,3.8,3.3


Melt the Gapminder Fertility Rate dataset into 3 columns: Country, Year, and FertilityRate

In [2]:
import bokeh
# bokeh.sampledata.download()
from bokeh.sampledata import gapminder
gapminder.fertility.head()

Unnamed: 0_level_0,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,...,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,7.671,7.671,7.671,7.671,7.671,7.671,7.671,7.671,7.671,7.671,...,7.136,6.93,6.702,6.456,6.196,5.928,5.659,5.395,5.141,4.9
Albania,5.711,5.594,5.483,5.376,5.268,5.16,5.05,4.933,4.809,4.677,...,2.004,1.919,1.849,1.796,1.761,1.744,1.741,1.748,1.76,1.771
Algeria,7.653,7.655,7.657,7.658,7.657,7.652,7.641,7.622,7.591,7.548,...,2.448,2.507,2.58,2.656,2.725,2.781,2.817,2.829,2.82,2.795
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,,,,,,,,,,,...,,,,,,,,,,


### Data Merging Exercise: Full Data Analysis

Tidy all four datasets in the gapminder data and merge them together into a single tidy table  (note: use multiple cells for this analysis.  When you're done, save it to a csv file)