## Melts

The `pd.melt()` function and `DataFrame.melt()` method take a single dataframe and make it **taller** by taking data stored in column names and putting it into the rows along with the rest of the data, adding extra metadata in the process.

For example, it can turn this `df` DataFrame:

| Month | Year | Monday | Tuesday | Wednesday |
| :--:  | :--: | :--:   | :--:    | :--:      |
| January | 2021 | 0 | -2 | -1 |
| February | 2021 | 2 | 4 | -2 |

into this:

| Month | Year | Weekday | Temperature |
| :--:  | :--: | :--:    |  :--:       |
| January | 2021 | Monday | 0 |
| January | 2021 | Tuesday | -2 |
| January | 2021 | Wednesday | -1 |
| February | 2021 | Monday | 2 |
| February | 2021 | Tuesday | 4 | 
| February | 2021 | Wednesday | -2 |

with one line of code:

```python
pd.melt(
    df, 
    id_vars=['Month', 'Year'],  # The columns that should stay the same
    value_vars=['Monday', 'Tuesday', 'Wednesday'],   # The columns that should melt
    var_name='Weekday',  # The new Column that will represent the melted column name's variable
    value_name='Temperature'  # The new Column that the data represents
)
```

**Note**: Melting a dataframe also called *"tidying"* data, making a *"long"* dataframe from a *"wide"* dataframe, or building a *design matrix*

#### Exercises

Let's practice tidying dataframes with the `pd.melt()` function. 

In [16]:
df = (
    pd.read_csv('https://raw.githubusercontent.com/nickdelgrosso/CodeTeachingMaterials/main/datasets/worldbankdata.csv')
    .get(['Country Name', 'Country Code', '1960', '1970', '1980', '1990', '2000'])
    .sample(10)
    .reset_index(drop=True)
)
df.head()

Unnamed: 0,Country Name,Country Code,1960,1970,1980,1990,2000
0,"Congo, Rep.",COG,5.88,6.259,6.178,5.347,5.134
1,Sint Maarten (Dutch part),SXM,,,,,
2,Switzerland,CHE,2.336,2.087,1.55,1.59,1.5
3,Belarus,BLR,2.67,2.31,2.03,1.91,1.31
4,Equatorial Guinea,GNQ,5.505,5.678,5.728,5.9,5.773


Melt this dataset so it has four columns: "Country Name", "Country Code", "Year", and "Fertility Rate"

## Regularizing, Splitting Text Data

Oftentimes, string data contains multiple pieces of data inside it, split with a seperator character.  With it, you can turn a DataFrame from this:

| line |
| :--: |
| hi_1 |
| bye_2|

into this:

| line | msg | num |
| :--: | :--: | :--: |
| hi_1 | hi | 1 |
| bye_2| bye | 2 |

using a single line:

```python
df[['msg', 'num']] = df['line'].str.split('_', expand=True)
```


Let's try it out!

In [2]:
df = pd.DataFrame({
    'counts_XADD': ["1;3;5", "10;2;6"],
    'intensities_JJAKX': ['5_32_654', "10_1_99"],
})
df

Unnamed: 0,counts_XADD,intensities_JJAKX
0,1;3;5,5_32_654
1,10;2;6,10_1_99


Rename the columns to just keep the the names before the underscore

Split the Counts into Counts_1, Counts_2, and Counts_3

Split the Intensities into Intensities_1, Intensities_2, and Intensities_3