# Data Transformations with Pandas in Python - Transforming Data

Welcome to the notebook about transforming data. In this notebook we will look at two ways of tranforming your data: 
1. **Melting** - transforming your _wide_ table to a _long_ table
2. **Pivoting** - transforming your _long_ table to a _wide_ table

Good luck!

## 1. Melting: from wide to long table

We will work with the CSV file `meltData.csv`, which has similarities with EMI station data files, but is much smaller (and with random data), so that it is easier to see what happens. Run the cell below to load and check the data.

In [None]:
import pandas as pd
meltData = pd.read_csv('meltData.csv')
meltData

You can use `.melt()` if you want to bring certain _values_ from different columns into one column, without losing track of the _identity_ of those values (e.g., the _value_ `rainfall`, with _identities_ like `station`, `year`, `month`, etc.). 

For using the function `.melt()`, it is important to know what columns in your dataframe contain _values_ and which columns contain information about the _identity_ of those values. 

In our case, the columns with _values_ are `['10', '20', '30']`, and the _identifier_ columns are `['Year', 'Month']`.

In [None]:
id_columns = ['Year', 'Month']
value_columns = ['10', '20', '30']

Once we know the `id_columns` and `value_columns`, using the function `.melt()` is very easy. Use it on the dataframe you want to _melt_, and inside the function `.melt()` supply the `id_columns` to the argument `id_vars=`, and the `value_columns` to the argument `value_vars=`.

In [None]:
melted = meltData.melt(id_vars=id_columns, value_vars = value_columns)
melted

If all columns are either id-columns or value-columns, you can also only specify `id_vars` (it will assume `value_vars` are all other columns).

In [None]:
melted = meltData.melt(id_vars=id_columns)
melted

You can also specify the variable name and the value name. Those will become the column names, instead of `variable` and `value`.

In [None]:
melted = meltData.melt(id_vars=id_columns, var_name = 'Day', value_name = 'Rainfall')
melted

**Exercise**: Use the function `.melt()` on the dataframe `emiMelt`.

Steps:
- Run below cell to load the data of `emiMelt.xlsx` as dataframe `emiMelt`.
- Check carefully which columns have the actual values (so, which are the `value_vars`), and which columns contain information about the _identity_ of those values (the `id_vars`).
- Use `emiMelt.melt()`, and inside the `.melt()` function supply the column names to the arguments `id_vars=` and `value_vars=`.
- Think about a possible use of `var_name=` and `value_name=`.

In [None]:
# Loading the data needed for the exercise
emiMelt = pd.read_excel('emiMelt.xlsx')
emiMelt

In [None]:
# Write your code for melting emiMelt here.

## 2. Pivoting: from long to wide table

We will work with the CSV file `pivotData.csv`, which looks very much like _melted_ stationdata (in other words: it is a _long_ table, instead of a _wide_ table). It is much smaller than regular EMI stationdata (and with random data), so that it is easier to see what happens. Run below cell to load and check the data.

In [None]:
pivotData = pd.read_csv('pivotData.csv')
pivotData

You can use the method `.pivot()` for creating different columns based on one column. For example, if you have a very _long_ table with data for different stations below each other, but you instead want to get a separate column per station, `.pivot()` is a perfect method.

For using the function `.pivot()`, it is important to know what you eventually want as **index**, as **columns** and as **values**. In the case of `pivotData`, it would be nice to have the individual stations as separate **columns**, with rainfall **values**, and the `Year` and `Month` data as **index**.

In [None]:
pivoted = pivotData.pivot(index=['Year', 'Month'], columns='station', values='rainfall')
pivoted

**Exercise**: Use the function `.pivot()` on the dataframe `emiPivot`, to get per station a column, indexed with year, month and day.

Steps:
- Load the CSV file `emiPivot.csv` by running below code
- Carefully think about the following three things:
    - What to set as `index= `?
    - What to set as `columns= `?
    - What to set as `values= `?
- Use `emiPivot.pivot()` and supply the right information to the arguments `index= `, `columns= ` and `values= `.

In [None]:
emiPivot = pd.read_csv('emiPivot.csv')
emiPivot

In [None]:
# Write your code for pivoting emiPivot here.

**Exercise**: Use the function `.pivot()` on the dataframe `emiPivot` to get for only the station `Gondar A.P.`, a column per year.

Steps:
- Make a new dataframe that only has data for `emiPivot.NAME == 'Gondar A.P.'`
- Pivot that new dataframe, with as index `Month` and `day`, as columns `Year`, and as values `rainfall`

In [None]:
# Write your code for pivoting emiPivot to columns per year for only Gondar A.P.