In [None]:
import pandas as pd

In [None]:
pd.options.display.max_columns = 50

# Pandas table rearrangements

We'll be using free data via gapminder.org [repository](https://github.com/open-numbers/ddf--gapminder--systema_globalis), CC-BY LICENSE for this exercise. 

Let's load the data from its storage on github: two tables, with information about world's countries:

In [None]:
countries = pd.read_csv(
    "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis"
    "/master/ddf--entities--geo--country.csv"
)

In [None]:
population = pd.read_csv(
    "https://raw.githubusercontent.com/open-numbers/ddf--gapminder--systema_globalis"
    "/master/countries-etc-datapoints/ddf--datapoints--population_total--by--geo--time.csv"    
)

## 1. Long form vs wide form tables

Same data can be stored in different formats and shapes of tables. This applies to pandas as well.

Depending on the circumstances, you might need a particular form of the table.

Here are random 10 rows from the population table

In [None]:
population.sample(10)

Each country has many rows for each year, and the county's code is repeated. This is the `long form` of this data.

The `wide form` of this data would not have country code repeated.

Instead, it would have a column for each year.

To convert `long` to `wide` form, there is `pivot` dataframe method:

In [None]:
wide_form = population.pivot(index="geo", columns="time")
wide_form

## 2. Index manipulations

In a pandas table rows and columns have names, which are stored in `index` and `columns` respectively.

They can be changed and this can be useful in a number of situations.

### 1. Move index (row names) inside the table

Sometimes you need to have the row names of your table inside as another column to perform operation on it. To do this, use `reset_index` method of the dataframe. It return the modified dataframe

In [None]:
wide_form = wide_form.reset_index()
wide_form

### 2. Set index (row names) to a specific column

To do that, use `set_index` method, it also returns the modified dataframe.

In [None]:
countries_by_name = countries.set_index("country")
countries_by_name

### 3. Directly assigning index or columns

Notice that the `set_index` method performs additional action. To avoid that we can always just directly assign to the dataframe `index` or `columns`

In [None]:
countries.index = countries.country
countries

In [None]:
countries_core_info = countries.loc[:, ["income_3groups", "world_4region", "name", "un_state"]]
countries_core_info.columns = ["income", "region", "name", "un_state"]
countries_core_info

### 4. Setting index to easily query data

One common reason to change index in a dataframe is to ease querying data.

Because pandas can return you a subset of a table based on index, as we've seen with `loc` operator, it becomes easy to align 2 tables based on same column data.

In [None]:
countries_core_info.loc[wide_form.geo]

## 3. Back to long form

The reverse method of `pivot` is called `melt`. I don't know a good mnemonic to remember them.

We can rearrange our table to a long form using it:

In [None]:
long_form = wide_form.melt(id_vars="geo")
long_form

## 3. Classwork

1. Pick your favourite region
2. Create a table with years as index and total population in all countries from this region as the only column