# Pandas Dataframes

© Explore Data Science Academy

## Instructions to Students
- **Do not add or remove cells in this notebook. Do not edit or remove the `### START FUNCTION` or `### END FUNCTION` comments. Do not add any code outside of the functions you are required to edit. Doing any of this will lead to a mark of 0%!**
- Answer the questions according to the specifications provided.
- Use the given cell in each question to to see if your function matches the expected outputs.
- Do not hard-code answers to the questions.
- The use of stackoverflow, google, and other online tools are permitted. However, copying fellow student's code is not permissible and is considered a breach of the Honour code below. Doing this will result in a mark of 0%.
- Good luck, and may the force be with you!

## Honour Code

I **YOUR NAME, YOUR SURNAME**, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the   <a href="https://drive.google.com/open?id=1FXCIf425JLRx3JQi-ltSWppj8BCF3Np1" target="_blank">EDSA Student Manifesto</a>.

Non-compliance with the honour code constitutes a material breach of contract.

### Import the required libraries

In [3]:
import numpy as np
import pandas as pd

### Data

You will need these dataframes in order to answer the following questions.

In [4]:
country_map_df = pd.read_csv('country_code_map.csv', index_col='Country Code')
population_df = pd.read_csv('world_population.csv', index_col='Country Code')
meta_df = pd.read_csv('metadata.csv', index_col='Country Code')

FileNotFoundError: [Errno 2] File country_code_map.csv does not exist: 'country_code_map.csv'

_**Dataframe specifications:**_

The dataframes provide information about the population of the world for various years. Some things to note:
* All dataframes have a `Country Code` as an index, which is a three letter code referring to a country.
* The `country_map_df` data maps the `Country Code` to a `Country Name`.
* The `population_df` data contains information on the population for a given country between the years of 1960 and 2017.
* The `meta_df` data contains meta information about each country, including it's geographical region, it's income group, and a comment on the country as a whole.

In [5]:
country_map_df.head()

NameError: name 'country_map_df' is not defined

In [6]:
population_df.head()

NameError: name 'population_df' is not defined

In [7]:
meta_df.head()

NameError: name 'meta_df' is not defined

Using this information, answer the questions below:

### Question 1

Write a function that returns the summed population total in a given geographic region for a given year.

_**Function Specifications:**_
* Should take as input a year as an `int` and region as a `str`.
* Should return an `int` corresponding to the population.

In [235]:
### START FUNCTION
def total_pop_in_region(year,region):
    df = meta_df[['Region']].copy()
    df = df.join(population_df)
    df = df.groupby('Region').sum()
    return df.loc[region,str(year)]
### END FUNCTION

In [None]:
total_pop_in_region(1960,'East Asia & Pacific')

_**Expected Outputs:**_
```python
total_pop_in_region(1960,'East Asia & Pacific') == 1029332591.0
total_pop_in_region(1970,'South Asia') == 712740919.0
```

### Question 2

Write a function that returns the global yearly population `Growth`, grouped by the `Income Group` and `Year`.

_**Function Specifications**_
* Should not take any inputs.
* The years are currently presented as the heading of each row in the population table. The table will have to be melted to produce the appropriate format. You can use `df.melt` to do this, where the variable name should be `Year` and the value name should be `Growth`.
* Should group by the `Year` and `Income Group`.
* Should only have one column named `Growth`.
* The `Income Group` and the `Year` should be indices.
* The `Growth` is calculated by taking the yearly difference and dividing it by the total population for each year, multiplied by 100.
* Should return a `DataFrame`.

In [239]:
### START FUNCTION
def population_difference_by_income():
    df = meta_df[['Income Group']].copy()
    df = df.join(population_df)
    df = pd.melt(
        df, 
        id_vars = ['Income Group'], 
        var_name='Year',
        value_name='Growth'
    )
    df = df.groupby(['Income Group', 'Year']).sum()
    Income_groups = list(meta_df['Income Group'].unique())

    for group in Income_groups:
        df.loc[group]['Growth'] = df.loc[group]['Growth'].diff()/df.loc[group]['Growth']*100
    return df
### END FUNCTION

In [None]:
df = population_difference_by_income()

_**Expected Output:**_
```python
population_difference_by_income().head()
```
> <table class="dataframe" border="1">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>Growth</th>
    </tr>
    <tr>
      <th>Income Group</th>
      <th>Year</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="5" valign="top">High income</th>
      <th>1960</th>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1961</th>
      <td>1.450978</td>
    </tr>
    <tr>
      <th>1962</th>
      <td>1.261630</td>
    </tr>
    <tr>
      <th>1963</th>
      <td>1.235893</td>
    </tr>
    <tr>
      <th>1964</th>
      <td>1.207633</td>
    </tr>
  </tbody>
</table>

### Question 3

Using the function you just created, write a function that returns the average population _growth_ over all years for a given income group. 

_**Function Specifications:**_
* Should take as input a `str` as the income group.
* Should raise a `ValueError` if the input is not a valid income group.
* Should return a `float`, rounded to 2 decimal places.

In [241]:
### START FUNCTION
def ave_growth_by_income(income_group):
    df = population_difference_by_income()
    df = df.groupby('Income Group').mean()
    
    if income_group not in df.index:
        raise ValueError
    
    return np.round(float(df.loc[income_group]), 2)
### END FUNCTION

In [None]:
ave_growth_by_income('Low income')

_**Expected Outputs:**_
```python
ave_growth_by_income('High income') == 0.81
ave_growth_by_income('Low income') == 2.55
```