# Assignment 6: Pandas Practice

### Instructions

Complete the tasks outlined below. In some cases the code may be partially written, and you need to finish the code. In other cases, you will need to write the entire line of code. Or, you may need to explain in plain English what a code chunk is doing.

When you are finished, save the notebook and push the repo to your fork of the repo, and copy/paste the link to your repo into the assignment 6 google form.

Feel free to create more cells as needed in order to explore the data and fully answer the questions.

## Import pandas

What is the standard import statement for pandas? Go ahead and import the pandas library

In [None]:
import pandas as pd

## Reading in data

Read the data in `gapminder_gdp_americas.csv` (in the `data` directory) into a variable called `americas` and display its summary statistics. Remember to specify the `country` column as the index.

In [None]:
americas = pd.read_csv('data/gapminder_gdp_americas.csv')
print(data)

In [None]:
americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country')
print(data)

## Inspecting data

After reading the data for the Americas, use help(americas.head) and help(americas.tail) to find out what DataFrame.head and DataFrame.tail do.

1. What method call will display the first three rows of this data?
2. What method call will display the last three columns of this data? (Hint: you may need to change your view of the data.)

In [None]:
americas.head(n=3)

In [None]:
americas_flipped = americas.T

In [None]:
americas_flipped.tail(n=3)

## Selection of Individual Values

Write an expression to find the Per Capita GDP of Chile in 1992.

In [None]:
americas.loc["Chile","gdpPercap_1992"]

What expression will produce an identical result?

In [None]:
americas.iloc[,]

## Reconstrucing Data

Explain what each line in the following short program does: what is in first, second, etc.?

In [None]:
first = pd.read_csv('data/gapminder_all.csv', index_col='country')
second = first[first['continent'] == 'Americas']
third = second.drop('Puerto Rico')
fourth = third.drop('continent', axis = 1)
fourth.to_csv('result.csv')

### first

### second

### third

### fourth

### fifth


## Selecting Indices

Explain in simple terms what idxmin and idxmax do in the short program below. When would you use these methods?

In [None]:
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
print(data.idxmin())
print(data.idxmax())

## Practice with Selection

Using the `data` dataframe above, with GDP data for Europe, write an expression to select:


1. GDP per capita for all countries in 1982.
2. GDP per capita for Denmark for all years.
3. GDP per capita for all countries for years after 1985.
4. GDP per capita for each country in 2007 as a multiple of GDP per capita for that country in 1952.

In [None]:
data['gdpPercap_1982']

In [None]:
data.loc['Denmark',:]

In [None]:
data.loc[:,'gdpPercap_1985':]

In [None]:
data['gdpPercap_2007']/data['gdpPercap_1952']

## Import matplotlib's sublibrary pyplot using the standard import statement

In [None]:
import pandas as pd

## Minima and Maxima

Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. 

Modify it again to plot the maximum GDP per capita over time for Europe.

In [None]:
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')

In [None]:
data_europe.____.plot(label='min')
data_europe.____
plt.legend(loc='best')
plt.xticks(rotation=90)

In [None]:
data_europe.min().plot(label='min')
data_europe.max().plot(label='max')
plt.legend(loc='best')
plt.xticks(rotation=90)

## Correlations

Finish the code below to create a scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia for each year in the data set. What relationship do you see (if any)?

In [None]:
data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country')
data_asia.describe().T.plot(kind='scatter', x='min', y='max')

### relationship of min/max GDP for countries in Asia


You might note that the variability in the maximum is much higher than that of the minimum. Take a look at the maximum and the max indexes:

In [None]:
data_asia.max().plot()
print(data_asia.idxmax())
print(data_asia.idxmin())

### What are some observations you have of this plot?



## Save the plot

If you are satisfied with the plot you see you may want to save it to a file, perhaps to include it in a publication. There is a function in the matplotlib.pyplot module that accomplishes this: savefig. Calling this function, e.g. with

`plt.savefig('my_figure.png')`

will save the current figure to the file my_figure.png. The file format will automatically be deduced from the file name extension (other formats are pdf, ps, eps and svg).

Note that functions in plt refer to a global figure variable and after a figure has been displayed to the screen (e.g. with plt.show) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you call plt.savefig before the plot is displayed to the screen, otherwise you may find a file with an empty plot.

When using dataframes, data is often generated and plotted to screen in one line, and plt.savefig seems not to be a possible approach. One possibility to save the figure to file is then to 
- save a reference to the current figure in a local variable (with plt.gcf)
- call the savefig class method from that variable.

```
fig = plt.gcf() # get current figure
data.plot(kind='bar')
fig.savefig('my_figure.png')
```

**Go ahead and save the plot produced by `data_asia.max().plot()` to a figure**

In [None]:
plt.savefig('my_figure.png')