# Welcome the our Interactive Jupyter Notebook Workshop

## A brief explorarion of the Gapminder dataset


Today, we will simulate a data analysis workflow, which will allow us to play around with some of Jupyter's interactive aspects.

We will explore part of the **Gapminder** dataset using tools from two very popular R packages: **dplyr** and **ggplot2**. Enjoy!

## A few words about Gapminder


Gapminder Foundation is a non-profit venture registered in Stockholm, Sweden, that promotes sustainable global development and achievement of the **United Nations Millennium Development Goals** by increased use and understanding of statistics and other information about social, economic and environmental development at local, national and global levels.


<img src="https://www.adamvowles.co.uk/wp-content/uploads/2015/05/gapminder.png" width="500">

## Let's get started with some data exploration!

In [None]:

# First we load all necessary packages for this session from CRAN 

# Data analysis packages
library(dplyr)
library(ggplot2)

# The package containing our dataset
install.packages('gapminder')
library(gapminder)

**Well done!**  We will now take a dive into our dataset.

In [None]:
# This allows us to take a sneak peak of our dataset
head(gapminder)

# And here we are trying to find out more about the type of variables in gapminder
str(gapminder)

# How many entries dows our datset contain?
nrow(gapminder)

In [None]:
# First we will group all our entries by country to reduce the number of rows

gapminder_mini = gapminder %>%
    filter(year == 1957) %>%
    group_by(country)


# Let's have a look at the number of entries in gapminder_mini
nrow(gapminder_mini)

# Check out the average life expectancy in 1957 by using the arithmetic mean - Note: The $ sign, helps us select a variable of interest.
mean(gapminder_mini$lifeExp)

# It's your turn, I will give your group another year to try out. We will then compare life expectancy and population size for that year!

**Great work!**

## A quick loook at exploratory plotting!

We will explore change in population size over time by continent

In [None]:
# First we prepare our data 

by_year_continent = gapminder %>%
group_by(year, continent) %>%
summarise(meanpop = mean(pop))

# Then we plot! Which continent has seen the steapest increase in population size?

by_year_continent %>% 
ggplot(aes(x= year, y = meanpop, color = continent)) + geom_line()


# Can you do the same for life expectancy? What did you see?


## We will now focus on the Asian continent to understand the previous result better





<img src="https://online.seterra.com/images/system/asia-borders.png" width="500">


In [None]:
# Here we use a filter to only keep data from the European continent - make the appropriate changes to isolate data from Asia

gap_asia =  gapminder %>%
  filter(continent == 'Europe') %>%
  group_by(year, country) %>%
  summarise(meanpop = mean(pop))


# We will now make a plot to look at pop size in the Asian countries

asia_plot = gap_asia %>%
ggplot(aes(x = year, y = meanpop, color = country)) + geom_line() 


asia_plot

# We can use a logarithmic tranformation to visualise our data better - Can you propose a trasformation


## Color pallettes

If the default palette is not working well for you, there are ways to customize the colours in your plots. Here we use **RColorBrewer**.
Here are some examples: 

<img src="http://a2.typepad.com/6a0105360ba1c6970c01b7c7187af2970b-800wi" width="700">


In [None]:
# First, let's install a package that contains numerous palettes for R
library(RColorBrewer)

# After running this chunk of code - experiment with another palette with perhaps more colours
asia_plot + scale_color_brewer(palette="Dark2")

## A bonus plot for the road - Looking for relationships in data

In [None]:
# Let's focus now once again on year 1957

gapminder_1957 = gapminder %>%
  filter(year == 1957)

# Is there a relationship between population size and lide expectancy
gapminder_1957 %>%
ggplot(aes(x = pop, y = lifeExp)) +  geom_point() 

# If this plot was in a Notebook, what would you want to customize? Discuss with your group and we will try it in class.

# Congratulations on completing this workshop!

<img src="https://i.imgflip.com/22dl7c.jpg" width="600">

If you would like to get started with Jupyter Notebooks for your own work, feel free to contact me at:
nathalie_vladis@hms.harvard.edu