# Coding Exercises

## Visualizing government research and development spending


This exercise asks you to re-create the government R&D spending figures (seen below).

<img src="https://vdsbook.com/02-dslc_files/figure-html/fig-govt-spending-exploratory-explanatory-1.png" alt="Government Spending" width="600"/>

In [5]:
import pandas as pd
import numpy as np
import plotly.express as px


## Background information

The version of the data that we are using comes from "Tidy Tuesday" ([link](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-12#federal-research-and-development-spending-by-agency)), but the original data source is from the American Association for the Advancement of Science ([link](https://www.aaas.org/programs/r-d-budget-and-policy/historical-trends-federal-rd)). The version of the data we are using has already been cleaned by Tom Mock (a Tidy Tuesday facilitator), and he shares the code he used to clean the data [here](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-12#federal-research-and-development-spending-by-agency).


## Load in the data

First, we load in the three files.

In [None]:
fed_spending = pd.read_csv("data/fed_rd.csv")
energy_spending = pd.read_csv("data/energy_spend.csv")
climate_spending = pd.read_csv("data/climate_spend.csv") 


Let's take a look at the first few rows of each dataset

In [None]:
fed_spending.head()


Unnamed: 0,department,year,rd_budget,total_outlays,discretionary_outlays,gdp
0,DOD,1976,35696000000,371800000000.0,175600000000.0,1790000000000
1,NASA,1976,12513000000,371800000000.0,175600000000.0,1790000000000
2,DOE,1976,10882000000,371800000000.0,175600000000.0,1790000000000
3,HHS,1976,9226000000,371800000000.0,175600000000.0,1790000000000
4,NIH,1976,8025000000,371800000000.0,175600000000.0,1790000000000


In [None]:
energy_spending.head()

Unnamed: 0,department,year,energy_spending
0,Office of Science R&D,1997,3593300000.0
1,Adv Sci Comp Res*,1997,217200000.0
2,Basic Energy Sciences*,1997,933300000.0
3,Bio and Env Research*,1997,550600000.0
4,Fusion Energy Sciences*,1997,331400000.0


In [None]:
climate_spending.head()

Unnamed: 0,department,year,gcc_spending
0,NASA,2000,1590942000.0
1,NSF,2000,256249900.0
2,Commerce (NOAA),2000,91811460.0
3,Energy,2000,154846200.0
4,Agriculture,2000,76737940.0


## Prepare the data

Next, we want to prepare the data (this is essentially pre-processing the data for our visualization analysis below). To simplify the data, we make the judgment call to aggregate/summarize the spending across all departments for each year.

In [None]:
# aggregate the energy spending by year
energy_by_year = energy_spending.groupby("year")["energy_spending"] \
    .sum() \
    .to_frame() \
    .reset_index()
energy_by_year

Unnamed: 0,year,energy_spending
0,1997,12625000000.0
1,1998,12662500000.0
2,1999,13651600000.0
3,2000,13349700000.0
4,2001,14511200000.0
5,2002,14717900000.0
6,2003,15042700000.0
7,2004,15343100000.0
8,2005,14717200000.0
9,2006,14194100000.0


In [None]:
# aggregate the climate spending by year
climate_by_year = climate_spending.groupby("year")["gcc_spending"] \
    .sum() \
    .to_frame("climate_spending") \
    .reset_index()
climate_by_year

Unnamed: 0,year,climate_spending
0,2000,2311730000.0
1,2001,2312562000.0
2,2002,2195398000.0
3,2003,2689233000.0
4,2004,2484252000.0
5,2005,2283907000.0
6,2006,2003552000.0
7,2007,2043967000.0
8,2008,2069039000.0
9,2009,2345556000.0


In [None]:
# aggregate the climate spending by year
#fed_spending.groupby("year")["rd_budget"].agg("sum")
fed_spending_by_year = fed_spending[["year","total_outlays", "discretionary_outlays", "gdp"]] \
    .drop_duplicates() \
    .rename(columns = {"total_outlays": "total_spending",
                       "discretionary_outlays": "discretionary_spending"}) \
    .reset_index(drop=True)
fed_spending_by_year

Unnamed: 0,year,total_spending,discretionary_spending,gdp
0,1976,371800000000.0,175600000000.0,1790000000000
1,1977,409200000000.0,197100000000.0,2028000000000
2,1978,458700000000.0,218700000000.0,2278000000000
3,1979,504000000000.0,240000000000.0,2570000000000
4,1980,590900000000.0,276300000000.0,2797000000000
5,1981,678200000000.0,307900000000.0,3138000000000
6,1982,745700000000.0,325900000000.0,3314000000000
7,1983,808400000000.0,353300000000.0,3541000000000
8,1984,851800000000.0,379400000000.0,3953000000000
9,1985,946300000000.0,415800000000.0,4270000000000


In [None]:
rd_budget_by_year = fed_spending.groupby("year")["rd_budget"] \
    .sum() \
    .to_frame("total_rd_budget") \
    .reset_index()
rd_budget_by_year

Unnamed: 0,year,total_rd_budget
0,1976,86227000000
1,1977,91807000000
2,1978,94864000000
3,1979,96601000000
4,1980,96305000000
5,1981,98304000000
6,1982,95448000000
7,1983,95010000000
8,1984,105371000000
9,1985,114818000000


And then we can combine all three datasets together using `left_join()`, filter to the year 2000 onwards, and scale each of the relevant variables by one million.


In [None]:
govt_spending = fed_spending_by_year.merge(rd_budget_by_year, on="year", how="left") \
    .merge(energy_by_year, on="year", how="left") \
    .merge(climate_by_year, on="year", how="left") \
    .query("year >= 2000") \
    .reset_index(drop=True)
# divide relevant variables by 1_000_000    
spending_var_names = ["gdp", "total_rd_budget", "total_spending", "discretionary_spending", "energy_spending", "climate_spending"]
govt_spending[spending_var_names] = govt_spending[spending_var_names] / 1_000_000
# look at the data
govt_spending

Unnamed: 0,year,total_spending,discretionary_spending,gdp,total_rd_budget,energy_spending,climate_spending
0,2000,1789000.0,614700.0,10148000.0,142299.0,13349.7,2311.730453
1,2001,1862800.0,649100.0,10565000.0,153197.0,14511.2,2312.561976
2,2002,2010900.0,734000.0,10877000.0,170354.0,14717.9,2195.3979
3,2003,2159900.0,824300.0,11332000.0,192010.0,15042.7,2689.233013
4,2004,2292800.0,895100.0,12089000.0,199104.0,15343.1,2484.251666
5,2005,2472000.0,968500.0,12889000.0,200099.0,14717.2,2283.906691
6,2006,2655000.0,1016700.0,13685000.0,199429.0,14194.1,2003.551973
7,2007,2728700.0,1041600.0,14323000.0,201827.0,14655.6,2043.966543
8,2008,2982500.0,1134800.0,14752000.0,200857.0,15298.2,2069.038746
9,2009,3517700.0,1237500.0,14415000.0,201275.0,16491.6,2345.5564



## Exercise:

Now it's your turn to re-create the figures specified in the exercise.


In [None]:
![Government Spending](https://vdsbook.com/02-dslc_files/figure-html/fig-govt-spending-exploratory-explanatory-1.png)
