# Progress Report

### Finding Data
Major macroeconomic data can definitely be found on Bloomberg terminals, but there are a alot of other resources available. The St. Louis Federal Reserve keeps a lot of US macroeconomic data, as well as the IMF, Quandl, Yahoo finance, the European University Institute, the Conference Board, the list goes on and on. 

I was looking for something that was all-inclusive. Different data from different vendors can be a bit of an annoyance (different units, different ways of measuring), so I was hoping for one standardized, complete set of data for all countries. 

Fortunately, I found the OECD website and stumbled across the "Complete Macreconomic Dataset." Seemed like it had everything I was looking for, and I found out that being a McMaster student, I can get the dataset for free. So I downloaded it and looked around. 

### First Look at Data

I first opened the file in Excel, and was surprised by the amount of data here. Right off the bat, 43268 rows x 2021 columns. Doesn't sound like much, but macroeconomic data hasn't been recorded for that long. I expect to see a lot of empty spaces for weird series/series not recorded until recently.

This creates the problem that I have way too many variables for the amount of observations. I'll have to try to work around this. 

The data is split by Country/Subject/Measure/Unit and by Annual/Quarterly/Monthly. Using Excel, I split the data into three separate files for each.

Initial analysis was seeing what I was working with. I ran the following code to see each unique value for series and for country:

In [42]:
#Import the data
a_data <- read.csv("Data_Annually.csv")

# We want to see all the possible unique series, so we can determine which may be left out
unique(a_data$Subject)


825 series! Way too many to look at every combination, so we will need to know which are the most important. Maybe there is a way to do this without looking at every single combination?

In [43]:
# We want to see how many unique countries we are dealing with
unique(a_data$Country)


63 levels! While not every "country" has 825 series, there is still way too many here
(42368 rows in the excel file). Luckily, there are also groups that we can work with 
i.e. 
* Major 5 Asia
* BRIICS
* Euro area
* EU
* G20
* Big Four Euro
* G7
* NAFTA
* OECD total with different flavours
* SDR _ "SDRs (Special Drawing Rights) are international reserve assets created by the International Monetary Fund and allocated to its members to supplement existing reserve assets."
      
So we will definitely need to break these down into smaller pieces and figure out how many we will need.

Since Excel cannot handle more than $x$ columns, we have to go R only from here on out. The issue is that I don't have much RAM so doing things can be really slow. It'd be good to figure out how to use all 6 cores of my laptop to speed things up a bit!

# Beginning Analysis

I think it would make more sense to start with the more general and get a better understanding of the data

In [70]:
# Transpose so we can get proper colnames going
t <- data.frame(t(a_data))


In [71]:
# Remove all rows that have FLAGS in it, thus leaving only year indices
t <- t[!grepl("FL", row.names(t)),]

In [72]:
z <- t["Country",]

Now that the rows are cohesive, we will divide each of the countries into separate dataframes

In [73]:
# Change the column names in our whole dataset to make the columns less
# ambiguous
colnames(t) <- c(t(t(z)))

In [74]:
countries <- unique(colnames(t))

for (c in countries){
  df <- t[colnames(t)==c]
  assign(c, df)
  print(c)
}

[1] "Major Five Asia"
[1] "Argentina"
[1] "Australia"
[1] "Austria"
[1] "Belgium"
[1] "Bulgaria"
[1] "Brazil"
[1] "BRIICS economies - Brazil. Russia. India. Indonesia. China and South Africa"
[1] "Canada"
[1] "Switzerland"
[1] "Chile"
[1] "China (People's Republic of)"
[1] "Colombia"
[1] "Costa Rica"
[1] "Cyprus"
[1] "Czech Republic"
[1] "Germany"
[1] "Denmark"
[1] "Euro area (19 countries)"
[1] "Spain"
[1] "Estonia"
[1] "European Union (28 countries)"
[1] "Finland"
[1] "France"
[1] "G20"
[1] "Four Big European"
[1] "G7"
[1] "United Kingdom"
[1] "Greece"
[1] "Hungary"
[1] "Indonesia"
[1] "India"
[1] "Ireland"
[1] "Iceland"
[1] "Israel"
[1] "Italy"
[1] "Japan"
[1] "Korea"
[1] "Lithuania"
[1] "Luxembourg"
[1] "Latvia"
[1] "Mexico"
[1] "Malta"
[1] "NAFTA"
[1] "Netherlands"
[1] "Norway"
[1] "New Zealand"
[1] "OECD - Total"
[1] "OECD - Europe"
[1] "OECD + Major Six NME"
[1] "OECD total excluding the euro area"
[1] "Poland"
[1] "Portugal"
[1] "Romania"
[1] "Russia"
[1] "Saudi Arabia"
[1] "SD

In [76]:
# Test with a random country
Slovenia

Unnamed: 0,Slovenia,Slovenia.1,Slovenia.2,Slovenia.3,Slovenia.4,Slovenia.5,Slovenia.6,Slovenia.7,Slovenia.8,Slovenia.9,⋯,Slovenia.1063,Slovenia.1064,Slovenia.1065,Slovenia.1066,Slovenia.1067,Slovenia.1068,Slovenia.1069,Slovenia.1070,Slovenia.1071,Slovenia.1072
Series.code,SVN.B6BLPI01.CXCU,SVN.B6BLPI01.CXCUSA,SVN.B6BLPI01.NCCU,SVN.B6BLPI01.NCCUSA,SVN.B6BLSE01.CXCU,SVN.B6BLSE01.CXCUSA,SVN.B6BLSE01.NCCU,SVN.B6BLSE01.NCCUSA,SVN.B6BLSI01.CXCU,SVN.B6BLSI01.CXCUSA,⋯,SVN.XTIMVA01.GPSA,SVN.XTIMVA01.GYSA,SVN.XTIMVA01.NCML,SVN.XTIMVA01.NCMLSA,SVN.XTIMVA01.STSA,SVN.XTNTVA01.CXML,SVN.XTNTVA01.CXMLSA,SVN.XTNTVA01.NCML,SVN.XTNTVA01.NCMLSA,SVN.XTNTVA01.STSA
LOCATION,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,⋯,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN
Country,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,⋯,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia
Series.code.1,SVN.B6BLPI01.CXCU,SVN.B6BLPI01.CXCUSA,SVN.B6BLPI01.NCCU,SVN.B6BLPI01.NCCUSA,SVN.B6BLSE01.CXCU,SVN.B6BLSE01.CXCUSA,SVN.B6BLSE01.NCCU,SVN.B6BLSE01.NCCUSA,SVN.B6BLSI01.CXCU,SVN.B6BLSI01.CXCUSA,⋯,SVN.XTIMVA01.GPSA,SVN.XTIMVA01.GYSA,SVN.XTIMVA01.NCML,SVN.XTIMVA01.NCMLSA,SVN.XTIMVA01.STSA,SVN.XTNTVA01.CXML,SVN.XTNTVA01.CXMLSA,SVN.XTNTVA01.NCML,SVN.XTNTVA01.NCMLSA,SVN.XTNTVA01.STSA
LOCATION.1,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,⋯,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN,SVN
Country.1,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,⋯,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia,Slovenia
SUBJECT,B6BLPI01,B6BLPI01,B6BLPI01,B6BLPI01,B6BLSE01,B6BLSE01,B6BLSE01,B6BLSE01,B6BLSI01,B6BLSI01,⋯,XTIMVA01,XTIMVA01,XTIMVA01,XTIMVA01,XTIMVA01,XTNTVA01,XTNTVA01,XTNTVA01,XTNTVA01,XTNTVA01
Subject,Balance of payments BPM6 > Current account Balance > Primary income > Total Balance,Balance of payments BPM6 > Current account Balance > Primary income > Total Balance,Balance of payments BPM6 > Current account Balance > Primary income > Total Balance,Balance of payments BPM6 > Current account Balance > Primary income > Total Balance,Balance of payments BPM6 > Current account Balance > Services > Total Balance,Balance of payments BPM6 > Current account Balance > Services > Total Balance,Balance of payments BPM6 > Current account Balance > Services > Total Balance,Balance of payments BPM6 > Current account Balance > Services > Total Balance,Balance of payments BPM6 > Current account Balance > Secondary income > Total Balance,Balance of payments BPM6 > Current account Balance > Secondary income > Total Balance,⋯,International Trade > Imports > Value (goods) > Total,International Trade > Imports > Value (goods) > Total,International Trade > Imports > Value (goods) > Total,International Trade > Imports > Value (goods) > Total,International Trade > Imports > Value (goods) > Total,International Trade > Net trade > Value (goods) > Total,International Trade > Net trade > Value (goods) > Total,International Trade > Net trade > Value (goods) > Total,International Trade > Net trade > Value (goods) > Total,International Trade > Net trade > Value (goods) > Total
MEASURE,CXCU,CXCUSA,NCCU,NCCUSA,CXCU,CXCUSA,NCCU,NCCUSA,CXCU,CXCUSA,⋯,GPSA,GYSA,NCML,NCMLSA,STSA,CXML,CXMLSA,NCML,NCMLSA,STSA
Measure,US Dollars. sum over component sub-periods,US Dollars. sum over component sub-periods. s.a.,National currency. sum over component sub-periods,National currency. sum over component sub-periods s.a,US Dollars. sum over component sub-periods,US Dollars. sum over component sub-periods. s.a.,National currency. sum over component sub-periods,National currency. sum over component sub-periods s.a,US Dollars. sum over component sub-periods,US Dollars. sum over component sub-periods. s.a.,⋯,Growth rate previous period. s.a.,Growth rate same period previous year. s.a.,National currency. monthly level,National currency. monthly level. s.a.,Level. rate or national currency. s.a.,US Dollars. monthly level,US Dollars. monthly level. s.a.,National currency. monthly level,National currency. monthly level. s.a.,Level. rate or national currency. s.a.
