Run the cell below.

In [1]:
import pandas as pd
import pandas_datareader as pdr

ind = pd.DataFrame({'weight':['ew','vw','ew','vw'], 
                      'year': ['2010','2010','2011','2011'], 
                      'ret': [0.1,0.15,0.05,0.01],
                      'N':[5,3,4,10]})
ind

Unnamed: 0,weight,year,ret,N
0,ew,2010,0.1,5
1,vw,2010,0.15,3
2,ew,2011,0.05,4
3,vw,2011,0.01,10


Reshape the data in the ``ret`` and ``N`` columns of the ``ind`` dataframe, so that each year gets its own row and different weight types ('ew' and 'vw') get their own columns. Call this new, reshaped dataframe ``ind_long``. Print ``ind_long``.

In [2]:
ind_long = ind.pivot(index='year', columns='weight', values=['ret','N'])
ind_long

Unnamed: 0_level_0,ret,ret,N,N
weight,ew,vw,ew,vw
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2010,0.1,0.15,5.0,3.0
2011,0.05,0.01,4.0,10.0


Reshape ``ind_long`` back into a dataframe that has the same columns as ``ind``. Call this new dataframe ``ind_wide``. Print ``ind_wide``.

In [3]:
ind_wide = ind_long.stack(level=1).reset_index()
ind_wide

Unnamed: 0,year,weight,ret,N
0,2010,ew,0.1,5.0
1,2010,vw,0.15,3.0
2,2011,ew,0.05,4.0
3,2011,vw,0.01,10.0


Use the ``pandas_datareader`` package to download data on the 'GDP' and 'GNP' series from the St. Louis FRED for the year 2020 (should have 4 rows, one for each quarter in 2020). You can download the data for both series at the same time (i.e. into the same dataframe) by supplying the 'GDP' and 'GNP' names as elements of a single list. Call this new dataframe ``macro1`` and print it.

In [4]:
macro1 = pdr.DataReader(name = ['GDP','GNP'], data_source = 'fred', start='2020-01-01', end='2020-12-31')
macro1

Unnamed: 0_level_0,GDP,GNP
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,21481.367,21721.267
2020-04-01,19477.444,19649.442
2020-07-01,21138.574,21365.412
2020-10-01,21477.597,21728.223


Now download data for the same 'gdp' and 'gnp' series for the year 2019. Store these into a dataframe called ``macro2`` and print it out.

In [5]:
macro2 = pdr.DataReader(['GDP','GNP'], data_source = 'fred', start='2019-01-01', end='2019-12-31')
macro2

Unnamed: 0_level_0,GDP,GNP
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-01-01,21001.591,21254.334
2019-04-01,21289.268,21564.924
2019-07-01,21505.012,21780.753
2019-10-01,21694.458,21955.98


Append ``macro1`` to ``macro2`` (stack them on top of each other). Call the resulting dataframe ``vert`` and print it out.

In [6]:
vert = pd.concat([macro1,macro2], axis=0)
vert

Unnamed: 0_level_0,GDP,GNP
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,21481.367,21721.267
2020-04-01,19477.444,19649.442
2020-07-01,21138.574,21365.412
2020-10-01,21477.597,21728.223
2019-01-01,21001.591,21254.334
2019-04-01,21289.268,21564.924
2019-07-01,21505.012,21780.753
2019-10-01,21694.458,21955.98


Also from FRED, download data on real GDP (``name = 'GDPC1'``) for 2019 and 2020. Call the resulting dataframe ``rgdp`` and print it out.

In [7]:
rgdp = pdr.DataReader('GDPC1', data_source = 'fred', start='2019-01-01', end='2020-12-31')
rgdp

Unnamed: 0_level_0,GDPC1
DATE,Unnamed: 1_level_1
2019-01-01,18833.195
2019-04-01,18982.528
2019-07-01,19112.653
2019-10-01,19202.31
2020-01-01,18951.992
2020-04-01,17258.205
2020-07-01,18560.774
2020-10-01,18767.778


Concatenate ``vert`` and ``rgdp`` so their data appears side by side into a new dataframe called ``full``. Print ``full``.

In [8]:
full = pd.concat([vert,rgdp], axis = 1)
full

Unnamed: 0_level_0,GDP,GNP,GDPC1
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,21001.591,21254.334,18833.195
2019-04-01,21289.268,21564.924,18982.528
2019-07-01,21505.012,21780.753,19112.653
2019-10-01,21694.458,21955.98,19202.31
2020-01-01,21481.367,21721.267,18951.992
2020-04-01,19477.444,19649.442,17258.205
2020-07-01,21138.574,21365.412,18560.774
2020-10-01,21477.597,21728.223,18767.778


Verify that this gives you the same answer as performing an outer join on ``vert`` and ``rgdp`` using the ``.join()`` function.

In [9]:
with_join = vert.join(rgdp, how='outer')
with_join

Unnamed: 0_level_0,GDP,GNP,GDPC1
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,21001.591,21254.334,18833.195
2019-04-01,21289.268,21564.924,18982.528
2019-07-01,21505.012,21780.753,19112.653
2019-10-01,21694.458,21955.98,19202.31
2020-01-01,21481.367,21721.267,18951.992
2020-04-01,19477.444,19649.442,17258.205
2020-07-01,21138.574,21365.412,18560.774
2020-10-01,21477.597,21728.223,18767.778


Reset index on ``vert`` and ``rgdp``. Call the resulting dataframes ``vert_reset`` and ``rgdp_reset`` and print them out.

In [10]:
vert_reset = vert.reset_index()
vert_reset

Unnamed: 0,DATE,GDP,GNP
0,2020-01-01,21481.367,21721.267
1,2020-04-01,19477.444,19649.442
2,2020-07-01,21138.574,21365.412
3,2020-10-01,21477.597,21728.223
4,2019-01-01,21001.591,21254.334
5,2019-04-01,21289.268,21564.924
6,2019-07-01,21505.012,21780.753
7,2019-10-01,21694.458,21955.98


In [11]:
rgdp_reset = rgdp.reset_index()
rgdp_reset

Unnamed: 0,DATE,GDPC1
0,2019-01-01,18833.195
1,2019-04-01,18982.528
2,2019-07-01,19112.653
3,2019-10-01,19202.31
4,2020-01-01,18951.992
5,2020-04-01,17258.205
6,2020-07-01,18560.774
7,2020-10-01,18767.778


Concatenate ``vert_reset`` and ``rgdp_rest`` so their data appears side by side into a new dataframe called ``full_reset``. Print ``full_reset``. Note how this time the dates for the two dataframes don't align anymore. That's because now ``.concat()`` joins the two dataframes based on their new indexes (the 0-7 ones).

In [12]:
full_reset = pd.concat([vert_reset, rgdp_reset], axis = 1)
full_reset

Unnamed: 0,DATE,GDP,GNP,DATE.1,GDPC1
0,2020-01-01,21481.367,21721.267,2019-01-01,18833.195
1,2020-04-01,19477.444,19649.442,2019-04-01,18982.528
2,2020-07-01,21138.574,21365.412,2019-07-01,19112.653
3,2020-10-01,21477.597,21728.223,2019-10-01,19202.31
4,2019-01-01,21001.591,21254.334,2020-01-01,18951.992
5,2019-04-01,21289.268,21564.924,2020-04-01,17258.205
6,2019-07-01,21505.012,21780.753,2020-07-01,18560.774
7,2019-10-01,21694.458,21955.98,2020-10-01,18767.778


Use ``.merge()`` to perform an outer merge on ``vert_reset`` and ``rgdp_reset`` based on ``DATE``. Print the resulting merged dataframe.

In [13]:
with_merge = vert_reset.merge(rgdp_reset, how = 'outer', on = 'DATE')
with_merge

Unnamed: 0,DATE,GDP,GNP,GDPC1
0,2020-01-01,21481.367,21721.267,18951.992
1,2020-04-01,19477.444,19649.442,17258.205
2,2020-07-01,21138.574,21365.412,18560.774
3,2020-10-01,21477.597,21728.223,18767.778
4,2019-01-01,21001.591,21254.334,18833.195
5,2019-04-01,21289.268,21564.924,18982.528
6,2019-07-01,21505.012,21780.753,19112.653
7,2019-10-01,21694.458,21955.98,19202.31
