Data source: https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/abmi/ukea

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

First load in the data and check what it looks like. After doing this we see that there is some metadata at the top of the file.

In [2]:
gdp = pd.read_csv('./data/uk_gdp.csv')
gdp.head(10)

Unnamed: 0,Title,Gross Domestic Product: chained volume measures: Seasonally adjusted £m
0,CDID,ABMI
1,Source dataset ID,UKEA
2,PreUnit,£
3,Unit,m
4,Release date,31-03-2021
5,Next release,30 June 2021
6,Important notes,
7,1948,365418
8,1949,377558
9,1950,390150


If we look at the end of the file we will see that there is data on quarterly gdp values as well as yearly values.

In [3]:
gdp.tail()

Unnamed: 0,Title,Gross Domestic Product: chained volume measures: Seasonally adjusted £m
339,2019 Q4,544733
340,2020 Q1,529223
341,2020 Q2,426197
342,2020 Q3,498429
343,2020 Q4,504742


I'm only interested in the yearly values so let's use the iloc method to isolate the rows of interest. With a little trial and error we can determine that we need rows 7-80

In [4]:
gdp = gdp.iloc[7:80]
gdp.tail()

Unnamed: 0,Title,Gross Domestic Product: chained volume measures: Seasonally adjusted £m
75,2016,2079113
76,2017,2115296
77,2018,2141792
78,2019,2172511
79,2020,1958591


Let's just check that we aren't missing any of the years from the top of the file:

In [5]:
gdp.head()

Unnamed: 0,Title,Gross Domestic Product: chained volume measures: Seasonally adjusted £m
7,1948,365418
8,1949,377558
9,1950,390150
10,1951,404591
11,1952,410688


The column names are either wrong, or too long to be readable, so let's rename them to something more easily usable and check the dataframe after doing so.

In [6]:
gdp.columns = ['Year', 'GDP']
gdp.head()

Unnamed: 0,Year,GDP
7,1948,365418
8,1949,377558
9,1950,390150
10,1951,404591
11,1952,410688


Finally, let's write the data to a csv file

In [7]:
gdp.to_csv('./data/ukgdp_1948-2020.csv', index=False)