# Economic Model - Part 1: Unemployment rate

## 1- Introduction

In this model, the main goal will be to determine the GDP variables that affect the most to the unemployment rate in Spain by:

**1. GDP variable:** internal consumption, gross fixed capital formation, exportation and importation.

**2. By sector:** which are the sectors that have a strongest impact on Spain's unemployment rate.

In order to do so, the following steps will be completed:
1. Data wrangling.
2. Determination of each variable's correlation with the unemployment rate depending on the time shift.
3. Building of the above-mentioned regression models (by GDP variable and by sector).

## 2- Data wrangling

In this section, we will prepare all the data for our analysis.

In [124]:
#We first import the necessary libraries
import requests
from datetime import datetime
import pandas as pd
import re

In [26]:
#We then extract the tables from the INE API:
"""
df.drop(['Secreto'],axis=1, inplace = True)
df.drop(['FK_TipoDato'],axis=1, inplace = True)
df
"""
def INE_extractor(code,num_data):
    url_template = 'http://servicios.ine.es/wstempus/js/EN/DATOS_TABLA/{codigo}?nult={num_datos}'
    url = url_template.format(codigo=code, num_datos=num_data)
    response = requests.get(url)
    data = response.json()
    df = pd.json_normalize(data)
    df = df[df['Nombre'].str.contains('National Total')]
    df.drop(['FK_Unidad','FK_Escala'],axis = 1, inplace = True)
    return df

In [149]:
#We first extract the unemployment rate data (our dependent variable):
unemployment = INE_extractor('4247',4*20)
unemployment = unemployment[unemployment['Nombre'].str.contains('Both sexes')]
unemployment = unemployment[unemployment['Nombre'].str.contains('All ages')]
unemployment['Nombre'] = 'Unemployment rate'
unemployment = pd.concat([pd.DataFrame(x) for x in unemployment['Data']],
                         keys=unemployment['Nombre']).reset_index(level=1, drop=True).reset_index()
unemployment['Fecha'] = [datetime.fromtimestamp(x/1000)for x in unemployment['Fecha']]
unemployment.drop(['Secreto','FK_TipoDato','FK_Periodo'],axis=1,inplace=True)
unemployment

Unnamed: 0,Nombre,Fecha,Anyo,Valor
0,Unemployment rate,2020-04-01,2020,15.33
1,Unemployment rate,2020-01-01,2020,14.41
2,Unemployment rate,2019-10-01,2019,13.78
3,Unemployment rate,2019-07-01,2019,13.92
4,Unemployment rate,2019-04-01,2019,14.02
...,...,...,...,...
69,Unemployment rate,2003-01-01,2003,11.99
70,Unemployment rate,2002-10-01,2002,11.61
71,Unemployment rate,2002-07-01,2002,11.49
72,Unemployment rate,2002-04-01,2002,11.15


In [147]:
#We then extract the main GDP by demand variables (our independent variables for the first regression model):
GDP_demand = INE_extractor('28604',4*20)
GDP_demand = GDP_demand[GDP_demand['Nombre'].str.contains(' adjusted')]
GDP_demand = GDP_demand[GDP_demand['Nombre'].str.contains('Base data')]
GDP_demand['Nombre'] = GDP_demand['Nombre'].str.replace('National Total. Base 2010. Seasonally and calendar adjusted data. ','')
GDP_demand['Nombre'] = GDP_demand['Nombre'].str.replace('. Base data. Current prices. ','')
values = ['Final consumption expenditure','Gross capital formation',
          'Exports of goods','Exports of services','Imports of goods','Imports of services']
GDP_demand = GDP_demand[GDP_demand['Nombre'].isin(values)]
GDP_demand = pd.concat([pd.DataFrame(x) for x in GDP_demand['Data']],
                         keys=GDP_demand['Nombre']).reset_index(level=1, drop=True).sort_values('Fecha',ascending = False).reset_index()
GDP_demand.drop(['Secreto','FK_TipoDato','FK_Periodo'],axis=1,inplace=True)
GDP_demand['Fecha'] = [datetime.fromtimestamp(x/1000)for x in GDP_demand['Fecha']]
GDP_demand.dtypes

Nombre            object
Fecha     datetime64[ns]
Anyo               int64
Valor            float64
dtype: object

In [146]:
#And we finally extract the GDP by sector (our independent variables for the second regression model):
GDP_offer = INE_extractor('28602',4*20)
GDP_offer = GDP_offer[GDP_offer['Nombre'].str.contains(' adjusted')]
GDP_offer = GDP_offer[GDP_offer['Nombre'].str.contains('Base data')]
GDP_offer['Nombre'] = GDP_offer['Nombre'].str.replace('National Total. Base 2010. Seasonally and calendar adjusted data. ','')
GDP_offer['Nombre'] = GDP_offer['Nombre'].str.replace('. Base data. Current prices. ','')
values = list(GDP_offer['Nombre'].unique())
values_refined = [values[1]] + values[3:5] + values [6:13]
GDP_offer = GDP_offer[GDP_offer['Nombre'].isin(values_refined)]
GDP_offer = pd.concat([pd.DataFrame(x) for x in GDP_offer['Data']],
                         keys=GDP_offer['Nombre']).reset_index(level=1, drop=True).sort_values(['Fecha','Nombre'],ascending = False).reset_index()
GDP_offer.drop(['Secreto','FK_TipoDato','FK_Periodo'],axis=1,inplace=True)
GDP_offer['Fecha'] = [datetime.fromtimestamp(x/1000)for x in GDP_offer['Fecha']]
GDP_offer['Nombre'] = GDP_offer['Nombre'].apply(lambda x: re.sub(r'\(.+\)','',x))
GDP_offer['Nombre'] = GDP_offer['Nombre'].str.replace('GVApb ','')
GDP_offer['Nombre'] = GDP_offer['Nombre'].str.replace('GVAbp ','')
GDP_offer['Type'] = ['Construction' if 'Construction' in x 
                     else 'Services' if 'Services activities. ' in x 
                     else 'Industry' if 'Industry' in x
                    else 'Others' for x in GDP_offer['Nombre']]
GDP_offer['Nombre'] = GDP_offer['Nombre'].apply(lambda x: re.sub(r'.+\. ','',x))
GDP_offer.dtypes

Nombre            object
Fecha     datetime64[ns]
Anyo               int64
Valor            float64
Type              object
dtype: object

## 2- Unemployment rate vs. GDP variables

In this section, we will try to estimate the uneomployment rate based on final consumprion, gross capital formation, imports and exports.

## 2.1 - Time shift determination

In this sub-section, we will try to find, for each variable, the time shift with which it has the highest correlation with the current unemployment rate.

In [181]:
#We first join the unemployment rate (dependent variable) to the independent variables:
data_1 = GDP_demand.merge(unemployment,on = 'Fecha',how = 'inner')
data_1.rename(columns = {'Nombre_x':'Variable','Fecha':'Date','Valor_x':'X','Valor_y':'Unemployment rate'},inplace = True)
data_1.drop(['Nombre_y','Anyo_y','Anyo_x'],axis = 1, inplace=True)
data_1 = data_1.sort_values(['Date','Variable'],ascending = False).reset_index(drop=True)
#My analysis will be based on the variations, so I will create the variations columns for both the dependent and independent variables:
data_1

Unnamed: 0,Variable,Date,X,Unemployment rate
0,Imports of services,2019-04-01,19233.0,14.02
1,Imports of goods,2019-04-01,80655.0,14.02
2,Gross capital formation,2019-04-01,68637.0,14.02
3,Final consumption expenditure,2019-04-01,235998.0,14.02
4,Exports of services,2019-04-01,32664.0,14.02
...,...,...,...,...
415,Imports of goods,2002-01-01,42495.0,11.55
416,Gross capital formation,2002-01-01,48479.0,11.55
417,Final consumption expenditure,2002-01-01,137251.0,11.55
418,Exports of services,2002-01-01,15616.0,11.55
