# 3. 1 The 2008 World Crisis - CDA

Hypothesis: The 2008 Financial Crisis negatively influenced exports and imports of the most developed countries in the world. We will now analyze how the 2008 crisis affected exports in China, the United States and Germany, as they are the best example to represent the three continents with the largest trade flows (Asia, America and Europe).

We are going to divide our data in 2 groups for every country:
- Group A (Pre-Crisis): 2000-2008
- Group B (Post-Crisis): 2009-2020

For a better understanding on the crisis magnitude, we will also use another intermediate group (2008-2009).

Null Hypothesis (H0): The 2008 crisis has not significantly affected export growth.

Alternative Hypothesis (H1): The 2008 crisis has significantly affected export growth.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy.stats import shapiro, ttest_ind, skew
import plotly.io as pio
import kaggle

In [2]:
KAGGLE_USERNAME = "marpenalva"
KAGGLE_KEY = "ea42f53179cfa2f6eac83929293413f4"

kaggle.api.authenticate()

def download_file_from_kaggle(dataset, path):
    kaggle.api.dataset_download_files(dataset, path=path, unzip=True)

dataset = "appetukhov/international-trade-database"  # Dataset correcto
download_path = "data/"

if not os.path.exists(download_path):
    os.makedirs(download_path)

download_file_from_kaggle(dataset, download_path)


file_path = os.path.join(download_path, 'trade_1988_2021.csv')
df = pd.read_csv(file_path)

print(df.head())

In [3]:
df['TradeValue'] = df['TradeValue in 1000 USD'] * 1000
df = df.drop(columns=['TradeValue in 1000 USD'])

In [4]:
df = df.drop(columns=['ReporterISO3', 'PartnerISO3'])
df = df.rename(columns={'ReporterName': 'Reporter', 'PartnerName': 'Partner', 'TradeFlowName': 'TradeFlow'})

In [5]:
palabra = 'World'
df = df[~df['Partner'].str.contains(palabra)]

## Data visualization

In [6]:
def calculate_total_growth(data, start_year, end_year):
    start_value = data[data['Year'] == start_year]['TradeValue'].sum()
    end_value = data[data['Year'] == end_year]['TradeValue'].sum()
    
    total_growth = ((end_value - start_value) / start_value) * 100
    
    return total_growth


In [7]:
def calculate_annual_growth(data, start_year, end_year):
    start_value = data[data['Year'] == start_year]['TradeValue'].sum()
    end_value = data[data['Year'] == end_year]['TradeValue'].sum()
    
    num_years = end_year - start_year
    
    total_growth = ((end_value - start_value) / start_value) * 100
    
    annual_growth = (1 + total_growth / 100) ** (1 / num_years) - 1
    annual_growth = annual_growth * 100
    
    return annual_growth

In [8]:
periods = [(2000, 2008), (2008, 2009), (2009, 2020)]
countries = ['Germany', 'United States', 'China']

growth_data = []

for country in countries:
    for start_year, end_year in periods:
        country_data = df[df['Reporter'] == country]
        
        annual_growth = calculate_annual_growth(country_data, start_year, end_year)
        total_growth = calculate_total_growth(country_data, start_year, end_year)
        
        growth_data.append({
            'Country': country, 
            'Period': f'{start_year}-{end_year}', 
            'Average Annual Growth (%)': annual_growth, 
            'Total Growth (%)': total_growth
        })

growth_df = pd.DataFrame(growth_data)
print(growth_df)

         Country     Period  Average Annual Growth (%)  Total Growth (%)
0        Germany  2000-2008                  12.964812        165.182874
1        Germany  2008-2009                 -22.753162        -22.753162
2        Germany  2009-2020                   1.906883         23.094514
3  United States  2000-2008                   6.370362         63.894769
4  United States  2008-2009                 -19.928271        -19.928271
5  United States  2009-2020                   2.334151         28.892162
6          China  2000-2008                  24.415431        474.108515
7          China  2008-2009                 -16.009465        -16.009465
8          China  2009-2020                   7.233239        115.587538


In [9]:
fig_annual = px.bar(growth_df, x='Period', y='Average Annual Growth (%)', color='Country', 
                    title='Average Annual Growth (%) by Country', barmode='group')

fig_annual.show()
pio.write_html(fig_annual, file=f'graphs/cda/annual_growth.html', auto_open=False)

In [10]:
fig_total = px.bar(growth_df, x='Period', y='Total Growth (%)', color='Country', 
                   title='Total Growth (%) by Country', barmode='group')

fig_total.show()
pio.write_html(fig_total, file=f'graphs/cda/growth.html', auto_open=False)

## Germany

In [11]:
pre_crisis = (2000, 2008)
post_crisis = (2009, 2020)

germany_data = df[df['Reporter'] == 'Germany']

def calculate_annual_growth(data, start_year, end_year):
    annual_growths = []
    for year in range(start_year, end_year):
        start_value = data[data['Year'] == year]['TradeValue'].sum()
        end_value = data[data['Year'] == (year + 1)]['TradeValue'].sum()
        
        annual_growth = ((end_value - start_value) / start_value) * 100
        annual_growths.append(annual_growth)
    
    return annual_growths

In [12]:
skewness_pre_crisis = skew(pre_crisis_growths)
print('Skewness (Pre-Crisis):', skewness_pre_crisis)

skewness_post_crisis = skew(post_crisis_growths)
print('Skewness (Post-Crisis):', skewness_post_crisis)

NameError: name 'pre_crisis_growths' is not defined

In [None]:
pre_crisis_growths = calculate_annual_growth(germany_data, pre_crisis[0], pre_crisis[1])

post_crisis_growths = calculate_annual_growth(germany_data, post_crisis[0], post_crisis[1])

t_stat, p_value = ttest_ind(pre_crisis_growths, post_crisis_growths)

print(f"Pre-Crisis ({pre_crisis[0]}-{pre_crisis[1]}): Average Annual Growth: {pd.Series(pre_crisis_growths).mean():.2f}%")
print(f"Post-Crisis ({post_crisis[0]}-{post_crisis[1]}): Average Annual Growth: {pd.Series(post_crisis_growths).mean():.2f}%")
print(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("We reject the null hypothesis: There is a significant difference in Germany's annual export growth before and after the 2008 crisis")
else:
    print("We fail to reject the null hypothesis: There is no significant difference in Germany's annual export growth before and after the 2008 crisis.")


Pre-Crisis (2000-2008): Average Annual Growth: 13.15%
Post-Crisis (2009-2020): Average Annual Growth: 2.25%
T-Statistic: 2.9139, P-Value: 0.0097
We reject the null hypothesis: There is a significant difference in Germany's annual export growth before and after the 2008 crisis


## China

In [None]:
pre_crisis = (2000, 2008)
post_crisis = (2009, 2020)

china_data = df[df['Reporter'] == 'China']

def calculate_annual_growth(data, start_year, end_year):
    annual_growths = []
    for year in range(start_year, end_year):
        start_value = data[data['Year'] == year]['TradeValue'].sum()
        end_value = data[data['Year'] == (year + 1)]['TradeValue'].sum()
        
        annual_growth = ((end_value - start_value) / start_value) * 100
        annual_growths.append(annual_growth)
    
    return annual_growths

In [None]:
skewness_pre_crisis = skew(pre_crisis_growths)
print('Skewness (Pre-Crisis):', skewness_pre_crisis)

skewness_post_crisis = skew(post_crisis_growths)
print('Skewness (Post-Crisis):', skewness_post_crisis)

Skewness (Pre-Crisis): 0.09344677013164297
Skewness (Post-Crisis): 0.11798613519172431


In [None]:
pre_crisis_growths = calculate_annual_growth(china_data, pre_crisis[0], pre_crisis[1])

post_crisis_growths = calculate_annual_growth(china_data, post_crisis[0], post_crisis[1])

t_stat, p_value = ttest_ind(pre_crisis_growths, post_crisis_growths)

print(f"Pre-Crisis ({pre_crisis[0]}-{pre_crisis[1]}): Average Annual Growth: {pd.Series(pre_crisis_growths).mean():.2f}%")
print(f"Post-Crisis ({post_crisis[0]}-{post_crisis[1]}): Average Annual Growth: {pd.Series(post_crisis_growths).mean():.2f}%")
print(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("We reject the null hypothesis: There is a significant difference in China's annual export growth before and after the 2008 crisis")
else:
    print("We fail to reject the null hypothesis: There is no significant difference in China's annual export growth before and after the 2008 crisis.")


Pre-Crisis (2000-2008): Average Annual Growth: 24.74%
Post-Crisis (2009-2020): Average Annual Growth: 7.70%
T-Statistic: 3.5986, P-Value: 0.0022
We reject the null hypothesis: There is a significant difference in China's annual export growth before and after the 2008 crisis


## United States

In [None]:
pre_crisis = (2000, 2008)
post_crisis = (2009, 2020)

us_data = df[df['Reporter'] == 'United States']

def calculate_annual_growth(data, start_year, end_year):
    annual_growths = []
    for year in range(start_year, end_year):
        start_value = data[data['Year'] == year]['TradeValue'].sum()
        end_value = data[data['Year'] == (year + 1)]['TradeValue'].sum()
        
        annual_growth = ((end_value - start_value) / start_value) * 100
        annual_growths.append(annual_growth)
    
    return annual_growths

In [None]:
skewness_pre_crisis = skew(pre_crisis_growths)
print('Skewness (Pre-Crisis):', skewness_pre_crisis)

skewness_post_crisis = skew(post_crisis_growths)
print('Skewness (Post-Crisis):', skewness_post_crisis)

Skewness (Pre-Crisis): -0.7385655776323143
Skewness (Post-Crisis): 0.8338918373492875


In [None]:
pre_crisis_growths = calculate_annual_growth(us_data, pre_crisis[0], pre_crisis[1])

post_crisis_growths = calculate_annual_growth(us_data, post_crisis[0], post_crisis[1])

t_stat, p_value = ttest_ind(pre_crisis_growths, post_crisis_growths)

print(f"Pre-Crisis ({pre_crisis[0]}-{pre_crisis[1]}): Average Annual Growth: {pd.Series(pre_crisis_growths).mean():.2f}%")
print(f"Post-Crisis ({post_crisis[0]}-{post_crisis[1]}): Average Annual Growth: {pd.Series(post_crisis_growths).mean():.2f}%")
print(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("We reject the null hypothesis: There is a significant difference in US's annual export growth before and after the 2008 crisis")
else:
    print("We fail to reject the null hypothesis: There is no significant difference in US's annual export growth before and after the 2008 crisis.")


Pre-Crisis (2000-2008): Average Annual Growth: 6.69%
Post-Crisis (2009-2020): Average Annual Growth: 2.76%
T-Statistic: 0.9065, P-Value: 0.3774
We fail to reject the null hypothesis: There is no significant difference in US's annual export growth before and after the 2008 crisis.


## Conclusion

First, we observed with skew if the data is symmetric. The data is only symmetric in China(pre- and post-crisis) and in Pre-Crisis in Germany. Meanwhile, the United States and Post-Crisis Germany both have a moderately asymmetric distribution.

For Germany and China, we can observe a significant difference in the annual export growth before and after the 2008 crisis, so we reject the null hypothesis. For the United States, we fail to reject the null hypothesis, so there's no significant difference between the annual exports before and after the crisis.