# Factors leading to Economic Crises in Africa

1.Introduction:

The global financial crisis is impacting African economies in a variety of ways. The most significant are the decline in export prices and volumes. Largely as a result of falling prices and demand for their commodities, many countries have experienced sharp drops in primary commodity exports.
In this project the factors for economic crisis will be foreseen using a different models of Machine learning.

# loading important libraries

In [None]:
# as usual, let us load all the necessary libraries
import numpy as np  # numerical computation with arrays
import pandas as pd # library to manipulate datasets using dataframes
import scipy as sp  # statistical library

# below sklearn libraries for different models
from sklearn.tree import DecisionTreeClassifier as DecisionTree
from sklearn.ensemble import RandomForestClassifier as RandomForest
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.impute import KNNImputer
from sklearn.metrics import mean_squared_error

import seaborn as sns

# plot 
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
#loading dataset
df_data = pd.read_csv('../input/africa-economic-banking-and-systemic-crisis-data/african_crises.csv')

df_data.head()

In [None]:
# print a summary of the values for each covariate in the dataset
df_data.describe()

In [None]:
#describing the dataset statistics
stat = df_data.describe().T
stat[['mean', 'max', 'min']]

In [None]:
# get a summary of how many rows in the dataset and how many missing values is in each column
#the 'Non-Null Count' counts the number of values in each column that is not missing
df_data.info()

In [None]:
#Checking missing values in the df Data
df_data.isna().sum()

# **Pre-process the data**


In [None]:
#convert the banking crisis values into categorical 0 and 1's
df_data['banking_crisis'] = df_data['banking_crisis'].replace({'crisis': 1, 'no_crisis':0})
df_data.head(3)

In [None]:
#Drop unimportant covariates
df_data.drop(columns=['case'], axis=1, inplace=True)

# Drop ALL redundant covariates cc3 as it represents country
df_data.drop(columns=['cc3'], axis=1, inplace=True)


In [None]:
df_data.head(3)

In [None]:
#lets see the colums after dropping some
df_data.columns

In [None]:
#lets see the unique countries
unique_countries=df_data['country'].unique()
unique_countries

# Visualizations.

In [None]:
#heatmap
plt.figure(figsize=(10,10))
sns.heatmap(df_data.corr(),annot=True)

visualizng countries infation crisis<br>
Inflation is measured by a central government authority, which is in charge of adopting measures to ensure the smooth running of the economy. 
in this dataset Angola shows the highest count for inflation crisis followed by Zambia and Zimbabwe. 

In [None]:
fig,ax = plt.subplots(figsize=(10,10))
sns.countplot(df_data['country'],hue=df_data['inflation_crises'],ax=ax)
plt.xlabel('Countries')
plt.ylabel('Counts')
plt.xticks(rotation=50)

lets Visualize how independence brought about usd exchange rates per countries

In [None]:
import random
sns.set(style='whitegrid')
plt.figure(figsize=(20,35))
plt.title('USD exchange rates per counntries after and beofe independence')
plot_number=1

for country in unique_countries:
    plt.subplot(7,2,plot_number)
    plot_number+=1
    color ="#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
    
    plt.scatter(df_data[df_data.country==country]['year'],
                df_data[df_data.country==country]['exch_usd'],
                color=color,
                s=20)
    
    sns.lineplot(df_data[df_data.country==country]['year'],
                 df_data[df_data.country==country]['exch_usd'],
                 label=country,
                 color=color)
    
    plt.plot([np.min(df_data[np.logical_and(df_data.country==country,df_data.independence==1)]['year']),
              np.min(df_data[np.logical_and(df_data.country==country,df_data.independence==1)]['year'])],
             [0, np.max(df_data[df_data.country==country]['exch_usd'])],
             color='black',
             linestyle='dotted',
             alpha=0.8)
    
    plt.text(np.min(df_data[np.logical_and(df_data.country==country,df_data.independence==1)]['year']),
             np.max(df_data[df_data.country==country]['exch_usd'])/2,
             'Independence',
             rotation=-90)
    
    plt.scatter(x=np.min(df_data[np.logical_and(df_data.country==country,df_data.independence==1)]['year']),
                y=0,
                s=50)
    
    plt.title(country)
    
plt.tight_layout()
plt.show()


From visualizations it is clear that the echange rate of USD for most all the countries rises after the independence. this may be due to the fact that when the country gets its independence they will also neede to import a lot of resouces from the developed countries that lead to the increase of usd exchange rate and hence economic crisis.

# Predictions

In [None]:
#Drop unimportant covariates
df_data.drop(columns=['country'], axis=1, inplace=True)
#getting the predictors and target variables
X = df_data.loc[:,df_data.columns != 'banking_crisis']
y = df_data.loc[:, 'banking_crisis']

In [None]:
#train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)

Linear Regression

In [None]:
# Step 1: create a sklearn linear regression model
linear_regressor = LinearRegression()

# Step 2: fit the linear regression model on the data
linear_regressor.fit(X_train, y_train)

In [None]:
# Step 3: compute MSE on train dataset
y_pred_train = linear_regressor.predict(X_train)
score_train = mean_squared_error(y_train, y_pred_train)

# Step 4: compute MSE on test dataset
y_pred_test = linear_regressor.predict(X_test)
score_test = mean_squared_error(y_test, y_pred_test)

print('MSE on train set: %.2f' % score_train)
print('MSE on test set: %.2f' % score_test)

Althouh both MSE for test and train are small,and of the same magnitude
Therefore we can use this model to predict wheather there is or there is no bank crisis in a country. 

# Desicion Tree

In [None]:

from sklearn.tree import DecisionTreeRegressor

#model
tree_regressor = DecisionTreeRegressor(random_state=0)

#fit the model
tree_regressor.fit(X_train, y_train)

In [None]:
# Step 3: compute MSE on train dataset
y_pred_train = tree_regressor.predict(X_train)
score_train = mean_squared_error(y_train, y_pred_train)

# Step 4: compute MSE on test dataset
y_pred_test = tree_regressor.predict(X_test)
score_test = mean_squared_error(y_test, y_pred_test)

print('MSE on train set: %.2f' % score_train)
print('MSE on test set: %.2f' % score_test)

This model is a good fit for the dataset, since both MSE for both train and test set are small.