<a href="https://colab.research.google.com/github/patech123/gomycode-data-science/blob/main/Supervised_Learning_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What You're Aiming For

In this checkpoint, we are going to work on the 'Systemic Crisis, Banking Crisis, inflation Crisis In Africa' dataset that was provided by Kaggle.

Dataset description : This dataset focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe. The ML model objective is to predict the likelihood of a Systemic crisis emergence given a set of indicators like the annual inflation rates.

 ➡️ Dataset link

https://i.imgur.com/3XzFz3x.jpg


Instructions

1. Import you data and perform basic data exploration phase                    ..Display general information about the dataset
 ..Create a pandas profiling reports to gain insights into the dataset
 ..Handle Missing and corrupted values
 ..Remove duplicates, if they exist
 ..Handle outliers, if they exist
 ..Encode categorical features
2. Select your target variable and the features
3.Split your dataset to training and test sets
4. Based on your data exploration phase select a ML classification algorithm and train it on the training set
5. Assess your model performance on the test set using relevant evaluation metrics
6. Discuss with your cohort alternative ways to improve your model performance



### **Load data**

In [38]:
import pandas as pd
from ydata_profiling import ProfileReport

In [39]:
!pip install ydata-profiling



In [48]:
df = pd.read_csv('/content/African_crises_dataset.csv')
df

Unnamed: 0,country_number,country_code,country,year,systemic_crisis,exch_usd,domestic_debt_in_default,sovereign_external_debt_default,gdp_weighted_default,inflation_annual_cpi,independence,currency_crises,inflation_crises,banking_crisis
0,1,DZA,Algeria,1870,1,0.052264,0,0,0.0,3.441456,0,0,0,crisis
1,1,DZA,Algeria,1871,0,0.052798,0,0,0.0,14.149140,0,0,0,no_crisis
2,1,DZA,Algeria,1872,0,0.052274,0,0,0.0,-3.718593,0,0,0,no_crisis
3,1,DZA,Algeria,1873,0,0.051680,0,0,0.0,11.203897,0,0,0,no_crisis
4,1,DZA,Algeria,1874,0,0.051308,0,0,0.0,-3.848561,0,0,0,no_crisis
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1054,70,ZWE,Zimbabwe,2009,1,354.800000,1,1,0.0,-7.670000,1,1,0,crisis
1055,70,ZWE,Zimbabwe,2010,0,378.200000,1,1,0.0,3.217000,1,0,0,no_crisis
1056,70,ZWE,Zimbabwe,2011,0,361.900000,1,1,0.0,4.920000,1,0,0,no_crisis
1057,70,ZWE,Zimbabwe,2012,0,361.900000,1,1,0.0,3.720000,1,0,0,no_crisis


In [49]:
Profile = ProfileReport(df, title="Ydata_Profiling Report", explorative=True)

In [60]:
Profile.to_file('/content/African_crises_dataset.csv')



Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [51]:
 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1059 entries, 0 to 1058
Data columns (total 14 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   country_number                   1059 non-null   int64  
 1   country_code                     1059 non-null   object 
 2   country                          1059 non-null   object 
 3   year                             1059 non-null   int64  
 4   systemic_crisis                  1059 non-null   int64  
 5   exch_usd                         1059 non-null   float64
 6   domestic_debt_in_default         1059 non-null   int64  
 7   sovereign_external_debt_default  1059 non-null   int64  
 8   gdp_weighted_default             1059 non-null   float64
 9   inflation_annual_cpi             1059 non-null   float64
 10  independence                     1059 non-null   int64  
 11  currency_crises                  1059 non-null   int64  
 12  inflation_crises    

In [52]:
missing_values = df.isnull().sum()
missing_values

country_number                     0
country_code                       0
country                            0
year                               0
systemic_crisis                    0
exch_usd                           0
domestic_debt_in_default           0
sovereign_external_debt_default    0
gdp_weighted_default               0
inflation_annual_cpi               0
independence                       0
currency_crises                    0
inflation_crises                   0
banking_crisis                     0
dtype: int64

In [53]:
#dropping duplicates
df_no_duplicates = df.drop_duplicates()
print(df_no_duplicates.head())

   country_number country_code  country  year  systemic_crisis  exch_usd  \
0               1          DZA  Algeria  1870                1  0.052264   
1               1          DZA  Algeria  1871                0  0.052798   
2               1          DZA  Algeria  1872                0  0.052274   
3               1          DZA  Algeria  1873                0  0.051680   
4               1          DZA  Algeria  1874                0  0.051308   

   domestic_debt_in_default  sovereign_external_debt_default  \
0                         0                                0   
1                         0                                0   
2                         0                                0   
3                         0                                0   
4                         0                                0   

   gdp_weighted_default  inflation_annual_cpi  independence  currency_crises  \
0                   0.0              3.441456             0                0  

In [59]:
from scipy import stats
import numpy as np

df1 = df[(np.abs(stats.zscore(df['systemic_crisis'])) < 3)]
df2 = df1[(np.abs(stats.zscore(df1['inflation_crises'])) < 3)]
df1

Unnamed: 0,country_number,country_code,country,year,systemic_crisis,exch_usd,domestic_debt_in_default,sovereign_external_debt_default,gdp_weighted_default,inflation_annual_cpi,independence,currency_crises,inflation_crises,banking_crisis
1,1,DZA,Algeria,1871,0,5.279800e-02,0,0,0.0,14.149140,0,0,0,no_crisis
2,1,DZA,Algeria,1872,0,5.227400e-02,0,0,0.0,-3.718593,0,0,0,no_crisis
3,1,DZA,Algeria,1873,0,5.168000e-02,0,0,0.0,11.203897,0,0,0,no_crisis
4,1,DZA,Algeria,1874,0,5.130800e-02,0,0,0.0,-3.848561,0,0,0,no_crisis
5,1,DZA,Algeria,1875,0,5.154600e-02,0,0,0.0,-20.924178,0,0,0,no_crisis
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1039,70,ZWE,Zimbabwe,1994,0,8.380000e-26,0,0,0.0,21.113543,1,1,1,no_crisis
1055,70,ZWE,Zimbabwe,2010,0,3.782000e+02,1,1,0.0,3.217000,1,0,0,no_crisis
1056,70,ZWE,Zimbabwe,2011,0,3.619000e+02,1,1,0.0,4.920000,1,0,0,no_crisis
1057,70,ZWE,Zimbabwe,2012,0,3.619000e+02,1,1,0.0,3.720000,1,0,0,no_crisis


In [62]:
target_variable = 'syematic_crisis'
features_variable = ['banking_crisis', 'inflation_crises']

In [None]:
from sklearn.model_selection import train_test_split
from