## Preprocessing Combined CSV Files with Economic Freedom Data from 1995 to 2020

### Step 1: Import the relevant libraries.

In [1]:
import pandas as pd

### Step 2: Import the combined .csv File

In [2]:
raw_data = pd.read_csv('1995_2020.csv')

In [3]:
raw_data.head()

Unnamed: 0,Name,Index Year,Overall Score,Property Rights,Judicial Effectiveness,Government Integrity,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
0,Afghanistan,1995,,,,,,,,,,,,,
1,Albania,1995,49.7,50.0,,10.0,81.7,34.3,,70.0,,22.1,59.0,70.0,50.0
2,Algeria,1995,55.7,50.0,,50.0,48.8,69.5,,70.0,,59.2,54.2,50.0,50.0
3,Angola,1995,27.4,30.0,,30.0,61.6,0.0,,40.0,,0.0,25.0,30.0,30.0
4,Argentina,1995,68.0,70.0,,50.0,80.7,86.6,,85.0,,61.1,58.4,70.0,50.0


In [4]:
df_preprocessed = raw_data.copy()

### Step 3: Drop unnecessary columns. Our goal is to visualize the 'Overall Score'. So, other than 'Country name', 'Index Year' and 'Overall Score', can be dropped.

In [5]:
df_preprocessed.drop(df_preprocessed.iloc[:, 3:len(df_preprocessed.columns)], inplace = True, axis = 1)

### Step 4: Drop rows with missing values. Rename "Name" column as "Country", "Index Year" as "Year" and "Overall Score" as "Score"

In [6]:
df_preprocessed = df_preprocessed.rename(columns={'Name':'Country','Index Year':'Year','Overall Score':'Score'})
df_preprocessed = df_preprocessed.dropna(axis=0)
df_preprocessed = df_preprocessed.reset_index().drop(['index'],axis=1)

### Step 5: Let us set some categories according to the 'Score' value, according to the classification in https://www.heritage.org/index/ranking

Free (80–100)

Mostly Free (70–79.9)

Moderately Free (65–69.9)

Moderately Unfree (60–64.9)

Mostly Unfree (50–59.9)

Repressed (0–49.9)

In [7]:
category = []

for i in df_preprocessed['Score']:
    if i >= 80:
        category.append('Free')
    elif i<80 and i>=70:
        category.append('Mostly Free')
    elif i<70 and i>=65:
        category.append('Moderately Free')
    elif i<65 and i>=60:
        category.append('Moderately Unfree')
    elif i<60 and i>=50:
        category.append('Mostly Unfree')
    else:
        category.append('Repressed')

In [8]:
df_labelled = df_preprocessed.copy()
df_labelled['Category'] = category

In [9]:
df_labelled

Unnamed: 0,Country,Year,Score,Category
0,Albania,1995,49.7,Repressed
1,Algeria,1995,55.7,Mostly Unfree
2,Angola,1995,27.4,Repressed
3,Argentina,1995,68.0,Moderately Free
4,Australia,1995,74.1,Mostly Free
...,...,...,...,...
4258,Vanuatu,2020,60.7,Moderately Unfree
4259,Venezuela,2020,25.2,Repressed
4260,Vietnam,2020,58.8,Mostly Unfree
4261,Zambia,2020,53.5,Mostly Unfree


### Step 5: Export to .csv file

In [10]:
df_labelled.to_csv('1995_2020_preprocessed.csv')