## How well can the level of corruption of a country in Europe be quantified? 

* What differences are there in actual corruption and perceived corruption? 

* Are there different forms of corruption prevalent in different countries in Europe? 

* What characteristics of a country predict the level of corruption? 

* What characteristics of a country predict an increase or decrease in the level of corruption?

• Corruption Perceptions Index (CPI) from Transparency International.
Data Set that shows preceived corruption of countries and rank them.

• World Bank Development Indicators (economic, social, and governance data).
Indicators could be used to look for correlation between them and the corruption score of countries

• European Social Survey (perception-related data).
• OECD Data on governance and public sector integrity.

In [2]:
import pandas as pd
import os

In [3]:
corruption_raw_data = pd.read_csv("../data/processed/CPI.csv")

In [148]:

corruption_raw_data.head()

Unnamed: 0,Economy ISO3,Economy Name,Indicator ID,Indicator,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,ALB,Albania,TI.CPI.Rank,Corruption Perceptions Index Rank,113.0,116.0,110.0,88.0,83.0,91.0,99.0,106.0,104.0,110.0,101.0,98.0
1,ALB,Albania,TI.CPI.STDERR,Corruption Perceptions Index Standard Error,2.0,2.1,1.51,3.58,1.99,1.81,1.65,2.51,0.92,1.33,1.32,1.56
2,ALB,Albania,TI.CPI.Score,Corruption Perceptions Index Score,33.0,31.0,33.0,36.0,39.0,38.0,36.0,35.0,36.0,35.0,36.0,37.0
3,ALB,Albania,TI.CPI.Sources,Corruption Perceptions Index Sources,7.0,7.0,7.0,7.0,7.0,8.0,8.0,8.0,8.0,8.0,8.0,7.0
4,AUT,Austria,TI.CPI.Rank,Corruption Perceptions Index Rank,25.0,26.0,23.0,16.0,17.0,16.0,14.0,12.0,15.0,13.0,22.0,20.0


In [5]:
countries = pd.read_csv("../data/processed/europe_countries.csv")

In [6]:
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48 entries, 0 to 47
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Country    48 non-null     object
 1   ISO3 Code  48 non-null     object
 2   ISO2 Code  48 non-null     object
dtypes: object(3)
memory usage: 1.2+ KB


In [13]:
iso3_europe_all = set(countries["ISO3 Code"])

In [20]:
len(iso3_europe_all)

48

In [24]:
iso3_europe_cpi = set(corruption_raw_data["Economy ISO3"])

In [28]:
len(iso3_europe_cpi)

42

In [31]:
iso3_europe_all-iso3_europe_cpi

{'AND', 'LIE', 'MCO', 'RKS', 'SMR', 'VAT'}

Countries missing: Andora, Liechtenstein, Kosovo, San Marino, Vatikan. I think we dont need this countries due to their size and small impact.

In [18]:
corruption_raw_data = corruption_raw_data[corruption_raw_data["Economy ISO3"].isin(iso3_europe_all)]

In [10]:
corruption_raw_data.head()

Unnamed: 0,Economy ISO3,Economy Name,Indicator ID,Indicator,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,ALB,Albania,TI.CPI.Rank,Corruption Perceptions Index Rank,113.0,116.0,110.0,88.0,83.0,91.0,99.0,106.0,104.0,110.0,101.0,98.0
1,ALB,Albania,TI.CPI.STDERR,Corruption Perceptions Index Standard Error,2.0,2.1,1.51,3.58,1.99,1.81,1.65,2.51,0.92,1.33,1.32,1.56
2,ALB,Albania,TI.CPI.Score,Corruption Perceptions Index Score,33.0,31.0,33.0,36.0,39.0,38.0,36.0,35.0,36.0,35.0,36.0,37.0
3,ALB,Albania,TI.CPI.Sources,Corruption Perceptions Index Sources,7.0,7.0,7.0,7.0,7.0,8.0,8.0,8.0,8.0,8.0,8.0,7.0
4,AUT,Austria,TI.CPI.Rank,Corruption Perceptions Index Rank,25.0,26.0,23.0,16.0,17.0,16.0,14.0,12.0,15.0,13.0,22.0,20.0


In [11]:
corruption_raw_data["Economy ISO3"].nunique()

42

In [156]:
corruption_raw_data.to_csv("../data/processed/CPI.csv", index=False, index_label=False)

In [157]:
cpi = pd.read_csv("../data/processed/CPI.csv")

In [158]:
cpi.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 168 entries, 0 to 167
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Economy ISO3  168 non-null    object 
 1   Economy Name  168 non-null    object 
 2   Indicator ID  168 non-null    object 
 3   Indicator     168 non-null    object 
 4   2012          166 non-null    float64
 5   2013          166 non-null    float64
 6   2014          165 non-null    float64
 7   2015          165 non-null    float64
 8   2016          165 non-null    float64
 9   2017          168 non-null    float64
 10  2018          168 non-null    float64
 11  2019          168 non-null    float64
 12  2020          168 non-null    float64
 13  2021          168 non-null    float64
 14  2022          168 non-null    float64
 15  2023          168 non-null    float64
dtypes: float64(12), object(4)
memory usage: 21.1+ KB
