## How well can the level of corruption of a country in Europe be quantified? 

* What differences are there in actual corruption and perceived corruption? 

* Are there different forms of corruption prevalent in different countries in Europe? 

* What characteristics of a country predict the level of corruption? 

* What characteristics of a country predict an increase or decrease in the level of corruption?

• Happiness Report 2005-2022


1. **Log GDP per capita**: Adjusted for PPP in 2017 international dollars, using WDI data; extended for 2022 with real GDP growth forecasts, adjusted for population growth.

2. **Healthy life expectancy**: WHO data for 2005-2019 interpolated/extrapolated to match 2005-2022.

3. **Social support (0-1)**: National average of binary responses to having friends or relatives to rely on.

4. **Freedom to make life choices (0-1)**: National average of satisfaction with freedom to choose life activities.

5. **Generosity**: Residual from regressing charity donation responses on log GDP per capita.

6. **Perceptions of corruption (0-1)**: Average of responses on government and business corruption.

7. **Positive affect**: Average of previous-day feelings of laughter, enjoyment, and interest.

8. **Negative affect**: Average of previous-day feelings of worry, sadness, and anger.



In [18]:
import pandas as pd
import os

In [19]:
happiness_raw_data = pd.read_csv("../data/raw/happiness_report.csv")

In [20]:
happiness_raw_data.head()

Unnamed: 0,Country Name,Regional Indicator,Year,Life Ladder,Log GDP Per Capita,Social Support,Healthy Life Expectancy At Birth,Freedom To Make Life Choices,Generosity,Perceptions Of Corruption,Positive Affect,Negative Affect,Confidence In National Government
0,Afghanistan,South Asia,2008,3.72359,7.350416,0.450662,50.5,0.718114,0.167652,0.881686,0.414297,0.258195,0.612072
1,Afghanistan,South Asia,2009,4.401778,7.508646,0.552308,50.799999,0.678896,0.190809,0.850035,0.481421,0.237092,0.611545
2,Afghanistan,South Asia,2010,4.758381,7.6139,0.539075,51.099998,0.600127,0.121316,0.706766,0.516907,0.275324,0.299357
3,Afghanistan,South Asia,2011,3.831719,7.581259,0.521104,51.400002,0.495901,0.163571,0.731109,0.479835,0.267175,0.307386
4,Afghanistan,South Asia,2012,3.782938,7.660506,0.520637,51.700001,0.530935,0.237588,0.77562,0.613513,0.267919,0.43544


In [21]:
countries = pd.read_csv("../data/processed/europe_countries.csv")

In [22]:
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Country    49 non-null     object
 1   ISO3 Code  49 non-null     object
 2   ISO2 Code  49 non-null     object
dtypes: object(3)
memory usage: 1.3+ KB


In [23]:
iso3_europe_all = set(countries["Country"])
len(iso3_europe_all)

49

In [24]:
iso3_europe_hap = set(happiness_raw_data["Country Name"])
len(iso3_europe_hap)

165

In [25]:
iso3_europe_all-iso3_europe_hap

{'Andorra',
 'Bosnia & Herzegovina',
 'Czech Republic',
 'Kosovo2',
 'Liechtenstein',
 'Monaco',
 'San Marino',
 'Turkey',
 'Vatican City'}

In [26]:
happiness_raw_data = happiness_raw_data[happiness_raw_data["Country Name"].isin(iso3_europe_all)]

In [27]:
happiness_raw_data.head()

Unnamed: 0,Country Name,Regional Indicator,Year,Life Ladder,Log GDP Per Capita,Social Support,Healthy Life Expectancy At Birth,Freedom To Make Life Choices,Generosity,Perceptions Of Corruption,Positive Affect,Negative Affect,Confidence In National Government
14,Albania,Central and Eastern Europe,2007,4.634252,9.121704,0.821372,66.760002,0.528605,-0.010429,0.8747,0.488819,0.246335,0.300681
15,Albania,Central and Eastern Europe,2009,5.48547,9.241429,0.833047,67.32,0.525223,-0.159259,0.863665,0.564474,0.279257,
16,Albania,Central and Eastern Europe,2010,5.268937,9.282793,0.733152,67.599998,0.568958,-0.173675,0.726262,0.576077,0.30006,
17,Albania,Central and Eastern Europe,2011,5.867422,9.310619,0.759434,67.879997,0.487496,-0.206186,0.877003,0.565759,0.256577,
18,Albania,Central and Eastern Europe,2012,5.510124,9.326344,0.784502,68.160004,0.601512,-0.170467,0.847675,0.553473,0.271393,0.364894


In [35]:
happiness_raw_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 427 entries, 18 to 2070
Data columns (total 12 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Country Name                       427 non-null    object 
 1   Year                               427 non-null    int64  
 2   Life Ladder                        427 non-null    float64
 3   Log GDP Per Capita                 425 non-null    float64
 4   Social Support                     427 non-null    float64
 5   Healthy Life Expectancy At Birth   416 non-null    float64
 6   Freedom To Make Life Choices       427 non-null    float64
 7   Generosity                         424 non-null    float64
 8   Perceptions Of Corruption          425 non-null    float64
 9   Positive Affect                    427 non-null    float64
 10  Negative Affect                    427 non-null    float64
 11  Confidence In National Government  389 non-null    float64
dt

In [38]:
def apply_iso3_code(df, countries_df):
    """
    Merges 'df' (with 'cname') against 'countries_df' (with 'Country Name' and 'Country Code')
    to add the 'ISO3 Code' column to 'df'.

    Parameters:
        df (pd.DataFrame): Your original DataFrame with a 'cname' column.
        countries_df (pd.DataFrame): DataFrame with at least two columns:
                                     - 'Country Name': textual name of the country
                                     - 'Country Code': ISO3 code (e.g. FRA, DEU, etc.)

    Returns:
        pd.DataFrame: Updated DataFrame that includes the 'ISO3 Code'.
    """

    # Merge on the country name
    merged = df.merge(
        countries_df[['Country', 'ISO3 Code']],
        how='left',
        left_on='Country Name',
        right_on='Country'
    )


    # (Optional) Filter out rows with missing ISO3 codes, 
    # if you only want to keep countries that have a valid code:
    merged = merged[~merged['ISO3 Code'].isna()]

    return merged

# --- Example usage ---
# Suppose `df` is your 8-column DataFrame shown above, and
# `countries_df` is a DataFrame that contains a mapping
# of valid European countries to their ISO3 codes.

# df_with_iso3 = apply_iso3_code(df, countries_df)
# print(df_with_iso3.head())


In [42]:
happiness_raw_data= apply_iso3_code(df=happiness_raw_data, countries_df=countries)

In [33]:
happiness_raw_data = happiness_raw_data[happiness_raw_data['Year'] >= 2012]


In [44]:
happiness_raw_data = happiness_raw_data.drop(axis=1, labels=['Country Name', 'Country'])

KeyError: "['Country Name', 'Country'] not found in axis"

In [45]:
happiness_raw_data.head()

Unnamed: 0,Year,Life Ladder,Log GDP Per Capita,Social Support,Healthy Life Expectancy At Birth,Freedom To Make Life Choices,Generosity,Perceptions Of Corruption,Positive Affect,Negative Affect,Confidence In National Government,ISO3 Code
0,2012,5.510124,9.326344,0.784502,68.160004,0.601512,-0.170467,0.847675,0.553473,0.271393,0.364894,ALB
1,2013,4.550648,9.338146,0.759477,68.440002,0.63183,-0.128825,0.862905,0.540751,0.338379,0.338095,ALB
2,2014,4.813763,9.357805,0.625587,68.720001,0.734648,-0.026298,0.882704,0.572945,0.334543,0.498786,ALB
3,2015,4.606651,9.382662,0.639356,69.0,0.703851,-0.082492,0.884793,0.579072,0.350427,0.506978,ALB
4,2016,4.511101,9.416873,0.638411,69.025002,0.729819,-0.018664,0.901071,0.56708,0.321706,0.40091,ALB


In [30]:
happiness_raw_data["Country Name"].nunique()

40

In [31]:
happiness_raw_data.to_csv("../data/processed/happiness.csv", index=False, index_label=False)