---

## Before starting, tell you that I had to modify this notebook because the graphs created with *ipywidgets interact* cannot be viewed, where you could choose the different graphs for the different years. So for this job I focused on making the visualizations for the year 2021.
## Whoever is interested I leave the github link where they can download the notebook with the interactive graphics.
## <a href=https://github.com/aguyanzon/kaggle/blob/main/notebooks/world-press-index/World%20Press%20Index%20(Interactive%20visualization).ipynb>github/aguyanzon</a>

---

# World Press Index (2021)
### Context
This is an attempt to understand how different nations treat journalists and journalism in general, throughout the world.

### Content
This is a dataset comprising of 180 countries.

***Abuse Score*** : This score is evaluated by experts, based on the violence and abuse caused in a country against journalists and this score is then used a weight metric with the Underlying Situation score to judge the final Global score. The lower the Abuse Score, the better.

***Underlying Situation Score*** : This score is evaluated by a group of experts to depict how truthful and honest information is available to the world by the journalists of the country. Lower the score, the better.

***Global Score*** : This is the final score based on the Underlying Situation Score and the Abuse score. Lower the Global Score of the country, better the rank of a country.

![](https://rsf.org/sites/default/files/styles/rsf_full/public/regions_visuel_rsfindex_20205.png?itok=mGeHSptE&timestamp=1618825403)

### More information about world press index
**WHAT IS IT?**

Published every year since 2002 by Reporters Without Borders (RSF), the World Press Freedom Index is an important advocacy tool based on the principle of emulation between states. Because it is well known, its influence over governments is growing. Many heads of state and government fear its annual publication. The Index is a point of reference that is quoted by media throughout the world and is used by diplomats and international entities such as the United Nations and the World Bank.

**WHAT DOES IT MEASURE?**

The Index ranks 180 countries and regions according to the level of freedom available to journalists. It is a snapshot of the media freedom situation based on an evaluation of pluralism, independence of the media, quality of legislative framework and safety of journalists in each country and region. It does not rank public policies even if governments obviously have a major impact on their country’s ranking. Nor is it an indicator of the quality of journalism in each country or region.

**THE GLOBAL INDICATOR AND REGIONAL INDICATORS**

Along with the Index, RSF calculates a global indicator and regional indicators that evaluate the overall performance of countries and regions (in the world and in each region) as regards media freedom. It is an absolute measure that complements the Index’s comparative rankings. The global indicator is the average of the regional indicators, each of which is obtained by averaging the scores of all the countries in the region, weighted according to their population as given by the World Bank.

**HOW THE INDEX IS COMPILED**

The degree of freedom available to journalists in 180 countries and regions is determined by pooling the responses of experts to a questionnaire devised by RSF. This qualitative analysis is combined with quantitative data on abuses and acts of violence against journalists during the period evaluated. The criteria used in the questionnaire are pluralism, media independence, media environment and self-censorship, legislative framework, transparency, and the quality of the infrastructure that supports the production of news and information.

Source: <a>https://rsf.org/</a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Install Packages
*pycountry-convert*: This package provides conversion functions between ISO country names, country-codes, and continent names. (https://pypi.org/project/pycountry-convert/)

In [None]:
!pip install pycountry-convert

In [None]:
# import libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go

import pycountry
import warnings
import pycountry_convert as pc
warnings.filterwarnings("ignore")

## Import dataset

In [None]:
df = pd.read_csv('/kaggle/input/world-press-index-20192021/World Press Index 2021.csv')
df.head()

In [None]:
# 180 rows, 6 columns
df.shape

In [None]:
# There is not NaN or null values in columns
df.info()

In [None]:
df.describe()

# Views of the best and worst 20 scores in 2021
In this studio only we'll focus in the Global Score

In [None]:
data = df[['Country Name','Global Score 2021', 'Global Score 2020', 'Global Score 2019']]

In [None]:
def best_GS(Year):
    fig = px.bar(data.iloc[:20], y=Year, x='Country Name', text=Year, color='Country Name',
                 color_discrete_sequence=px.colors.qualitative.Set2, title='Best ' + str(Year))
    fig.update_traces(texttemplate='%{text:}', textposition='outside')
    fig.update_layout(showlegend=False)
    fig.show()
    
best_GS('Global Score 2021')    

In [None]:
def worst_GS(Year):
    fig = px.bar(data.iloc[-20:-1], y=Year, x='Country Name', text=Year, color='Country Name',
                 color_discrete_sequence=px.colors.qualitative.Set2, title='Worst '+ Year)
    fig.update_traces(texttemplate='%{text:}', textposition='outside')
    fig.update_layout(showlegend=False)
    fig.show()
    
worst_GS('Global Score 2021')

# Variables alpha2 and alpha3
We convert country names to alpha2 and alpha3 variables using the *pycountry_convert* and *pycountry* library

In [None]:
data.head()

In [None]:
def name_toalpha3(names_countries):
    try:
        return pc.country_name_to_country_alpha3(names_countries,cn_name_format = "default")
    except:
        return ("not founded")
    
data['Code A3'] = data.apply(lambda x: name_toalpha3(x['Country Name']),axis=1)

In [None]:
data.head()

In [None]:
data[data['Code A3'] == 'not founded']

These countries were not found, so as there is little data we will proceed to look for their codes manually and fill them in the dataframe

In [None]:
# Search example
list(pycountry.countries)[:10]

In [None]:
countries_A3 = [57, 65, 86, 94, 117,  135, 148]
cod_A3 = ['BIH', 'CIV', 'HTI', 'GNB', 'COG', 'MAR', 'COD']

for i in range(len(cod_A3)):
    data.loc[countries_A3[i],'Code A3'] = cod_A3[i]

In [None]:
# We observe that the values were replaced
data.loc[countries_A3,'Code A3']

In [None]:
# These two countries were the only ones in which the alpha3 code was not found.
data[data['Code A3'] == 'not founded']

Now we will repeat the same procedure to obtain the alpha2 codes

In [None]:
def name_toalpha2(names_countries):
    try:
        return pc.country_name_to_country_alpha2(names_countries,cn_name_format = "default")
    except:
        return ("not founded")
    
data['Code A2'] = data.apply(lambda x: name_toalpha2(x['Country Name']),axis=1)

In [None]:
countries_A2 = [57, 65, 86, 94, 117,  135, 148]
cod_A2 = ['BA', 'CI', 'HT', 'GW', 'CG', 'MA', 'CD']

for i in range(len(cod_A2)):
    data.loc[countries_A2[i],'Code A2'] = cod_A2[i]

In [None]:
data.head()

# Interactive visualization of the distribution of global scores worldwide 

In [None]:
def world(Year):
    fig = px.choropleth(data, locations="Code A3",
                    color=Year, 
                    hover_name="Country Name",
                    title=Year)
    fig.show()
    
world('Global Score 2021')

# Interactive visualization of the distribution of global scores by continent

In [None]:
def year(Year,continents):
    fig = px.choropleth(data, locations="Code A3",
                        color=Year, 
                        hover_name="Country Name",
                        scope=continents,
                       title=continents.capitalize())
    fig.show()
    
continents = ['europe','asia','south america','north america','africa']

for c in range(len(continents)):
    year('Global Score 2021', continents[c])

Now we will create a function that converts the alpha2 code of the country to the continent in which it belongs to then graph the behavior of the average of the global scores by continent over the last 3 years

In [None]:
def alpha2_tocontinent(country_alpha2):
    try:
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
        return country_continent_name
    except:
        return ("not founded")
    
data['Continent'] = data.apply(lambda x: alpha2_tocontinent(x['Code A2']),axis=1)

In [None]:
data.head(10)

In [None]:
avg_continents = data.groupby('Continent').mean()
avg_continents = avg_continents.transpose().reset_index().rename(columns={'index': 'Global Score'})
avg_continents

# Distribution of average global score in each continent by year

In [None]:
# Visualization
years = [2019,2020,2021]
fig = go.Figure()
fig.add_trace(go.Scatter(x=years, y=avg_continents['Africa'],
                    mode='lines+markers',
                    name='Africa'))

fig.add_trace(go.Scatter(x=years, y=avg_continents['Asia'],
                    mode='lines+markers',
                    name='Asia'))

fig.add_trace(go.Scatter(x=years, y=avg_continents['Europe'],
                    mode='lines+markers',
                    name='Europe'))

fig.add_trace(go.Scatter(x=years, y=avg_continents['North America'],
                    mode='lines+markers',
                    name='North America'))
                                                                      
fig.add_trace(go.Scatter(x=years, y=avg_continents['South America'],
                    mode='lines+markers',
                    name='South America'))
                                                                      
fig.add_trace(go.Scatter(x=years, y=avg_continents['Oceania'],
                    mode='lines+markers',
                    name='Oceania'))

fig.update_layout(
    xaxis = dict(
        tick0 = 2018,
        dtick = 1
    ),
    title = 'Distribution of average global score in each continent by year'
)

fig.show()

# Conclusion
The countries with the most favorable press freedom index are on the continent of Oceania and Europe is very close to it. Then the countries of America follow with an average of about 30 points. Finally, the countries with the most unfavorable press index are found in the continents of Africa and Asia, the latter being the most complicated.

---