# Trust in the World Analysis

## Overview

Trust in institutions and individuals plays a crucial role in shaping societal outcomes, from economic performance to healthcare outcomes. This research aims to understand the factors that influence trust, and how trust can be improved in order to promote positive change within communities. The results of this research have implications for policymakers, business leaders, and individuals looking to build stronger relationships and more effective systems.
Trust is a fundamental component of human interaction, and plays a critical role in shaping our relationships, communities, and societies. Research on trust allows us to understand how trust is built, maintained, and broken, as well as the consequences of trust and mistrust. This research is essential for developing interventions to improve trust in individuals and institutions, and ultimately leads to a more cohesive and functional society.

The key idea of this research is to understand the factors that influence trust in different institutions and individuals across countries, and how this trust relates to economic and healthcare outcomes.

### Materials used:
   - Welcome Trust & Gallop survey https://wellcome.org/reports/wellcome-global-monitor/2018. Data for countries economics parameters and trust question were used. Level of the trust was calcolated as % of users answered 'A lot' and 'Some' to questions formulated as 'How much do you trust...' for next categories: Goverment, Media, Healthcare, Traditional Medicine, Scientists, Neighbors and NGOs
   - Countries classification by Income level https://data.worldbank.org

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import numpy as np
import plotly.express as px
from scipy.stats import pearsonr

In [3]:
cwd = os.getcwd()
print(cwd)

/Users/klynete/Desktop/Springboard/Wellcome trust


In [4]:
countries=pd.read_csv(f'{cwd}/data/countries_health_data.csv', names=['Country', 'Data', 'Value'], header=None)

In [None]:
# Reshape to wide table
countries_p=pd.pivot(countries, index='Country', columns='Data', values='Value').reset_index()

In [None]:
countries_p.info()

In [None]:
columns_mapping={countries_p.columns[1]:'GDP',
                countries_p.columns[2]:'GINI',
                countries_p.columns[3]:'Life_exp',
                countries_p.columns[4]:'Mort_per_1000',
                countries_p.columns[5]:'Research_per_mil',
                countries_p.columns[6]:'Research_%_GDP'}
countries_p.rename(columns=columns_mapping, inplace=True)
countries_p.head()

In [5]:
wgm=pd.read_csv(f'{cwd}/data/wgm2018.csv')
wgm.head()

FileNotFoundError: [Errno 2] No such file or directory: '../data/wgm2018.csv'

In [None]:
#We will use only answers for question related to Trust
trust=wgm[wgm['Question Number'].str.contains('Q11')]
trust.head()

In [None]:
trust.Question.unique()

In [None]:
# create mapping
questions={'Q11A':'Neighbors', 'Q11B':'Govt', 'Q11C':'Scientists', 'Q11D':'Journalists', 'Q11E':'Doctors', 
           'Q11F':'NGO', 'Q11G':'Healers'}
trust['Question Number'].replace(questions, inplace=True)
trust['Question Number'].unique()

In [None]:
trust['Response'].unique()

In [None]:
trust.drop(columns='Question', inplace=True)

In [None]:
trust.rename(columns={'Country Name':'Country','Question Number':'Q'}, inplace=True)
trust.head()

In [None]:
trust['Value']=trust['Response Total %']/100
trust.drop(columns='Response Total %', inplace=True)

In [None]:
trust_clean=trust[trust.Response.str.contains('A lot|Some')]
trust_clean.head()

In [None]:
trust_p=pd.pivot_table(trust_clean, index='Country', columns='Q', values='Value', aggfunc='sum').reset_index()

In [None]:
trust_p.info()

In [None]:
#Load list of countries and their Alpha-3 ISO codes
iso=pd.read_csv(f'{cwd}/data/wikipedia-iso-country-codes.csv', names=['Country', 'A2', 'A3','Num','ISO'], header=None)
iso.drop(columns=['A2', 'Num','ISO'], inplace=True)
# Income level classification for countries from worldbank.org
income=pd.read_csv(f'{cwd}/data/ncome.csv')

In [None]:
#Join trust dataset and countries
df=pd.merge(pd.merge(pd.merge(trust_p,countries_p, 'left',on='Country'), iso,'left',on='Country'), income, 'left',on='A3')
df.head()

In [None]:
df.info()

In [None]:
df[df['A3'].isna()]

In [None]:
df = df[df['A3'].notna()]
df = df[df['GDP'].notna()]

In [None]:
df[df['Govt'].isna()]

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
plt.figure(figsize=(16,7))
_=sns.heatmap(df.corr(), annot=True)
_=plt.title('Correlation heatmap')


In [None]:
df['Income_rating'].unique()

In [None]:
order=['L','LM','UM','H']

In [None]:
# outliers checking
def outliers(df, col, threshold=2):
    # calculate z-scores
    z_scores = (df[col] - np.mean(df[col])) / np.std(df[col])
    # set a threshold for z-scores
    
    # find outliers using the threshold
    outliers = np.where(np.abs(z_scores) > threshold)
    # print the outliers
    print("Outliers for trust in ",col,":\n", df[['Country', col]].iloc[outliers].sort_values(by=col))
    return outliers


We will need to check Pearson correlation coefficient to test the relationship between different columns. The pearsonr() function from the scipy.stats module is used to calculate the correlation coefficient and the associated p-value. Then p-value is checked against the significance level (alpha=0.05) to accept or reject the null hypothesis.

Null hypothesis: There is no relationship between column 1 and 2.
Alternative hypothesis: There is a positive relationship between column 1 and 2.

In [None]:
def corr_hypothesis(df, col1, col2):
    df_corr=df[[col1, col2]].dropna()
    corr, p_value = pearsonr(df_corr[col1], df_corr[col2])
    # print the correlation coefficient and p-value
    print("Correlation coefficient:", corr)
    print("p-value:", p_value)
    if corr>0:
        sign='positive'
    else:
        sign='negative'

    # set the significance level
    alpha = 0.05

    # check if p-value is less than significance level
    if p_value < alpha:
        print("There is a",sign, "relationship between trust in", col1, "and", col2,"(reject null hypothesis)")
    else:
        print("There is no relationship between trust in", col1, "and", col2,"(fail to reject null hypothesis)")


In [None]:
def annotate(df, col1, col2):
    minC1=df['Country'].loc[df[col1]==df[col1].min()].values[0]
    minC2=df['Country'].loc[df[col2]==df[col2].min()].values[0]
    maxC1=df['Country'].loc[df[col1]==df[col1].max()].values[0]
    maxC2=df['Country'].loc[df[col2]==df[col2].max()].values[0]
    plt.text(df[df.Country==minC1][col1],df[df.Country==minC1][col2],minC1, color='black')
    plt.text(df[df.Country==minC2][col1],df[df.Country==minC2][col2],minC2, color='black')
    plt.text(df[df.Country==maxC1][col1]-0.02,df[df.Country==maxC1][col2]+0.02,maxC1, color='black')
    plt.text(df[df.Country==maxC2][col1],df[df.Country==maxC2][col2],maxC2, color='black')

## Trust in government and economic performance

Let's check relations between trust level for Goverment and economic performance for countries by next parameters: GDP per capita, GINI coefficient and grouping by level of income

In [None]:
df[['Country', 'Govt']].sort_values(by='Govt', ascending=False).head(5)

In [None]:
df[['Country', 'Govt']].sort_values(by='Govt', ascending=False).dropna().tail(5)

In [None]:
df["Income_rating"].value_counts().plot(kind='bar', title='Count of Countries by income level');

In [None]:
sns.boxplot(x="Income_rating", y="GDP", data=df, order=order).set(title='Comparison of GDP levels by Income level');

In [None]:
sns.boxplot(x="Income_rating", y="Govt", data=df, order=order).set(title='Comparison of trust to the Govt by Income level');

In [None]:
_=outliers(df[df['Income_rating']=='H'], 'Govt', 2)

In [None]:
_=outliers(df, 'Govt', 2)

In [None]:
sns.lmplot(data=df, x='Govt', y='GDP', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='GDP vs. trust to the Govt by levels by Income level')
annotate(df,'Govt', 'GDP');

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Govt", 
                    hover_data=["Country", "GDP"],
                    title="Trust to Goverment among countries",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
corr_hypothesis(df, 'Govt', 'GDP')

In [None]:
corr_hypothesis(df[df['Income_rating']!='H'], 'Govt', 'GDP')

In [None]:
corr_hypothesis(df[df['Income_rating']=='H'], 'Govt', 'GDP')

In [None]:
fig, ax = plt.subplots(figsize=(12,6))
sns.scatterplot(ax=ax, x=df['Govt'], y=df['GINI'], hue=df['Income_rating'], hue_order=order).set(title='Trust to Govt vs. GINI coef.')
annotate(df, 'Govt', 'GINI');

In [None]:
corr_hypothesis(df, 'Govt', 'GINI')

### Summary:
   - The largest group of countries is considered as countries with High income, same time GDP per capita for this group is much wider distributed than for other groups.
   - Among High income countries the lowest level is in Greece - 17% and highest in Norway - 89%
   - The weakest trust in Government is in Ukraine - 11%
   - There is no relationship between trust to Government and GDP as well as between trust to Government and GINI coefficient, but there is positive correlation between trust to Government and GDP among High income countries.

## Trust in the Healthcare and country statistics

Let's investigate how level of trust to healthcare depends on country data related to health level from: Life expectancy, Mortality rate. Also let's consider information regarding trust to Healers and how it is related to above parameters

In [None]:
df[['Country', 'Doctors', 'Income_rating']].sort_values(by='Doctors', ascending=False).head(5)

In [None]:
df[['Country', 'Doctors', 'Income_rating']].sort_values(by='Doctors', ascending=False).tail(5)

In [None]:
sns.boxplot(x="Income_rating", y="Doctors", data=df, order=order).set(title='Comparison of trust to The Healthcare by Income level');

In [None]:
outl=outliers(df[df['Income_rating']=='H'], 'Doctors', 2)

In [None]:
df[df['Income_rating']=='H'][['Country', 'Doctors', 'Life_exp','Mort_per_1000']].iloc[outl]

In [None]:
outl2=outliers(df, 'Doctors', 2)

In [None]:
df[['Country', 'Doctors', 'Life_exp','Mort_per_1000']].iloc[outl2]

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Doctors", 
                    hover_data=["Mort_per_1000", "Life_exp"],
                    title="Trust to the Healthcare in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
sns.lmplot(data=df, x='Doctors', y='Life_exp', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Healthcare vs. Life Expectancy by levels by Income level')
annotate(df, 'Doctors', 'Life_exp');

In [None]:
corr_hypothesis(df, 'Doctors', 'Life_exp')

In [None]:
sns.lmplot(data=df, x='Doctors', y='Mort_per_1000', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Healthcare vs. Mortality Rate by levels by Income level')
annotate(df, 'Doctors', 'Mort_per_1000');

In [None]:
corr_hypothesis(df, 'Doctors', 'Mort_per_1000')

In [None]:
sns.lmplot(data=df, x='Doctors', y='Healers', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Healthcare vs. trust to the Traditional Medicine by levels by Income level')
annotate(df, 'Doctors', 'Healers');

In [None]:
sns.boxplot(x="Income_rating", y="Healers", data=df, order=order).set(title='Comparison of trust to Traditinal medicine by Income level');

In [None]:
_=outliers(df,'Healers',2)

In [None]:
df[['Country', 'Healers']].sort_values(by='Healers', ascending=False).head(5)

In [None]:
df[['Country', 'Healers']].sort_values(by='Healers', ascending=False).tail(5)

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Healers", 
                    hover_data=["Country","Mort_per_1000", "Life_exp"],
                    title="Trust to Healers in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
corr_hypothesis(df[df['Income_rating']!='L'], 'Healers', 'Doctors')

In [None]:
corr_hypothesis(df[df['Income_rating']=='L'], 'Healers', 'Doctors')

### Summary:
   - Trust to healthcare among all countries positively correlated with Life expectancy and negatively correlated with Mortality rate. 
   - Among 5 countries with lowest trust to healthcare, 4 are Low income country group and 1 - Gabon classified as Upper-Middle income group.
   - Trust in traditional medicine have positive relation with trust in healthcare system amond countries that doesn't belong to Low income group.
   - Most strong trust in traditional medicine is in Sri Lanka.

## Trust to the Science and country socioeconomical parameters

In [None]:
pd.DataFrame(df['Scientists']).describe().T

In [None]:
df[['Country', 'Scientists', 'Income_rating']].sort_values(by='Scientists', ascending=False).head(5)

In [None]:
df[['Country', 'Scientists', 'Income_rating']].sort_values(by='Scientists', ascending=False).tail(5)

In [None]:
sns.boxplot(x="Income_rating", y="Scientists", data=df, order=order).set(title='Comparison of trust to Science by Income level');

In [None]:
outl=outliers(df[df['Income_rating']=='H'], 'Scientists', 2)

In [None]:
df[df['Income_rating']=='H'][['Country', 'Scientists', 'Life_exp','Mort_per_1000']].iloc[outl]

In [None]:
outl2=outliers(df, 'Scientists', 2)

In [None]:
df[['Country', 'Scientists', 'Life_exp','Mort_per_1000']].iloc[outl2]

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Scientists", 
                    hover_data=["Mort_per_1000", "Life_exp"],
                    title="Trust to the Science in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Research_%_GDP", 
                    hover_data=["Scientists"],
                    title="% of GDP allocated to scientific research in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
df[['Country', 'Scientists', 'Research_%_GDP', 'Income_rating']].sort_values(by='Research_%_GDP', ascending=False).head(10)

In [None]:
sns.lmplot(data=df, x='Scientists', y='Life_exp', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Scientific research vs. Life Expectancy by levels by Income level')
annotate(df, 'Scientists', 'Life_exp');

In [None]:
corr_hypothesis(df, 'Scientists', 'Life_exp')

In [None]:
sns.lmplot(data=df, x='Scientists', y='Research_%_GDP', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Science vs. % of GDP spent on research by levels by Income level')
annotate(df, 'Scientists', 'Research_%_GDP');

In [None]:
corr_hypothesis(df, 'Scientists', 'Research_%_GDP')

In [None]:
sns.lmplot(data=df, x='Scientists', y='Research_per_mil', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to the Science vs. number of Researchers by levels by Income level')
annotate(df, 'Scientists', 'Research_per_mil');

In [None]:
corr_hypothesis(df, 'Scientists', 'Research_per_mil')

In [None]:
sns.lmplot(data=df, y='Doctors', x='Scientists', height=6, aspect=1.5).set(title='Trust to the Science vs. trust to the Healthcare')
annotate(df, 'Scientists', 'Doctors');

In [None]:
corr_hypothesis(df, 'Scientists','Doctors')

In [None]:
sns.lmplot(data=df, y='Govt', x='Scientists', height=6, aspect=1.5).set(title='Trust to the Science vs. trust to the Goverment')
annotate(df, 'Scientists', 'Govt');

In [None]:
corr_hypothesis(df, 'Scientists','Govt')

## Summary:
   - Average level of trust to the Science around all countries is 66%
   - The most trusting country is Spain with 94%
   - High income countries tent to trust in the Science more than other groups.
   - In general High income countries tend to spend higher % of their GDP for research and have more scientists per million of population.
   - There is strong positive relation between trust to the Science and trust to the Healthcare.
   - Among top 10 countries by % of GDP spent on scientific research, only Taiwan shows trust level of 62% that is less than average for all countries

## Trust in the media, income level and trust in goverment

Let's check how trust in the media disctributed among countries and whether there is relation between it and other parameters for specific country

In [None]:
df[['Country', 'Journalists', 'Income_rating']].sort_values(by='Journalists', ascending=False).head(5)

In [None]:
df[['Country', 'Journalists', 'Income_rating']].sort_values(by='Journalists', ascending=False).tail(5)

In [None]:
df['Journalists'].describe()

In [None]:
sns.boxplot(x="Income_rating", y="Journalists", data=df, order=order).set(title='Comparison of trust to the Media by Income level');

In [None]:
outl=outliers(df[df['Income_rating']=='H'], 'Journalists', 2)

In [None]:
outl=outliers(df[df['Income_rating']=='UM'], 'Journalists', 2)

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Journalists", 
                    hover_data=["Govt"],
                    title="Trust to the Media in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
sns.lmplot(data=df, x='Journalists', y='Govt', height=6, aspect=1.5).set(title='Trust to the Media vs. trust to the Goverment')
annotate(df, 'Journalists', 'Govt');

In [None]:
corr_hypothesis(df, 'Govt', 'Journalists')

### Summary:
   - Average level of trust to the Media around all countries is 55%
   - The most strong trust to the Media among High income countries is in Finland (81%), the weakest in Greece (12%) and Taiwan (24%)
   - Among other income level groups of countries the strongest trust is in Uzbekistan (89%), Tanzania (86%) and Rwanda	(81%)
   - There is strong relation between trust to the Media and Trust to the government

## Trust to neighbors and NGO workers

How trust to neighbors and NGO workers is distributed by countries and wheter it has relation to other factors as trust to goverment, trust to healthcare, economic factors.

In [None]:
df[['Neighbors', 'NGO']].describe().T

In [None]:
df[['Country', 'Neighbors', 'Income_rating']].sort_values(by='Neighbors', ascending=False).head(5)

In [None]:
df[['Country', 'Neighbors', 'Income_rating']].sort_values(by='Neighbors', ascending=True).head(5)

In [None]:
df[['Country', 'NGO', 'Income_rating']].sort_values(by='NGO', ascending=False).head(5)

In [None]:
df[['Country', 'NGO', 'Income_rating']].sort_values(by='NGO', ascending=False).tail(5)

In [None]:
sns.lmplot(data=df, x='Neighbors', y='NGO', height=6, aspect=1.5).set(title='Trust to the Neighbors vs. trust to the NGO by levels by Income level')
annotate(df,'Neighbors','NGO');

In [None]:
corr_hypothesis(df, 'Neighbors', 'NGO')

In [None]:
sns.boxplot(x="Income_rating", y="Neighbors", data=df, order=order).set(title='Comparison of the trust to the Neighbors by Income level');

In [None]:
fig, ax = plt.subplots(figsize=(12,6))
sns.scatterplot(data=df, x='Neighbors', y='Life_exp', hue='Income_rating', hue_order=order, ax=ax).set(title='Trust to the Neighbors vs. Life Expectancy')
annotate(df,'Neighbors','Life_exp');

In [None]:
corr_hypothesis(df, 'Neighbors', 'Life_exp')

In [None]:
fig, ax = plt.subplots(figsize=(12,6))
sns.scatterplot(data=df, x='Neighbors', y='GINI', hue='Income_rating', hue_order=order, ax=ax).set(title='Trust to the Neighbors vs. GINI coef.')
annotate(df, 'Neighbors', 'GINI');

In [None]:
corr_hypothesis(df, 'Neighbors', 'GINI')

In [None]:
fig = px.choropleth(df, locations="A3",
                    color="Neighbors", 
                    hover_data=["Govt", "NGO", "GINI"],
                    title="Trust to Neighbors in the World",
                    color_continuous_scale=px.colors.diverging.Spectral)
fig.show()

In [None]:
sns.boxplot(x="Income_rating", y="NGO", data=df, order=order).set(title='Comparison of the trust to the NGO by Income level');

In [None]:
sns.lmplot(data=df, x='NGO', y='Govt', hue='Income_rating', hue_order=order, height=6, aspect=1.5).set(title='Trust to NGO workers vs. Trust to Goverment by Income Level')
annotate(df,'NGO', 'Govt');

### Summary:
   - Average level of trust to the Neighbors around all countries is 74%
   - Same time average level of trust to the NGO workers around all countries is 56%
   - The most strong trust to Neighbors is in Uzbekistan (96%), Iceland (95%) and Afganistan (94%)
   - Top 5 countries with hieghest trust to NGO workers belong to High income group. In general High income countries have higher level of life expecancy and so they tend to trust the NGO workers more.
   - There is strong positive relation between trust to Neighbors and trust to the NGO workers with correlation coefficient of 0.67
   - The more inequality in country the less people tend to trust their neighbors


# Analysis Summary

The research aimed to understand the factors that influence trust in different institutions and individuals across countries, and how this trust relates to economic and healthcare outcomes. The data collected included trust levels in government, scientists, doctors, journalists, neighbors, NGO workers by country and GDP, life expectancy, mortality rate and income group (high, upper-middle, lower-middle andlow).

   - Trust in government varies widely across countries, with the highest levels of trust observed in high-income countries.
   - Trust in healthcare is positively correlated with life expectancy and negatively correlated with mortality rate.
   - Trust in traditional medicine has a positive relationship with trust in healthcare among countries that don't belong to the low-income group.
   - Trust in science depends on the welfare of the country and it's ability to allocate higher level of resources for scientific R&D.
   - Trust in Media and NGO is quite low across the Globe, with an average of 55% and 56% level correspondingly.
   - Trust among people is 74% on average and highly correlated to GINI coeffitient that measures inequality. So the more inequality in the country the less people tend to trust each other.