# CIRI Human Rights Dataset Analysis

The CIRI Human Rights Dataset Project is composed of US State Department Country Reports on Human Rights Practices and the Amnesty International's Annual Report. This dataset contains quantitative information on government respect for 15 internationally recognized human rights for 202 countries, annually from 1981 to 2011. The scores in this dataset measure how governments allow or restrict their citizens to exercise rights such as Physical Integrity Rights (Torture, Extrajudicial Killing, Political Imprisonment, and Disappearance), Empowerment Rights (Foreign Movement, Domestic Movement, Freedom of Speech, Freedom of Assembly & Association, Workers’ Rights, Electoral Self-Determination, and Freedom of Religion), Worker’s Rights, Women’s Economic Rights, Women’s Political Rights, and Women’s Social Rights. The dataset also includes an index for Independence of the Judiciary which measures the impartiality of the judiciary of each country.

I chose to work on this dataset because I'm interested to learn how humans all over the world are treated differently depending on their government. This dataset will give us a bigger picture of how the government's practices of human rights in each country have evolved from 1981 to 2011 and what still needs to be improved. I believe this dataset will help us be more informed in this matter and know better what to do to help make the world a fairer place.

From this dataset, I will look at how the government's practices of human rights in each country have changed over the course of 30 years from 1981 to 2011. For each of these rights, I plan to find the countries whose government historically restricts or respects the right. I will also find if these rights have any correlations with one another. Some countries in this dataset experienced periods of interregnum or interruption due to war, and I want to find out how the interregnum or interruption would affect the practice of rights afterward. I also want to look into the correlation between the economy, education, and well-being of a country with its practices of human rights, especially women's rights. Finally, I plan to use Machine Learning to predict the Happiness Score of a country based on the evaluation of its government's practices of human rights.

*Source: Cingranelli, David L., David L. Richards, and K. Chad Clay. 2014. "The CIRI Human Rights Dataset."  http://www.humanrightsdata.com. Version 2014.04.14.*

In [173]:
import pandas as pd
import altair as alt
import numpy as np
import seaborn as sns
from vega_datasets import data
import requests

In [419]:
hr = pd.read_csv('CIRI Data 1981_2011 2014.04.14.csv')
hr.head()

Unnamed: 0,CTRY,YEAR,CIRI,COW,POLITY,UNCTRY,UNREG,UNSUBREG,PHYSINT,DISAP,...,OLD_MOVE,SPEECH,ELECSD,OLD_RELFRE,NEW_RELFRE,WORKER,WECON,WOPOL,WOSOC,INJUD
0,Afghanistan,1981,101,700.0,700.0,4.0,142,62,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,1982,101,700.0,700.0,4.0,142,62,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
2,Afghanistan,1983,101,700.0,700.0,4.0,142,62,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,Afghanistan,1984,101,700.0,700.0,4.0,142,62,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,Afghanistan,1985,101,700.0,700.0,4.0,142,62,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


### [Variable Descriptions](https://drive.google.com/file/d/0BxDpF6GQ-6fbY25CYVRIOTJ2MHM/edit)

In [420]:
#eliminate CIRI, COW, POLITY, UNREG, and UNSBREG columns since they are unnecessary for this analysis
hr = hr.drop(columns = ['CIRI', 'COW', 'POLITY', 'UNREG', 'UNSUBREG'])
hr.head()

Unnamed: 0,CTRY,YEAR,UNCTRY,PHYSINT,DISAP,KILL,POLPRIS,TORT,OLD_EMPINX,NEW_EMPINX,...,OLD_MOVE,SPEECH,ELECSD,OLD_RELFRE,NEW_RELFRE,WORKER,WECON,WOPOL,WOSOC,INJUD
0,Afghanistan,1981,4.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,Afghanistan,1982,4.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
2,Afghanistan,1983,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,Afghanistan,1984,4.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,Afghanistan,1985,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [349]:
#find countries that experienced periods of interregnum (indicated by code -77) or interruption (indicated by code -66)
hr1 = hr[(hr.values == -77) | (hr.values == -66)]
hr1 = hr1.drop_duplicates() #drop duplicate rows
hr1

Unnamed: 0,CTRY,YEAR,UNCTRY,PHYSINT,DISAP,KILL,POLPRIS,TORT,OLD_EMPINX,NEW_EMPINX,...,OLD_MOVE,SPEECH,ELECSD,OLD_RELFRE,NEW_RELFRE,WORKER,WECON,WOPOL,WOSOC,INJUD
11,Afghanistan,1992,4.0,,-77.0,-77.0,-77.0,-77.0,,,...,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0
12,Afghanistan,1993,4.0,,-77.0,-77.0,-77.0,-77.0,,,...,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0
13,Afghanistan,1994,4.0,,-77.0,-77.0,-77.0,-77.0,,,...,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0
14,Afghanistan,1995,4.0,,-77.0,-77.0,-77.0,-77.0,,,...,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0,-77.0
22,Afghanistan,2003,4.0,,-66.0,-66.0,-66.0,-66.0,,,...,-66.0,-66.0,-66.0,-66.0,-66.0,-66.0,-66.0,-66.0,-66.0,-66.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5049,Somalia,2008,706.0,,-77.0,-77.0,-77.0,-77.0,,,...,,-77.0,-77.0,,-77.0,-77.0,-77.0,-77.0,,-77.0
5050,Somalia,2009,706.0,,-77.0,-77.0,-77.0,-77.0,,,...,,-77.0,-77.0,,-77.0,-77.0,-77.0,-77.0,,-77.0
5051,Somalia,2010,706.0,,-77.0,-77.0,-77.0,-77.0,,,...,,-77.0,-77.0,,-77.0,-77.0,-77.0,-77.0,,-77.0
5052,Somalia,2011,706.0,,-77.0,-77.0,-77.0,-77.0,,,...,,-77.0,-77.0,,-77.0,-77.0,-77.0,-77.0,,-77.0


In [143]:
hr1['YEAR'][hr1['CTRY'] == 'Somalia'].unique() #find years that Somalia experienced interregnum or interruption

array([1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
       2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011])

In [353]:
#replace -77, -66, -999 (indicates missing values) in dataset with NaN
hr = hr.replace([-77, -66, -999], np.nan) #worked without the dict.fromkeys()

In [352]:
hr[hr['DISAP'] == -66] #test to see if the above works

Unnamed: 0,CTRY,YEAR,UNCTRY,PHYSINT,DISAP,KILL,POLPRIS,TORT,OLD_EMPINX,NEW_EMPINX,...,OLD_MOVE,SPEECH,ELECSD,OLD_RELFRE,NEW_RELFRE,WORKER,WECON,WOPOL,WOSOC,INJUD


In [354]:
#find mean values for all indicators of rights
hr_mean = hr.groupby('CTRY').mean() #automatically skips na
hr_mean['Country'] = hr_mean.index # CTRY is index, not a column to be passed to tooltip function
hr_mean

Unnamed: 0_level_0,YEAR,UNCTRY,PHYSINT,DISAP,KILL,POLPRIS,TORT,OLD_EMPINX,NEW_EMPINX,ASSN,...,SPEECH,ELECSD,OLD_RELFRE,NEW_RELFRE,WORKER,WECON,WOPOL,WOSOC,INJUD,Country
CTRY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,1996,4.0,1.200000,0.600000,0.200000,0.320000,0.080000,1.050000,1.920000,0.240000,...,0.200000,0.360000,0.200000,0.440000,0.040000,0.000000,0.960000,0.052632,0.000000,Afghanistan
Albania,1996,8.0,4.884615,1.935484,1.230769,1.032258,0.451613,5.000000,7.225806,0.967742,...,0.806452,0.935484,0.615385,1.032258,0.903226,1.032258,1.967742,1.045455,0.580645,Albania
Algeria,1996,12.0,3.935484,1.548387,1.000000,0.741935,0.645161,3.730769,4.451613,0.129032,...,0.322581,0.548387,0.423077,0.774194,0.451613,1.071429,1.580645,0.166667,1.096774,Algeria
Andorra,1996,20.0,8.000000,2.000000,2.000000,2.000000,2.000000,8.800000,12.400000,2.000000,...,1.900000,2.000000,1.000000,2.000000,0.500000,1.400000,2.500000,1.000000,1.263158,Andorra
Angola,1996,17.0,2.433333,1.033333,0.366667,0.633333,0.400000,2.040000,3.600000,0.366667,...,0.333333,0.300000,0.720000,1.400000,0.233333,1.000000,2.033333,1.000000,0.200000,Angola
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Yemen, South",1996,,,,,,,,,,...,,,,,,,,,,"Yemen, South"
Yugoslavia,1996,891.0,3.545455,1.363636,1.454545,0.181818,0.545455,4.363636,6.000000,0.090909,...,0.818182,0.909091,0.000000,0.545455,0.818182,1.909091,1.888889,2.000000,0.909091,Yugoslavia
"Yugoslavia, Federal Republic of",1996,891.0,4.000000,1.666667,1.000000,0.666667,0.666667,5.000000,7.333333,1.333333,...,1.000000,1.000000,0.666667,0.333333,0.333333,1.000000,2.000000,1.000000,0.666667,"Yugoslavia, Federal Republic of"
Zambia,1996,894.0,4.580645,1.903226,1.161290,1.129032,0.387097,5.692308,7.483871,0.903226,...,0.935484,1.000000,0.923077,1.451613,0.741935,0.870968,2.000000,0.880000,1.580645,Zambia


In [416]:
#function to graph geographically the mean of each indicator from 1981-2011

def geograph(indicator):
    countries = alt.topo_feature(data.world_110m.url, 'countries')
    graph = alt.Chart(countries).mark_geoshape(stroke='white'
                                ).encode(color=indicator+':Q', tooltip=['Country:N', indicator+':Q']
                                ).transform_lookup(lookup='id', from_=alt.LookupData(hr_mean, 'UNCTRY', list(hr_mean.columns.values))
                                ).properties(width=750, height=450
                                ).project('equirectangular')

    return graph

In [335]:
geograph('PHYSINT') #graph for Physical Integrity Rights Index

In [336]:
geograph('NEW_EMPINX') #graph for Empowerment Rights Index

In [341]:
geograph('WECON') #graph for Women’s Economic Rights

In [342]:
geograph('WOPOL') #graph for Women’s Political Rights

In [344]:
geograph('WOSOC') #graph for Women’s Social Rights

In [345]:
geograph('INJUD') #graph for Independence of the Judiciary

In [409]:
#import world_countries dataset
wd = pd.read_csv('world_countries.csv')
wd['CTRY'] = wd['Country'].str.strip()
wd['Region'] = wd.Region.str.strip()
wd = wd.set_index('CTRY')

In [410]:
#merge wd[['GDP', 'Literacy']] into hr_mean dataset
new_hr_mean = pd.merge(hr_mean, wd[['Region', 'GDP', 'Literacy']], how='left', on='CTRY')

In [411]:
#import happiness_2012 dataset
hp = pd.read_csv('happiness_2012.csv')
hp['CTRY'] = hp['country'].str.strip()
hp = hp.set_index('CTRY')

In [412]:
#merge hp['Happiness Score'] into new_hr_mean
new_hr_mean = pd.merge(new_hr_mean, hp['Happiness Score'], how='left', on='CTRY')
new_hr_mean

Unnamed: 0_level_0,YEAR,UNCTRY,PHYSINT,DISAP,KILL,POLPRIS,TORT,OLD_EMPINX,NEW_EMPINX,ASSN,...,WORKER,WECON,WOPOL,WOSOC,INJUD,Country,Region,GDP,Literacy,Happiness Score
CTRY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,1996,4.0,1.200000,0.600000,0.200000,0.320000,0.080000,1.050000,1.920000,0.240000,...,0.040000,0.000000,0.960000,0.052632,0.000000,Afghanistan,ASIA (EX. NEAR EAST),700.0,36.0,4.040
Albania,1996,8.0,4.884615,1.935484,1.230769,1.032258,0.451613,5.000000,7.225806,0.967742,...,0.903226,1.032258,1.967742,1.045455,0.580645,Albania,EASTERN EUROPE,4500.0,86.5,5.550
Algeria,1996,12.0,3.935484,1.548387,1.000000,0.741935,0.645161,3.730769,4.451613,0.129032,...,0.451613,1.071429,1.580645,0.166667,1.096774,Algeria,NORTHERN AFRICA,6000.0,70.0,5.422
Andorra,1996,20.0,8.000000,2.000000,2.000000,2.000000,2.000000,8.800000,12.400000,2.000000,...,0.500000,1.400000,2.500000,1.000000,1.263158,Andorra,WESTERN EUROPE,19000.0,100.0,
Angola,1996,17.0,2.433333,1.033333,0.366667,0.633333,0.400000,2.040000,3.600000,0.366667,...,0.233333,1.000000,2.033333,1.000000,0.200000,Angola,SUB-SAHARAN AFRICA,1900.0,42.0,5.589
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Yemen, South",1996,,,,,,,,,,...,,,,,,"Yemen, South",,,,
Yugoslavia,1996,891.0,3.545455,1.363636,1.454545,0.181818,0.545455,4.363636,6.000000,0.090909,...,0.818182,1.909091,1.888889,2.000000,0.909091,Yugoslavia,,,,
"Yugoslavia, Federal Republic of",1996,891.0,4.000000,1.666667,1.000000,0.666667,0.666667,5.000000,7.333333,1.333333,...,0.333333,1.000000,2.000000,1.000000,0.666667,"Yugoslavia, Federal Republic of",,,,
Zambia,1996,894.0,4.580645,1.903226,1.161290,1.129032,0.387097,5.692308,7.483871,0.903226,...,0.741935,0.870968,2.000000,0.880000,1.580645,Zambia,SUB-SAHARAN AFRICA,800.0,80.6,5.006


In [417]:
#graphs of correlations between indicators

alt.Chart(new_hr_mean).mark_circle().encode(
    alt.X(alt.repeat('column'), type='quantitative'),
    alt.Y(alt.repeat('row'), type='quantitative'),
    tooltip='Country', color='Region'
).properties(width=150, height=150).repeat(
    row=['GDP', 'Literacy', 'Happiness Score'],
    column=['PHYSINT', 'NEW_EMPINX', 'WECON', 'WOPOL', 'WOSOC', 'INJUD']
).interactive()