# Analyzing Religion Data

----

#### Summary
In this notebook I will be working on 'Religion.csv'. I am interested in whether religion has an impact on suicide rate, and if so, which religion has the most effect. I will be comparing different religions using regression models. For reference, here is an article about how different religions view suicide differently: https://www.mpac.org/programs/anti-terrorism-campaign/islamic-views-regarding-terrorism-and-suicidem/religious-views-on-suicide.php

#### Key Questions
* Is there a relationship between religion and suicide rates? 
* If so, which religion is more negatively correlated with suicide rate? 

In [37]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

df1 = pd.read_csv('../data/Raw_data/Religion.csv')
suicide = pd.read_csv('../data/Cleaned_data/suicide_total.csv')

#### Cleaning up data
* After taking a look at the data frame I realized that it has data from years 2010,2020,2030,2040,2050. So I decided to focus on year 2010 for the regression because it's the only year that matches what I have from the world suicide data
    * created a subset with data from year 2016 only
    * the first 8 rows are not countries but all countries in a continent so I decided to skip those 
* Other than that the data looks very clean already so I left everything else as is

In [38]:
df1
df1.Year.value_counts()
#realized that there are more year than I need so created a subset with data from 2010 only
religion = df1.loc[df1['Year'] == 2010]

#drop rows that are not countries
religion.head(10)
religion = religion.drop(religion.index[:7])
religion

Unnamed: 0,row_number,level,Nation_fk,Year,Region,Country,All Religions,Buddhists,Christians,Folk Religions,Hindus,Jews,Muslims,Other Religions,Unaffiliated
7,8,1,1,2010,Asia-Pacific,Afghanistan,100,< 1.0,< 1.0,< 1.0,< 1.0,< 1.0,>99.0,< 1.0,< 1.0
8,9,1,2,2010,Europe,Albania,100,< 1.0,18.0,< 1.0,< 1.0,< 1.0,80.3,< 1.0,1.4
9,10,1,3,2010,Middle East-North Africa,Algeria,100,< 1.0,< 1.0,< 1.0,< 1.0,< 1.0,97.9,< 1.0,1.8
10,11,1,4,2010,Asia-Pacific,American Samoa,100,< 1.0,98.3,< 1.0,< 1.0,< 1.0,< 1.0,< 1.0,< 1.0
11,12,1,5,2010,Europe,Andorra,100,< 1.0,89.5,< 1.0,< 1.0,< 1.0,< 1.0,< 1.0,8.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,237,1,232,2010,Sub-Saharan Africa,Zimbabwe,100,< 1.0,87.0,3.8,< 1.0,< 1.0,< 1.0,< 1.0,7.9
237,238,1,237,2010,Sub-Saharan Africa,South Sudan,100,< 1.0,60.5,32.9,< 1.0,< 1.0,6.2,< 1.0,< 1.0
238,239,1,238,2010,Latin America-Caribbean,Curacao,100,< 1.0,93.9,1.2,< 1.0,< 1.0,< 1.0,< 1.0,3.3
239,240,1,239,2010,Latin America-Caribbean,Sint Maarten,100,< 1.0,93.9,1.2,< 1.0,< 1.0,< 1.0,< 1.0,3.3


In [41]:
#merging the 2 data frames left us with 162 countries
merged = pd.merge(left=religion, right=suicide, left_on='Country', right_on='Country')
merged

Unnamed: 0,row_number,level,Nation_fk,Year,Region,Country,All Religions,Buddhists,Christians,Folk Religions,...,Jews,Muslims,Other Religions,Unaffiliated,Sex,2016,2015,2010,2000,means
0,8,1,1,2010,Asia-Pacific,Afghanistan,100,< 1.0,< 1.0,< 1.0,...,< 1.0,>99.0,< 1.0,< 1.0,Both sexes,6.4,6.6,7.4,8.1,7.300000
1,9,1,2,2010,Europe,Albania,100,< 1.0,18.0,< 1.0,...,< 1.0,80.3,< 1.0,1.4,Both sexes,5.6,5.3,7.7,5.8,6.366667
2,10,1,3,2010,Middle East-North Africa,Algeria,100,< 1.0,< 1.0,< 1.0,...,< 1.0,97.9,< 1.0,1.8,Both sexes,3.3,3.4,3.5,4.7,3.833333
3,13,1,6,2010,Sub-Saharan Africa,Angola,100,< 1.0,90.5,4.2,...,< 1.0,< 1.0,< 1.0,5.1,Both sexes,8.9,9.3,10.4,13.9,11.066667
4,15,1,8,2010,Latin America-Caribbean,Antigua and Barbuda,100,< 1.0,93.0,3.6,...,< 1.0,< 1.0,< 1.0,1.7,Both sexes,0.5,0.8,0.2,2.1,0.933333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,230,1,225,2010,Asia-Pacific,Vanuatu,100,< 1.0,93.3,4.1,...,< 1.0,< 1.0,1.4,1.2,Both sexes,5.4,5.5,6.2,8.7,6.766667
158,235,1,230,2010,Middle East-North Africa,Yemen,100,< 1.0,< 1.0,< 1.0,...,< 1.0,>99.0,< 1.0,< 1.0,Both sexes,9.8,9.9,10.6,9.1,9.833333
159,236,1,231,2010,Sub-Saharan Africa,Zambia,100,< 1.0,97.6,< 1.0,...,< 1.0,< 1.0,< 1.0,< 1.0,Both sexes,11.3,11.2,11.5,14.1,12.300000
160,237,1,232,2010,Sub-Saharan Africa,Zimbabwe,100,< 1.0,87.0,3.8,...,< 1.0,< 1.0,< 1.0,7.9,Both sexes,19.1,18.9,20.6,21.7,20.466667


#### Cleaning up the data 2
After a failed attempt to do regression with the merged the data I realized that I have to perform more steps to make the regression model work
* in the original data there are values "<1.0" and ">99.0", which can't be read of the model. Therefore I changed <1.0 to 0.5 and >99.0 to 99.5
* In addition the numbers in the religion file are actually in the class object so I had to turn it into float64
* Lastly, the model had trouble reading the column title 2010, so I made a new column with the same values but with a different title (in characters)

In [None]:
merged = merged.replace(['< 1.0'],'0.5')
merged = merged.replace(['>99.0'],'99.5')

print (merged.dtypes)
merged["Christians"] = merged.Christians.astype(float)
merged["Buddhists"] = merged.Buddhists.astype(float)
merged["Hindus"] = merged.Hindus.astype(float)
merged["Jews"] = merged.Jews.astype(float)
merged["Muslims"] = merged.Muslims.astype(float)

merged['main']=merged['2010']
merged

#### Examine relationships
Multiple regression reveals a negative relationship between religion and suicide rate, which suggests that higher proportion of religious people is associated with lower suicide rate. This is true for all 5 religions examined, though the relationship is only statistically significant for Christians and Muslims. 


In [6]:
m1 = smf.ols(formula= 'main ~ Christians + Jews + Muslims + Buddhists+Hindus', data=merged).fit()
m1.summary()

0,1,2,3
Dep. Variable:,main,R-squared:,0.109
Model:,OLS,Adj. R-squared:,0.08
Method:,Least Squares,F-statistic:,3.805
Date:,"Wed, 23 Dec 2020",Prob (F-statistic):,0.0028
Time:,16:32:01,Log-Likelihood:,-522.09
No. Observations:,162,AIC:,1056.0
Df Residuals:,156,BIC:,1075.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,19.4003,3.467,5.595,0.000,12.551,26.250
Christians,-0.0923,0.039,-2.347,0.020,-0.170,-0.015
Jews,-0.1287,0.090,-1.431,0.154,-0.306,0.049
Muslims,-0.1222,0.036,-3.376,0.001,-0.194,-0.051
Buddhists,-0.0832,0.054,-1.543,0.125,-0.190,0.023
Hindus,-0.0245,0.057,-0.427,0.670,-0.138,0.089

0,1,2,3
Omnibus:,55.345,Durbin-Watson:,2.04
Prob(Omnibus):,0.0,Jarque-Bera (JB):,125.853
Skew:,1.498,Prob(JB):,4.69e-28
Kurtosis:,6.109,Cond. No.,484.0
