# Joshua Project - a closer look

---  

This notebook contains some analysis on data available through <a href='https://joshuaproject.net/'>The Joshua Project </a>- which is described<sup>[1]</sup> as an organization seeking to highlight the ethnic groups of the world with the least followers of evangelical Christianity.  
  
Joshua Project is an organisation that aims to convert non-Christians across the world and offers highly targeted and specific action points for each of its target groups across countries. The data is publicly available [here](https://joshuaproject.net/resources/datasets).  

### Importing the tools and libraries

In [1]:
from matplotlib.pylab import rcParams
import matplotlib.pyplot as plt
from iso3166 import countries
import iso3166 as iso
import pandas as pd
import geocoder
import gmaps
import math
import gmplot




%matplotlib inline
rcParams['figure.figsize'] = (17,17)

### Importing the dataset  
  
The dataset is available for download [here]().

In [2]:
data = pd.read_csv('data/AllPeoplesByCountry.csv', skiprows=1, skipfooter=16, engine='python')

In [3]:
data.head(2)

Unnamed: 0,ROG3,Ctry,PeopleID3,ROP3,PeopNameAcrossCountries,PeopNameInCountry,Population,JPScale,LeastReached,ROL3,...,RegionCode,RegionName,ROG2,Continent,10_40Window,RaceCode,Latitude,Longitude,WorkersMin,WorkersMax
0,AF,Afghanistan,14372,107989.0,Afghan,Afghan,8207000.0,1,Y,prs,...,5,Central Asia,ASI,Asia,Y,CNT24f,31.15621,62.14612,165,170
1,AF,Afghanistan,19409,100096.0,Afshari,Afshari,13000.0,1,Y,azb,...,5,Central Asia,ASI,Asia,Y,MSY41a,34.44796,69.28976,1,2


### Quick plot of world distribution

In [4]:
world_map = gmplot.GoogleMapPlotter('20.5937', '78.9629',12)

In [5]:
for i in range(1,35):
    world_map.heatmap(data['Latitude'][500*(i-1):500*i],data['Longitude'][500*(i-1):500*i])

In [6]:
world_map.heatmap(data['Latitude'],data['Longitude'])

In [7]:
world_map.draw('World_Map.html')

### Distribution by countries

In [8]:
import plotly.plotly as py
import plotly

In [9]:
PLOTLY_API_KEY = open('PLOTLY_API_KEY','r').read().strip('\'')

In [10]:
plotly.tools.set_credentials_file(username='JanakAJain', api_key=PLOTLY_API_KEY)

In [11]:
wdata = pd.DataFrame(data.groupby(['Ctry']).count()['ROG3'].sort_values(ascending=False))
wdata = wdata.reset_index()

In [12]:
wdata.head(2)

Unnamed: 0,Ctry,ROG3
0,India,2510
1,Papua New Guinea,885


In [13]:
# Adding a column to contain ISO complaint 3 character codes for countries

wdata['GCode'] = ''
for i, row in wdata.iterrows():
    try:
        wdata.loc[i,'GCode'] = countries.get(wdata.loc[i,'Ctry'])[2]
    except:
        continue

In [14]:
# Note: Some country codes were inconsistent with the package's names. The details for these records 
# were manually entered. The resultant file is available in the data folder. 

In [15]:
wdata = pd.read_csv('data/geo_data_codes.csv')
wdata.head()

Unnamed: 0.1,Unnamed: 0,Ctry,ROG3,GCode
0,0,India,2510,IND
1,1,Papua New Guinea,885,PNG
2,2,Indonesia,779,IDN
3,3,Nigeria,544,NGA
4,4,China,544,CHN


### Function to plot the world maps

In [16]:
def plot_on_world_map(data, z='ROG3',title='Number of target groups'):
    
    data = [ dict(
        type = 'choropleth',
        locations = wdata['GCode'],
        z = wdata[z],
        text = wdata['Ctry'],
        colorscale = [[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
            [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"]],
        autocolorscale = False,
        reversescale = True,
        marker = dict(
            line = dict (
                color = 'rgb(180,180,180)',
                width = 0.5
            ) ),
        colorbar = dict(
            autotick = False,
            tickprefix = '',
            title = title),
      ) ]
    
    layout = dict(
    title = 'Joshua Project Target Areas - '+ z + '<br>Source:\
            <a href="https://joshuaproject.net/resources/datasets/1">\
            Data Available Here</a>',
    geo = dict(
        showframe = False,
        showcoastlines = False,
        projection = dict(
            type = 'Mercator'
            )
        )
    )
    
    fig = dict( data=data, layout=layout )
    
    return py.iplot( fig, validate=False, filename='d3-world-map' )

In [17]:
plot_on_world_map(wdata,'ROG3')

High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~JanakAJain/0 or inside your plot.ly account where it is named 'd3-world-map'


In [18]:
wdata = wdata.join(pd.DataFrame(data.groupby(['Ctry']).sum()['Population'].sort_values(ascending=False)), on='Ctry', how = 'left')

In [19]:
plot_on_world_map(wdata,'Population','Population')

High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~JanakAJain/0 or inside your plot.ly account where it is named 'd3-world-map'


In [20]:
### Mapping the countries with highest 

In [21]:
wdata = wdata.join(pd.DataFrame(data.groupby(['Ctry']).mean()[['JPScale','PercentAdherents']].sort_values(by=['JPScale','PercentAdherents'],ascending=[False,True])), on='Ctry', how = 'left')

In [22]:
plot_on_world_map(wdata,'JPScale')

High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~JanakAJain/0 or inside your plot.ly account where it is named 'd3-world-map'


#### Let's check the countries with top 'performing' missions where Christianity doesn't form a simple majority.  
Please note that in many of these countries it is possible for Christianity to still be the primary religion despite not being followed by more than 50% of the population.  
  
**12 out of 18** of these countries are in Africa.

In [23]:
wdata[(wdata['PercentAdherents'] <=50.00) & (wdata['JPScale'] >=3)]

Unnamed: 0.1,Unnamed: 0,Ctry,ROG3,GCode,Population,JPScale,PercentAdherents
3,3,Nigeria,544,NGA,191635230.0,3.441176,39.869982
9,9,Brazil,310,BRA,211097370.0,3.306452,43.550387
14,14,Philippines,200,PHL,103641180.0,3.715,43.8055
19,19,Tanzania,156,TZA,56815900.0,3.653846,47.094551
20,20,Myanmar (Burma),146,MMR,54766900.0,3.390411,35.845212
26,26,Ethiopia,113,ETH,104183400.0,3.327434,45.009327
27,27,Ghana,112,GHA,28568300.0,3.830357,44.04125
30,30,Kenya,111,KEN,48354500.0,3.522523,45.112324
46,46,South Sudan,78,SSD,12858800.0,3.705128,34.032051
58,58,Benin,65,BEN,11389200.0,3.030769,24.286


### Is there a relation between performance and percent of Christian adherents?

In [24]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score, train_test_split

In [25]:
# Let us first create a copy of our dataset

reg_data = wdata.copy()

In [26]:
reg_data = reg_data.dropna()

In [27]:
y = reg_data['JPScale']

del reg_data['JPScale']

X = reg_data[['Population','PercentAdherents','ROG3']]

In [28]:
X_train , X_test , y_train , y_test = train_test_split(X,y)

In [29]:
X_train.shape , X_test.shape , y_train.shape , y_test.shape

((176, 3), (59, 3), (176,), (59,))

In [30]:
reg_model = LinearRegression().fit(X=X_train,y=y_train)

In [31]:
y_pred = reg_model.predict(X_test)

In [32]:
print('Score = ' + str(reg_model.score(X_test,y_test)*100) + '%')

Score = 71.9487655487%


In [33]:
reg_model.coef_

array([ -1.92687601e-09,   3.31158932e-02,   8.92252195e-04])

#### It seems that JPScale (performance) is <font color='green'> positively correlated </font>with both the number of target groups and percentage of adherent population.

--- 
### Let's look at India

In [34]:
india = data[data['Ctry'] == 'India']

In [35]:
india.groupby(['PrimaryReligion']).count()['ROG3'].sort_values(ascending = False)

PrimaryReligion
Hinduism            1800
Islam                380
Christianity         135
Other / Small         93
Buddhism              56
Unknown               45
Ethnic Religions       1
Name: ROG3, dtype: int64

In [36]:
india_top_christian_targets = india[(india['PrimaryReligion'] == 'Christianity') & (india['JPScale'] >= 4)]

In [37]:
india_top_c_lat = india_top_christian_targets['Latitude'].astype(float)
india_top_c_lon = india_top_christian_targets['Longitude'].astype(float)

In [38]:
india_top_c_map = gmplot.GoogleMapPlotter('26.1445', '91.7362','4')

In [39]:
india_top_c_map.heatmap(india_top_c_lat, india_top_c_lon)

In [40]:
india_top_c_map.draw("India_Top_Christian_Targets.html")

In [41]:
india_top_christian_targets[['Latitude','Longitude']]  # These locations are in the bordering North Eastern areas.

Unnamed: 0,Latitude,Longitude
6032,25.674381,90.331111
6902,23.09445,93.188852
6904,22.980801,93.048139
6905,22.36849,93.0
6906,23.84523,92.879
6907,23.845,92.87


#### All these vulnerable groups belong to (perhaps the most distant) states in the North East.

---  
## References

1. https://en.wikipedia.org/wiki/Joshua_Project