In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
from collections import OrderedDict
import seaborn as sns


# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))
world = pd.read_csv('../input/global.csv')
region = pd.read_csv('../input/regional.csv')
national = pd.read_csv('../input/national.csv')

# Any results you write to the current directory are saved as output.

In [None]:
list(national['year'].unique())

# Orthodox judaism on global rise?

The development of the different groups in Judaism was scrutinized from 1945 to 2010. "Orthodox" Judaism was observed to be the strongest growing group while the amount of followers classified as "other" decreased. Future trends were estimated using linear regression and showed that within less than two decades the majority of followers of Judaism would be classified as "orthodox". 
A follow up analysis of trends in single countries indicated that these changes could rather reflect immigration to Eretz Israel. It seemed that different classification had been used for different countries as the number of followers of "orthodox" and "other" Judaism were strongly correlated with the overall followers in Eretz Israel and in the Diaspora (excluding USA and Canada), respectively.

### Data set
The data set Correlates of War: World Religions was analyzed on trends in religious affiliation. This data set was chosen as it is the only data set on kaggle representing global data on religious affiliation. Several numbers were quickly checked against wikipedia articles and the Berman Jewish Databank. The data set assumes that there were 14.023 million jews living in 2015.  According to Berman Jewish Databank it should be 14.310. That leads to a 2% difference in the estimates. The data sets are further considered valid in terms of total numbers. The data set is regarded as accurate and highly interesting.
http://www.jewishdatabank.org/Studies/details.cfm?StudyID=803 

## Motivation
After the enlightenment in Europe and the opening of European society in the last centuries reform movements in Judaism started and multitudes left the orthodox world. Shortly after, the destruction of European Jewry followed during the second world war.  The European refugees fled were ever they could struggling to survive and adapt to new circumstances. All signs were on destruction as in 1945 orthodox Judaism had only a few hundred thousand followers. At the same time reform movements gained even more popularity. Is Judaism ultimately undergoing a global transformation abandoning the old ways of life?
In 1948 the State of Israel was founded serving as a save harbor for Jews suffering from prosecution around the world. Subsequently new movements in the orthodox world started focusing on educating non religious and reform Jews and bringing them back to the original way of life.

## Research questions

How is the partitioning  of followers of Judaism changing over time?
or asked differently: Are the Jews becoming frum?

Is there any global pattern discernible showing higher popularity of particular groups in different areas or countries?
or asked differently: Where are they becoming frum?

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
#axis.yaxis.grid(True)
#axis.xaxis.grid(True)

axis.set_xlim(1945, 2010)
axis.set_ylim(0,7000000)
plt.plot(world['year'].values, world['judaism_orthodox'].values)

axis.set_title('Rise in followers of orthodox Judaism since 1945',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('number of orthodox Jews',fontsize=20)
Y = world['year'].values
X1 = world['judaism_orthodox'].values
axis.fill_between(Y, 0, X1,facecolor='black', alpha=0.9)

line_X1 = axis.plot(Y, X1, label = "orthodox",color='black')


### Constant rise since 1945
The number of followers of orthodox Judaism increased constantly from less than one million in 1945 to over 6 million in 2010 in total numbers of followers.

This could be attributed to overall growth Judaism. 

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_title('Number of followers of the different streams of Judaism',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('number of followers',fontsize=20)
axis.set_xlim(1945, 2010)

Y = world['year']
X1 = world['judaism_orthodox']
X2 = world['judaism_conservative']
X3 = world['judaism_reform']
X4 = world['judaism_other']
X5 = world['judaism_all']

line_X1 = axis.plot(Y, X1, label = "orthodox", linewidth=10, linestyle="-", c="black")
line_X2 = axis.plot(Y, X2, label = "conservative", linewidth=10, linestyle="-", c="blue")
line_X3 = axis.plot(Y, X3, label = "reform",linewidth=10, linestyle="-", c="green")
line_X4 = axis.plot(Y, X4, label = "other",linewidth=10, linestyle="-", c="orange")
line_X5 = axis.plot(Y, X5,linewidth =10, linestyle="-", c="purple")


plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

### Total number of followers of Judaism constant

It appears that over the observed time the total number of  followers of Judaism were roughly constant. From all streams of Judaism "orthodox" seems to be the only one increasing in followers.

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('percentage of followers',fontsize=20)
axis.set_title('Proportional distribution of the different streams in Judaism',fontsize=25)


Y = world['year']
X1 = world['judaism_orthodox']/ world['judaism_all']
X2 = world['judaism_conservative']/ world['judaism_all']
X3 = world['judaism_reform']/ world['judaism_all']
X4 = world['judaism_other']/ world['judaism_all']
X5 = world['judaism_all']/ world['judaism_all']

line_X1 = axis.plot(Y, X1, label = "orthodox", linewidth=10, linestyle="-", c="black",
         solid_capstyle="round")
line_X2 = axis.plot(Y, X2, label = "conservative", linewidth=10, linestyle="-", c="blue",
         solid_capstyle="round")
line_X3 = axis.plot(Y, X3, label = "reform", linewidth=10, linestyle="-", c="green",
         solid_capstyle="round")
line_X4 = axis.plot(Y, X4, label = "other", linewidth=10, linestyle="-", c="orange",
         solid_capstyle="round")
line_X5 = axis.plot(Y, X5, label ="all", linewidth=5, linestyle="-", c="purple",
         solid_capstyle="round")


plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

In [None]:
print(world['judaism_all'].tail(1))
print((14.310-14.023)/14.310)

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)
axis.set_xlim(1945, 2010)
axis.set_ylim(0,1)

axis.set_title('Percentual distribution of the different streams in Judaism',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('percentage of followers',fontsize=20)

#line_ort = axis.plot(year, ort, label = "real orthodox")

Y = world['year']
X1 = world['judaism_orthodox']/ world['judaism_all']
X2 = world['judaism_conservative']/ world['judaism_all']
X3 = world['judaism_reform']/ world['judaism_all']
X4 = world['judaism_other']/ world['judaism_all']
X5 = world['judaism_all']/ world['judaism_all']
axis.stackplot(Y.values.flatten('F'), X1.values.flatten('F'), X2.values.flatten('F'), X3.values.flatten('F'), X4.values.flatten('F'),colors=['black','blue','green','orange'],labels=['orthodox','conservatice','reform','other'])

plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

In [None]:
labels = 'orthodox', 'conservative', 'reform', 'other'
size1945 = [X1[0], X2[0], X3[0], X4[0]]
size1975 = [X1[6],X2[6],X3[6],X4[6]]
size2010 = [X1[13], X2[13],X3[13],X4[13]]
explode = (0, 0, 0, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')

fig1, (ax1,ax2,ax3) = plt.subplots(1,3,figsize=[30,10])

ax1.set_title('1945',fontsize=25)
ax1.pie(size1945, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

ax2.set_title('1975',fontsize=25)
ax2.pie(size1975, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

ax3.set_title('2010',fontsize=25)
ax3.pie(size2010, explode=explode, labels=labels,textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
ax3.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.suptitle('Proportional distribution of followers of different streams of Judaism at selected timepoints ',fontsize=30)
plt.show()

### Proportional Distribution
The proportional distribution of followers of the different streams of Judaism shows a clear proportional rise of orthodox Judaism over other stream. Particularly at the expense of "other" Judaism

In [None]:
Y = world['year']
X1 = world['judaism_orthodox']
X2 = world['judaism_conservative']
X3 = world['judaism_reform']
X4 = world['judaism_other']
X5 = world['judaism_all']

year = Y.values.reshape(-1,1)
ort = X1.values.reshape(-1,1)
con = X2.values.reshape(-1,1)
ref = X3.values.reshape(-1,1)
oth = X4.values.reshape(-1,1)
together = X5.values.reshape(-1,1)

## Prediction with linear regression
To get an estimate how the proportional distribution will develop in the future linear regressions were fitted on the current data. Other generalized linear models might be suited better, but for simplicity reasons linear regression was chosen.

In [None]:
ort_train, ort_test, y_train, y_test = train_test_split(ort, year, test_size=0.25, random_state=665)
ort_regressor = LinearRegression()
ort_regressor.fit(y_train,ort_train)
ort_prediction = ort_regressor.predict(y_test)
RMSE = sqrt(mean_squared_error(y_true = ort_test, y_pred = ort_prediction))
print("{0:.2f}".format(RMSE/(np.amax(ort_test)-np.amin(ort_test))), " normalized RMSE")
print("{0:.2f}".format(ort_test.std()/(np.amax(ort_test)-np.amin(ort_test))), " normalized STD test data")
print("{0:.2f}".format(ort.std()/(np.amax(ort)-np.amin(ort))), " normalized STD all data")


In [None]:
con_train, con_test, y_train, y_test = train_test_split(con, year, test_size=0.25, random_state=665)
con_regressor = LinearRegression()
con_regressor.fit(y_train,con_train)
con_prediction = con_regressor.predict(y_test)
RMSE = sqrt(mean_squared_error(y_true = con_test, y_pred = con_prediction))
print("{0:.2f}".format(RMSE/(np.amax(con_test)-np.amin(con_test))), " normalized RMSE")
print("{0:.2f}".format(con_test.std()/(np.amax(con_test)-np.amin(con_test))), " normalized STD test data")
print("{0:.2f}".format(con.std()/(np.amax(con)-np.amin(con))), " normalized STD all data")


In [None]:
ref_train, ref_test, y_train, y_test = train_test_split(ref, year, test_size=0.25, random_state=665)
ref_regressor = LinearRegression()
ref_regressor.fit(y_train,ref_train)
ref_prediction = ref_regressor.predict(y_test)
RMSE = sqrt(mean_squared_error(y_true = ref_test, y_pred = ref_prediction))
print("{0:.2f}".format(RMSE/(np.amax(ref_test)-np.amin(ref_test))), " normalized RMSE")
print("{0:.2f}".format(ref_test.std()/(np.amax(ref_test)-np.amin(ref_test))), " normalized STD test data")
print("{0:.2f}".format(ref.std()/(np.amax(ref)-np.amin(ref))), " normalized STD all data")


In [None]:
oth_train, oth_test, y_train, y_test = train_test_split(oth, year, test_size=0.25, random_state=665)
oth_regressor = LinearRegression()
oth_regressor.fit(y_train,oth_train)
oth_prediction = oth_regressor.predict(y_test)
RMSE = sqrt(mean_squared_error(y_true = oth_test, y_pred = oth_prediction))
print("{0:.2f}".format(RMSE/(np.amax(oth_test)-np.amin(oth_test))), " normalized RMSE")
print("{0:.2f}".format(oth_test.std()/(np.amax(oth_test)-np.amin(oth_test))), " normalized STD test data")
print("{0:.2f}".format(oth.std()/(np.amax(oth)-np.amin(oth))), " normalized STD all data")


In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_title('Linear Regression Model prediction and real data of followers of Judaism',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('number of followers',fontsize=20)


line_o_p = axis.plot(year, ort_regressor.predict(year), label = "predicted orthodox", linewidth=16, linestyle="-", c="black",
         solid_capstyle="round", alpha=0.5)
line_ort = axis.plot(year, ort, label = "real orthodox", linewidth=8, linestyle="-.", c="black",
         solid_capstyle="round")


line_c_p = axis.plot(year, con_regressor.predict(year), label = "predicted conservative", linewidth=16, linestyle="-", c="green",
         solid_capstyle="round", alpha = 0.5)
line_con = axis.plot(year, con, label = "real conservative", linewidth=8, linestyle="-.", c="green",
         solid_capstyle="round")


line_r_p = axis.plot(year, ref_regressor.predict(year), label = "predicted reform", linewidth=16, linestyle="-", c="blue",
         solid_capstyle="round",alpha=0.5)
line_ref = axis.plot(year, ref, label = "real reform", linewidth=8, linestyle="-.", c="blue",
         solid_capstyle="round")


line_t_p = axis.plot(year, oth_regressor.predict(year), label = "predicted other", linewidth=16, linestyle="-", c="orange",
         solid_capstyle="round",alpha=0.5)
line_oth = axis.plot(year, oth, label = "real other", linewidth=8, linestyle="-.", c="orange",
         solid_capstyle="round")


line_a_p = axis.plot(year,np.sum(np.array([ort_regressor.predict(year),con_regressor.predict(year),ref_regressor.predict(year),oth_regressor.predict(year)]),axis=0), label ="predicted sum", linewidth=16, linestyle="-", c="purple",
         solid_capstyle="round", alpha=0.5)
line_all = axis.plot(year,together, label ="real sum", linewidth=8, linestyle="-.", c="purple",
         solid_capstyle="round")


plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()


In [None]:
future = np.array([1945,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015,2020,2025,2030,2035,2040,2045,2050,2055,2060,2065]).reshape(-1,1)
ort_future = ort_regressor.predict(future)
con_future = con_regressor.predict(future)
ref_future = ref_regressor.predict(future)
oth_future = oth_regressor.predict(future)
array_all_future = np.array([ort_future,con_future,ref_future,oth_future])
sum_all = np.sum(array_all_future, axis = 0 )
print(array_all_future.shape)
print(sum_all.shape)
print(future.shape)


In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_title('Prediction future numbers of followers of different stream of Judaism',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('number of followers',fontsize=20)

#line_ort = axis.plot(year, ort, label = "real orthodox")
line_o_p = axis.plot(future, ort_future, label = "predicted orthodox",linewidth=16, linestyle="-", c="black",
         solid_capstyle="round",alpha=0.5)

#line_con = axis.plot(year, con, label = "real conservative")
line_c_p = axis.plot(future, con_future, label = "predicted conservative",linewidth=16, linestyle="-", c="green",
         solid_capstyle="round",alpha=0.5)

#line_ref = axis.plot(year, ref, label = "real reform")
line_r_p = axis.plot(future, ref_future, label = "predicted reform",linewidth=16, linestyle="-", c="blue",
         solid_capstyle="round",alpha=0.5)

#line_oth = axis.plot(year, oth, label = "real other")
line_t_p = axis.plot(future, oth_future, label = "predicted other",linewidth=16, linestyle="-", c="orange",
         solid_capstyle="round",alpha=0.5)

line_t_p = axis.plot(future, sum_all, label = "predicted all",linewidth=16, linestyle="-", c="purple",
         solid_capstyle="round",alpha=0.5)


plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

In [None]:
orthodox = ort_future/sum_all
conservative = con_future/sum_all
reform = ref_future/sum_all
others = oth_future/sum_all

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_title('Prediction of proportional distribution of the different streams in Judaism',fontsize=25)

axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('percentage of followers',fontsize=20)

#line_ort = axis.plot(year, ort, label = "real orthodox")
line_o_p = axis.plot(future, orthodox, label = "predicted orthodox", linewidth=16, linestyle="-", c="black",
         solid_capstyle="round",alpha=0.5)

#line_con = axis.plot(year, con, label = "real conservative")
line_c_p = axis.plot(future, conservative, label = "predicted conservative", linewidth=16, linestyle="-", c="green",
         solid_capstyle="round",alpha=0.5)

#line_ref = axis.plot(year, ref, label = "real reform")
line_r_p = axis.plot(future, reform, label = "predicted reform", linewidth=16, linestyle="-", c="blue",
         solid_capstyle="round",alpha=0.5)

#line_oth = axis.plot(year, oth, label = "real other")
line_t_p = axis.plot(future, others, label = "predicted other", linewidth=16, linestyle="-", c="orange",
         solid_capstyle="round",alpha=0.5)

line_t_p = axis.plot(future, sum_all/sum_all, label = "predicted all", linewidth=8, linestyle="-", c="purple",
         solid_capstyle="round",alpha=0.5)



plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)
axis.set_ylim(0,1)
axis.set_xlim(1945,2065)
axis.set_title('Prediction of percentual distribution of the different streams in Judaism',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('percentage of followers',fontsize=20)

#line_ort = axis.plot(year, ort, label = "real orthodox")


axis.stackplot(future.flatten(), orthodox.flatten(), conservative.flatten(), reform.flatten(), others.flatten(),colors=['black','blue','green','orange'],labels=['orthodox','conservatice','reform','other'])

plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

In [None]:
(2065-1945)/8



In [None]:

def makeInt(n):
    res = []
    for i in n:
        res.append(int(100 * i))
    return res

labels = 'orthodox', 'conservative', 'reform', 'other'
size1945 = makeInt([orthodox[0],conservative[0],reform[0],others[0]])
size1960 = makeInt([orthodox[3],conservative[3],reform[3],others[3]])
size1975 = makeInt([orthodox[6],conservative[6],reform[6],others[6]])
size1990 = makeInt([orthodox[9],conservative[9],reform[9],others[9]])
size2005 = makeInt([orthodox[12],conservative[12],reform[12],others[12]])
size2020 = makeInt([orthodox[15],conservative[15],reform[15],others[15]])
size2035 = makeInt([orthodox[18],conservative[18],reform[18],others[18]])
size2050 = makeInt([orthodox[21],conservative[21],reform[21],others[21]])
size2065 = makeInt([orthodox[24],conservative[24],reform[24],others[24]])

explode = (0, 0, 0, 0)  # only "explode" the 2nd slice (i.e. 'Hogs')

fig, axes = plt.subplots(nrows=3,ncols= 3, figsize=(30,20))

axes[0, 0].set_title('1945',fontsize=25)
axes[0, 0].pie(size1945, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[0, 0].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[0, 1].set_title('1960',fontsize=25)
axes[0, 1].pie(size1960, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[0, 1].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[0, 2].set_title('1975',fontsize=25)
axes[0, 2].pie(size1975, explode=explode, labels=labels,textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[0, 2].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[1, 0].set_title('1990',fontsize=25)
axes[1, 0].pie(size1990, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[1, 0].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[1, 1].set_title('2005',fontsize=25)
axes[1, 1].pie(size2005, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[1,1].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[1, 2].set_title('2020',fontsize=25)
axes[1, 2].pie(size2020, explode=explode, labels=labels,textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[1, 2].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
axes[2, 0].set_title('2035',fontsize=25)
axes[2, 0].pie(size2035, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[2, 0].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[2, 1].set_title('2050',fontsize=25)
axes[2, 1].pie(size2050, explode=explode, labels=labels, textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[2,1].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

axes[2, 2].set_title('2065',fontsize=25)
axes[2, 2].pie(size2065, explode=explode, labels=labels,textprops={'fontsize':18},
        shadow=True, startangle=90,colors=['black','blue','green','orange'])
axes[2, 2].axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.suptitle('Estimated proportional distribution of followers of different streams of Judaism at selected timepoints ',fontsize=30)
plt.show()

### Orthodox Judaism to become dominant stream

It appears that "orthodox" Judaism will be major stream in Judaism in the future. According to the prediction around 2030 more than half of the followers of Judaism are expected to follow "orthodox" Judaism.

### "Other" Judaism to disappear

"Other" Judaism seems to basically disappear while "reform" and "conservative" streams keep the same percentage of followers out of all followers of Judaism.



## Are there local differences in the numbers of followers according to country?

After observing general trends in the global data the question arises how those trends are manifested in the data set listing the followers per country. 
Can we observe regional centers for the different streams? 
Do we see changes inside single countries over time?

In [None]:
national_jews = national[['year','state','judaism_orthodox','judaism_conservative','judaism_reform','judaism_other','judaism_all']]
national_jews['year'].unique().shape

In [None]:
np.sum(national_jews['judaism_all']>10000)

In [None]:
national_jews['state'][national_jews['judaism_all']>10000].unique()

### Nations with more than 10000 followers of Judaism at any given point in time

#### North America

* 'United States of America'
* 'Canada'
* 'Mexico'

#### South America

* 'Colombia'
* 'Venezuela'
* 'Brazil'
* 'Chile'
* 'Argentina'
* 'Uruguay'

#### Europe

* 'United Kingdom'
* 'Netherlands'
* 'Belgium'
* 'France',
* 'Switzerland'
* 'Spain'
* 'Germany'
* 'German Federal Republic',
* 'Poland'
* 'Austria'
* 'Hungary'
* 'Czechoslovakia'
* 'Italy'
* 'Yugoslavia'
* 'Sweden'
* 'Bulgaria'
* 'Romania'

#### Central Eurasia

* 'Russia'
* 'Latvia',
* 'Ukraine'
* 'Belarus'
* 'Georgia'
* 'Azerbaijan'
* 'Moldova'
* 'Uzbekistan'
* 'Kazakhstan'

#### Afrika

* 'Ethiopia'
* 'Zimbabwe'
* 'South Africa'
* 'Morocco'
* 'Algeria'
* 'Tunisia'

#### Middle East

* 'Iran'
* 'Turkey'
* 'Iraq'
* 'Egypt'
* 'Syria'
* 'Israel'
* 'Yemen Arab Republic'

#### Other 

* 'India'
* 'Australia'

In [None]:
nations = national_jews[national_jews['state'].isin(national_jews['state'][national_jews['judaism_all']>10000].unique())]

In [None]:
len(nations['state'].unique())

In [None]:
nations.shape

In [None]:
nations[nations['state'].str.contains('France')]

### Diaspora has only "other" Judaism

In the data set separating followers over countries it appears that most countries have all followers of Judaism listed as "other".
Above 'France' was picked as an example case to showcase this finding. It could therefore be that all the findings above, particularly the decrease in "other" Judaism could be rather caused by this classification artifact than real changes in followers.

To probe this suspicion all countries having more than 1 follower of "orthodox", "reform" or "conservative" were printed. The only three countries found were USA, Canada and Israel with any non "other" followers of Judaism.

In [None]:
nations[(nations['judaism_conservative']>1) | (nations['judaism_orthodox']>1) | (nations['judaism_reform']>1)]

The state of Israel counts approximately 8.8 million citizen. Approximately 75% of them are Jewish. 6.6 million Jewish Israelis were therefore assumed including secular Jews.
https://en.wikipedia.org/wiki/Demographics_of_Israel

In [None]:
8.8*0.75

If out of 6.6 million Israeli Jews 5.5 million are "orthodox", it would mean 83% of Israeli Jews were classified as "orthodox".  

In [None]:
5517567/6600000

## Accuracy of national data

A preliminary analysis of the data set from national.csv has raised several concern. 
Two main short comings were observed: 

### Diaspora jews are classified as "other"

All Countries except for Israel, USA and Canada have all Jews classified as "other". Therefore the only countries with any distinction between the different groups in Judaism are those three countries.  This, of course, is not the reality. Countries like Belgium, England and Switzerland have relatively speaking big orthodox communities.  In Israel on the other hand no Jews are classified as "other".

### Too high numbers for "orthodox" in Eretz Israel
According to a report in Jnet citing findings from the central bureau of statistics of the State of Israel (https://www.ynetnews.com/articles/0,7340,L-3890330,00.html) of all Jewish Israelis above age 20

* 42% refer to themselves as secular
* 25% as traditional
* 13% as traditional religious
*  12% as religious
* 8% as haredi (ultra orthodox)

The number of 5 517 567 orthodox Jews out of a population of approximately 6.6 million Jewish Israelis today seems to high. This would indicate that more than 80% of Jewish Israelis were orthodox in 2010 what is inconsistent with literature (and reality).

### Changes in religious affiliation could represent immigration to Israel

The global analysis showed an yearly increase in followers of "orthodox" Judaism and a decrease of followers of "other" Judaism. This could rather represent the immigration of Jews to Eretz Israel. Since all diaspora Jews except for Canadian and American Jews are counted as "other" and 80% of Israeli Jews are counted as "orthodox" the decrease of "other" could stem from emigration from all countries but Israel, USA and Canada and the increase in "orthodox" could stem from immigration to Eretz Israel. The above analysis may therefore show a different result as expected.

### Investigating
To investigate this suspicion further to total number of all followers of Judaism in the diaspora were correlated with global "other" followers of Judaism. Diaspora Jews in this analysis meant all Jews not from Eretz Israel, USA and Canada.

In [None]:
national_jews.head()

In [None]:
diaspora = national_jews[(national_jews['state'] != 'Israel') & (national_jews['state'] != 'United States of America') & (national_jews['state'] != 'Canada')]

In [None]:
del diaspora['state']
del diaspora['judaism_orthodox']
del diaspora['judaism_conservative']
del diaspora['judaism_reform']
del diaspora['judaism_other']

In [None]:
diaspora.head()

In [None]:
for y in year:
    y=int(y)

diaspora_dict ={}
a=0
for y in diaspora['year']:
    if y not in diaspora_dict:
        diaspora_dict[y] = diaspora['judaism_all'].iloc[a]
    else:
        diaspora_dict[y] += diaspora['judaism_all'].iloc[a]

    a += 1
print(diaspora_dict)

In [None]:
print(diaspora_dict)
whole_diaspora = pd.Series(diaspora_dict).values.reshape(-1,1)
print(whole_diaspora)

In [None]:
year = world['year'].values.reshape(-1,1)
ort = world['judaism_orthodox'].values.reshape(-1,1)
oth = world['judaism_other'].values.reshape(-1,1)

In [None]:
israeli_all = national_jews['judaism_all'][national_jews['state']=='Israel'].values.reshape(-1,1)
israeli_all.shape

In [None]:
fig, axis = plt.subplots(figsize=(20,10))
# Grid lines, Xticks, Xlabel, Ylabel

axis.yaxis.grid(True)
axis.xaxis.grid(True)

axis.set_title('Number of diaspora and Israeli Jews correspond with "other" and "orthodox" respectively',fontsize=25)
axis.set_xlabel('year',fontsize=20)
axis.set_ylabel('number of followers',fontsize=20)

#line_ort = axis.plot(year, ort, label = "real orthodox")
line_who = axis.plot(year, whole_diaspora, label = "diaspora jews", linewidth=10, linestyle="-", c="#994d00",
         solid_capstyle="round")
line_oth = axis.plot(year, oth, label = "global other", linewidth=10, linestyle="-", c="orange",
         solid_capstyle="round")
line_ort = axis.plot(year,ort, label = "global orthodox", linewidth=10, linestyle="-", c="black",
         solid_capstyle="round")
line_isr = axis.plot(year[1:],israeli_all, label = "israeli jews", linewidth=10, linestyle="-", c="#0052cc",
         solid_capstyle="round")


plt.legend(bbox_to_anchor=(1.05, 1), loc=2,
           ncol=1,prop={'size': 18}, borderaxespad=0.)
plt.show()

The above plot shows 

*  decrease in global "other" Judaism corresponds with decrease of diaspora Judaism

* increase in global "orthodox" Judaism corresponds with increase in Israeli Judaism

The distance between Israeli Jews and "global orthodox" are the "orthodox" in Canada and USA and non-"orthodox" Israeli Jews.
The distance between diaspora Jews and "other" are the "global other" in Canada and USA.

In [None]:
to_correlate=pd.DataFrame(np.hstack((whole_diaspora[1:],oth[1:],israeli_all,ort[1:])),columns=['Diaspora','"other"','Israeli','"orthodox"'])
f, ax = plt.subplots(figsize=(10, 8))
ax.set_title('Correlation ',fontsize=25)


corr = to_correlate.corr()
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
            square=True, ax=ax)

for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(16) 
                # specify integer or one of preset strings, e.g.
                #tick.label.set_fontsize('x-small') 
              #  tick.label.set_rotation('vertical')
for tick in ax.yaxis.get_major_ticks():
    tick.label.set_fontsize(16) 
    tick.label.set_rotation('horizontal')

## Correlation
To statistically verify the correspondence between diaspora Jews and "other" classification and Israeli Jews and the "orthodox" classification. The correlations showed

* Israeli Jews correlate with "orthodox" classification
* diaspora Jews correlate with "other" classification
* Israeli Jews correlate negative with diaspora Jews
* "orthodox" classification correlates negative with "other" classification


# Conclusion

A change in Judaism from "other" to "orthodox" is observable in the data which can be either attributed to

* Jews becoming frum
    or
* Jews moving to Eretz Israel

With the data set as it is right now the change from "other" to "orthodox" seems to stem rather from migration and differences in classification in the different countries. The data set might therefore not be fully suited to answer the question whether "Jews turn frum".

What was instead found is that Jews are leaving the diaspora and are emigration to Eretz Israel.



## Possible future analysis

It might be interesting to analyse migration patterns of the followers of Judaism. Following up on the finding that on a global scale the followers of Judaism seem to migrate to Israel, it would be interesting to see if and how this trend is present in the single countries. This could be further put in context with events from world history like the collapse of soviet Russia