In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from scipy import stats

### Analyzing Suicides in India by Gender

**Are men as likely to commit suicide as women?**

The data is shared by the National Crime Records Bureau (NCRB), Govt of India. 
To perform this analysis, we need to know the sex ratio in India, which is 940 females to 100 males

Let p denote the fraction of women in India.

In [2]:
# ration of women
p = 940/(940+1000)
p

0.4845360824742268

If there is no correlation between gender and suicide, then the sex ratio of people committing suicides should closely reflect that of the general population. 


In [3]:
df = pd.read_csv('Suicides in India 2001-2012.csv')

In [8]:
df.head(n=3)

Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
0,A & N Islands,2001,Causes,Illness (Aids/STD),Female,0-14,0
1,A & N Islands,2001,Causes,Bankruptcy or Sudden change in Economic,Female,0-14,0
2,A & N Islands,2001,Causes,Cancellation/Non-Settlement of Marriage,Female,0-14,0


In [9]:
df.shape

(237519, 7)

In [11]:
df.tail()

Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189
237518,West Bengal,2012,Social_Status,Never Married,Male,0-100+,2658


In [13]:
# count the number of male sucides and female sucides
df['Gender'].value_counts()

Male      118879
Female    118640
Name: Gender, dtype: int64

The number of female suicides is slightly lesser than the number of male suicides. But the number of femals are also less than the number of males

* **Null Hypothesis (H0)**: Men and women are equally likely to commit suicide.
* **Alternate Hypothesis (H1)**: Men and women are not equally likely to commit suicide.<br>

If the null hypothesis is true, it would mean that the fraction of women committing suicide would be the same as the fraction of women in the general population

**p-value**

![Source: Stack Overflow](https://i.stack.imgur.com/idDTA.png)

#### Compute the p-value

In [16]:
h1_prop = df['Gender'].value_counts()['Female']/len(df) # ratio of female sucides
h1_prop

0.49949688235467476

In [17]:
# let h0_prop be p
h0_prop = p

In [19]:
sigma_prop = np.sqrt((h0_prop * (1 - h0_prop))/len(df))
sigma_prop

0.0010254465276083747

In [20]:
z = (h1_prop - h0_prop)/sigma_prop
z

14.589546580591277

In [22]:
def pvalue(z):
    return 2 * (1 - stats.norm.cdf(z))

In [24]:
p_val = pvalue(z)
p_val

0.0

The p value obtained is extremely strong evidence to suggest that it is much lower than our significance level $\alpha$. We can thus safely disregard the null hypothesis and accept the alternate hypothesis (since it is the negation of the null hypothesis).

**Men and women in india are not equally likely to commit suicide.**

Note that this test says nothing about if men are more likely than women to commit suicide or vice versa. It just states that they are not equally likely. 