# Gender Inequality Index (GII)

The GII is an inequality index. It measures gender inequalities in three important aspects of human development—reproductive health, measured by maternal mortality ratio and adolescent birth rates; empowerment, measured by proportion of parliamentary seats occupied by females and proportion of adult females and males aged 25 years and older with at least some secondary education; and economic status, expressed as labour market participation and measured by labour force participation rate of female and male populations aged 15 years and older. The GII is built on the same framework as the IHDI—to better expose differences in the distribution of achievements between women and men. It measures the human development costs of gender inequality. Thus the higher the GII value the more disparities between females and males and the more loss to human development.

The GII sheds new light on the position of women in 160 countries; it yields insights in gender gaps in major areas of human development. The component indicators highlight areas in need of critical policy intervention and it stimulates proactive thinking and public policy to overcome systematic disadvantages of women.


![GII](http://hdr.undp.org/sites/default/files/gii.png "GII")
Ref: http://hdr.undp.org/en/content/gender-inequality-index-gii

# GII - India

Violence against women in India is actually more present than it may appear at first glance, as many expressions of violence are not considered crimes, or may otherwise go unreported or undocumented due to certain Indian cultural values and beliefs. These reasons all contribute to India's Gender Inequality Index rating of 0.524 in 2017, putting it in the bottom 20% of ranked countries for that year.

Ref:https://en.wikipedia.org/wiki/Violence_against_women_in_India

https://data.gov.in/ had statistics from National Crime Records Bureau (https://en.wikipedia.org/wiki/National_Crime_Records_Bureau) about the crimes committed againts women for a bunch of years. 

In this kernel, I'm going to see if we can find trends or patterns. We'll also find out the states with most crime rates reported against women, and see if the details in the Wiki page actually show up in the data.  

### Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

### Reading Data

In [None]:
crime_df = pd.read_csv('/kaggle/input/crime-against-women-20012014-india/crimes_against_women_2001-2014.csv')
crime_df.info()

### Data Cleaning

Dropping columns that are unnecessary and have Total counts to maintain consistency.

In [None]:
crime_df = crime_df.drop(['Unnamed: 0', 'DISTRICT'], axis=1)
crime_df.head()

In [None]:
def get_case_consistency(row):
    row = row['STATE/UT'].strip()
    row = row.upper()
    row = row.title()
    return row

crime_df['STATE/UT'] = crime_df.apply(get_case_consistency, axis=1)
crime_df['STATE/UT'].unique()

### Bucketing States into Zones 

Let's get the zonal details. Using https://www.mapsofindia.com/zonal/ the states are divided into Zones.

In [None]:
north_india = ['Jammu & Kashmir', 'Punjab', 'Himachal Pradesh', 'Haryana', 'Uttarakhand', 'Uttar Pradesh', 'Chandigarh']
east_india = ['Bihar', 'Odisha', 'Jharkhand', 'West Bengal']
south_india = ['Andhra Pradesh', 'Karnataka', 'Kerala' ,'Tamil Nadu', 'Telangana']
west_india = ['Rajasthan' , 'Gujarat', 'Goa','Maharashtra','Goa']
central_india = ['Madhya Pradesh', 'Chhattisgarh']
north_east_india = ['Assam', 'Sikkim', 'Nagaland', 'Meghalaya', 'Manipur', 'Mizoram', 'Tripura', 'Arunachal Pradesh']
ut_india = ['A & N ISLANDS', 'Delhi', 'LAKSHADWEEP', 'PUDUCHERRY', 'A&N Islands', 'Daman & Diu', 'Delhi Ut', 'Lakshadweep',
       'Puducherry', 'D & N Haveli', 'DAMAN & DIU', 'D&N Haveli', 'A & N Islands']

def get_zonal_names(row):
    if row['STATE/UT'].title().strip() in north_india:
        val = 'North Zone'
    elif row['STATE/UT'].title().strip()  in south_india:
        val = 'South Zone'
    elif row['STATE/UT'].title().strip()  in east_india:
        val = 'East Zone'
    elif row['STATE/UT'].title().strip()  in west_india:
        val = 'West Zone'
    elif row['STATE/UT'].title().strip()  in central_india:
        val = 'Central Zone'
    elif row['STATE/UT'].title().strip()  in north_east_india:
        val = 'NE Zone'
    elif row['STATE/UT'].title().strip()  in ut_india:
        val = 'Union Terr'
    else:
        val = 'No Value'
    return val

crime_df['Zones'] = crime_df.apply(get_zonal_names, axis=1)
crime_df['Zones'].unique()

In [None]:
crime_df[(crime_df['Zones'] == 'No Value')]['STATE/UT'].unique()

In [None]:
crimes = ['Rape','Kidnapping and Abduction','Dowry Deaths','Assault on women with intent to outrage her modesty','Insult to modesty of Women',
'Cruelty by Husband or his Relatives','Importation of Girls']

### Zones and Crimes:

### (i) Rape:

In [None]:
rape_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Rape'].sum().reset_index().sort_values(crimes[0], ascending=False)

#### Registered Rape Cases - 2001 - 2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in rape_df.Zones.unique():
    plt.subplot(len(rape_df.Zones.unique()),1,count)

    sns.lineplot(rape_df[(rape_df['Zones'] == zone)]['Year'],rape_df[(rape_df['Zones'] == zone)]['Rape'],ci=None)
    plt.subplots_adjust(hspace=0.8)
    plt.xlabel('Years')
    plt.ylabel('# Rape Cases')
    plt.title(zone)
    count+=1

The East Zone shows a trend of increasing Rape cases starting 2004. The NE Zone also has a linear increase.
However West, Central, South and UT have a dormant rate of rape cases till 2006-2007. And then something happens in  2012. North, Central, West and UTs show a sharp increase post 2012.


#### Zonal Registered Rape Cases
Let's look at the zonal numbers collectively. 

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(rape_df.Zones, rape_df.Rape, errwidth=0)
plt.ylabel('# Rape Cases')
plt.title('Zone-Wise Rape Cases Registered', fontdict = {'fontsize' : 15})

Central Zone has the highest number of cases, followed by East. Let's look at the states which have the most number of cases for rape.

#### Statewise Registered Rape Cases

In [None]:
rape_st_df = rape_df[(rape_df['Zones'] == 'Central Zone') | (rape_df['Zones'] == 'East Zone')]
rape_st_df = rape_st_df.groupby(by=['STATE/UT'])['Rape'].sum().reset_index().sort_values('Rape', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(rape_st_df['STATE/UT'], rape_st_df.Rape, errwidth=0)
plt.ylabel('# Rape Cases')
plt.title('States with Rape Cases Registered', fontdict = {'fontsize' : 15})
rape_st_df.head(5)

Madhya Pradesh has the maximum no. of cases which is Central Zone, followed by WB in East.

### (ii) Kidnapping and Abduction

In [None]:
kidnap_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Kidnapping and Abduction'].sum().reset_index().sort_values(crimes[1], ascending=False)

#### Registered Kidnapping and Abduction Cases - 2001-2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in kidnap_df.Zones.unique():
    plt.subplot(len(kidnap_df.Zones.unique()),1,count)

    sns.lineplot(kidnap_df[(kidnap_df['Zones'] == zone)]['Year'],kidnap_df[(kidnap_df['Zones'] == zone)]['Kidnapping and Abduction'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

Here we see 2007 where the cases across all zones except Central start to increase in a slow linear pattern. 


#### Zonal Registered Kidnapping/Abduction Cases:

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(kidnap_df.Zones, kidnap_df['Kidnapping and Abduction'], errwidth=0)
plt.ylabel('# Kidnapping/Abduction Cases')
plt.title('Zone-Wise Kidnapping/Abduction Cases Registered', fontdict = {'fontsize' : 15})

East followed closesly by West Zone has the highest number of cases. Let's look at the States, now.

In [None]:
kidnap_st_df = kidnap_df[(kidnap_df['Zones'] == 'East Zone') | (kidnap_df['Zones'] == 'West Zone')]
kidnap_st_df = kidnap_st_df.groupby(by=['STATE/UT'])['Kidnapping and Abduction'].sum().reset_index().sort_values('Kidnapping and Abduction', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(kidnap_st_df['STATE/UT'], kidnap_st_df['Kidnapping and Abduction'], errwidth=0)
plt.ylabel('# Kidnapping and Abduction Cases')
plt.title('States with Kidnapping and Abduction Cases Registered', fontdict = {'fontsize' : 15})
kidnap_st_df.head(5)

Rajasthan has the maximum no. of reported kidnapping and abduction cases. 

### (iii) Dowry Deaths

In [None]:
dowry_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Dowry Deaths'].sum().reset_index().sort_values('Dowry Deaths', ascending=False)

#### Registered Cases for Deaths for/related to Dowry Demands - 2001 - 2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in dowry_df.Zones.unique():
    plt.subplot(len(dowry_df.Zones.unique()),1,count)

    sns.lineplot(dowry_df[(dowry_df['Zones'] == zone)]['Year'],dowry_df[(dowry_df['Zones'] == zone)]['Dowry Deaths'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

The Southern Zone has an interesting graph where there's no dropping in cases related to dowry deaths, untill 2013 where it 
starts seeing a downfall.

#### Zonal Registered Dowry Death Cases

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(dowry_df.Zones, dowry_df['Dowry Deaths'], errwidth=0)
plt.ylabel('# Dowry Deaths Cases')
plt.title('Zone-Wise Dowry Deaths Cases Registered', fontdict = {'fontsize' : 15})

Eastern and Central Zones of India show a huge number of cases where women were killed for Dowry. 
Let's look at the States now.

#### Statewise Dowry Deaths

In [None]:
dowry_st_df = dowry_df[(dowry_df['Zones'] == 'East Zone') | (dowry_df['Zones'] == 'Central Zone')]
dowry_st_df = dowry_st_df.groupby(by=['STATE/UT'])['Dowry Deaths'].sum().reset_index().sort_values('Dowry Deaths', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(dowry_st_df['STATE/UT'], dowry_st_df['Dowry Deaths'], errwidth=0)
plt.ylabel('# Dowry Deaths Cases')
plt.title('States with Dowry Deaths Cases Registered', fontdict = {'fontsize' : 15})
dowry_st_df.head(5)

Bihar in Eastern Zone, followed by Madhya Pradesh show that the counts for deaths registered for Dowry cases were the highest.

### (iv) Assault on women with intent to outrage her modesty

In [None]:
assault_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Assault on women with intent to outrage her modesty'].sum().reset_index().sort_values('Assault on women with intent to outrage her modesty', ascending=False)

#### Registered Cases for Assault on Women 2001-2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in assault_df.Zones.unique():
    plt.subplot(len(assault_df.Zones.unique()),1,count)

    sns.lineplot(assault_df[(assault_df['Zones'] == zone)]['Year'],assault_df[(assault_df['Zones'] == zone)]['Assault on women with intent to outrage her modesty'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

So, finally we see a graph where across the years there hasnt seem to be an increase on this type of crime. For once, lines that don't linearly or exponentiall increase feels good to look at. But hey look, there's something triggering in 2012. The no. of cases spike up. 

Let's hold on to that thought - we will explore the happenings in 2012 later. Moving on,

#### Zonal Cases Registered for Assault on Women

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(assault_df.Zones, assault_df['Assault on women with intent to outrage her modesty'], errwidth=0)
plt.ylabel('# Dowry Deaths Cases')
plt.title('Zone-Wise Assault on Women Cases Registered', fontdict = {'fontsize' : 15})

Central Zone and South this time. Looking at State numbers now.

#### State-wise Cases for Assault on Women

In [None]:
assault_st_df = assault_df[(assault_df['Zones'] == 'Central Zone') | (assault_df['Zones'] == 'South Zone')]
assault_st_df = assault_st_df.groupby(by=['STATE/UT'])['Assault on women with intent to outrage her modesty'].sum().reset_index().sort_values('Assault on women with intent to outrage her modesty', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(assault_st_df['STATE/UT'], assault_st_df['Assault on women with intent to outrage her modesty'], errwidth=0)
plt.ylabel('# Assault on women with intent to outrage her modesty Cases')
plt.title('States with Assault on Women Cases Registered', fontdict = {'fontsize' : 15})
assault_st_df.head(5)

Madhya Pradesh and Andhra Pradesh account for the most of cases for assault against women.

### (v) Insult to modesty of Women

In [None]:
insult_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Insult to modesty of Women'].sum().reset_index().sort_values('Insult to modesty of Women', ascending=False)

#### Registered Cases for Crimes that are commited to Insult the modesty of Women 2001-2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in insult_df.Zones.unique():
    plt.subplot(len(insult_df.Zones.unique()),1,count)

    sns.lineplot(insult_df[(insult_df['Zones'] == zone)]['Year'],insult_df[(insult_df['Zones'] == zone)]['Insult to modesty of Women'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

Now, this is interesting. Look at Central Zone. The line is almost consistent and post 2013, starts stooping low. Similarly for Northern,where the cases drop after 2010.
South Zone however jumps after 2006 and gets back to the low levels only in 2014.

Maybe we can get more details on Zonal counts.

#### Zonal Cases Registered for Crimes that Insult Modesty of Women

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(insult_df.Zones, insult_df['Insult to modesty of Women'], errwidth=0)
plt.ylabel('# Insult to modesty of Women Cases')
plt.title('Zone-Wise Insult to modesty of Women Cases Registered', fontdict = {'fontsize' : 15})

As seen from timeline charts, it was expected for South to emerge as leading the charts here. Moving on to states in South zone.

Im going to only consider one zone as this zone clearly dominates.

#### Statewise Crimes Registered for Crimes leading to Insult to modesty of Women

In [None]:
insult_st_df = insult_df[(insult_df['Zones'] == 'South Zone')]
insult_st_df = insult_st_df.groupby(by=['STATE/UT'])['Insult to modesty of Women'].sum().reset_index().sort_values('Insult to modesty of Women', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(insult_st_df['STATE/UT'], insult_st_df['Insult to modesty of Women'], errwidth=0)
plt.ylabel('# Insult to modesty of Women Cases')
plt.title('States with Insult to modesty of Women Cases Registered', fontdict = {'fontsize' : 15})
insult_st_df.head(5)

Woah! With Andhra Pradesh solely leading, the second highest count is lesser than half of the cases registered. 

### (vi) Cruelty by Husband or his Relatives

In [None]:
cruel_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Cruelty by Husband or his Relatives'].sum().reset_index().sort_values('Cruelty by Husband or his Relatives', ascending=False)

#### Registered Cases for Cruelty by Husband or his Relatives 2001-2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in cruel_df.Zones.unique():
    plt.subplot(len(cruel_df.Zones.unique()),1,count)

    sns.lineplot(cruel_df[(cruel_df['Zones'] == zone)]['Year'],cruel_df[(cruel_df['Zones'] == zone)]['Cruelty by Husband or his Relatives'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

Again, 2006 seems to be a noticable year where the numbers increase.

#### Zonal Cases Registered for Cruelty by Husband or his Relatives

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(cruel_df.Zones, cruel_df['Cruelty by Husband or his Relatives'], errwidth=0)
plt.ylabel('# Cruelty by Husband or his Relatives Cases')
plt.title('Zone-Wise Cruelty by Husband or his Relatives Cases Registered', fontdict = {'fontsize' : 15})

West leading followed by South and East close in numbers. Let's get the states now.

#### State-wise Cruelty by Husband or his Relatives Cases Registered

In [None]:
cruel_st_df = cruel_df[(cruel_df['Zones'] == 'West Zone') | (cruel_df['Zones'] == 'South Zone')]
cruel_st_df = cruel_st_df.groupby(by=['STATE/UT'])['Cruelty by Husband or his Relatives'].sum().reset_index().sort_values('Cruelty by Husband or his Relatives', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(cruel_st_df['STATE/UT'], cruel_st_df['Cruelty by Husband or his Relatives'], errwidth=0)
plt.ylabel('# Cruelty by Husband or his Relatives Cases')
plt.title('States with Cruelty by Husband or his Relatives Registered', fontdict = {'fontsize' : 15})
cruel_st_df.head(5)

Andhra Pradesh and Rajasthan. 

### (vii) Importation of Girls

In [None]:
import_df = crime_df.groupby(by=['Year', 'STATE/UT', 'Zones'])['Importation of Girls'].sum().reset_index().sort_values('Importation of Girls', ascending=False)

#### Registered Cases of Importation of Girls - 2001 - 2014

In [None]:
plt.figure(figsize=(20,15))
count = 1

for zone in import_df.Zones.unique():
    plt.subplot(len(import_df.Zones.unique()),1,count)

    sns.lineplot(import_df[(import_df['Zones'] == zone)]['Year'],import_df[(import_df['Zones'] == zone)]['Importation of Girls'],ci=None)
    plt.subplots_adjust(hspace=0.9)
    plt.xlabel('Years')
    plt.ylabel('# Cases')
    plt.title(zone)
    count+=1

Another graph where we see some flat low lines, except a bumpy Eastern and North Eastern Zone.

#### Zonal Cases Registered for Importation of Girls

In [None]:
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(import_df.Zones, import_df['Importation of Girls'], errwidth=0)
plt.ylabel('# Importation of Girls Cases')
plt.title('Zone-Wise Importation of Girls Cases Registered', fontdict = {'fontsize' : 15})

#### State-wise Importation of Girls Cases Registered

In [None]:
import_st_df = import_df[(import_df['Zones'] == 'East Zone')]
import_st_df = import_st_df.groupby(by=['STATE/UT'])['Importation of Girls'].sum().reset_index().sort_values('Importation of Girls', ascending=False)
fig, ax = plt.subplots(figsize=(15,10))
sns.barplot(import_st_df['STATE/UT'], import_st_df['Importation of Girls'], errwidth=0)
plt.ylabel('# Importation of Girls Cases')
plt.title('States with Importation of Girls Cases Registered', fontdict = {'fontsize' : 15})
import_st_df.head(5)

The counts of Jharkhand, WB and Odisha combined do not meet the count of cases registered in Bihar.

# Insights and Conclusion:

Let's move on to the most pressing question.

## <font color='blue'> What happened in 2012? </font>

## <font color='blue'> What event triggered sudden spikes in the numbers that the charts showed? </font>

## 2012 Nirbhaya Gang Rape Case

Ref: https://en.wikipedia.org/wiki/2012_Delhi_gang_rape

http://www.gnovisjournal.org/2017/05/02/the-nirbhaya-movement-an-indian-feminist-revolution/

Post this horrific case that happened in the last month of 2012, a wave of awareness of crimes committed against women swept the nation. It was termed as India's Arab Spring.  The ‘Nirbhaya’ case was marked by unprecedented public outrage on social media as well as on the ground .People in huge numbers came out to show their support, to demand for justice and asked for an earnest call to the Government to put measures in place to stop these crimes.

And though it may seem that the cases after 2012 were more, if you would have noticed, throughout the notebook I have used the word 'Registered'. The numbers increased only because people, women especially started reporting the crimes against them. 
They were no longer silent.

It is also why I quoted the text in the begining of this kernel from Wiki -

''' Violence against women in India is actually more present than it may appear at first glance, as many expressions of violence are not considered crimes, or may otherwise go unreported or undocumented due to certain Indian cultural values and beliefs. '''

The section in Wiki also states - '''Although rapes are becoming more frequently reported, many go unreported or have the complaint files withdrawn due to the perception of family honour being compromised. Women frequently do not receive justice for their rapes, because police often do not give a fair hearing, and/or medical evidence is often unrecorded which makes it easy for offenders to get away with their crimes under the current laws.

Increased attention in the media and awareness among both Indians and the outside world is both bringing attention to the issue of rape in India and helping empower women to report the crime. After international news reported the gang rape of a 23-year-old student on a moving bus that occurred in Delhi, in December 2012, Delhi experienced a significant increase in reported rapes. The number of reported rapes nearly doubled from 143 reported in January–March 2012 to 359 during the three months after the rape. After the Delhi rape case, Indian media has committed to report each and every rape case'''

Ref:
http://world.time.com/2013/11/08/why-rape-seems-worse-in-india-than-everywhere-else-but-actually-isnt/


On 1 February 2013 approved for bringing an ordinance, for giving effect to the changes in law as suggested by the Verma Committee Report 

Ref: 
https://en.wikipedia.org/wiki/Criminal_Law_(Amendment)_Act,_2013

https://pib.gov.in/newsite/erelease.aspx?relid=91979

https://www.indiatoday.in/india/north/story/president-signs-ordinance-to-effect-changes-in-laws-against-sexual-crimes-153156-2013-02-03

If India wants to improve its GII, we need to work on this issue. More cases being reported leads to more numbers showing up. It can also probably call for more stringent measures in place thereby putting a check on the crimes committed against women.

It definitely is alarming to see these numbers sore. But if people don't report, they wont be solved. 

Its a cycle! 

Also read:
http://hdr.undp.org/en/content/what-are-strengths-and-limitations-gii

http://hdr.undp.org/sites/default/files/hdr2018_technical_notes.pdf

https://www.in.undp.org/content/dam/india/docs/india_factsheet_gender_n_social_exclusion_indicators.pdf

https://halshs.archives-ouvertes.fr/halshs-00462463/document