In [None]:
import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt
import seaborn as sns

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## **What is the dataset about?**

This dataset shows us the ratio of those who agree with a husband is justified in hitting or beating his wife for some reasons from different countries and social groups.

In [None]:
df = pd.read_csv('../input/violence-against-women-and-girls/makeovermonday-2020w10/violence_data.csv')
df.head()

**Let's get to know the data.**

In [None]:
df.info()

We have 8 different columns and some missing data in 'Value' columns.

In [None]:
df.describe(include='all')

We can reach all informations about columns by using include='all' parameter. Otherwise it would be just non-object columns. 

What we got are: 
* There are 12600 data.
* The survey is made in 70 countries.
* There are 5 different topics and their responses.
* There are 6 questions posed to the participants.

Let's take a look at these countries, topics and questions.

In [None]:
df.Country.unique()

In [None]:
df['Demographics Question'].unique().tolist()

In [None]:
df.Question.unique().tolist()

Let's choose a country and visualize the data.
I'm gonna choose Ethiopia. 

In [None]:
Ethiopia = df[df.Country == 'Ethiopia']
Ethiopia.head(3)


In [None]:
graph = Ethiopia[Ethiopia['Demographics Question'] == 'Education']

g = sns.catplot(x='Demographics Response',y='Value',col='Gender',hue='Question',
                order=['No education','Primary','Secondary','Higher'],
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Education Level','Percentage (%)')
g.fig.suptitle('Ethiopians agreeing a husband is justified in hitting his wife',y=1.05)

It is confusing to see the ratio of males are lower than that of females.

We can observe that with simplier way like:

In [None]:
graph = Ethiopia[Ethiopia['Demographics Question'] == 'Education']

g = sns.catplot(x='Demographics Response',y='Value',hue='Gender',
                order=['No education','Primary','Secondary','Higher'],
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Education Level','Percentage (%)')
g.fig.suptitle('Ethiopians agreeing a husband is justified in hitting his wife',y=1.05)

These bars represent the mean of percentages of 6 questions.

Now, let's look at the ratios of women for another social groups Residence and Age.


In [None]:
graph = Ethiopia[(Ethiopia['Demographics Question'] == 'Residence') & (Ethiopia['Gender'] == 'F')]

g = sns.catplot(x='Demographics Response',y='Value',
                order=['Rural','Urban'],
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Residence Type','Percentage (%)')
g.fig.suptitle('Ethiopian women agreeing a husband is justified in hitting his wife',y=1.05)

In [None]:
graph = Ethiopia[(Ethiopia['Demographics Question'] == 'Age') & (Ethiopia['Gender'] == 'F')]

g = sns.catplot(x='Demographics Response',y='Value',
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Age','Percentage (%)')
g.fig.suptitle('Ethiopian women agreeing a husband is justified in hitting his wife',y=1.05)

Let's compare two different country. I am gonna analyze the survey results of Pakistan and India. For this, we should create a new dataframe.

In [None]:
cts = df[(df.Country == 'Pakistan') | (df.Country == 'India')] #cts means countries
cts

In [None]:
graph = cts[cts['Demographics Question'] == 'Education']

g = sns.catplot(x='Demographics Response',y='Value',col='Country',row='Gender',
                order=['No education','Primary','Secondary','Higher'],
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Education Level','Percentage (%)')
g.fig.suptitle('Those agreeing a husband is justified in hitting his wife',y=1.05)

It is so easy to say access to education is so important for not to justify violence.

Another observation from this graph is education is more effective in Pakistan than India. Because the ratio of uneducated pakistani is higher than that of indian but for a high educated person the rati of pakistani is lower than indian.

Continue to compare with other demographic groups.

In [None]:
graph = cts[cts['Demographics Question'] == 'Age']

g = sns.catplot(x='Demographics Response',y='Value',col='Country',row='Gender',
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Age','Percentage (%)')
g.fig.suptitle('Those agreeing a husband is justified in hitting his wife',y=1.05)

For indian women the ratio is increasing with age but it is decreasing for pakistani women.

In [None]:
graph = cts[cts['Demographics Question'] == 'Residence']

g = sns.catplot(x='Demographics Response',y='Value',col='Country',row='Gender',
                order=['Rural','Urban'],
                data=graph,kind='bar',ci=None)
g.set_axis_labels('Residence Type','Percentage (%)')
g.fig.suptitle('Those agreeing a husband is justified in hitting his wife',y=1.05)