# 16 days of activism against GBV

Data sources: https://genderdata.worldbank.org/en/topics/violence

## I. Load libraries and data

In [5]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [7]:
# Create a file path
data_path = '/Users/nataschajademinnitt/Documents/5. Data Analysis/distributionofthings.analysis/7_gbv/data/'
figure_path = '/Users/nataschajademinnitt/Documents/5. Data Analysis/distributionofthings.analysis/7_gbv/results/figures/'
tables_path = '/Users/nataschajademinnitt/Documents/5. Data Analysis/distributionofthings.analysis/7_gbv/results/tables/'

# Load data
raw = pd.read_csv(data_path + 'violence.csv')
raw.head()

Unnamed: 0,Indicator Name,Indicator Code,Country Name,Country Code,Year,Value
0,Proportion of women subjected to physical and/...,SG.VAW.15PL.ME.ZS,East Asia & Pacific,EAS,2018,6.195
1,Proportion of women subjected to physical and/...,SG.VAW.15PL.ME.ZS,Europe & Central Asia,ECS,2018,4.813
2,Proportion of women subjected to physical and/...,SG.VAW.15PL.ME.ZS,High income,HIC,2018,4.142
3,Proportion of women subjected to physical and/...,SG.VAW.15PL.ME.ZS,Latin America & Caribbean,LCN,2018,6.955
4,Proportion of women subjected to physical and/...,SG.VAW.15PL.ME.ZS,Low income,LIC,2018,20.062


## II. Data exploration

In [35]:
raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69086 entries, 0 to 69085
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Indicator Name  69086 non-null  object 
 1   Indicator Code  69086 non-null  object 
 2   Country Name    69086 non-null  object 
 3   Country Code    69086 non-null  object 
 4   Year            69086 non-null  int64  
 5   Value           69086 non-null  float64
dtypes: float64(1), int64(1), object(4)
memory usage: 3.2+ MB


In [129]:
# Subset the data for 2018
df = raw[raw['Year'] == 2018]

The country name column included groupings of countries by region or income level. Using the WB API the country name column was clean to only include country names and seprate columns were created for region and income level.

In [131]:
# Convert to DataFrame for easier handling
classification = pd.read_excel(data_path + 'name_classification.xlsx', sheet_name=0)

# Droping values in 'Country Name' that don't match the country list
country_list = classification['Country'].tolist()
df = df[df['Country Name'].isin(country_list)]

In [142]:
# Renaming 'Country Name' to 'Name' for merge
df.rename(columns={'Country Name' : 'Country'}, inplace=True)

# Merging dataframe to add 'region', 'income group', and 'country code'
merged_df = df.merge(
    classification[['Country', 'Region', 'Income group', 'Code']],
    on='Country',
    how='left'
)

In [184]:
indicators = merged_df['Indicator Name'].value_counts()
indicators = pd.DataFrame(indicators)
indicators.to_csv(data_path + 'indicators.csv', index=True)

In [198]:
# Extract all indicators without Q1, Q2, etc.
overall_indicators = merged_df['Indicator Name'][~merged_df['Indicator Name'].str.contains(r':\s*Q\d', na=False)].unique()

overall_indicators = pd.DataFrame(overall_indicators)

# Set max column width to display full content
pd.set_option('display.max_colwidth', None)

overall_indicators

Unnamed: 0,0
0,"Proportion of women subjected to physical and/or sexual violence in the last 12 months (modeled estimate, % of ever partnered women ages 15+)"
1,Men who believe religion requires female genital mutilation (% of men who have heard about FGM)
2,"Suicide mortality rate, male (per 100,000 male population)"
3,There is legislation on sexual harassment in employment (1=yes; 0=no)
4,Female genital mutilation prevalence (%)
5,Proportion of women who have ever experienced any form of sexual violence (% of women ages 15-49)
6,There is legislation specifically addressing domestic violence (1=yes; 0=no)
7,Women whose husband or partner has never demonstrated controlling behaviors (% of ever-married women ages 15-49)
8,Women who believe a husband is justified in beating his wife (any of five reasons) (%)
9,Women who believe a husband is justified in beating his wife when she refuses sex with him (%)


### Indicators of interest



In [56]:
# Count the number of unique indicators
print(f"There are {df_recent['Indicator Name'].nunique()} indicators in the dataframe.")

There are 114 indicators in the dataframe.


In [59]:
df_grouped = df_recent.groupby(['Indicator Name', 'Indicator Code', 'Country Name', 'Country Code'])

In [65]:
df_grouped['Indicator Name'].unique()

Indicator Name                                                                                                    Indicator Code     Country Name         Country Code
Criminal penalties or civil remedies exist for sexual harassment in employment (1=yes; 0=no)                      SG.PEN.SXHR.EM     Afghanistan          AFG             [Criminal penalties or civil remedies exist fo...
                                                                                                                                     Albania              ALB             [Criminal penalties or civil remedies exist fo...
                                                                                                                                     Algeria              DZA             [Criminal penalties or civil remedies exist fo...
                                                                                                                                     Angola               AGO             [Cr

## III. Data cleaning

## IV. Data visualisation

## V. Data analysis