## Business Understanding

I am interested in using a data analysis approach to know more about the situation of Women in Computer Programming. I hope to use the analysis results to provide some useful information to everyone who need this kind of research. The key questions I would like to answer are:

- what is the situation of women in Italy in the programming world compared to man?
- what is the identikit of women who work in computer programming in Italy? What are their qualifications?
- Instead in USA, what is the situation of super qualified women in tech ?

## Data Understanding

The data used in this analysi was Stack Overflow’s developer survey data from 2017 to 2019. Respondents from about 200 countries gave their answers to about 150 survey questions. This notebook attempted to use the survey questions to answer the three questions listed in the Business Understanding section.

## Gather Data

Data has been gathered by Stack Overflow survey. The following cells import necessary Python libraries, and read them into Pandas Dataframe.

In [11]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from pandasql import sqldf

## Italy situation in 2019

## Prepare Data

The following cell help me to access Data, select the columns and the values that I need for my analysis. 

In [2]:
## Read Data
stack_2019 = pd.read_csv('2019_survey_results_public.csv')

In [3]:
Italy_2019 = stack_2019.loc[stack_2019['Country']=='Italy']

In [5]:
stack_2019 = pd.read_csv('2019_survey_results_public.csv')

stack_2019 = stack_2019[['Country','Gender','EdLevel','DevType']]

Italy_2019 = stack_2019.loc[stack_2019['Country']=='Italy']
#In this part I used dropna in order to drop all null value in the column Gender. For my analysis I need only to know if the gender is Male of Female#
Ita_2019 = Italy_2019.dropna(subset = ['Gender'])

Ita_19 = Ita_2019[Ita_2019['Gender'].isin(['Man','Woman'])]
Ita_Dev19 = Ita_19.dropna(subset = ['DevType'])

Ita_F19 = Ita_Dev19.loc[Ita_Dev19['Gender']=='Woman']


Ita_M19 = Ita_Dev19.loc[Ita_Dev19['Gender']=='Man']


## Results for Italy in 2019

In [6]:
# A first overview of the total number of people in tech world based on the data of the survey#
print(Ita_Dev19.shape[0])
# A first overview of the number of female in tech world#
print(Ita_F19.shape[0])
# A first overview of the number of male in tech world#
print(Ita_M19.shape[0])

1347
47
1300


In [7]:
perc_Ita_F19 = Ita_F19.shape[0] / Ita_Dev19.shape[0] *100

In [8]:
#Percentage of women in tech world in Italy #
"{:.2f} %".format(perc_Ita_F19)

'3.49 %'

In [9]:
func = lambda q : sqldf(q , globals())

q = """
select *
from Ita_Dev19
    where EdLevel like '%Master%' or 
    EdLevel like'%Bachelor%' or 
    EdLevel like'%Professional%' or 
    EdLevel like'%doctoral%';
"""

In [12]:
Dev_OverQual19_Ita = func(q)

In [13]:
Dev_OverQual19_Ita.head()

Unnamed: 0,Country,Gender,EdLevel,DevType
0,Italy,Man,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Developer, back-end;Developer, desktop or ente..."
1,Italy,Man,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Developer, back-end;Developer, embedded applic..."
2,Italy,Woman,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Developer, front-end"
3,Italy,Man,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Database administrator;Developer, full-stack"
4,Italy,Man,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Developer, back-end"


In [14]:
### percentage of women developer overqualified on number of developer women##
Dev_OverQual_F19_Ita = len(Dev_OverQual19_Ita.loc[Dev_OverQual19_Ita['Gender']=='Woman'])/Ita_F19.shape[0] * 100 

In [15]:
### percentage of men developer overqualified on number of developer men##
Dev_OverQual_M19_Ita = len(Dev_OverQual19_Ita.loc[Dev_OverQual19_Ita['Gender']=='Man'])/Ita_M19.shape[0] * 100 

In [16]:
#percentage of women developer overqualified on number of developer women#
print("{:.2f} %".format(Dev_OverQual_F19_Ita))
#percentage of women developer overqualified on number of developer women#
print("{:.2f} %".format(Dev_OverQual_M19_Ita))

70.21 %
56.38 %


In [17]:
##################### export Ita18 outputs ##########################################
Italy_2019.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\Italy_2019.xlsx', index=False, header=True)
Ita_2019.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaDropnaByGender_2019.xlsx', index=False, header=True)
Ita_19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaMF_19.xlsx', index=False, header=True)
Ita_Dev19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaDev_19.xlsx', index=False, header=True)
Ita_F19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaDev_F19.xlsx', index=False, header=True)
Ita_M19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaDev_M19.xlsx', index=False, header=True)
Dev_OverQual19_Ita.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Ita_19\ItaDev_OverQ19.xlsx', index=False, header=True)



## USA Situation in 2019  

## Prepare Data

The following cell help me to access Data, select the columns and the values that I need for my analysis

In [18]:
Usa_2019 = stack_2019.loc[stack_2019['Country']=='United States']
#In this part I used dropna in order to drop all null value in the column Gender. For my analysis I need only to know if the gender is Male of Female#
Usa_2019 = Usa_2019.dropna(subset = ['Gender'])

USA_MF19 = Usa_2019[Usa_2019['Gender'].isin(['Man','Woman'])]
Usa_Dev19 = USA_MF19.dropna(subset = ['DevType'])

Usa_F19 = Usa_Dev19.loc[Usa_Dev19['Gender']=='Woman']


Usa_M19 = Usa_Dev19.loc[Usa_Dev19['Gender']=='Man']

## Results for Usa in 2019

In [19]:
# A first overview of the total number of people in tech world in Usa based on the data of the survey#
print(Usa_Dev19.shape[0])
# A first overview of the number of women in tech world in Usa based on the data of the survey#
print(Usa_F19.shape[0])
# A first overview of the total number of men in tech world in Usa based on the data of the survey#
print(Usa_M19.shape[0])

18439
1945
16494


In [20]:
perc_Usa_F19 = Usa_F19.shape[0] / Usa_Dev19.shape[0] *100

In [21]:
#Percentage of women in tech world in Usa in 2019#
perc_Usa_F19

10.548294376050762

In [22]:
func_1 = lambda d : sqldf(d , globals())

d = """
select *
from Usa_Dev19
where EdLevel like '%Master%' or 
EdLevel like '%Bachelor%' or 
EdLevel like '%Professional%' or 
EdLevel like '%doctoral%';
"""

In [23]:
Dev_OverQual19_Usa = func_1(d)

In [24]:
### USA: percentage of women developer overqualified on number of developer women##

Dev_OverQual_F19_Usa = len(Dev_OverQual19_Usa.loc[Dev_OverQual19_Usa['Gender']=='Woman'])/Usa_F19.shape[0] * 100 

In [25]:
###USA:  percentage of men developer overqualified on number of developer men##

Dev_OverQual_M19_Usa = len(Dev_OverQual19_Usa.loc[Dev_OverQual19_Usa['Gender']=='Man'])/Usa_M19.shape[0] * 100


In [26]:
### percentage of woman developer overqualified on number of developer women##
print("{:.2f} %".format(Dev_OverQual_F19_Usa))
### percentage of man developer overqualified on number of developer men##
print("{:.2f} %".format(Dev_OverQual_M19_Usa))

82.93 %
75.41 %


In [None]:
################################# export Usa19 outputs ######################

Usa_2019.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\Usa_2019.xlsx', index=False, header=True)
Usa_19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaDropnaByGender_2019.xlsx', index=False, header=True)
USA_MF19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaMF_19.xlsx', index=False, header=True)
Usa_Dev19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaDev_19.xlsx', index=False, header=True)
Usa_F19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaDev_F19.xlsx', index=False, header=True)
Usa_M19.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaDev_M19.xlsx', index=False, header=True)
Dev_OverQual19_Usa.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_19\Usa_19\UsaDev_OverQ19.xlsx', index=False, header=True)