## Business Understanding

I am interested in using a data analysis approach to know more about the situation of Women in Computer Programming. I hope to use the analysis results to provide some useful information to everyone who need this kind of research. The key questions I would like to answer are:

- what is the situation of women in Italy in the programming world compared to man?
- what is the identikit of women who work in computer programming in Italy? What are their qualifications?
- Instead in USA, what is the situation of super qualified women in tech ?

## Data Understanding

The data used in this analysi was Stack Overflow’s developer survey data from 2017 to 2019. Respondents from about 200 countries gave their answers to about 150 survey questions. This notebook attempted to use the survey questions to answer the three questions listed in the Business Understanding section.

## Gather Data

Data has been gathered by Stack Overflow survey. The following cells import necessary Python libraries, and read them into Pandas Dataframe.

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from pandasql import sqldf 

## Italy situation in 2018

## Prepare Data

The following cell help me to access Data, select the columns and the values that I need for my analysis 

## Results for Italy in 2018

In [2]:
stack_2018 = pd.read_csv('2018_survey_results_public.csv')

stack_2018 = stack_2018[['Country','Gender','FormalEducation','DevType','Salary']]

Italy_2018 = stack_2018.loc[stack_2018['Country']=='Italy']
#In this part I used dropna in order to drop all null value in the column Gender. For my analysis I need only to know if the gender is Male of Female#
Ita_2018 = Italy_2018.dropna(subset = ['Gender'])

Ita_18 = Ita_2018[Ita_2018['Gender'].isin(['Male','Female'])]
Ita_Dev18 = Ita_18.dropna(subset = ['DevType'])

Ita_F18 = Ita_Dev18.loc[Ita_Dev18['Gender']=='Female']


Ita_M18 = Ita_Dev18.loc[Ita_Dev18['Gender']=='Male']


  interactivity=interactivity, compiler=compiler, result=result)


## Results for Italy in 2018

In [3]:
# A first overview of the total number of people in tech world based on the data of the survey#
print(Ita_Dev18.shape[0])
# A first overview of the number of female in tech world#
print(Ita_F18.shape[0])
# A first overview of the number of male in tech world#
print(Ita_M18.shape[0])

1025
24
1001


In [4]:
perc_Ita_F18 = Ita_F18.shape[0] / Ita_Dev18.shape[0] *100

In [5]:
#Percentage of women in tech world in Italy #
"{:.2f} %".format(perc_Ita_F18)

'2.34 %'

In [6]:
func = lambda q : sqldf(q , globals())

q = """
select *
from Ita_Dev18
where FormalEducation like '%Master%' or 
    FormalEducation like '%Bachelor%' or 
    FormalEducation like '%Professional%' or 
    FormalEducation like '%doctoral%';
"""

Dev_OverQual18_Ita = func(q)

In [7]:
### percentage of woman developer overqualified on number of developer women##
Dev_OverQual_F18_Ita = len(Dev_OverQual18_Ita.loc[Dev_OverQual18_Ita['Gender']=='Female'])/Ita_F18.shape[0] * 100 

### percentage of man developer overqualified on number of developer man##
Dev_OverQual_M18_Ita = len(Dev_OverQual18_Ita.loc[Dev_OverQual18_Ita['Gender']=='Male'])/Ita_M18.shape[0] * 100 

In [8]:
print("{:.2f} %".format(Dev_OverQual_F18_Ita))
print("{:.2f} %".format(Dev_OverQual_M18_Ita))

70.83 %
55.94 %


In [48]:
####### export Ita18 outputs #######
Italy_2018.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\Italy_2018.xlsx', index=False, header=True)
Ita_2018.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaDropnaByGender_2018.xlsx', index=False, header=True)
Ita_18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaMF_18.xlsx', index=False, header=True)
Ita_Dev18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaDev_18.xlsx', index=False, header=True)
Ita_F18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaDev_F18.xlsx', index=False, header=True)
Ita_M18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaDev_M18.xlsx', index=False, header=True)
Dev_OverQual18_Ita.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Ita_18\ItaDev_OverQ18.xlsx', index=False, header=True)

## USA situation in 2018

In [9]:
Usa_2018 = stack_2018.loc[stack_2018['Country']=='United States']

Usa_18 = Usa_2018.dropna(subset = ['Gender'])

USA_MF18 = Usa_18[Usa_2018['Gender'].isin(['Male','Female'])]
#In this part I used dropna in order to drop all null value in the column Gender. For my analysis I need only to know if the gender is Male of Female#
Usa_Dev18 = USA_MF18.dropna(subset = ['DevType'])

Usa_F18 = Usa_Dev18.loc[Usa_Dev18['Gender']=='Female']

Usa_M18 = Usa_Dev18.loc[Usa_Dev18['Gender']=='Male']

  """


## Results for USA in 2018

In [10]:
# A first overview of the total number of people in tech world in Usa based on the data of the survey#
print(Usa_Dev18.shape[0])
# A first overview of the number of women in tech world in Usa based on the data of the survey#
print(Usa_F18.shape[0])
# A first overview of the total number of men in tech world in Usa based on the data of the survey#
print(Usa_M18.shape[0])

14863
1251
13612


In [11]:
perc_Usa_F18 = Usa_F18.shape[0] / Usa_Dev18.shape[0] *100

In [12]:
"{:.2f} %".format(perc_Usa_F18)

'8.42 %'

In [13]:
func_1 = lambda d : sqldf(d , globals())

d = """
select *
from Usa_Dev18
where FormalEducation like '%Master%' or 
FormalEducation like '%Bachelor%' or 
FormalEducation like '%Professional%' or 
FormalEducation like '%doctoral%';
"""

Dev_OverQual18_Usa = func_1(d)

In [14]:
Dev_OverQual18_Usa.tail()

Unnamed: 0,Country,Gender,FormalEducation,DevType,Salary
11045,United States,Male,"Other doctoral degree (Ph.D, Ed.D., etc.)",Educator or academic researcher,
11046,United States,Male,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Student,
11047,United States,Male,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",Educator or academic researcher,
11048,United States,Male,"Master’s degree (MA, MS, M.Eng., MBA, etc.)",Student,
11049,United States,Male,"Bachelor’s degree (BA, BS, B.Eng., etc.)","C-suite executive (CEO, CTO, etc.)",


In [15]:
### percentage of women developer overqualified on number of women  developer##
Dev_OverQual_F18_Usa = len(Dev_OverQual18_Usa.loc[Dev_OverQual18_Usa['Gender']=='Female'])/Usa_F18.shape[0] * 100 
### percentage of men developer overqualified on number men developer##
Dev_OverQual_M18_Usa = len(Dev_OverQual18_Usa.loc[Dev_OverQual18_Usa['Gender']=='Male'])/Usa_M18.shape[0] * 100 

In [16]:
### percentage of woman developer overqualified on number of developer women##
print("{:.2f} %".format(Dev_OverQual_F18_Usa))
### percentage of man developer overqualified on number of developer man##
print("{:.2f} %".format(Dev_OverQual_M18_Usa))

84.49 %
73.41 %


In [52]:
########### export output Usa 2018 #########################

Usa_2018.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\Usa_2018.xlsx', index=False, header=True)
Usa_18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaDropnaByGender_2018.xlsx', index=False, header=True)
USA_MF18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaMF_18.xlsx', index=False, header=True)
Usa_Dev18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaDev_18.xlsx', index=False, header=True)
Usa_F18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaDev_F18.xlsx', index=False, header=True)
Usa_M18.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaDev_M18.xlsx', index=False, header=True)
Dev_OverQual18_Usa.to_excel(r'C:\Users\moryb\OneDrive\Desktop\Project1\output\output_18\Usa_18\UsaDev_OverQ18.xlsx', index=False, header=True)