![Workforce](https://media.licdn.com/media/gcrc/dms/image/C4E12AQFh8lR4q1oi0w/article-cover_image-shrink_600_2000/0?e=2123092800&v=beta&t=pZz7F6dquEe6l2NEx1SvvOzKCt_tbC0GHqtOyVm9b8k)
# <center>Silicon Valley</center>

## Why Care

Women today hold only about a quarter of U.S. computing and mathematical jobs—a fraction that has actually fallen slightly over the past 15 years, even as women have made big strides in other fields. Women not only are hired in lower numbers than men are; they also leave tech at more than twice the rate men do. It’s not hard to see why. Studies show that women who work in tech are interrupted in meetings more often than men. They are evaluated on their personality in a way that men are not. They are less likely to get funding from venture capitalists, who, studies also show, find pitches delivered by men—especially handsome men—more persuasive. And in a particularly cruel irony, women’s contributions to open-source software are accepted more often than men’s are, but only if their gender is unknown.

In short there are social issues in the world of technology, and I intend to address the gender gap issue by exploring the top 23 silicon valley gender data sets during 2016. 



## **The Data** 

There are six columns in this dataset:

**company:** Company name

**year:** For now, 2016 only

**race:** Possible values: "American_Indian_Alaskan_Native", "Asian", "Black_or_African_American", "Latino", "Native_Hawaiian_or_Pacific_Islander", "Two_or_more_races", "White", "Overall_totals"

**gender:** Possible values: "male", "female". Non-binary gender is not counted in EEO-1 reports.

**job_category:** Possible values: "Administrative support", "Craft workers", "Executive/Senior officials & Mgrs", "First/Mid officials & Mgrs", "laborers and helpers", "operatives", "Professionals", "Sales workers", "Service workers", "Technicians", "Previous_totals", "Totals"

**count:** Mostly integer values, but contains "na" for a no-data variable.

## Plotly and SNS Visulizations

This will also be a crash course using one of the emerging data visualation libraries [Plotly](https://plot.ly/). Be prepared to learn about these 3 types of techniques: 
* Pie Charts
* Bar Graphs Vertical and Horizontal
* Multivariate Bar Graphs



## Steps to Success
### Step 1 | Create our gender data frame
### Step 2 | Remove impossible values from count column
### Step 3 | Total employee count
### Step 4 | Bar graph with sns plot of total employee count
### Step 5 | Exploring gender data with plotly pie charts :)
### Step 6 | Multivariate Bar Chart with Plotly
### Step 7 | Male to Female Ratios Calculation
### Step 8 | Visualize the Female to Male Ratio




# Load in Libraries

In [110]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
pd.options.mode.chained_assignment = None

from IPython.display import HTML

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
from plotly.graph_objs import *
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode()

<!-- toc -->
# Step 1 | Create our gender data frame
* Read the csv data into our gender data frame
* Preview Data by looking at the head of the data frame

In [111]:
gender_data=pd.read_csv('../input/Reveal_EEO1_for_2016.csv')
gender_data.head()

# Step 2 | Remove Impossible Values
* Convert the count column into an numeric type. 
* Replace all na values in the count column to 0 (can't do math if we have na text involved :) ) 

In [112]:
gender_data['count'].replace(to_replace='na',value=0,inplace=True)
gender_data['count']=gender_data['count'].astype(int)
gender_data.head()

# Step 3 | Total employee Count
How many people works for the top 15 silicon valley companies in the bay. We must group the columns by companies and count the sum of all employees from the different companies. 

In [113]:
#using lambda to aggregate all of the count data from the different type of employees that work at the 15 Silicon Valley Business
#under exploration
company_count=gender_data.groupby(['company']).agg({'count': lambda x: sum((x).astype(int))})
company_count.head()


# Step 4 | Bar Graph with SNS Bar Plot
We explore the employee count data by using an sns style and matplotlib bar graphs. 
* Created our figure size with plt.figure
* Setup our style for graph I used 'WhiteGrid' so the reader can see the grid lines, but feel free to use "White" instead if you want the maximum amount of whitespace
* Indicate the x axis values, y axis values, and the type of color pallete desire to demonstrate your data

Sns Bar Plot [Docs](https://seaborn.pydata.org/generated/seaborn.barplot.html)

In [114]:
#using figure to create a large size for our viewing purposes 
plt.figure(figsize=(10,8))

#using whitegrid to identify grid lines in the bar graphs
sns.set_style('whitegrid')

#key to creating bar plot line

sns.barplot(x=company_count.index.get_values(),y=company_count['count'],palette=sns.color_palette("Paired", 10))

plt.title('Silicon Valley Companies',size=25)
plt.ylabel('Number of employees',size=14)
plt.xlabel('Companies',size=14)
plt.yticks(size=14)
plt.xticks(size=14,rotation=90)
sns.despine()
plt.show()

# Step 5 | Exploring Gender Data with Plotly Pie Charts :)
Let's explore the gender data of these silicon valley offices with a pie chart to see the differences.
Below we are highlighting the differences between male to female raito

* Using the gender column to count all of the different employee values 
* Use labels variable to store the gender values of the different labels 
* Use trace variable to concatenate the label name( male or female), count of the data, and percentages of the genders

Review [Documents](https://plot.ly/python/pie-charts/) for detail view of how to create your better pie charts

In [115]:
labels = gender_data.groupby(['gender']).agg({'count':sum}).index.get_values()
values = gender_data.groupby(['gender']).agg({'count':sum})['count'].values
colors = ['#a1d99b', '#deebf7']
trace = go.Pie(labels=labels, values=values,
               textinfo="label+percent",
               textfont=dict(size=20),
               marker=dict(colors=colors, 
                           line=dict(color='#000000', width=2)))
layout=go.Layout(title='Pie Chart of Female and Male Employee')
data=[trace]

fig = dict(data=data,layout=layout)
iplot(fig, filename='Pie Chart of Female and Male Employees')

### For every women employee there are two men employed by the given companies. I wonder what the data is for each indivual company that totals up such magnitude of differences. 

# Step 6 | Multivariate Bar Chart with Plotly 
 ### **Let's take closer look at indivual companies by plotting the number of male and female employee by each of the companies to see the distribution of employees**</h3>
* Create a trace1, and trace 2 variables to store male and femle information 
* x stores the gender data values of the particular companies
* y stores the count of the eigher the male of female data values of the company of interest


The trace's is the common nomenclature that plotly uses in their api libraries to represent a data set.

In [116]:
d=gender_data.groupby(['gender','company']).agg({'count':sum}).reset_index()
trace1 = go.Bar(
    x=d[d.gender=='male']['company'],
    y=d[d.gender=='male']['count'],
    name='Males',
    marker=dict(
        color='rgb(158,202,225)'
    )
)
trace2 = go.Bar(
    x=d[d.gender=='female']['company'],
    y=d[d.gender=='female']['count'],
    name='Females',
    marker=dict(
        color='rgb(161,217,155)'
    )
)
data = [trace1, trace2]
layout = go.Layout(
    barmode='group',title='Distribution of Male and Female Employees by Company')


fig = dict(data=data, layout=layout)
iplot(fig, filename='Distribution of Male and Female Employees by Company')

<h3><b> We can see that some of the biggest companies such as Intel, Apple, Cisco, Google have a huge gap between the number of male and female employees. Let us dig a little deeper to see how wide the gap is between female and male employees. </b></h3>

# Step 7 | Male to Female Ratios Calculation
* Count the sum of the genders from gender data frame 
* Call the [unstack function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) on the d dataframe to pivot the table based on column names
* Leverage lambda to find the percentages of males and females for the Silicon Valley Companies 
* Find the ratio data by simply diving male and female ratios


In [117]:
d=gender_data.groupby(['company','gender']).agg({'count':sum})
d=d.unstack()
d=d['count']
d=np.round(d.iloc[:,:].apply(lambda x: (x/x.sum())*100,axis=1))
d['Ratio']=np.round(d['male']/d['female'],2)
d.sort_values(by='Ratio',inplace=True,ascending=False)
d.columns=['Female %','Male %','Ratio']

In [118]:
d

#### Even big companies such as <b>Nvidia, Intel, Cisco, Uber, Google, Apple, Facebook </b> and many more have more than <b>2 male employees for each female employee.</b>
#### <b>Nvidia</b> seems to have the largest female to male ratio out of all the 23 silicon valley companies  <b>with almost 5 men for each female employee</b>

# Step 8 | Let's Visualize the Female to Male Ratio
* We are leveraging plotly to visualize the ratio data with a horizontal bar chart
* Notice that we are using orientation in our trace object to denotate that this chart is meant to be horizontal

More details on horizontal bar graphs with plotly can be found in [Docs!](https://plot.ly/python/bar-charts/)


In [119]:
trace1 = go.Bar(
    y=d.index.get_values(),
    x=d['Ratio'],text=d['Ratio'],textposition='auto',
    orientation='h',
    marker=dict(
        color='rgb(158,202,225)',
        line=dict(
            color='rgb(8,48,107)',
            width=1.5,
        )
    ),
    opacity=0.6
)

data = [trace1]
layout = go.Layout(
    barmode='group',title='Ratio of Male to Female Employees')

fig = dict(data=data, layout=layout)
iplot(fig, filename='Ratio of Male to Female Employees')

![](https://media.licdn.com/dms/image/C4E12AQElIcntLKXDNg/article-inline_image-shrink_1500_2232/0?e=2123110800&v=beta&t=sjk8VHF38xSNxdt5wWTwOZZIEI6SjVoHZAP0E0qt_QI)
# <center> Conlcusion | Hope </center>

In the past several years, Silicon Valley has begun to grapple with these problems, or at least to quantify them. In 2014, Google released data on the number of women and minorities it employed. Other companies followed, including LinkedIn, Yahoo, Facebook, Twitter, Pinterest, eBay, and Apple. The numbers were not good, and neither was the resulting news coverage, but the companies pledged to spend hundreds of millions of dollars changing their work climates, altering the composition of their leadership, and refining their hiring practices.

#### Project Next Steps

In part two of the tutorial I will explore the race gap numbers, and how to visualize this information as well.