**Summary**

This notebook explores the women representation in the city property of San Francisco using visulization techniques. The goal of the is to find where women and men have larger representation in the different departments/sources. The findings show that women have a significantly larger representation in the Port and men have a significantly larger representation in in the SFMTA, PUC and Airport departments. The Rec and Park and Library have slightly more women representation than men. The Administrator and RED departments hold a well balance equal amount between women and men. 

**Introduction**

This notebook is about the analysis of the women representation in the city property of San Francisco. The goal of this is to do an EDA on the representation of men and women in various facilities of the city, and find where both genders are larger along with some possible reasons behind that. There's 82 places named by people in the city's property. The types of information included in the file is the department of the named place, name of the place, the person it's named after, the gender of the person, reference, comments, and following numbers about the different districts. I will use visualization techniques to show the disproportion between female and male named in the different areas, which I will the department information to be used as the different areas. 

**Notebook Set-Up and Data Exploration Section**

First, I chose four libraries for this. I chose pandas and numppy for data manipulation functions, and matplotlib and seaborn for data visualization functions.  

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # more visualization

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Load in the csv files.

In [None]:
file_path ='../input/women-representation-in-city-property-sanfrancisco/WomenRepresentaionInCityProperty-SanFrancisco.csv'
women_data = pd.read_csv(file_path)
print("Data loaded")

Let's see how the data is structured:

In [None]:
women_data.head()

In [None]:
women_data.tail()

From the table of the data above, we see that there are places named by both male and female. So, now we should look at the count of female vs male.

In [None]:
women_data['Gender'].value_counts()

Now we see that the data also holds information of female/male, male/male and male/female named places. Since our goal is to find larger female and male named places in different areas, I will just leave out the places with shared names since that doesn't help us define the disproportion of the data. 

In [None]:
w_data2 = women_data[(women_data['Gender'] != 'F & M') & (women_data['Gender'] != 'M & M') & (women_data['Gender'] != 'M & F' )]

We know we have 53 male and 19 female named places. We can show this below:

In [None]:
w_data2['Gender'].value_counts()

In [None]:
counts = [53, 19]
gender_label = ["Male", "Female"]

In [None]:
plt.pie(counts, labels = gender_label);
plt.legend();
plt.show()

However, we can see from looking at just the beginning and ending rows that we have repeating names (example: "George R. Moscone"). This means we need to find the repeating names and take those out since they can bias our visulations of either gender. 

In [None]:
repeats = w_data2[w_data2.duplicated('Person')]
print(repeats)

In [None]:
w_data3 = w_data2.drop([2,22, 31, 38, 42, 46,50,52,70,76,77,80])
w_data3 = w_data3.reset_index(drop=True)

In [None]:
w_data3.tail()

Now we have 59 rows of unique, only female and male named places. Our data is now workable to start visualizing. 

**Visulization of Genders in Different Departments**

With our cleaned data, we have the 59 places named either by females or males. These places are divided by eight different departments/source, called administrator, Rec and Parks, library, RED, SFMTA, PUC, Port and airport. 

We can see how many places are in each one below: 

In [None]:
dpt_counts = w_data3["Department/Source"].value_counts()
print(dpt_counts)

In [None]:
gender_counts = w_data3['Gender'].value_counts()
print(gender_counts)

We can start visualizing the female vs male count of each department/source.

The first plot shows the breakdown of all the departments on one graph followed by individual plots of each department. 

In [None]:
sns.countplot(y = "Department/Source" ,hue ="Gender", data = w_data3)

In [None]:
fig, axes = plt.subplots(4,2, figsize = (15,20), constrained_layout = True)


sns.barplot(ax=axes[0,0], x=w_data3['Gender'], y=(w_data3['Department/Source']=='Administrator'))
axes[0,0].set_title("Females vs Males of the Administrator Department")
sns.barplot(ax=axes[0,1], x=w_data3['Gender'],y=w_data3['Department/Source']=='REC AND PARKS')
axes[0,1].set_title("Females vs Males of the Rec and Parks Department")
sns.barplot(ax=axes[1,0], x=w_data3['Gender'],y=w_data3['Department/Source']=='LIBRARY')
axes[1,0].set_title("Females vs Males of the Library")
sns.barplot(ax=axes[1,1],x=w_data3['Gender'],y=w_data3['Department/Source']=='RED')
axes[1,1].set_title("Females vs Males of the RED")
sns.barplot(ax=axes[2,0],x=w_data3['Gender'],y=w_data3['Department/Source']=='SFMTA')
axes[2,0].set_title("Females vs Males of the SFMTA")
sns.barplot(ax=axes[2,1],x=w_data3['Gender'],y=w_data3['Department/Source']=='PUC')
axes[2,1].set_title("Females vs Males of the PUC")
sns.barplot(ax=axes[3,0],x=w_data3['Gender'],y=w_data3['Department/Source']=='Port')
axes[3,0].set_title("Females vs Males of the Port")
sns.barplot(ax=axes[3,1], x=w_data3['Gender'],y=w_data3['Department/Source']=='AIRPORT')
axes[3,1].set_title("Females vs Males of the Airport")


**Summary Analysis**

Women have a significantly larger representation in the Port, almost doubling the amount of named places after men. That is the only area of significantly larger representation. Areas like the Administrator and RED departments hold a well balance equal amount between women and men, while the Rec and Park and Library have slightly more women representation. The women presences is non-exsistent in the SFMTA, PUC and Airport departments. It is *likely* to be highly focused in the Port, administrator, RED, Rec and park and library departments because (1) they are longer exstablished departments and (2) more women have broken into these fields. 

Men have a significantly larger representation in in the SFMTA, PUC and Airport, as no other women are represented in these departments. Men have a balanced representation in the Administrator and RED departments with women, and and slighly less represented in the Rec and Park and Library departments. 