# My First Exploratory Data Analysis (EDA) Project

As a female computer science student in Canada, I wanted to know the education distribution and job title distribution among female data scientists in Canada.

In [None]:
# import packages
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## 1. Loading data

In [None]:
data = pd.read_csv('/kaggle/input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
data.describe()
data.head()

## 2. Removing unnecessary row

In [None]:
# remove questions
survey_data = data.iloc[1:,:]
survey_data.head()

## 3. Renaming some columns

In [None]:
# rename columns
renamed_data = survey_data.rename(columns={'Q2':'gender',
                                           'Q3':'country',
                                           'Q4':'education',
                                           'Q5':'job_title'
                                           })

new_renamed_data = renamed_data[['gender', 'country', 'education', 'job_title']].copy()
new_renamed_data

## 4. Extracting a dataframe on female data scientists in Canada

In [None]:
women_in_canada = new_renamed_data.loc[(renamed_data.country == 'Canada') & (renamed_data.gender == 'Woman')]
women_in_canada

## 5. Getting a percentage of null values in selected questions

In [None]:
# percentage of null values in selected questions
women_in_canada.isnull().sum() / women_in_canada.shape[0]

Since the percentage of null values is small, I decided not to handle missing values.

## 6. Creating a bar chart on education distribution

In [None]:
# no response for 'No formal education past high school' and 'I prefer not to answer'
education_order = [
    'Professional degree',
    'Some college/university study without earning a bachelor’s degree',
    'Bachelor’s degree',
    'Master’s degree',
    'Doctoral degree',
]

plt.figure(figsize=(10,6))
plt.xticks(rotation=80, fontsize=10)
plt.title("Education Distribution", fontdict={'fontsize': 20})

sns.barplot(x=women_in_canada['education'], y=women_in_canada.index, order=education_order,
           palette="pastel")

According to the chart created above, education level is almost evenly distributed except for 'Professional degree' among female data scientists in Canada.

## 7. Creating a bar chart on job title
A chart is shown in descending order to see which job title is assigned most frequently among female data scientists.

In [None]:
# job title distribution in descending order
plot_order = (women_in_canada.groupby('job_title').job_title.count()).sort_values(ascending=False)

plt.figure(figsize=(10,6))
plt.title("Job Title Distribution", fontdict={'fontsize': 20})

plot_order.plot(kind='bar', rot=70, fontsize=10)

According to the chart created above, they are mostly student. Otherwise, 'Data Analyst' or 'Data Scientist' were two common job titles.