# Should I pursue higher education?
This notebook aims to analyse job education demographics and provide greater insights on the job propects of higher education.

Note: Because the survey is done on kaggle, there will be some bias especially on roles such as software engineers. 

One probably does not need anything more than a bachelors degree for SWE. 

However the survey probably captures people who are trying to make the transition from SWE to machine learning related roles.

# TLDR
For more 'technical roles' that have the word 'Scientist' in their job titles, pursuing a higher education would probably give better job prospects. 

There will always be exceptions, people with no degrees can also become a research scientist but it is considered to very rare.

# Higher Education
One should not pursue higher education for the sake of getting that piece of paper which gets one through the front door. 

One should consider one's interest as well as learning opportunities(and opportunity costs) before making the decision.

In [None]:
import pandas as pd
import plotly.express as px
from plotly.offline import init_notebook_mode
import plotly.graph_objects as go
init_notebook_mode(connected=True)

In [None]:
df = pd.read_csv("../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv")

In [None]:
df1 = df.drop(0)

In [None]:
df1["Q4"].unique()

In [None]:
d = {
    "I prefer not to answer":0,
    "No formal education past high school":1,
    "Some college/university study without earning a bachelor’s degree":2,
    "Professional doctorate":3,
    "Bachelor’s degree":4,
     "Master’s degree":5,
    "Doctoral degree":6
}
def get_percentage(job_name):
    df2 = df1[df1["Q5"]==job_name][["Q4","Q5"]].groupby("Q4").count().reset_index()
    df2["%"] = 100 * df2["Q5"]/df2["Q5"].sum()
    return df2.sort_values("Q4",key=lambda x:x.map(d))

In [None]:
df_swe = get_percentage("Software Engineer")
df_rs = get_percentage("Research Scientist")
df_ds = get_percentage("Data Scientist")
df_da = get_percentage("Data Analyst")
df_mle = get_percentage("Machine Learning Engineer")
df_ba = get_percentage("Business Analyst")
df_de = get_percentage("Data Engineer")
df_dba = get_percentage("DBA/Database Engineer")

In [None]:
fig = go.Figure(data=[
    go.Bar(name='DBA/Database Engineer', x=df_dba["Q4"], y=df_dba["%"]),
    go.Bar(name='Software Engineer', x=df_swe["Q4"], y=df_swe["%"]),
    go.Bar(name='Data Engineer', x=df_de["Q4"], y=df_de["%"]),
    go.Bar(name='Business Analyst', x=df_ba["Q4"], y=df_ba["%"]),
    go.Bar(name='Data Analyst', x=df_da["Q4"], y=df_da["%"]),
    go.Bar(name='Machine Learning Engineer', x=df_mle["Q4"], y=df_mle["%"]),
    go.Bar(name='Data Scientist', x=df_ds["Q4"], y=df_ds["%"]),
    go.Bar(name='Research Scientist', x=df_rs["Q4"], y=df_rs["%"])
])
fig.update_layout(
    autosize=True,
    width=1000,
    height=800)
fig.update_layout(title="Education demographics for various jobs",yaxis=dict(title="% of respondents"))
fig.show()

In [None]:
def make_plotly_bar(df,title):
    fig = px.bar(df,
            x="Q4",
            y="%",
            text=df["%"].round(1).values,
            orientation="v")
    fig.update_layout(
        autosize=False,
        width=800,
        height=600,)
    fig.update_layout(title=title,yaxis=dict(title="% of respondents"),xaxis=dict(title="job title"))
    fig.show()

In [None]:
make_plotly_bar(df_swe,"Software Engineer")

In [None]:
make_plotly_bar(df_rs,"Research Scientist")

In [None]:
make_plotly_bar(df_ds,"Data Scientist")

In [None]:
make_plotly_bar(df_da,"Data Analyst")

In [None]:
make_plotly_bar(df_mle,"Machine Learning Engineer")

In [None]:
make_plotly_bar(df_ba,"Business Analyst")

In [None]:
make_plotly_bar(df_de,"Data Engineer")

In [None]:
make_plotly_bar(df_dba,"DBA/Database Engineer")