# How are Coding Bootcamp graduates valued in the field?
There are many arguments whether or not coding boocamps are worth it.
People debate a lot in the communities <a href="https://www.reddit.com/r/webdev/comments/6tcnt9/are_coding_bootcamps_worth_the_time_and_money/" target="_blank">reddit discussion link</a>  and it is hot issue among those who want to start their new professional software engineering career. 

I can talk about pros and cons of going to bootcamp here, but I won't do it because there are already <a href="https://careerkarma.com/blog/are-coding-bootcamps-worth-it/" target="_blank">many discussion out there</a>. 

In this article, I would instead talk about how much coding boocamp are valued in the field based on <a href="https://www.kaggle.com/mchirico/stack-overflow-developer-survey-results-2019" target="_blank">stackoverflow survey 2019</a> data.

There are some assumptions that I have to tell you. 

1. Not all coding bootcamp graduates probably got their job 100%, so this analysis would be fully based on those survey results by whom I belive that they got their job. Therefore, my argument is not focused on "Can you get a job through coding bootcamp?'.

2. There are no specific educational backgroudn group with labed as 'coding bootcamp' but there is one like 'Participated in a full-time developer training program or bootcamp'. Therefore I would like to consider the group as 'coding bootcamp' group since 9 years ago because wikipedia says the very first bootcamp was in 2001 in U.S.
3. I restricted the region to U.S. becasue the compensation is relative all over the world and did not want to go into market and exchange rate issue. 

## My Hypothesis
The bootcamp graduates are valued as equal as B.S. degree in Computer Science in the field

## Steps to validate my Hypothesis
1. EDA: let's explore the data
2. visualize data
3. analyze and see if my hypothesis is correct

## Step 1: EDA

In [225]:
# imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [226]:
# Plotly imports
import plotly as py
import plotly.graph_objs as go
import plotly.express as px

In [227]:
# original survey_data_2019.csv file size was 150Mb
# had to compress the size down to 50Mb with pickle format
df = pd.read_pickle('./survey_data_2019.pkl')
df_sch = pd.read_csv('./survey_results_schema.csv')

In [228]:
df.head()

Unnamed: 0,Respondent,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
0,1,I am a student who is learning to code,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work",United Kingdom,No,Primary/elementary school,,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,14.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
1,2,I am a student who is learning to code,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work",Bosnia and Herzegovina,"Yes, full-time","Secondary school (e.g. American high school, G...",,...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,19.0,Man,No,Straight / Heterosexual,,No,Appropriate in length,Neither easy nor difficult
2,3,"I am not primarily a developer, but I write co...",Yes,Never,The quality of OSS and closed source software ...,Employed full-time,Thailand,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Web development or web design,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,28.0,Man,No,Straight / Heterosexual,,Yes,Appropriate in length,Neither easy nor difficult
3,4,I am a developer by profession,No,Never,The quality of OSS and closed source software ...,Employed full-time,United States,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech articles written by other developers;Indu...,22.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
4,5,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Ukraine,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,30.0,Man,No,Straight / Heterosexual,White or of European descent;Multiracial,No,Appropriate in length,Easy


The shape of the data is 88883 x 85

In [229]:
df.shape

(88883, 85)

In [230]:
# let's see what each column survey question asks
pd.set_option('display.max_rows', df_sch.shape[0]+1)
print(df_sch)

Column                                       QuestionText
0               Respondent  Randomized respondent ID number (not in order ...
1               MainBranch  Which of the following options best describes ...
2                 Hobbyist                            Do you code as a hobby?
3              OpenSourcer        How often do you contribute to open source?
4               OpenSource  How do you feel about the quality of open sour...
5               Employment  Which of the following best describes your cur...
6                  Country          In which country do you currently reside?
7                  Student  Are you currently enrolled in a formal, degree...
8                  EdLevel  Which of the following best describes the high...
9           UndergradMajor  What was your main or most important field of ...
10                EduOther  Which of the following types of non-degree edu...
11                 OrgSize  Approximately how many people are employed by ...
12    

# 

In [231]:
# types of each column
df.dtypes

Respondent                  int64
MainBranch                 object
Hobbyist                   object
OpenSourcer                object
OpenSource                 object
Employment                 object
Country                    object
Student                    object
EdLevel                    object
UndergradMajor             object
EduOther                   object
OrgSize                    object
DevType                    object
YearsCode                  object
Age1stCode                 object
YearsCodePro               object
CareerSat                  object
JobSat                     object
MgrIdiot                   object
MgrMoney                   object
MgrWant                    object
JobSeek                    object
LastHireDate               object
LastInt                    object
FizzBuzz                   object
JobFactors                 object
ResumeUpdate               object
CurrencySymbol             object
CurrencyDesc               object
CompTotal     

In [232]:
# what are the numeric answers?
df.select_dtypes(include=np.number).columns.tolist()

['Respondent',
 'CompTotal',
 'ConvertedComp',
 'WorkWeekHrs',
 'CodeRevHrs',
 'Age']

In [233]:
# Are there null values?
df.isna().sum()

Respondent                    0
MainBranch                  552
Hobbyist                      0
OpenSourcer                   0
OpenSource                 2041
Employment                 1702
Country                     132
Student                    1869
EdLevel                    2493
UndergradMajor            13269
EduOther                   4623
OrgSize                   17092
DevType                    7548
YearsCode                   945
Age1stCode                 1249
YearsCodePro              14552
CareerSat                 16036
JobSat                    17895
MgrIdiot                  27724
MgrMoney                  27726
MgrWant                   27651
JobSeek                    8328
LastHireDate               9029
LastInt                   21728
FizzBuzz                  17539
JobFactors                 9512
ResumeUpdate              11006
CurrencySymbol            17491
CurrencyDesc              17491
CompTotal                 32938
CompFreq                  25615
Converte

In [234]:
# let's see if I can filter out non-numeric columns from df
df_obj = df.select_dtypes(['object'])
print(df_obj.dtypes)

MainBranch                object
Hobbyist                  object
OpenSourcer               object
OpenSource                object
Employment                object
Country                   object
Student                   object
EdLevel                   object
UndergradMajor            object
EduOther                  object
OrgSize                   object
DevType                   object
YearsCode                 object
Age1stCode                object
YearsCodePro              object
CareerSat                 object
JobSat                    object
MgrIdiot                  object
MgrMoney                  object
MgrWant                   object
JobSeek                   object
LastHireDate              object
LastInt                   object
FizzBuzz                  object
JobFactors                object
ResumeUpdate              object
CurrencySymbol            object
CurrencyDesc              object
CompFreq                  object
WorkPlan                  object
WorkChalle

In [235]:
# let's filter out numeric columns
df_num = df.select_dtypes(np.number)
print(df_num.shape)
print(df_num.dtypes)

(88883, 6)
Respondent         int64
CompTotal        float64
ConvertedComp    float64
WorkWeekHrs      float64
CodeRevHrs       float64
Age              float64
dtype: object


## Startegy
In order to validate my hypothesis, I would like to compare Salaries for two different gropus. 
1. B.S. degree in Computer Science or similar computer related study
2. Coding Bootcamp graduates

Stackoverflow gives two columns, one with 'EdLevel' where I can figure who got B.S. degree in computer science or similar. 

In [236]:
# see if EdLevel has null values
df['EdLevel'].isna().sum()

2493

In [237]:
# see how the value counts for each categorical value
df['EdLevel'].value_counts()

Bachelor’s degree (BA, BS, B.Eng., etc.)                                              39134
Master’s degree (MA, MS, M.Eng., MBA, etc.)                                           19569
Some college/university study without earning a degree                                10502
Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)     8642
Associate degree                                                                       2938
Other doctoral degree (Ph.D, Ed.D., etc.)                                              2432
Primary/elementary school                                                              1422
Professional degree (JD, MD, etc.)                                                     1198
I never completed any formal education                                                  553
Name: EdLevel, dtype: int64

In [238]:
df['EduOther'].value_counts().index.tolist()

ipated in online coding competitions (e.g. HackerRank, CodeChef, TopCoder);Participated in a hackathon;Contributed to open source software',
 'Participated in a full-time developer training program or bootcamp;Completed an industry certification program (e.g. MCPD);Received on-the-job training in software development;Taught yourself a new language, framework, or tool without taking a formal course;Participated in online coding competitions (e.g. HackerRank, CodeChef, TopCoder);Participated in a hackathon',
 'Taken an online course in programming or software development (e.g. a MOOC);Participated in a full-time developer training program or bootcamp;Completed an industry certification program (e.g. MCPD);Participated in online coding competitions (e.g. HackerRank, CodeChef, TopCoder);Participated in a hackathon;Contributed to open source software',
 'Taken an online course in programming or software development (e.g. a MOOC);Taken a part-time in-person course in programming or software 

In [239]:
df['EduOther'].iloc[25875]
str_self_taught = 'Taught yourself a new language, framework, or tool without taking a formal course;Contributed to open source software'

In [240]:
# let's visualize education level
data = go.Histogram(
    x=df['EdLevel'],
    # histnorm='percent'
    )
fig = go.Figure(data=[data])
fig.update_layout(
    title_text='Education Level', # title of plot
    xaxis_title_text='Counts', # xaxis label
    yaxis_title_text='Ed Level', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)
fig.update_yaxes(automargin=True)
fig.show()

In [241]:
# let's define df for only United States Country respondent
df_us = df[df['Country'] == 'United States']

In [242]:
df_us['EdLevel'].value_counts()

Bachelor’s degree (BA, BS, B.Eng., etc.)                                              10953
Master’s degree (MA, MS, M.Eng., MBA, etc.)                                            3585
Some college/university study without earning a degree                                 2779
Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)     1113
Associate degree                                                                        977
Other doctoral degree (Ph.D, Ed.D., etc.)                                               673
Primary/elementary school                                                               310
Professional degree (JD, MD, etc.)                                                      119
I never completed any formal education                                                   96
Name: EdLevel, dtype: int64

In [243]:
# Salary with education levels
print('salary median for B.S. degree in all majors\n', df_us.groupby(['EdLevel'])['ConvertedComp'].median())
str_bs_ed_level = 'Bachelor’s degree (BA, BS, B.Eng., etc.)'

salary median for B.S. degree in all majors
 EdLevel
Associate degree                                                                       90000.0
Bachelor’s degree (BA, BS, B.Eng., etc.)                                              110000.0
I never completed any formal education                                                123500.0
Master’s degree (MA, MS, M.Eng., MBA, etc.)                                           125000.0
Other doctoral degree (Ph.D, Ed.D., etc.)                                             140000.0
Primary/elementary school                                                             120000.0
Professional degree (JD, MD, etc.)                                                    106500.0
Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)     95000.0
Some college/university study without earning a degree                                107000.0
Name: ConvertedComp, dtype: float64


## result
B.S. undergraduate degree salary median is 110,000 USD.
However, I think I need to filter only C.S. or similar degree here. That way, I expect to get higher median value. Another way to say is that I assum non-CS major in SW engieering field will have lower salary. 

In [244]:
# what majors do i have
df_us['UndergradMajor'].value_counts()

Computer science, computer engineering, or software engineering          10747
Another engineering discipline (ex. civil, electrical, mechanical)        1344
Information systems, information technology, or system administration     1186
A natural science (ex. biology, chemistry, physics)                        961
Mathematics or statistics                                                  895
A humanities discipline (ex. literature, history, philosophy)              718
A social science (ex. anthropology, psychology, political science)         695
Fine arts or performing arts (ex. graphic design, music, studio art)       667
A business discipline (ex. accounting, finance, marketing)                 615
Web development or web design                                              493
I never declared a major                                                   345
A health science (ex. nursing, pharmacy, radiology)                         87
Name: UndergradMajor, dtype: int64

In [245]:
# get the string for C.S or similar major
str_cs_degree = df_us['UndergradMajor'].iloc[3733]
str_cs_degree

'Computer science, computer engineering, or software engineering'

In [246]:
# Salary with education levels specifically CS major
condition_bs_degree = df_us['EdLevel'] == str_bs_ed_level
condition_cs_bs_major = condition_bs_degree & (df_us['UndergradMajor'] == str_cs_degree)

# there is something wrong with 10953 because number of bs degree cannot be number of cs undergrad major
# I need to show that b.s. degree filter has all majors
bs_values = df_us[condition_bs_degree]['EdLevel'].value_counts()
bs_cs_values = df_us[condition_cs_bs_major]['EdLevel'].value_counts()
print('number of people who has B.S. degree in all majors\n', bs_values)
print('number of people who has B.S. degree in computer science or similar\n', bs_cs_values)
print('salary median for people with B.S. degree in C.S and their education level is B.S.\n', df_us[condition_cs_bs_major].groupby(['EdLevel'])['ConvertedComp'].median())

number of people who has B.S. degree in all majors
 Bachelor’s degree (BA, BS, B.Eng., etc.)    10953
Name: EdLevel, dtype: int64
number of people who has B.S. degree in computer science or similar
 Bachelor’s degree (BA, BS, B.Eng., etc.)    6449
Name: EdLevel, dtype: int64
salary median for people with B.S. degree in C.S and their education level is B.S.
 EdLevel
Bachelor’s degree (BA, BS, B.Eng., etc.)    110250.0
Name: ConvertedComp, dtype: float64


## result
The median value for C.S. or similar undergraduate B.S. degree is more or less the same than overal B.S. degree people. Therefore, my assumption that 'C.S. degree median salary will be higher' is wrong.

In [247]:
# here we have to update years of coding as pforessional column string values to integere values
df_us['YearsCodePro'] = df_us['YearsCodePro'].apply(lambda x: 0.5 if x == 'Less than 1 year' else (50 if x == 'More than 50 years' else x))

In [248]:
df_us['YearsCodePro'].sample(20)

4865       5
31436      2
67575     12
73448     34
46822    0.5
75967    0.5
77366      1
79736     15
65415      5
73205    NaN
66628     12
18351      4
1393      20
42054     12
36980      4
42641     12
2835      13
38443    NaN
1161       5
16381     10
Name: YearsCodePro, dtype: object

In [249]:
# I'm not sure why but below astype cannot take care of NaN
# df_us['YearsCodePro'] = df_us['YearsCodePro'].astype(int) 
df_us['YearsCodePro'] = pd.to_numeric(df_us['YearsCodePro'])

In [250]:
# see if the type updated
df_us['YearsCodePro'].dtypes

dtype('float64')

In [251]:
edu_level = df_us.pivot_table(
    index = 'EdLevel',
    columns='YearsCodePro',
    values='ConvertedComp',
    aggfunc={
        'ConvertedComp': np.median
        }
    )
bs_degree = edu_level[edu_level.index == str_bs_ed_level]
bs_degree

YearsCodePro,0.5,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,...,40.0,41.0,42.0,43.0,44.0,45.0,47.0,48.0,49.0,50.0
EdLevel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Bachelor’s degree (BA, BS, B.Eng., etc.)",70700.0,75000.0,81000.0,90000.0,95000.0,101000.0,106000.0,111000.0,120000.0,130000.0,...,132500.0,133400.0,125000.0,92000.0,115000.0,180000.0,145000.0,157500.0,176000.0,


In [252]:
edu_level = df_us[condition_cs_bs_major].pivot_table(
    index = 'EdLevel',
    columns='YearsCodePro',
    values='ConvertedComp',
    aggfunc={
        'ConvertedComp': np.median
        }
    )
bs_cs_degree = edu_level[edu_level.index == str_bs_ed_level]
bs_cs_degree

YearsCodePro,0.5,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,...,37.0,38.0,39.0,40.0,41.0,42.0,43.0,44.0,45.0,47.0
EdLevel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Bachelor’s degree (BA, BS, B.Eng., etc.)",72900.0,75000.0,82000.0,90000.0,95000.0,105000.0,105000.0,114500.0,130000.0,130000.0,...,123500.0,145000.0,150000.0,155000.0,133400.0,90000.0,91000.0,115000.0,180000.0,220000.0


In [253]:
edu_other = df_us.pivot_table(
    index = 'EduOther',
    columns='YearsCodePro',
    values='ConvertedComp',
    aggfunc={
        'ConvertedComp': np.median
        }
    )
# edu_other.head()
# bootcamp.shape
str_bootcamp_graduate = 'Participated in a full-time developer training program or bootcamp'
bootcamp_graduate = edu_other[edu_other.index == str_bootcamp_graduate]

bootcamp_graduate

YearsCodePro,0.5,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,...,40.0,41.0,42.0,43.0,44.0,45.0,47.0,48.0,49.0,50.0
EduOther,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Participated in a full-time developer training program or bootcamp,68000.0,71000.0,83750.0,88500.0,87000.0,112000.0,567500.0,,65000.0,,...,,,,,,,,,,


In [254]:
years_code_pro = bootcamp_graduate.columns.tolist()

fig = go.Figure(
    data=[
        go.Bar(name='B.S in Computer Science', x=years_code_pro, y=bs_cs_degree.values.tolist()[0]),
        go.Bar(name='Bootcamp Graduates', x=years_code_pro, y=bootcamp_graduate.values.tolist()[0])
    ]
)
# Change the bar mode
fig.update_layout(
    title='Salary Comparison between CS degree and Bootcamp',
    xaxis_title='number of years of professional coding',
    yaxis_title='Salaray(median) in dollars',
    barmode='group'
    )
fig.show()

# Plot Result
I have an insane salary median peak for bootcamp graduate at 6 years of professional coding expereience. I am very skeptical about it. This can't be right unless someone went to coding bootcamp years ago and he or she quickly went IPO and somehow he or she submitted stackoverflow survey. Also, there should not many people in that condition unless otherwise the insane peak cannot be shown. Therefore, let's investigate what's going on here.   

In [255]:
# we will find if the 6 years of pro coding exp people really have insane median peak here
condition_4 = (df['EduOther'] == str_bootcamp_graduate)
bootcamp_grad_salary = df_us[condition_4]['ConvertedComp']
bootcamp_grad_years = df_us[condition_4]['YearsCodePro']
bootcamp_grad_comp_per_years_med = df_us[condition_4].groupby(['YearsCodePro'])['ConvertedComp'].median()
condition_5 = condition_4 & (df_us['YearsCodePro'] == 6)
bootcamp_grad_with_6_yrs = df_us[condition_5]['ConvertedComp']
print(bootcamp_grad_with_6_yrs)

67663    1000000.0
73526     135000.0
Name: ConvertedComp, dtype: float64


## Investigation Result
we found that index 67663 gave 1,000,000 for his convertedComp, salary
which is either a additional digit mistake or on purpose
1. fix digit
2. drop the row

Let's fix the digit assuming the respondent had a typo.

In [256]:
# let's fix the digit typo
df_us.set_value(67663, 'ConvertedComp', 100000)
print('is it fixed?', df_us[df_us.index == 67663]['ConvertedComp'])

is it fixed? 67663    100000.0
Name: ConvertedComp, dtype: float64


In [257]:
# let's plot it again with fixed typo
edu_level = df_us[condition_cs_bs_major].pivot_table(
    index = 'EdLevel',
    columns='YearsCodePro',
    values='ConvertedComp',
    aggfunc={
        'ConvertedComp': np.median
        }
    )
edu_other = df_us.pivot_table(
    index = 'EduOther',
    columns='YearsCodePro',
    values='ConvertedComp',
    aggfunc={
        'ConvertedComp': np.median
        }
    )
bootcamp_graduate_fix = edu_other[edu_other.index == 'Participated in a full-time developer training program or bootcamp']


fig = go.Figure(
    data=[
        go.Bar(name='B.S in Computer Science', x=years_code_pro, y=bs_cs_degree.values.tolist()[0]),
        go.Bar(name='Bootcamp Graduates', x=years_code_pro, y=bootcamp_graduate_fix.values.tolist()[0])
    ]
)
# Change the bar mode
fig.update_layout(
    xaxis=dict(range=[0, 10], dtick=1),
    title='Salary Comparison between CS degree and Bootcamp',
    xaxis_title='number of years of professional coding',
    yaxis_title='Salaray(median) in dollars',
    barmode='group'
    )
fig.show()

## Plot
plot is now updatedd with fixed data.
I trimmed years of coding prefossional down to 10 years max because according to wiki it says the coding bootcamp has been for 9 years. 


# Final Analysis
My hypothesis was
The bootcamp graduates are valued as equal as B.S. degree in Computer Science in the field

I think i came up with reasonable data which supports my hypothesis because the first 6 years of professional career years, the salary medians are pretty close each other. For range from 5 to 6 years, coding bootcamp graduates were even more valued in the field.

Of course, I think we need more years to validate this idea because coding bootcamp graduates gorup is a lot less in terms of number of people in the group. 



In [262]:
# let's find numbers for each group
cond_bootcamp_graduate = df_us['EduOther'] == str_bootcamp_graduate
cond_bs = df_us['EdLevel'] == str_bs_ed_level
cond_cs = df_us['UndergradMajor'] == str_cs_degree
cond_bs_cs = cond_bs & cond_cs
years_bootcamp = df_us[cond_bootcamp_graduate]['YearsCodePro'].value_counts()
years_bs_cs = df_us[cond_bs_cs]['YearsCodePro'].value_counts()

In [267]:
# visualize how quickly bootcamp graduate grow as well as CS degree graduate
fig = go.Figure(
    data=[
        go.Bar(name='CS Graduates', x=years_code_pro, y=years_bs_cs.values.tolist()),
        go.Bar(name='Bootcamp Graduates', x=years_code_pro, y=years_bootcamp.values.tolist())
    ]
)
# Change the bar mode
fig.update_layout(
    xaxis=dict(range=[0, 10], dtick=1),
    title='Number Comparison between CS degree and Bootcamp',
    xaxis_title='number of professional coding years',
    yaxis_title='Counts',
    barmode='group'
    )
fig.show()

## Thoughts
Number of bootcamp graduates are significantly small relative to CS degree, so I cannot have a strong opinion on the salary comparison result. According to the [careerkarma.com](https://careerkarma.com/blog/bootcamp-market-report-2020/), a number of bootcamp graduates in U.S. in 2019 was 33,959(only 0.3% of the whole population responded via stackoverflow). Therefore, as the number of bootcamp gradutes grow quickly, we will have better analysis on values in the field. 

In [268]:
# condition for B.S. in computer science or similar major in U.S
cond_cs_mjr = df_us['UndergradMajor'] == str_cs_degree
cond_bs_dgr = df_us['EdLevel'] == str_bs_ed_level
cond_less_3_yrs = df_us['YearsCodePro'] < 3
cond_more_3_yrs = (df_us['YearsCodePro'] >= 3) & df_us['YearsCodePro'] <= 9

result_bs_cs_less_3 = df_us[cond_cs_mjr & cond_bs_dgr & cond_less_3_yrs].groupby(['EdLevel'])['ConvertedComp'].median()
result_bs_cs_more_3 = df_us[cond_cs_mjr & cond_bs_dgr & cond_more_3_yrs].groupby(['EdLevel'])['ConvertedComp'].median()
print('Compensation with B.S. in CS with less than 3 years of professional coding expereince\n', result_bs_cs_less_3)
print('Compensation with B.S. in CS with more than 3 years of professional coding expereince\n', result_bs_cs_more_3)

Compensation with B.S. in CS with less than 3 years of professional coding expereince
 EdLevel
Bachelor’s degree (BA, BS, B.Eng., etc.)    79644.0
Name: ConvertedComp, dtype: float64
Compensation with B.S. in CS with more than 3 years of professional coding expereince
 EdLevel
Bachelor’s degree (BA, BS, B.Eng., etc.)    110250.0
Name: ConvertedComp, dtype: float64


In [269]:
# I need to show exact median values for B.S. degree vs I never completed formal education


condition_1 = df['EduOther'] == str_bootcamp_graduate
bootcamp_grad_comp = df_us[condition_1].groupby(['EduOther'])['ConvertedComp'].median()

condition_2 = condition_1 & (df_us['YearsCodePro'] <= 3)
result_bootcamp_grad_three_years_less_exp_comp = df_us[condition_2].groupby(['EduOther'])['ConvertedComp'].median()

condition_3 = condition_1 & (df_us['YearsCodePro'] >= 3)
result_bootcamp_grad_three_years_more_exp_comp = df_us[condition_3].groupby(['EduOther'])['ConvertedComp'].median()

print('boot camp grad compensation\n', bootcamp_grad_comp)
print('\n')
print('boot camp grad with less than 3 years of professional coding expereince compensation\n', result_bootcamp_grad_three_years_less_exp_comp)
print('\n')
print('boot camp grad with more than 3 years of professional coding expereince compensation\n', result_bootcamp_grad_three_years_more_exp_comp)

boot camp grad compensation
 EduOther
Participated in a full-time developer training program or bootcamp    84375.0
Name: ConvertedComp, dtype: float64


boot camp grad with less than 3 years of professional coding expereince compensation
 EduOther
Participated in a full-time developer training program or bootcamp    80000.0
Name: ConvertedComp, dtype: float64


boot camp grad with more than 3 years of professional coding expereince compensation
 EduOther
Participated in a full-time developer training program or bootcamp    91500.0
Name: ConvertedComp, dtype: float64


# Takeaways
1. Coding Bootcamp is not a bad choice in terms of values in the field (if hired)
2. 